Interpolate time series: external data

Interpolate Time Series: External Data

When time series data are collected at widely spaced and possibly irregular intervals ("sparse" data), the value of the data at times between the times of collection are formally unknown. However, a "best guess" estimate of such values can be obtained by joining the known sample values by a line that follows the general shape of the changing data. This is known as interpolation. The simplest form of interpolation is linear - the known data points are joined by straight lines. This has the great advantage that it makes no assumptions about the real shape of any trends in the data (and it is very quick and easy to compute), but it is almost certainly inaccurate in predicting intermediate values. After all, it is very unlikely that the real data have a series of sharp angular changes in direction that just happen to occur at the precise times the data were sampled. It is often more useful to join the known sample values by curving lines that ensure that there are no sharp angular changes in the trendline. There are many algorithms for drawing such curving lines, and DataView allows you to explore some of them.

Data source

The external data source should be plain-text numbers arranged in two tab- or comma-separated equal-length columns, with the left column containing the X (normally time) values and the right column containing the Y values. Any rows containing non-numeric values are ignored, and there must be at least 4 X-Y pairs. The X values of the data do not have to be evenly spaced, but they do have to be in strictly increasing order, with the smallest at the top.

There are 3 ways to load data into the program:

Copy the data onto the clipboard outside of DataView
then click the Paste button in the Robust Fit dialog,
OR click inside the Fixed points edit box and press control-v (the normal edit paste command), then press the Tab key to accept the entry.
Click the Load button in the dialog and select a text file (.txt) containing the data.
Drag-and-drop a text file containing the data from File Explorer onto the dialog.

The table below shows data (21 X-Y pairs) in a suitable format.

time value

37 -1.7266

84 2.1104

148 1.0697

161 0.4513

170 -2.7969

174 -1.0098

188 -0.6982

242 -0.1060

267 0.1238

395 -0.7872

416 0.9317

468 0.3044

470 1.6138

558 0.3223

628 0.1751

676 -0.7698

752 -0.2267

848 0.1580

914 0.2280

Copy the table above onto the clipboard.
In my browser (Firefox) you can do this by dragging across the table to select it, and then pressing control-c.
Select the Analyse (external data): Interpolate time series menu command to open the Interpolate Time Series dialog.
Note that this will be a top menu item if no data files are currently loaded into DataView, otherwise it will be a sub-menu item within the Analyse main menu.
Click the Paste button in the dialog.

Interpolation types

The dialog graph now shows the original 19 fixed points from the table above (open circles) and a red line drawn using the current Interpolation type selection, which by default is Linear.

The graph can be expanded if desired by dragging a corner of the dialog to resize it.

DataView provides 3 interpolation methods that connect the fixed points by curves which avoid sharp angles. For details of the mathematical algorithms underlying each curve the user should consult a textbook, but their properties can be visualized in DataView by selecting each type in turn.

Select Cubic spline
Note that the Y axis of the graph rescales because the Autoscale Y checkbox has been pre-selected.

Cubic spline interpolation generally produces a very smooth curve which is ideal for slowly changing data. The shape of the curve is calculated using information from all the fixed points, so any change in a source X or Y value will change the shape of the whole curve. However, the shape change diminishes with distance from the changed point.

One problem with cubic spline interpolation is that it can produce major overshoot oscillations with data that change rapidly. One such oscillation is clearly visible near the centre of the display, where the peak of the interpolated curve projects far beyond the fixed points of the original data. The cause of this overshoot is that 2 data values on its rising slope are very close together in time, with X values of 468 and 470 respectively.

Uncheck the Autoscale Y box.
Manually edit the X value of 470 to 475.
Press the Tab key to accept the change.

The relatively small increase in the separation of the X values considerably reduces the overshoot, and the interpolation curve now looks more reasonable. However, one cannot arbitrarily change the sample time of real data, and it seems that a cubic spline is not a suitable interpolation method for these data.

Manually edit the X value to return 475 to 470 and restore the overshoot.

Select Bezier (3rd order) spline.

Check the Autoscale Y box to zoom in on the curve.

The Bezier interpolation follows the source data much more closely than the cubic splin, but at the expense of less elegant curves! The Bezier spline uses just 4 adjacent fixed source points (control points) to determine the shape of the curve between the central pair, so any changes in X-Y values outside of the control points do not affect the curve within those points.

The curves generated by the Bezier option are generally very similar to the default curves Excel uses to connect points in a scatterplot (although to the best of my knowledge Microsoft has not published the actual method it uses). The code I use is modified from that provided by Brian Murphy at XLRotor, who has kindly published an Excel macro for Bezier curve generation, as described here.

Select PCHIP (Piecewise Cubic Hermite Interpolating Polynomial).

With these data, PCHIP interpolation produces curves very similar to the Bezier option, although there are subtle changes in shape.

With a data set containing flat regions with sharp changes, the differences between Bezier and PCHIP interpolation are more pronounced.

Copy the table below onto the clipboard.

1 -1

2 -1

3 -1

4 -1

5 0

6 1

7 1

8 1

9 1

Select Linear as the interpolation method.
Click Paste in the dialog to load the new data.

The graph shows that what the data look like when connected by 3 straight lines.

Select Cubic as the interpolation method.

The points are now connected by curves, but these propagate as damped oscillations into the flat data on either side of the mid-range slope.

Select Bezier as the interpolation method.

There is a small degree of overshoot on either side of the central slope, but this now only propagates into the first flat interval. After that the flat regions are, indeed, flat.

Select PCHIP as the interpolation method.

With this method there is no overshoot into the flat region. Instead, the smoothing curve is restricted to the outer limits of the central slope region. The regions of same-valued (i.e. flat) data at the start and end of the series are completely flat.

The differences between the 3 methods can be seen more clearly by zooming in on a flat region.

Uncheck the Autoscale Y box.
Set the upper Y axis scale to -0.9 and the lower scale to -1.1.
Select the 3 curved interpolation methods in turn, and observe the difference in the lines drawn between fixed source points.

Best choice?

As can be seen above, the best choice of interpolation method depends in part on the characteristics of the data you are interpolating (rapidity of change, range of sample intervals etc.), and in part on possible external constraints imposed by the physical system being analysed (e.g. whether flat data regions should always remain flat). That being said, in my opinion the PCHIP method is one of the most reliable ways of connecting data points sampled at irregular intervals with smooth curves. However, in many cases simple linear interpolation is perfectly adequate, and has the advantage that it can be used with any data without producing unexpected side-effects like unwanted overshoot.

Extrapolate out-of-range

It is normally not advisable to extrapolate an interpolation line beyond the range of the source fixed points, but if you chose to do so in DataView, the Y values are calculated by linear extrapolation using the slope of the interpolated line at the end points of the fixed points.

Select the contents of the first data table above, and re-Paste it into the dialog.
Make sure the Autoscale Y is box checked, and Cubic is selected as the Interpolation type.
Check the box Extropolate OoR (below the Interpolation type list).

Note how on the left of the graph the interpolated line continues downwards at a steep angle, reflecting the slope at the first fixed point, while on the right the line continues at a much shallower slope, reflecting the slope at the last fixed point.

Extract Interpolated Values

You can extract the full set of X-Y values used to draw the interpolated line in the graph thus:

Select Copy text from the drop-down menu of the Copy button to place n X pairs of X-Y values onto the clipboard (select Save text to save them to a file).

If you want to find the interpolated Y value at a single particular X value:

Enter the desired X value into the X box under Single point.

The interpolated Y value is shown when you press Enter or Tab to accept the X value.

Tutorial Contents

Interpolate time series: external data

Data source

Interpolation types

Extract interpolated values

Contents

Interpolate time series: external data

Data source

Interpolation types

Extract interpolated values

Interpolate Time Series: External Data

Data source

Interpolation types

Best choice?

Extrapolate out-of-range

Extract Interpolated Values

time	value
37	-1.7266
84	2.1104
148	1.0697
161	0.4513
170	-2.7969
174	-1.0098
188	-0.6982
242	-0.1060
267	0.1238
395	-0.7872
416	0.9317
468	0.3044
470	1.6138
558	0.3223
628	0.1751
676	-0.7698
752	-0.2267
848	0.1580
914	0.2280