Interpolate Time Series: External Data
When time series data are collected at widely spaced and possibly irregular intervals ("sparse" data), the value of the data at times between the times of collection are formally unknown. However, a "best guess" estimate of such values can be obtained by joining the known sample values by a line that follows the general shape of the changing data. This is known as interpolation. The simplest form of interpolation is linear - the known data points are joined by straight lines. This has the great advantage that it makes no assumptions about the real shape of any trends in the data (and it is very quick and easy to compute), but it is almost certainly inaccurate in predicting intermediate values. After all, it is very unlikely that the real data have a series of sharp angular changes in direction that just happen to occur at the precise times the data were sampled. It is often more useful to join the known sample values by curving lines that ensure that there are no sharp angular changes in the trendline. There are many algorithms for drawing such curving lines, and DataView allows you to explore some of them.
Data source
The external data source should be plain-text numbers arranged in two tab- or comma-separated equal-length columns, with the left column containing the X (normally time) values and the right column containing the Y values. Any rows containing non-numeric values are ignored, and there must be at least 4 X-Y pairs. The X values of the data do not have to be evenly spaced, but they do have to be in strictly increasing order, with the smallest at the top.
There are 3 ways to load data into the program:
- Copy the data onto the clipboard outside of DataView
then click the Paste button in the Robust Fit dialog,
OR click inside the Fixed points edit box and press control-v (the normal edit paste command), then press the Tab key to accept the entry. - Click the Load button in the dialog and select a text file (.txt) containing the data.
- Drag-and-drop a text file containing the data from File Explorer onto the dialog.
The table below shows data (21 X-Y pairs) in a suitable format.
time value 37 -1.7266 84 2.1104 148 1.0697 161 0.4513 170 -2.7969 174 -1.0098 188 -0.6982 242 -0.1060 267 0.1238 395 -0.7872 416 0.9317 468 0.3044 470 1.6138 558 0.3223 628 0.1751 676 -0.7698 752 -0.2267 848 0.1580 914 0.2280
- Copy the table above onto the clipboard.
- In my browser (Firefox) you can do this by dragging across the table to select it, and then pressing control-c.
- Select the Analyse (external data): Interpolate time series menu command to open the Interpolate Time Series dialog.
- Note that this will be a top menu item if no data files are currently loaded into DataView, otherwise it will be a sub-menu item within the Analyse main menu.
- Click the Paste button in the dialog.
Interpolation types
The dialog graph now shows the original 19 fixed points from the table above (open circles) and a red line drawn using the current Interpolation type selection, which by default is Linear.
- The graph can be expanded if desired by dragging a corner of the dialog to resize it.
DataView provides 3 interpolation methods that connect the fixed points by curves which avoid sharp angles. For details of the mathematical algorithms underlying each curve the user should consult a textbook, but their properties can be visualized in DataView by selecting each type in turn.
- Select Cubic spline
- Note that the Y axis of the graph rescales because the Autoscale Y checkbox has been pre-selected.
Cubic spline interpolation generally produces a very smooth curve which is ideal for slowly changing data. The shape of the curve is calculated using information from all the fixed points, so any change in a source X or Y value will change the shape of the whole curve. However, the shape change diminishes with distance from the changed point.
One problem with cubic spline interpolation is that it can produce major overshoot oscillations with data that change rapidly. One such oscillation is clearly visible near the centre of the display, where the peak of the interpolated curve projects far beyond the fixed points of the original data. The cause of this overshoot is that 2 data values on its rising slope are very close together in time, with X values of 468 and 470 respectively.
- Uncheck the Autoscale Y box.
- Manually edit the X value of 470 to 475.
- Press the Tab key to accept the change.
The relatively small increase in the separation of the X values considerably reduces the overshoot, and the interpolation curve now looks more reasonable. However, one cannot arbitrarily change the sample time of real data, and it seems that a cubic spline is not a suitable interpolation method for these data.
- Manually edit the X value to return 475 to 470 and restore the overshoot.
- Select Bezier (3rd order) spline.
- Check the Autoscale Y box to zoom in on the curve.
The Bezier interpolation follows the source data much more closely than the cubic splin, but at the expense of less elegant curves! The Bezier spline uses just 4 adjacent fixed source points (control points) to determine the shape of the curve between the central pair, so any changes in X-Y values outside of the control points do not affect the curve within those points.
The curves generated by the Bezier option are generally very similar to the default curves Excel uses to connect points in a scatterplot (although to the best of my knowledge Microsoft has not published the actual method it uses). The code I use is modified from that provided by Brian Murphy at XLRotor, who has kindly published an Excel macro for Bezier curve generation, as described here.
- Select PCHIP (Piecewise Cubic Hermite Interpolating Polynomial).
With these data, PCHIP interpolation produces curves very similar to the Bezier option, although there are subtle changes in shape.
With a data set containing flat regions with sharp changes, the differences between Bezier and PCHIP interpolation are more pronounced.
- Copy the table below onto the clipboard.
1 -1 2 -1 3 -1 4 -1 5 0 6 1 7 1 8 1 9 1
- Select Linear as the interpolation method.
Click Paste in the dialog to load the new data.
The graph shows that what the data look like when connected by 3 straight lines.
- Select Cubic as the interpolation method.
The points are now connected by curves, but these propagate as damped oscillations into the flat data on either side of the mid-range slope.
- Select Bezier as the interpolation method.
There is a small degree of overshoot on either side of the central slope, but this now only propagates into the first flat interval. After that the flat regions are, indeed, flat.
- Select PCHIP as the interpolation method.
With this method there is no overshoot into the flat region. Instead, the smoothing curve is restricted to the outer limits of the central slope region. The regions of same-valued (i.e. flat) data at the start and end of the series are completely flat.
The differences between the 3 methods can be seen more clearly by zooming in on a flat region.
- Uncheck the Autoscale Y box.
- Set the upper Y axis scale to -0.9 and the lower scale to -1.1.
- Select the 3 curved interpolation methods in turn, and observe the difference in the lines drawn between fixed source points.
Best choice?
As can be seen above, the best choice of interpolation method depends in part on the characteristics of the data you are interpolating (rapidity of change, range of sample intervals etc.), and in part on possible external constraints imposed by the physical system being analysed (e.g. whether flat data regions should always remain flat). That being said, in my opinion the PCHIP method is one of the most reliable ways of connecting data points sampled at irregular intervals with smooth curves. However, in many cases simple linear interpolation is perfectly adequate, and has the advantage that it can be used with any data without producing unexpected side-effects like unwanted overshoot.
Extrapolate out-of-range
It is normally not advisable to extrapolate an interpolation line beyond the range of the source fixed points, but if you chose to do so in DataView, the Y values are calculated by linear extrapolation using the slope of the interpolated line at the end points of the fixed points.
- Select the contents of the first data table above, and re-Paste it into the dialog.
- Make sure the Autoscale Y is box checked, and Cubic is selected as the Interpolation type.
- Check the box Extropolate OoR (below the Interpolation type list).
Note how on the left of the graph the interpolated line continues downwards at a steep angle, reflecting the slope at the first fixed point, while on the right the line continues at a much shallower slope, reflecting the slope at the last fixed point.
Extract Interpolated Values
You can extract the full set of X-Y values used to draw the interpolated line in the graph thus:
- Select Copy text from the drop-down menu of the Copy button to place n X pairs of X-Y values onto the clipboard (select Save text to save them to a file).
If you want to find the interpolated Y value at a single particular X value:
- Enter the desired X value into the X box under Single point.
The interpolated Y value is shown when you press Enter or Tab to accept the X value.