Tutorial Contents

Interpolate time series: external data

Data source

Interpolation types

Extract interpolated values

Contents

Interpolate Time Series: External Data

When time series data are collected at widely spaced and possibly irregular intervals ("sparse" data), the value of the data at times between the times of collection are formally unknown. However, a "best guess" estimate of such values can be obtained by joining the known sample values by a line that follows the general shape of the changing data. This is known as interpolation. The simplest form of interpolation is linear - the known data points are joined by straight lines. This has the great advantage that it makes no assumptions about the real shape of any trends in the data (and it is very quick and easy to compute), but it is almost certainly inaccurate in predicting intermediate values. After all, it is very unlikely that the real data have a series of sharp angular changes in direction that just happen to occur at the precise times the data were sampled. It is often more useful to join the known sample values by curving lines that ensure that there are no sharp angular changes in the trendline. There are many algorithms for drawing such curving lines, and DataView allows you to explore some of them.

Data source

The external data source should be plain-text numbers arranged in two tab- or comma-separated equal-length columns, with the left column containing the X (normally time) values and the right column containing the Y values. Any rows containing non-numeric values are ignored, and there must be at least 4 X-Y pairs. The X values of the data do not have to be evenly spaced, but they do have to be in strictly increasing order, with the smallest at the top.

There are 3 ways to load data into the program:

  1. Copy the data onto the clipboard outside of DataView
    then click the Paste button in the Robust Fit dialog,
    OR click inside the Fixed points edit box and press control-v (the normal edit paste command), then press the Tab key to accept the entry.
  2. Click the Load button in the dialog and select a text file (.txt) containing the data.
  3. Drag-and-drop a text file containing the data from File Explorer onto the dialog.

The table below shows data (21 X-Y pairs) in a suitable format.

time value
37 -1.7266
84 2.1104
148 1.0697
161 0.4513
170 -2.7969
174 -1.0098
188 -0.6982
242 -0.1060
267 0.1238
395 -0.7872
416 0.9317
468 0.3044
470 1.6138
558 0.3223
628 0.1751
676 -0.7698
752 -0.2267
848 0.1580
914 0.2280

Interpolation types

The dialog graph now shows the original 19 fixed points from the table above (open circles) and a red line drawn using the current Interpolation type selection, which by default is Linear.

DataView provides 3 interpolation methods that connect the fixed points by curves which avoid sharp angles. For details of the mathematical algorithms underlying each curve the user should consult a textbook, but their properties can be visualized in DataView by selecting each type in turn.

Cubic spline interpolation generally produces a very smooth curve which is ideal for slowly changing data. The shape of the curve is calculated using information from all the fixed points, so any change in a source X or Y value will change the shape of the whole curve. However, the shape change diminishes with distance from the changed point.

One problem with cubic spline interpolation is that it can produce major overshoot oscillations with data that change rapidly. One such oscillation is clearly visible near the centre of the display, where the peak of the interpolated curve projects far beyond the fixed points of the original data. The cause of this overshoot is that 2 data values on its rising slope are very close together in time, with X values of 468 and 470 respectively.

The relatively small increase in the separation of the X values considerably reduces the overshoot, and the interpolation curve now looks more reasonable. However, one cannot arbitrarily change the sample time of real data, and it seems that a cubic spline is not a suitable interpolation method for these data.

The Bezier interpolation follows the source data much more closely than the cubic splin, but at the expense of less elegant curves! The Bezier spline uses just 4 adjacent fixed source points (control points) to determine the shape of the curve between the central pair, so any changes in X-Y values outside of the control points do not affect the curve within those points.

The curves generated by the Bezier option are generally very similar to the default curves Excel uses to connect points in a scatterplot (although to the best of my knowledge Microsoft has not published the actual method it uses). The code I use is modified from that provided by Brian Murphy at XLRotor, who has kindly published an Excel macro for Bezier curve generation, as described here.

With these data, PCHIP interpolation produces curves very similar to the Bezier option, although there are subtle changes in shape.

With a data set containing flat regions with sharp changes, the differences between Bezier and PCHIP interpolation are more pronounced.

1 -1
2 -1
3 -1
4 -1
5 0
6 1
7 1
8 1
9 1

The graph shows that what the data look like when connected by 3 straight lines.

The points are now connected by curves, but these propagate as damped oscillations into the flat data on either side of the mid-range slope.

There is a small degree of overshoot on either side of the central slope, but this now only propagates into the first flat interval. After that the flat regions are, indeed, flat.

With this method there is no overshoot into the flat region. Instead, the smoothing curve is restricted to the outer limits of the central slope region. The regions of same-valued (i.e. flat) data at the start and end of the series are completely flat.

The differences between the 3 methods can be seen more clearly by zooming in on a flat region.

Best choice?

As can be seen above, the best choice of interpolation method depends in part on the characteristics of the data you are interpolating (rapidity of change, range of sample intervals etc.), and in part on possible external constraints imposed by the physical system being analysed (e.g. whether flat data regions should always remain flat). That being said, in my opinion the PCHIP method is one of the most reliable ways of connecting data points sampled at irregular intervals with smooth curves. However, in many cases simple linear interpolation is perfectly adequate, and has the advantage that it can be used with any data without producing unexpected side-effects like unwanted overshoot.

Extrapolate out-of-range

It is normally not advisable to extrapolate an interpolation line beyond the range of the source fixed points, but if you chose to do so in DataView, the Y values are calculated by linear extrapolation using the slope of the interpolated line at the end points of the fixed points.

Note how on the left of the graph the interpolated line continues downwards at a steep angle, reflecting the slope at the first fixed point, while on the right the line continues at a much shallower slope, reflecting the slope at the last fixed point.

Extract Interpolated Values

You can extract the full set of X-Y values used to draw the interpolated line in the graph thus:

If you want to find the interpolated Y value at a single particular X value:

The interpolated Y value is shown when you press Enter or Tab to accept the X value.