Tutorial Contents

Events

Parameters

Quick Insert

Channel Properties

Channel Position

Single Event Properties

Navigating Events

Context Menu

Create Events Edit Existing Events

Listing/Measuring Parameters

Scattergraph

Display as Data

Histogram

See also...

Contents

Events

One of the most powerful features of DataView is the ability to define and analyse events. Events are characterised by an on-time and an off-time, and are used to demarcate specific sections of a data record where something of interest has happened.

Event timing
Event timing. The event on time is 3 ms because that is the acquisition time of the first sample within the event (the very first sample has acquisition time 0). The event off time is 6 ms because that is the acquisition time of the first sample after the event. The event duration is 3 ms.
For clarity, the View: Dots not lines menu command has been activated so that individual data samples show as symbols not connected by lines.

We start by exploring some of the basic procedures involving events, and then look at specific editing and analysis features.

In general, operations that create, delete or change events are in the Event edit menu, while operations that use events for analysis are in the Event analyse menu. However, this is not a hard-and-fast distinction - some operations change events as a result of analysis, and so their command location is a bit arbitrary.

Parameters

Events themselves only carry timing information, but by associating event channels with data traces, much data-related information can also be extracted. A list of both time- and data-related parameters that can be extracted from events is given here.

Quick Insert

When you release the mouse button after completing the box, an event channel appears with an event marking the timing of the burst. Since this is the first event channel in the file, it is channel a.

Note: there are numerous ways of automatically detecting bursts like this – this is just a quick way to get started.

Channel Properties

Each event channel probably contains events relating to a particular feature or analysis of the data, and each channel can have specific properties associated with it.

Parent trace

Note that at the right-hand side of the main chart view the first event channel is labelled a: 1. This shows that this is event channel a, and that the parent trace of the channel is trace 1 (the upper trace). The parent trace is the trace from which data-related values will be calculated. The second event channel label shows b, which is correct, but it also shows the parent trace as 1, which is not. You drew the boxes around the second trace, and presumably you would want to measure data from that trace, so you want the parent to be trace 2. Obviously, DataView has no way of knowing which trace you are interested in, so you have to tell it. (However, if you construct the events from a data trace by, for instance, template or threshold recognition, then it will automatically set the trace used in construction as the parent trace.)

Note that the main display label for the second event channel now reads b: 2. Whenever you change a property in this dialog box, click Apply if you want it to take immediate effect or if you have made changes and don’t want to lose them when you change to a new channel.

Label and Description

We can give the channel an explicit label.

Note that the label is prepended to the channel ID text. If we wanted we could enter a more extended note about the channel in the Channel description box. This does not get displayed on the screen, but it is stored with the file and can be retrieved and read through this dialog box.

Numerical display

Each event in the main display can have a numeric value displayed under it, which by default is the sequential ID of the event within the channel.  However, we can change this.

Note that the numeric display for each event in channel a now shows the time (from the start of the recording) of the start of the event.

This is where setting the correct parent trace is crucial – if we had left the parent trace of channel b as trace 1, we would be seeing peak-to-peak amplitude measured from trace 1, even though we drew the events to bracket the bursts in trace 2.

The screen display is just a quick way of seeing simple event-related parameters. There is a very extensive further set of display and analysis options available through the Event analyse menu.

Visibility (Hide/Show)

Individual event channels can be hidden or displayed by checking/unchecking the Channel visible box within the Channel properties dialog box. You can also set channel visibility through the Event edit: View: Hide/Show channels menu command. A global hide/show option is available with the View: Show events menu command, and this acts with AND logic on the individual channel choices. When any event channels are hidden, a small plus (+) sign appears in the top right-hand corner of the display screen. If you click this, then the hide command is cancelled for all event channels, so that all channels containing events are shown.

Colour

The baseline colour of the event channel can be set by clicking the Base colour button, and then clicking the desired new colour in the Choose colour dialog box.
Now dismiss the Channel Properties dialog box by clicking OK.

Channel Position

If you move the mouse over the baseline of an event channel the cursor changes to an up-arrow, indicating that you are pointing at a draggable object. You can drag event channels to new positions with the mouse.

Alternatively you can get DataView to distribute them for you.

Event channels maintain their position relative to the data even if the view is re-sized, and this position will be saved if the data file itself is saved.

Single Event Properties

The active event becomes centred in the main display (unless it is at an extreme end) and its mark is slightly taller than the non-active events. The channel and ID of the selected event are displayed in the dialog, along with read-only details of the number of events in the channel and the parent trace.

Colours, Tags, Labels and Timing

Within the dialog box you can:

  1. Set the colour of individual events, which can be used to partition them into separate channels as described later.
  2. You can also set a tag and/or an individual label on each event. Tagged events show as highlighted, and have various special properties in analysis routines.
  3. You can also adjust the timing of that particular event with a variety of options (press F1 to activate the built-in help for the dialog to see more details of these options.

In each case you need to click the Apply button within that section to make your choice stick.

When you have made the desired adjustments to the selected event, you can navigate to another event by clicking on it, or you can move forwards and backwards to successive events within the selected channel using the spin button beside the event ID box.

When editing is complete click the Done button to dismiss the dialog box.

Navigating Events

You will often end up with many hundreds of events within many channels in your files. You can quickly navigate between successive events by:

By default, pressing ctrl/alt e moves to the next/previous event in any channel, but sometimes you want to navigate between events in a single channel, even though you have many channels with events in them.

Now ctrl/alt e navigates between events in the selected channel only.

Focus

When you navigate through events, you can either position the selected event on the left edge of the main view (the default), or in the centre.

Context Menu

If you right-click within an event (within the inverted-U shape) a pop-up menu offers many options relating to individual events or whole event channels. These are all available as menu options, but right-clicking is often more convenient.

Create Events

You can create events by detecting specific features in data traces, or in other event channels. These methods are detailed in tutorials in the Specific Analysis section:

You can also insert user-defined events.

You can now try out manual creation.

Cursor

Mouse

You have already seen the default option of dragging a box to insert an event in the Quick Insert Events section earlier. For now just note that you can also insert fixed-width events by selecting that option, and then just clicking on the screen where you want the event to start.

Regular or random

You can set events to occur at regular defined intervals, or with a random distribution, throughout the file. This may be useful if you wish to monitor data at regular intervals, or to construct simulated data with known parameters.

Paste or Import

There may be occasions when you want to set up a series of events with known but irregular times and durations. This can best be done by pasting or importing them from a text file. The format is very simple, each event occurs on a separate line with tab-separated on-time and off-timeRemember that the on-time of the event is the acquisition time of the first data sample within the event. The off-time is the acquisition time of the first data sample after the event (i.e. the half-open range). and an optional event label; e.g.

100 120
200 230 ev 2
300 340 3rd ev

To see this in action:

Edit Existing Events

Event toolbar

A toolbar with some common event editing operations can be made visible by selecting it from the View menu.

Delete or Merge Events

The Event edit: Delete and Merge menu sections contains commands to delete events or merge. Most are self-explanatory, and if you hover over a command, the status bar text at the bottom-left of the main view gives a brief summary of the command action.

The Delete/Merge: Events within channel commands allow you to drag around a group of events within a channel and either delete them, or merge them into a single event, respectively. Both commands remain in operation until you press Escape, right click, or drag around a region of screen not containing events. You can use the navigation toolbar buttons while maintaining the command mode. This allows you to edit events within a whole file without repeatedly having to select the same menu commands.

If you right-click on or just below the baseline between events, one of the context menu options is Del event channel, which does what it says.

If you right-click within the ⌈⌉-shape of a single event, one of the context menu options is Del one event, which also does what it says.

If you right-click in the U-shaped gap between two events, one of the context menu options is Merge events, which merges the two events on either side of the gap into a single event.

Logical Operations on Events

You can perform logical operations between events in different channels with commands in the Event edit: Logical operation menu section.

Unary operations

These operate on a single source channel and place the result into a destination channel.

EQUALS (copy): The destination channel is set equal to the source channel.

NOT (invert): The destination channel is set equal to an inverted copy the source channel. Inverted means that the inter-event periods become events, and the events are deleted, i.e. off becomes on and vice versa.

Binary plus operations

These operate on two or more source channels, and place the results in a destination channel.

AND: Destination events only occur during times when all source events are on

OR: Destination events occur when any source event is on.

XOR: Destination events occur when only one source event is on.

AND NOT: Destination events occur when any source event is on (OR logic), except during a specified NOT channel.

Filtering Events

Options in the Event edit: Filter menu section allow you to filter events in a source channel according to either their own time characteristics, or the values in their parent data trace. The modified events are written to a destination channel, which can be the same as the source channel (hence overwriting it).

The filter dialogs are non-modal, and the results of the filter can be obtained (and if necessary changed) by clicking the Apply button without dismissing the dialog.

Time filter

There are is command activates the Time filter dialog. This has 3 filter parameters which are applied in sequence: minimum off time, minimum on time, and maximum on time.

Any events separated by less than the minimum off time are merged to produce a single longer-duration event. Then any events shorter than the minimum on time are deleted. Finally, any events longer than the maximum on-time are deleted.

Value filter

This command activates the Value filter dialog. The user selects the trace parameter from the Filter factor drop down list (e.g. peak-to-peak amplitude, average value etc.), and then choses Minimum, Maximum or Minimum and maximum, plus the values that will be applied. Any event whose data falls outside the allowed limits is deleted.

Frequency filter

This command activates the Frequency filter dialog. Any allows you to filter an event channel whose events enclose events in another channel. The enclosing event is copied to the destination channel if the events enclosed meet the specified criterion such as a minum or maximum frequency, or a maximum coefficient of variation.

Cross-channel merging and moving

Merge Events

If you have more than 1 event channel defined, you can combine events from separate channels, using the start times from one channel and the end times from the other, with the Event edit: Merge event channels (on/off) command. This displays the Merge event channels using on-off times dialog box. Adjacent pairs of events in the two source channels will be merged, with the start time of an event in Chan defining start times determining the start of the event in the destination channel, and the end time of the immediately following event in Chan defining end times determining the end time of the event in the destination channel. Unpaired events in either source channel are ignored.

Move Events

You can move one or a group of contiguous events from one channel to another. Select the Event edit: Move events menu command, then drag and box around the event(s) that you wish to move. When you release the mouse a dialog pops up asking which event channel to move them to. Select the channel you want, and click OK. The events will then be moved.

Adjust timing

Point Process Events

Several histograms and scatter graph options use the start time of the event as the datum point on which to base the plot. In some cases this may not be appropriate. You can change this using the Timing: Make point process command from the Event edit menu. This sets up a new event channel, in which the events are point processes (i.e. have the minimum possible duration of one sample interval) whose time can be set to some defined point within the original event. The choices are:

Relative fraction: The event is a fraction of the way through the original, where 0 means the first sample within the original, 1 means the last sample within the orginal (i.e. just before the off-time), 0.5 means half-way through the original etc.
Centre of gravity: This is the earliest time point at which the cumulative summed voltage in the parent channel from the start of the event is greater than or equal to half its total value.

Max, min: The time of the maximum or minimum value in the parent trace within the original

The data channel shows square waveforms with 100 samples with amplitude 1000, followed by 100 samples with amplitude 500. Random noise has been added to the data. Event channel a is the original event that encompasses the step changes in the data. Event channels b - g contain point process events derived from channel a as per the channel labels.

Duration

The Event edit: Timing: Set duration command activates the Event duration dialog. This sets the duration of all events to the same value. However, you can add noise to the duration of events, so that their durations have a defined statistical profile.

Adjust start/end times

The Event edit: Timing: Adjust start/end time command activates the Adjust event start/end times dialog. This offers various option allowing you to do what the name suggests. Press the F1 key with the dialog active to see an explanation of the options.

The Edit event: Align command allows you to adjust event timing with respect to data within the event. If you want to display events using the Event analyse: Event-triggered scope view command, and you want all the events data traces to have their peaks or troughs superimposed at the same time location, then this is the option for you. You define a Master event ID in the Align events dialog box, which determines which event number is used as the baseline. All other events within the channel will be time-shifted so that the peak or trough within those events (in their original location) occurs at the same time relative to the event start as the peak or trough in the master event.

Event timing can also be adjusted visually, by activating the Event analyse: Event-triggered scope view dialog, setting the display parameters to show the data that you wish to encompass within the events, and then clicking the Ev=Disp button.

Shuffle or Resample

The Event edit: Timing: Random shuffle/resample intervals menu command activates the Shuffle/Resample event intervals dialog.

Shuffle: This option makes a list (internally) of all the intervals between events, and randomly shuffles them. It then takes each event after the first in turn and adjusts its start time so that the interval between it and the preceding event matches the corresponding interval in the shuffled list. This preserves the main statistical properties of the event channel intervals, but destroys any serial correlation in onset timing. This may be a useful control for various analysis procedures.

Resample: This option is similar to shuffling, but the new intervals are drawn at random from the original list, with replacement. Also, the timing of the first event is randomized from the list. This means that intervals present in the original channel may be missing in the new channel, and also there may be duplicate copies of intervals in the original channel present in the new channel. It destroys any relationship between the events and data values in the parent trace. This may be appropriate for Monte Carlo and bootstrap statistical tests.

Listing Events Parameters

In a real experiment you would probably want to do something with your event analysis.

The file contains an extracellular recording from the superficial 3rd root of an abdominal ganglion of a crayfish. A series of events which mark the timing of some medium-sized spikes have been inserted in event channel a. The events were defined using the template recognition facility. Event channel b also contains events marking spikes, but they are not all the same as those of channel a, and they have been colour-coded using the Dataview spike sorting facility.

There are a wide variety of measurements you can make from events, and to select measurements you check the appropriate boxes. The Event-specific parameters on the left of the dialog box are parameters whose values depend only on the events, not on any underlying data. The parameters on the right of the dialog make measurements from data traces within the duration of each event. If a Parent trace is specified then measurements are made from just that trace. If there is no parent trace (value 0) then measurements are made and listed from all traces in the recording.

The numerical values of the on times and the peak-to-peak amplitudes are displayed. Note that the peak-to-peak amplitude of the second event is much smaller than the others, which fits with the rather small second spike in the main display (the red event).

If you click the Copy button, these will be placed on the clipboard and could be pasted into another program for graphing or statistical analysis. You could also click the Save button in the main dailog to save the event values directly to a text file.

Scattergraph of Event Parameters

Event parameters can be displayed using 2-D or 3-D scatter graphs. The latter are mainly intended for partitioning events into groups, and this is described in detail elsewhere.

The data record is an extracellular recording of the ventral nerve cord of a crayfish during light-activation of the caudal photoreceptor. The template-match facility has been used to mark CPR spikesThe spike whose waveform was used as a template was chosen heuristically by inspecting the data - there is no magic way of knowing that it really is a CPR spike without the ground truth of a matching intracellular recording. However, as we will see, its response characteristics confirm that it was a good choice. with events in channel a. Note that the low frequency spikes at the start of the recording are not CPR spikes - they come from a different axon. Even though they are quite similar in shape and size to the CPR spikes, the template match is able to distinguish them from the CPR spikes.

Our ears are very good at detecting patterns of activity, and the increase and subsequent decline in CPR spike frequency should be obvious. However, it would be good to quantify this.

Event scattergraph
2D scattergraph showing a plot of instantaneous frequency versus time in of events in channel a. Other options can be selected from the X, Y Axes drop-down lists located above the left-hand end of the graph.

As suggested by the audio play-back, the graph shows that the CPR spike frequency increases to a peak, and then declines to a plateau. Note that one datum is drawn with a thicker circle than the others. This is the single tagged event representing the exact template match used to identify the spikes.

Scale and Zoom

By default, all events in the selected channel will be included in the graph, and the axes will autoscale (note that the Autoscale X and Y boxes above the graph are checked). However, you can zoom in to a part of the graph:

The scattergaph should now be zoomed in on the selected data.

Note that the Autoscale boxes are now unchecked. In that state you can manually edit the axes scales, rather than using the zoom method. This might be appropriate if you want to set explicit scale values so that you can compare between experiments, for instance.

Analysis Region

If you have a large data file with a lot of events, but only want to analyse those that occur within a particular region, you may speed up the analysis by selecting an appropriate option from the Analysis region group.

If you select an analysis region other than the whole file and the X axis is showing Time or ID, parts of the graph might be empty either because there are no events in that time region in the source file, or because that time region of the graph it is outside the analysis region.To help avoid confusion, valid sections of the graph are marked by a line drawn under the X axis. The colour of the line can be adjusted by clicking the coloured button (to hide the line, set the colour to white).

If the X axis is not showing time or ID, or the whole file is selected for analysis (in which case all times are valid), the valid range mark is not shown.

Navigating using the scattergraph

Although the frequency trend is clear in the graph, and generally quite stable, there are several places where the frequency appears to drop by about half. This either means that CPR spikes genuinely did not occur at these times, or that their waveforms did not match the template closely enough to be recognized as belonging to this category. We would like to examine the data in the region of the low frequency outlier events to try to determine which explanation is correct.

The main display screen will scroll to centre the event nearest to the spot that you clicked on the scatter graph (you can move or minimize the dialog box if it gets in the way of the main window). The marker for the event you clicked (number 235) is drawn taller than normal so that it is highlightedTo remove the highlight, just right-click the mouse or press the ESC key. amongst the other events. To the left of the highlighted event you will see a giant-sized spike where there apparently ought to be a CPR spike. It is not surprising that this super-spike did not match the template of the normal spike. What you decide to do about this mismatch is a scientific problem, not a computational one. The giant spike may genuinely be a different spike, in which case it should not be included in the analysis, but I rather suspect it is a result of the CPR spike summing with some other spike, in which case it probably should be included.

Trendlines: Fitting and smoothing

A new Trendline/smooth floating dialog box displays which allows you to plot either a smoothed trendline or a fitted polynomial trendline through the data. Smoothing is achieved by a locally-weighted scatterplot smoothing (LOWESS) algorithm. The fit is robust (i.e. reduces or ignores the influences of outliers) if the Robust iter parameter is set greater than 0 (if this parameter is set to 0 then outlier data points carry equal weight to non-outliers).

After a brief delay you should see something like this:

Smoothed trendline through scattergraph
Scattergraph with smoothed trendline.

A smoothed line has been drawn through the data points, ignoring the outlier values (that’s the robust part of the algorithm).

If you select the Polynomial option, then a best-fit trendline following a polynomial of the selected degree is drawn through the data. A Degree 0 means that a simple average is drawn, Degree 1 means a linear trendline etc. Again, the fit is robust if the Robust iter parameter is set greater than 0.

The Copy text button places both the raw XY data, and the fitted XY data onto the clipboard in text format. This allows you to perform calculations using the residuals, if you so desire.

Note, the display of the fitted line is not persistent. If you change one of the axes in the scatter graph to display a different parameter, the fitted line is automatically erased since it is no longer appropriate to the new data. If you then change the axis back to display the original parameter, you will have to re-calculate the fit line. You can, of course, switch to another application and then switch back to DataView without losing the fit line.

Measuring from the scattergraph

As you move the mouse cursor over the scatter graph, the X and Y values for the cursor location (in axis units) display near the top-right of the dialog. This gives an instant read-out of the parameter values at that screen location in the graph. However, you can also get accurrate measurements of the actual data values at each point.

Note that there is also a Go to & Measure option within the On click drop-down list. This does what its name suggests.

Display Event Parameters as Data

Should you so desire, you can save many event parameter and analysis outcomes as a data trace.

Details of this file are given in the preceding Scattergraph tutorial, but briefly it is an extracellular recording showing spikes of a photo-sensitive neuron in response to a light stimulus. The spikes have been marked with events using the template recognition facility.

The data values are only set at the times of the events themselves, so you have to specify how intervening data points should be handled. You can maintain data values from one event up to the next (step), linearly interpolate between events (linear), or set intervening data points as inactive (gap). For a Frequency plot, gap is the best option.

When the new file loads, you need to make some adjustments to the display:

At this point your display should look like this:

Frequency as data
Instantaneous event frequency displayed as data. CPR: Extracellular recording showing activity of a crayfish photoreceptor in response to a light stimulus. The spikes have been detected as events. Freq: The instantaneous frequency (Hz) of each event, measured as the reciprocal of the time interval between the event and the one preceding it (the first event is set to the same frequency as the second).

Histogram of Event Parameters

This dialog box is non-modal, so if you want you can switch between files, and the histogram updates to show the values for the changed file. You can display a histogram of a wide range of different event-related parameters, which you select from the drop-down Parameter list box. The default display is of an interval histogram. You select the event channel from the box near the top-left of the dialog. Some analyses (latency, count in, cross-correlation etc) look at relative timing between events in two separate channels. These channels are identified as 1 and 2. The channel 2 selection box is only available when a parameter that requires two event channels has been selected.

By default the X and Y axes will autoscale to show all the data, but if you uncheck the Autoscale X or Y options, you can hand edit the axis scales. You can also set the number of bins in the histogram (but see notes on the Poisson distribution below). Note: to appear in the histogram, data values must be equal to or greater than the lower bound (left-hand X axis scale) and less than the upper bound (right-hand X axis scale).

By default the histogram bins are drawn in a deep green colour, but this can be changed by clicking the colour button beside the Bin number option at the left of the dialog.

Axis scales are shown greyed out for scales that are read only (e.g. if the autoscale option is set). If the user clicks Copy these are temporarily un-greyed while the display is copied to the clipboard as a bitmap. This is purely for aesthetic reasons.

Basic descriptive statistics for the raw data contributing to the histogram are shown in the text box above the display.

Parent trace

Some of histograms only display information about the timing of events (interval, phase, correlation etc), but others (maximum, minimum, average etc) display amplitude information derived from a waveform data trace during the on-time of events. These latter require that the event channel should have a Parent trace, which is the trace from which the measurements are made. The Parent trace for the selected Event channel is displayed towards the top of the histogram dialog box. If it shows 0 it means that the event channel currently has no parent trace, but you can edit the value to set it to the data trace you wish to analyse. Note that if you change the Parent trace, this change is recorded in the file itself, and can be made permanent by saving the file.

Display type

The default display (normal) has the Y axis simply showing the raw counts of data in each histogram bin. The Display type drop-down list also offers a %age(Y) option, which shows the Y axis as a percentage of counts, and a ln(Y) option, which shows the natural logarithm of the counts. The probability density option is only appropriate for interval or correlation parameter choices, and it shows the Y-axis scaled to counts per second, rather than just counts. The sqrt(Y) and average PSTH options are specialized display modes used in specific analyses.

Rug plot

A series of red marks appear under the histogram. Each mark represents an actual value from the data generating the histogram (in this case the interval between two events), and each mark is positioned on the X-axis at the appropriate place for its value. This is a rug plot, which is analogous to a histogram with zero-width bins, or a 1-dimensional scatter plot.

Partitioning Events with the Histogram

Note that this is just one way of spike sorting. There are multiple facilities available for this, and many of them are considerably more sophisticated than this simple method.

Fitting Parameters to Histograms

The peak-to-peak amplitude histogram is obviously bimodal (reflecting the two spike types marked by the events), but the descriptive statistics shown above the display assume a normal, unimodal distribution. So although they are mathematically accurate, they are not helpful. However, DataView allows the histogram to be analysed as a mixture of normal (or other) distributions, and this is appropriate here.

When the Fit mixtures dialog box is displayed, a red curve showing a scaled PDFPDF = probability distribution function. is superimposed on the histogram, with parameters set in the Fit/Show frame of the dialog box. The default is to show the PDF of a single Gaussian (normal) distribution, with mean and standard deviation set from the data. You can always return to this simple fit by clicking the Update 1 button. However, this is clearly wrong for these data.

There are two ways of finding mixture parameters to fit the data: automatic (only available for Gaussian mixtures) and manual. With either method you can return to an earlier set of parameters by clicking the Back button.

Automatic

The automatic method applies a clustering algorithmFrom Bouman's web page: "The ... program applies the expectation-maximization (EM) algorithm together with an agglomerative clustering strategy to estimate the number of clusters which best fit the data. The estimation is based on the Rissenen order identification criteria known as minimum discription length (MDL). This is equivalent to maximum-likelihood (ML) estimation when the number of clusters is fixed, but in addition it allows the number of clusters to be accurately estimated." (Charles A Bouman: Copyright © 1995 The Board of Trustees of Purdue University"In no event shall Purdue University be liable to any party for direct, indirect, special, incidental, or consequential damages arising out of the use of this software and its documentation, even if Purdue University has been advised of the possibility of such damage. Purdue University specifically disclaims any warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. The software provided hereunder is on an "as is" basis, and Purdue Univeristy has no obligation to provide maintenance, support, updates, enhancements, or modifications." And the same goes for DataView! ) to partition the raw data underlying the histogram to find the best mixture parameters.

Note that the Target cluster count edit box shows 0. This instructs the algorithm to automatically choose the "best" number of mixtures in the data.

After a brief interval the Cluster dialog shows that the data can best be described by a mixture of 3 normal distributions. It lists the proportion (fraction of the data belonging to that distribution), mean and standard deviation of each sub-class (component) of the mixture.

The Fit Mixtures dialog now shows the parameters of the 3 component distributions, and the red histogram line shows these superimposed on the histogram.

Rather surprisingly, the PDF line has just 2 peaks, and looks like it only contains 2 distributions. However, the distributions in the first and third rows in the Fit Mixtures dialog have rather similar mean values which approximately co-locate with the right-hand peak, but that in the third row, which only contains about 15% of the data, has a much broader standard deviation. The right-hand peak in the PDF thus has a broader base than that which would be produced by the standard "bell curve" of the first component normal distribution alone.

Physiologically, the 2 main distributions represent the two extracelluar spike types marked by events in the main view. Each of these will have a characteristic shape, but individual spike shapes will be contaminated by noise, thus producing a spread about the "true" shape. The third, minor distribution, is probably caused by spike collisions producing a more substantial deviation from the characteristic shape than that produced by noise alone.

The goodness-of-fit of the mixture to the data is indicated by the 3 values at the bottom of the Fit Mixtures dialog (-ve log likelihood, -ve log rel likelihood and BIC). The interpretation of these numbers is discussed below.

a
bimodal histogram
b
Fit mixtures
Fitting parameters to a bimodal histogram. a. The Event Parameter histogram dialog shows a bimodal distribution of peak-to-peak amplitudes in the parent trace. The PDF of a mixture of 3 Gaussian (normal) distributions is superimposed (red line). Note that the histogram bin colour has been set to pale grey to make the PDF more visible. b. The Fit Mixtures dialog shows the parameters of the PDF. The parameters of the 3 components were determined by using the automatic Cluster method.
Manual

The automatic method uses the raw data underlying the histogram to find the best mixture parameters, but the manual method uses the counts in the histogram bins to find a PDF that fits the shape of the histogram. The main purpose of the manual method is to fit to non-normal distributions (an example is given later), but it can be used for Gaussian distributions if desired.

In the manual method, the user first has to make a "best guess" of the distribution parameters to get the PDF into approximately the correct shape. The parameters are then fine-tuned to maximize the likelihood of the fit using Powell’s minimization method (Press et a., 2007). However, since we have just applied the automatic method we already have a good fit to the histogram, so we can just skip to the fine-tuning stage.

There is a very slight change in the parameter values, but the overall shape of the PDF hardly alters.

As stated above, the manual methods fits the parameters to the histogram shape, so, unlike the automatic method, it is sensitive to the bin width and X axis scales.

The parameters of the 2 main components do not change much, but the minor 3rd component has a significantly larger S.D. It is likely that the reduced bin count hides the details of this component, thus making the fit less accurate.

Three or Two Mixtures?

The histogram looks bimodal, and by-eye it is by no means obvious that the right-hand cluster in the histogram is made up from 2 superimposed normal distributions with different standard deviations. Occam's Razor therefore suggests that maybe 2 components provide an adequate fit to the data. We can force the automatic method to find just 2 components as follows:

The standard deviations of the two mixture components are very similar, so we can further reduce the complexity of the model by forcing the manual Fit to assume that the standard deviations are identical.

The S.D. value is now the same for each component, and has a value about mid-way between the values of the two S.D.s when they were fitted independently. Whether these simplifications are a good idea is discussed later.

Goodness-of-Fit Measures

Note that the goodness-of-fit measures all apply to how well the PDF mixture fits to the histogram (i.e. the bin counts), not to the underlying data. However, the measures are calculated for both automatic and manual fitting.

Likelihood

The –ve log likelihood is shown towards the bottom of the Fit mixtures dialog box. This is the calculated negative logActual likelihood values are extremely small (like the probability of getting exactly 500 heads from 1000 tosses of a fair coin), so taking the negative log of the likelihood produces a number that is more easily human-readable. There are also computational advantages to do with rounding error, but I won't go into those here. of the likelihood that the proposed parameter values of the Gaussian mixture could have produced the histogram bin counts that we actually have. Thus the smaller the displayed value, the better the “fit” between the model and the data. The –ve log rel likelihood is the ratio between the actual likelihood, and the likelihood of achieving data which are exactly those predicted by the model if the model is correct (i.e. the best possible likelihood). This is thus a form of normalized likelihood that enables comparisons between different models and different data, and again, the smaller the value the better the fit.

BIC: Complexity vs Fit TradeOff

The Bayesian (Schwartz) Information Criterion (BICThe BIC is similar in concept and usage to the Akaike Information Criterion (AIC), although the BIC usually penalizes complexity slightly more than the AIC.) is a numerical value that helps choose between models with differing likelihoods and different numbers of parameters. The likelihood of a model can usually be increased by increasing the complexity of the model. Thus in theory one could exactly fit a Gaussian mixture to a histogram by setting the number of components in the mixture equal to the number of bins, and giving each component a low standard deviation and a mean value equal to the mid-bin value. This is known as overfitting! The BIC formula introduces a penalty for increasing complexity which will tend to counterbalance the increased likelihood of the more complex model.

In general, one should choose a model having a lower BIC value in preference to one having a higher BIC value. If the increased likelihood of a more complex model compared to a simpler model is large enough, it will have a lower BIC value despite its increased complexity, and the more complex model should be chosen. However, if the increased likelihood is not sufficient to "justify" the increased complexity, then the BIC value will be higher than the simpler model, and the simpler model should be chosen. The absolute value of BIC has little meaning, and it should not be used to choose between different types of model, but rather between models of the same type but of varying complexity.

The formula for the BIC in a likelihood estimate is:

BIC = -2 ln(L) + k ln(n)

where L is the maximized value of the likelihood of the model, n is the number of data points, and k is the number of parameters.

Comparison of Fit Methods

For convenience, the Fit Mixture dialogs for the 6 fitting procedures described above are shown below.

a
fit mixtures 1
b
fit mixtures 2
c
fit mixtures 3
d
fit mixtures 4
e
fit mixtures 5
f
fit mixtures 6
Fitting a Gaussian mixture model to the same data with various methods. a. Automatic fit, with automatic selection of the number of mixtures resulting in 3 distributions. b. As (a) but followed by manual fit. c. As (b), but with the number of histogram bins reduced to 10. d. Automatic fit, with forced selection of 2 distribution mixtures. e. As (d) but followed by manual fit to 50 histogram bins. f. As (e), but with the standard deviations of the 2 mixtures forced to be equal.

In each case the manual method provides a marginal improvement in the fit measures compared to the automatic method (compare a vs b, and d vs e above). However, this is not surprising, because the manual fit explicitly maximizes the fit between the parameters and the histogram, and the likelihood and BIC measures quantify exactly that fit! This does not mean that the manual method gives a "truer" analysis of the underlying data, since changing the histogram bin count from 50 to 10 produces at least as big a change in manual fit parameter values as the change from the automatic to manual fit method (compare a, b and c above). Note that you cannot compare the goodness-of-fit measures between the 50 and 10 bin histograms (b and c above), because that change applies to the data, not to the parameters, and the goodness-of-fit measures the likelihood of the parameters, given a particular fixed set of data.

The manual goodness-of-fit measures do allow us to compare the 3 and 2 mixture counts, and the equal and unequal S.D. specification, because the data (histogram bin counts) are not changing. The manual 2-mixture fit negative log likelihood values are higher than those of the 3 mixture fit, so the fit is worse, and the BIC value is also higher (compare b and e above). So the substantial reduction in complexity in dropping from 3 to 2 mixtures (which reduces the free parameters from 8 to 5) does not justify the worsening of the fit. However, if we force a fit to 2 mixtures, then also forcing the S.D. values to be equal reduces the BIC value, so this simplification would be justified. This is not too surprising, because the S.D. values are fairly equal anyway. (If we forced equal S.D.s in the 3 mixture model the BIC value would increase, because the minor component of the mixture has a substantially different S.D to the 2 major components. We would also get a quite different mean and proportion for that component, and a worse likelihood fit.)

So, the bottom line for these data is that the 3-component fully automatic fit method probably provides the best description of the data.

Exponential Distributions

This file is an event-only file, with channel a containing a set of regular pulses (250 ms interval, 200 ms duration). Channel b contains random events with a Poisson distribution with a mean interval of 100 ms, while channel c contains a Poisson distribution with a mean of 20 ms. For both channels the intervals are restricted to the range from 2 to 5000 ms. Channel d contains a mixture; the first half is copied from channel c, the second half from channel b.

The default parameter for the histogram is inter-event Interval and the default event channel is a. All the events in channel a have the same 250 ms interval, so they are all in one bin in the histogram.

You should see a rather obvious exponential distribution in the histogram.

When the Fit mixtures dialog box is displayed, a red curve showing a scaled PDFPDF = probability distribution function. is superimposed on the histogram. The default is to show the PDF of a single Gaussian (normal) distribution, but this is clearly wrong for these data.

An exponential distribution is defined by a single parameter, the mean interval, and as soon as you select the new Type, the PDF curve will update to an exponential distribution with the mean value read from the data. The mean value should be approximately 100 ms, and the PDF should be a good fit to the data.

The counts now display on a logarithmic scale, and these fall more or less on a straight line, as they should for an exponential distribution.

The bin counts still fall on a straight line, but the PDF line is wildly wrong.

Mixed Exponentials

Event channel d contains a mixture of 2 exponential distributions. There is a clue to this in the histogram - a line through the peaks of the bins has a distinct kink in it at an X axis value of about 100 ms.

We would like to fit a mixed exponential PDF to the data. Dataview does not have an automatic method for fitting mixed non-Gaussian distributions, so we will have to do a manual fit, and that requires making an initial guess at the distribution parameters in the Fit Mixtures dialog.

The red PDF line now has a distinct kink in it too. It is still a lousy fit to the data, but it may do as a starting guess (figure panel b below).

The PDF now has a good fit to the histogram bin data (figure panel c below). The two detected means are close to their "correctWe don't actually know the correct values. We know that the data were drawn from Poisson distributions with these means, but they were drawn randomly and so the true means are unlikely to be exactly the same as the means of the distributions from which they were drawn." values of 20 and 100 ms. The distribution with the short mean interval occupies a much higher proportion of the data. This is because the two distributions occupy equal time within the recording, and so the one with the shorter mean interval will generate many more data values than the one with the longer mean interval.

The three stages of fitting the mixed exponential PDF are shown below:

a
Mixed exponential 1
b
Mixed exponential 2
c
Mixed exponential 3
Fitting a PDF to histogram containing a mixture of 2 exponential distributions of intervals. Bin counts are in the logarithmic domain. a. The PDF is fitted to a single exponential with the mean interval set to the global mean read from the data. b. A user-specified initial guess at the parameters of a mixture of 2 exponentials (proportions: 0.5, 0.5; means: 70, 300 ms). c. A best-fit 2-component PDF to the histogram display (proportions: 0.839, 0.161; means: 20.159, 101.499 ms).

The mixed exponential PDF shows a good fit to the normal data (non-log counts), but it is much harder to see the inflection point. This is why it is easier to make the initial guess at mixture parameters when counts are shown in the log domain.

Poisson Distributions

You may wonder why a series of events which are claimed to come from a Poisson distribution has an exponential distribution. This is because Poisson events do indeed have an exponential interval distribution – the Poisson part refers to the distribution of the number of events that occur in a fixed time period.

DataView divides the analysis window into a contiguous sequence of 200 ms distribution bin lengths, and counts how many Target events there are in each length. The histogram shows how many such lengths there are with 0, 1, 2 etc Target events within them. Thus the left-most bin in the histogram shows how many 200 ms lengths have no events within them - there are about 360 such lengths (the bin height is about 360). The next bin shows that there are 683 lengths with 1 event within them, and the next bin shows there are about 665 lengths with 2 events within them. And so on. The histogram X axis is set to autoscale, and it shows that there are no lengths containing more than 10 events (the right-hand X axis scale is 11, but it is the left bin boundary value that specifies the count). Note that the X axis scale will attempt to automatically adjust so that the bin width becomes an integer value. This is because fractional counts of occurrences makes no sense.

The histogram shows the typical skewed shape of a Poisson distribution in the histogram.

The Distribution count display measures Target events in the continuous sequence of all the Distribution bin lengths in the analysis window. However, you get the same distribution pattern even if you look at discontinuous lengths.

The histogram now shows the number of events in channel a which encompass 0, 1, 2 etc events in channel b. The shape of the histogram is the same as before, although of course the bin counts are lower because we are not counting channel b events from the entire recording.

The topic of how Poisson spike distributions can arise from membrane noise is explored in the Poisson from noise tutorial.

Log X Histogram Display

This file contains a mixture of events drawn from a gamma distribution with a shape factor 3. This sort of distribution is like a Poisson distribution, except with a built-in “refractory period” that prevents the occurrence of very short intervals. (Gamma distributions often provide a reasonably good way of describing inter-spike intervals for randomly spiking neurons. The Poisson distribution actually is a gamma distribution, with shape factor of 1.) Channel a has high frequency events (mean interval 50 ms), b has lower frequency (mean interval 500 ms), and c and d are a mixture of a and b, with d being just a shuffled version of c.

Note that with the standard linear X-axis display, the histogram has a very long tail to the right, but with very few counts in the tail. This reflects the low frequency (long interval) part of the mixture, while the condensed large peak to the left reflects the high frequency component.

The histogram now shows the logarithm of the intervals, so that each bin of the histogram contains a successive larger multiple of the interval range as you move to the right. This means that bins to the right of the histogram contain counts over a wider range of intervals than bins to the left, which in part compensates for the fact that the bins to the right represent the lower frequency component and thus have fewer counts. The histogram is now bimodal, and the 2 peaks fall on the 2 mean values of the mixed distribution.

You should see a bimodal PDF, but it is not a good fit to the data.

With luck, you will now have a much better fit between the PDF and the data, and the parameters will not be far off those used in constructing the data. Note that the 0.9 : 0.1 proportion reflects that the fact that the two distributions occupy equal total time, but one has 10 times as many events as the other.

The take-home message here is that a histogram with the X axis displaying logarithmic data may provide a better visualization of intervals that follow a Poisson or gamma distribution then a histogram with a linear X axis. Since many spike trains seem to follow a gamma-like distribution, this may help the characterization of inter-spike intervals.

The fit method finds a local minimum rather than a global minimum, so if your initial "by-eye" estimates are wildly wrong, the pdf fit is likely to also be wildly wrong. However, this is usually very obvious, so you can just improve your initial estimates and try again.

Statistics and Phase Calculation

The text display above the histogram gives key statistical parameters of the data producing the histogram. For most histogram types the statistics have standard meaning, but for the Phase parameter circular statistics are used (see Zar, 2010 for details). Consider two event channels which are more-or-less in phase with each other. This would generate a histogram with phases tightly clustered around the 0 – 1 value, which would have a highly bimodal appearance. However, the mean value is calculated using circular statistics in which the range 0-to-1 represents points around the circumference, and with 0 and 1 being actually the same point on the circle. The mean phase would thus have a value close to either 0 or 1, rather than close to 0.5 as would be the case of the mean was calculated in the standard way.

The histogram should look like this:

phase histogram
A standard phase histogram where the phase is centred around 0/1 appears bimodal, but the phase itself is actually unimodal.

Now check the Show circular plot box. This displays the histogram as a polar (circular, angle or rose) plot. The 0/1 phase is at the top of the plot, and phase angle progresses clockwise around the circle. If you advance Chan 1 to c, you see an increase in the spread of phase angles. Advance Chan 1 to d, and the phase becomes random.

a
circular phase plot with tight cluster
b
circular phase plot with loose cluster
c
circular phase plot with random cluster
Circular phase plots. a. Tight phase cluster (derived from the histogram shown above). b. Looser cluster. c. Random cluster.

Analysis Region

This is an event only file, constructed from DataView facilities. Event channel g contains events with different Gaussian distributions in different regions. It was constructed as follows:
Event channel a contains events with a Gaussian distribution with mean interval of 100 ms, s.d. 10, while channel b contains events with a Gaussian distribution with mean interval of 130 ms, s.d. 20. If you examine the Descriptive statistics produced by the Interval histogram, this will confirm these values.

Channel c divides the record into random intervals, and channel d is simply the inverse of c. These have been logically ANDed with channels a and b to produce e and f. Finally, e and f were combined with logical OR to produce g, which thus is non-stationary; it contains a mixture of Gaussians made up from stationary Gaussians in different regions.

By default, the histogram will analyse all the events in a channel. However, you can choose to analyse just a sub-region or regions of the file.

This displays a sub-dialog box with a drop-down list containing four options. Whole file analyses the whole file, as the name suggests. Visible screen analyses just the data visible on screen, again as the name suggests. You can have two windows showing views of different regions of the same file, and as you switch between windows, the histogram changes to reflect the different visible data. User time allows you to set a start and end time for analysis independent of the view. Finally, gate channel species an event channel to act as a “gate” – only data regions within events in that channel get analysed. If there are multiple events, each event acts as if it was an independent file in its own right, and the final histogram is the summed result of analysis of each of these sub-regions. This can be useful if your data are non-stationary (i.e. changes its characteristics over time) and you only want to analyse data with particular characteristics.

Continuous Time Histograms

The Freq in time bin and Count in time bin options differ from the other histogram parameter choices in that the X axis represents explicit elapsed time within the recording. For these analyses the recording, or a section of the recording, is divided into a sequence of contiguous time bins. The number of events whose start times fall within each bin is calculated, and, for the Freq option, converted into frequency by dividing by the bin width.

For normal displays (in which the Select analysis Rgn option is not selected) the user sets the analysis start time (as the left-hand X axis scale) and the Bin width and Bin number parameters. The analysis end time (right-hand X axis scale) is automatically calculated from these values. If the Select analysis Rgn option is selected, then the active section of the recording (as indicated by the X axis scale values) is determined by the options selected within the Analysis region dialog. The user sets the Bin number in the main histogram dialog, and this automatically adjusts the bin width to provide complete coverage of the active section.

Optimal Bin Width

One of the problems with histograms is choosing an appropriate bin width (or equivalently, bin number) for the display. If the width is too small, the output is dominated by spurious fluctuations due to random variations in the data, if too large, key details may be missed. Knuth (2013) suggested a Bayesian methodology for calculating an optimal bin width given the data themselves.  

The intervals are actually generated from random numbers with a mean of 100 and s.d. of 20, but with 50 histogram bins, the display is quite “lumpy”. Of course, with real data we would not know whether the lumps and bumps actually meant something, or whether they were just chance. Optimizing the bin number helps us decide.

A graph LogP of bin number displays showing the probability metric (the relative log posterior probability) for bin counts from 1 to 50, and this graph peaks at a value of 16. The main histogram is set to have a Bin number of 16 (previously it was 50), and the X axis scales have been set to encompass the data range (which is a necessary condition for the algorithm). Note that the histogram now just shows the shape of the normal distribution.

The take-home message here is that the optimization algorithm works best if the initial Bin number (which sets the maximum number of bins that will be included in the algorithm) is set to a high but plausible value. Inspection of the graph can reveal whether the algorithm is working within the plausible range.

Channel d contains events with a Poisson distribution, which is the classic interspike interval distribution for non-pacemaker neurons. Optimization suggests that the underlying distribution probability is best captured with a 23 bin histogram.
Channel e contains a complex mix of 3 normal distributions superimposed on a wide-range uniform distribution. This is a case where it is best to increase the initial display above the default 50 bins.

The algorithm suggests that 81 bins is best to capture this distribution. But remember, the algorithm only searches for an optimum up to the value in the Bin number edit box. There may be a better optimum if a higher ceiling is set, but the strange visual appearance of histogram with very large numbers of bins means that this is rarely useful in practice.

Kernel density estimate

Kernel density estimation (KDEThe KDE is also sometimes called the Parzen–Rosenblatt window method, after the statisticians who invented it.) is a way of smoothing histograms without making assumptions about the underlying distribution of the data. It uses the raw data underlying the histogram, rather than the binned data, and so it is not affected by varying the bin number or width in the histogram. Further details on KDE are given here.  

See also

Peri-stimulus time histogram in Stimulus-response analysis.

Event auto-correlation, event cross-correlation in Correlation analysis.

Extract sub-file(s).

 


Previous           Next