Tutorial Contents

3D graph and cluster: external data

Data source

3D display

Clustering

Contents

3D Graph and Cluster: External Data

You can use DataView 3D Graph and cluster facilities to display and cluster data from sources external to the program.

Help

The dialog is quite complicated and not all its facilities will be used in this tutorial, but remember that you can get full built-in help on the dialog by pressing the F1 key at any time when the dialog is active.

Data source

The external data source should be plain-text numbers arranged in multiple tab- or comma-separated equal-length columns. There can be one or more text header rows, but these are ignored.

There are 3 ways to load data into the program:

  1. Copy the data onto the clipboard , and then click the Paste button in the dialog.
  2. Click the Load button in the dialog and select a text file (.txt or .csv) containing the data.
  3. Drag-and-drop a text file containing the data from File Explorer onto the dialog.

When you load the data, a notification message tells you that 2 rows contained non-numeric data and were skipped - these were header rows. The remaining data were organized into 6 columns representing dimensions, and 326 rows, representing the individual 6-D items. The data values (read only) are displayed in the grid, while the 3D scatter graph automatically displays the first 3 dimensions.

3D Display

There are many facilities to tune the display, this guide just describes some of the ones that are more frequently used.

Rotation

Zoom and Autoscale

Clustering

There are 2 obvious clusters visible in the default 3D display.

The Cluster dialog opens, and clustering immediately starts using the default Target cluster count of 0, which automatically selects an optimum number of clusters based on the data themselves.

Three clusters are detected, and the data points are automatically coloured according to the cluster to which they belong. Two clusters are the obvious left and right groups in the display, which the third cluster is a diffuse set of points that do not fit into either obvious cluster.

Press F1 if you want more information on the Cluster dialog options and output.

Dimensions

There are 6 dimensions for each datum, but only 3 at a time can be displayed in the graph.

All the data points are drawn in the same colour, i.e. assigned to the same cluster (which is not really a cluster at all, since it includes all the data).

The multiple colours reappear due to the clusters in the non-displayed dimensions, but they are randomly scattered in these dimensions.

Evidently the clusters in the first 3 dimensions are prominent enough to be detected by the clustering algorithm even though there are no clusters in the remaining 3 dimensions.

Saving clusters

This adds a new column to the right-hand end of the data grid. This contains numbers identify to which cluster each row (datum pont) belongs.

You can export the data, including the cluster identification column, by clicking the Copy sheet or Save sheet button in the Data Source group. This copies the data grid to the clipboard or saves it to a file respectively.

Editing data

You cannot edit individual data values in the grid, but you can delete rows or columns, and you can add additional columns.

To delete a column, click on any cell in the column that you wish to delete in order to identify the column, then click the Del col button. This would be useful if you wanted to delete the cluster identification column prior to re-clustering with a different cluster option (if you leave the indentification column in place it could be included in the cluster algorithm, which is almost certainly not what you want). It would also be useful if you wanted to export just the cluster identification column by deleting all the others.

To delete a row, click on any cell in the row that you wish to delete, then click the Del row button. This would be useful if a particular row contained an outlier that was distorting the evaluation.

To add one or more columns to an existing data set, select the add col(s) option. Then use Paste, Load or drag-and-drop method to add new data from a file in the same format as the original. (Note that the default replace option will replace existing data, as though the new data were added to an empty grid.)