3D Graph and Cluster: External Data
You can use DataView 3D Graph and cluster facilities to display and cluster data from sources external to the program.
- Select the Analyse: External data: 3D graph and cluster menu command to open the 3D Graph and Cluster External Data dialog. Unlike most analysis routines, this is available even if no data file is loaded into the main program.
Help
The dialog is quite complicated and not all its facilities will be used in this tutorial, but remember that you can get full built-in help on the dialog by pressing the F1 key at any time when the dialog is active.
Data source
The external data source should be plain-text numbers arranged in multiple tab- or comma-separated equal-length columns. There can be one or more text header rows, but these are ignored.
There are 3 ways to load data into the program:
- Copy the data onto the clipboard , and then click the Paste button in the dialog.
- Click the Load button in the dialog and select a text file (.txt or .csv) containing the data.
- Drag-and-drop a text file containing the data from File Explorer onto the dialog.
- Click 6d cluster.txt.
- When you click the link, most browsers will open the file and display the numbers. You can then select them (usually control-a) and copy them to the clipboard (control-c). Then click the browser Back button to return to this page.
- Alternatively, you can right-click the link and download a local copy of the file, then open and copy its contents (or load or drag-and-drop as described above).
- Assuming you have placed the file contents onto the clipboard, click Paste in the Robust Fit dialog.
When you load the data, a notification message tells you that 2 rows contained non-numeric data and were skipped - these were header rows. The remaining data were organized into 6 columns representing dimensions, and 326 rows, representing the individual 6-D items. The data values (read only) are displayed in the grid, while the 3D scatter graph automatically displays the first 3 dimensions.
3D Display
There are many facilities to tune the display, this guide just describes some of the ones that are more frequently used.
Rotation
- Click-and-drag on the box outline in the display to adjust the viewpoint.
- Click the Front button in the Rotation group above the display for a view that is essentially an X-Y scatterplot (the Z dimension runs into the screen).
- Click the Default button to return to the slightly offset default view.
- Select the Auto Y option to see the view rotate about the vertical (Y) axis.
- At this point you could make a video of the rotating view by clicking the Make AVI button, but that would take a while.
- Select the Mouse option to make the display draggable again, then click the Default button to return to where you started.
Zoom and Autoscale
- Click the Sel button in the Scales & Colour group on the left of the display.
- Click-and-drag around the obvious left-hand cluster of points in the display. A free-form self-closing outline follows the mouse movement.
- If you are not happy with data included in the selection, click Cancel and try again.
- Click the Zoom button.
- The scales expand to just show data within your selection.
- Check the X, Y and Z Autoscale boxes.
- The display autoscales each axis to show all the data.
Clustering
There are 2 obvious clusters visible in the default 3D display.
- Click the Cluster button in the Cluster group under the display.
The Cluster dialog opens, and clustering immediately starts using the default Target cluster count of 0, which automatically selects an optimum number of clusters based on the data themselves.
Three clusters are detected, and the data points are automatically coloured according to the cluster to which they belong. Two clusters are the obvious left and right groups in the display, which the third cluster is a diffuse set of points that do not fit into either obvious cluster.
Press F1 if you want more information on the Cluster dialog options and output.
- Click OK to accept the clusters and dismiss the dialog.
- If you click Cancel the dialog is dismissed but the clusters are rejected.
Dimensions
There are 6 dimensions for each datum, but only 3 at a time can be displayed in the graph.
- Set the X, Y and Z parameters in the Data Sets group to 4, 5 and 6 respectively.
- There are no obvious clusters in the display, although the data points are still coloured from the clusters identified in the other dimensions. You can drag the box around a bit to confirm this from different viewpoints.
- Select the 3 Displayed option in the Cluster group.
- This means that future clustering will not use all the dimensions, but only the 3 dimensions in the display.
- Click the Cluster button again, then click OK in the Cluster dialog.
All the data points are drawn in the same colour, i.e. assigned to the same cluster (which is not really a cluster at all, since it includes all the data).
- Select the All dim option, and click Cluster again.
The multiple colours reappear due to the clusters in the non-displayed dimensions, but they are randomly scattered in these dimensions.
- Set the view to dimensions 1, 2 and 3 again.
Evidently the clusters in the first 3 dimensions are prominent enough to be detected by the clustering algorithm even though there are no clusters in the remaining 3 dimensions.
Saving clusters
- Click the List button in the Cluster group.
This adds a new column to the right-hand end of the data grid. This contains numbers identify to which cluster each row (datum pont) belongs.
- Use the horizontal scroll bar in the data grid to view the right-most (seventh) column G.
- Note that this column only contains the numbers 0, 1 or 2, which identify the cluster.
- Set the Y data set parameter to 7, then click the Front button.
- The display now shows 3 horizontal bands on the Y axis, representing the cluster identification of each point.
You can export the data, including the cluster identification column, by clicking the Copy sheet or Save sheet button in the Data Source group. This copies the data grid to the clipboard or saves it to a file respectively.
Editing data
You cannot edit individual data values in the grid, but you can delete rows or columns, and you can add additional columns.
To delete a column, click on any cell in the column that you wish to delete in order to identify the column, then click the Del col button. This would be useful if you wanted to delete the cluster identification column prior to re-clustering with a different cluster option (if you leave the indentification column in place it could be included in the cluster algorithm, which is almost certainly not what you want). It would also be useful if you wanted to export just the cluster identification column by deleting all the others.
To delete a row, click on any cell in the row that you wish to delete, then click the Del row button. This would be useful if a particular row contained an outlier that was distorting the evaluation.
To add one or more columns to an existing data set, select the add col(s) option. Then use Paste, Load or drag-and-drop method to add new data from a file in the same format as the original. (Note that the default replace option will replace existing data, as though the new data were added to an empty grid.)