Skip navigation to content

Statistics support

Our tutors are able to provide assistance with a wide array of topics, ranging from the fundamentals to more advanced statistical concepts. Below is a non-exhaustive list of topics that we are able to help with. 

Fundamentals of Statistics

Understanding the fundamentals of statistics is imperative if you intend to go on, and draw meaningful conclusions from the analysis of data. An initial important step is to correctly classify your data type, and recognize the most appropriate way to display and/or explore your data. These relatively basic initial steps should not be underestimated, as any confusion at this level will ultimately lead on to incorrect or meaningless inference.

This section of the website contains resources which focus mainly on the fundamentals i.e. classification of data, the concepts behind summarizing data and the notions behind basic probability theory which allow us to test hypothesis relating to data.

Data types & One number summaries: The correct classification of data type is essential, as only then can the correct statistical techniques be employed to allow us to draw meaningful conclusions in relation to the data. Summarizing an important feature of a set of data using just one number i.e. a statistic can be used to gain a fundamental knowledge of the central tendency, most frequently occurring or spread of the data. Adequately displaying such data correctly is an important step in exploring your data and assessing if any trends/patterns exist.
- mean,med (PDF, 96 KB) & data types (PDF, 112 KB)

Probability density/ mass & hypothesis testing: This PDF describes the basic concepts of probability theory, which is based on assessing if the occurrence of some event can reasonably be put down to chance. We may be interested in the probability of a bus turning up in the next ten minutes, or the chance of drawing a full house in poker. Given adequate data, and under certain assumptions probability theory allows us to identify the probabilities associated with each of these events occurring. - probability (PDF, 126 KB) & Hypothesis testing (PDF, 95 KB)

Statistical tests

Unscrupulous and uninformed statistical analysis as usually associated with rash governmental claims, or crass newspaper headlines give Statistics a bad name. However choosing the correct statistical analysis is relatively simple once the scientific question of interest has been decided and the data has been correctly classified. 

This section of the website focuses on the most commonly used statistical tests, and discusses when they can be used.

T-tests: T-tests are used to compare means of one or two samples. We can compare the mean of one sample of values to a hypothesized value i.e. (one-sample t-test), we can compare the means of two independent samples i.e. (two-sample t-test), or even the means of two dependent samples i.e (paired t-test). T-test (PDF, 154 KB)

Correlation: Correlation is used to measure the strength and direction of the linear relationship between two variables. - correlation (PDF, 115 KB)

ANOVA: ANOVA refers to a well-established technique for comparing a number of means based on a sample of observations, it is essentially an extension of the T-test, but allows us to compare multiple samples. As with the T-test there are multiple types of ANOVA techniques allowing us to deal with different situations, the linked PDF contains more details. ANOVA (PDF, 142 KB)

MANOVA: The MANOVA is essentially an extension of the ANOVA, however this allows us to deal with more than one dependent variable. - MANOVA (PDF, 66 KB)

Chi-squared test: The chi-squared tests allows us to test a theory by comparing observed numbers with those expected, we asses if any observed discrepancies from our theory can be reasonably put down to chance. - chi-square (PDF, 126 KB)

Non-parametric tests: T-tests, ANOVAs and MANOVAs are all based on certain assumptions, namely that we know what distribution the statistic used originates from, however often these assumptions cannot be met, therefore non-parametric equivalents must be used.

For example the t-test is based on the assumption of a Normally distributed data, hence the mean is an adequate measure of central tendency. However, if our data were highly skewed then mean would be pulled toward the tail of the data so here the median would be a better measure of central tendency, hence in such a case the equivalent non-parametric test is used which compares medians i.e. (Wilcoxon test, Mann-Whitney test). - non-parametric (PDF, 106 KB)

Tests for analysing Likert type/ scale data. A step by step guide into which of the above statistical tests should be used for analysing Likert type/scale data. - Likert (PDF, 86 KB)

Statistical regression

The objective is to find the mathematical formula which best describes the relationship between the variables of interest. Hence regression analysis aims to explain the relationship between the dependent variable Y and the explanatory variables x0 is.

For example we may want to use a set of variables, say previous assignment scores, degree program to find which are considered useful in predicting the dependent variable, say exam score.

Simple/ multiple linear Regression: This type of regression is used when the dependent variable (response) is continuous and the relationship between it and any continuous independent variables (predictors) is assumed to be linear. - Simple-linear-regression (PDF, 107 KB)

Logistic regression: This type of regression is used when we have dichotomous outcome variables i.e. (event occurs / event doesn't occur), binary logistic regression is used to model the log odds of the outcome as a linear combination of the predictor variables. We estimate a set of regression coefficients that predict the probability of an event occurring. We therefore predict from a knowledge of relevant independent variables the probability (p) that it is 1 (event occurring) rather than 0 (event not occurring). - Logistic regression (PDF, 103 KB)


Ordinal Logistic regression: This type of regression is used when we have ordinal outcome variables i.e. (a Likert type scale), binary logistic regression is used to model the log odds of observing a particular outcome or less as a linear combination of the predictor variables. - ordinal logistic regression (PDF, 86 KB)


Linear mixed effects model: This type of regression is used when we want to account for some inter-dependence in the responses in relation to some factor. To deal with this we add a random effect into the model which allows us to assume a different baseline response value for each factor. Hence the model is called a mixed effects model due to the inclusion of both fixed and random effects. - Random effects (PDF, 85 KB)

Examples of analysing data & how to write a report

This section of the website focuses on the presentation and methodology of basic statistical reports. From a guideline/decision tree" to examples of interpreting all the above mentioned statistical procedures using the statistical package R.

Choosing the correct method of analysis: The statistical/decision tree" culminates in a summary of the most commonly used statistical procedures. To use it, first you must correctly classify the type of data you have, then simply follow the branches deciding what questions you want to answer, this will then lead you to the correct statistical test/procedure to use. - decisiontree (PDF, 462 KB)

Linear Regression example: This example starts out with basic exploratory data analysis, then moves onto T-tests, ANOVA and multiple linear regression whilst giving the relevant R code and interpreting the output. - regressionknit (PDF, 265 KB)

Logistic Regression example: This example starts out with basic exploratory data analysis, then moves onto correlation and fitting a Logistic regression whilst giving the relevant R code and interpreting the output. - logisticknit (PDF, 158 KB)

Ordinal Logistic Regression example: This example starts out with basic exploratory data analysis, then moves onto a Mann-Whitney test, Chi-squared test, Kruskal Wallis test and fitting an Ordinal logistic regression whilst giving the relevant R code and interpreting the output. The example then moves on to discuss analysing Likert Scale data using parametric techniques. - OrdinalexampleR (PDF, 189 KB)

Linear Mixed Effects Regression example: This goes through the procedure of fitting and interpreting a linear mixed effects model in R, including all required code. - mixedeffectsknir (PDF, 143 KB)

How to write a report

The guides above are not templates for how a statistical report should be best written, but a guideline of the natural progression of statistical procedures used on certain data types, and examples of how to correctly interpret the output produced by statistical software. Follow this link for a precedent as to how to write an ideal report. http://www.maths.manchester.ac.uk/saralees/68371_1.pdf 

Online resources

  1. http://www.psych.utoronto.ca/courses/c1/spss/toc.htm - SPSS
  2. https://www.nceas.ucsb.edu/files/scicomp/Dloads/RProgramming/BestFirstRTutorial.pdf - R
  3. http://www.stat.cmu.edu/hseltman/309/Book/chapter15.pdf - Mixed models in SPSS

CAPOD

CAPOD office
Hebdomadar's Block
St Salvators' Quad

Tel: (01334) 46 7175
Email: learning@st-andrews.ac.uk