Coronavirus information and guidance

ID5059 - Knowledge Discovery and Datamining

Academic year

2020-2021 (Semester 2)

Key module information

SCOTCAT credits

15

The Scottish Credit Accumulation and Transfer (SCOTCAT) system allows credits gained in Scotland to be transferred between institutions. The number of credits associated with a module gives an indication of the amount of learning effort required by the learner. European Credit Transfer System (ECTS) credits are half the value of SCOTCAT credits.

SCQF level

SCQF level 11

The Scottish Credit and Qualifications Framework (SCQF) provides an indication of the complexity of award qualifications and associated learning and operates on an ascending numeric scale from Levels 1-12 with SCQF Level 10 equating to a Scottish undergraduate Honours degree.

Availability restrictions

Not automatically available to General Degree students

Planned timetable

11.00 am Mon (odd weeks), Wed and Fri

This information is given as indicative. Timetable may change at short notice depending on room availability.

Module coordinator

Dr K Terzic

This information is given as indicative. Staff involved in a module may change at short notice depending on availability and circumstances.

Module Staff

Professor Tom Kelsey , Team taught

This information is given as indicative. Staff involved in a module may change at short notice depending on availability and circumstances.

Module description

Contemporary data collection can be automated and on a massive scale e.g. credit card transaction databases. Large databases potentially carry a wealth of important information that could inform business strategy, identify criminal activities, characterise network faults etc. These large scale problems may preclude the standard carefully constructed statistical models, necessitating highly automated approaches. This module covers many of the methods found under the banner of Datamining, building from a theoretical perspective but ultimately teaching practical application. Topics covered include: historical/philosophical perspectives, model selection algorithms and optimality measures, tree methods, bagging and boosting, neural nets, and classification in general. Practical applications build sought-after skills in programming (typically R, SAS or python).

Relationship to other modules

Anti-requisites

You cannot take this module if you take CS5014

Assessment pattern

As used by St Andrews

2-hour Written Examination = 60%, Coursework = 40%

As defined by QAA

Written examinations = 60%
Practical examinations = 0%
Coursework = 40%

The Quality Assurance Agency (QAA) have compiled/developed an indicative list of learning and teaching methods:
  • Written: Is included in this category any assessment done under exam conditions (exams during diets, class tests) that do not involve the use of practical skills.
  • Practical: Are included in this category oral assessment and presentation as well as practical skills assessed in situ (in a classroom or laboratory for instance). Performances in the performing arts context are also classed as practical assessment.
Further details can be found on the QAA website.

Re-assessment

2-hour Written Examination = 60%, Existing Coursework = 40%

Learning and teaching methods and delivery

Weekly contact

Lectures, seminars, tutorials and practical classes.

Scheduled learning hours

35

The number of compulsory student:staff contact hours over the period of the module.

Guided independent study hours

115

The number of hours that students are expected to invest in independent study over the period of the module.

Intended learning outcomes

  • Understand the mathematics underpinning common machine-learning/data-mining methods, including parameter estimation
  • Determine what models are applicable for different data and objectives
  • Understand complex regressions from the perspective of basis functions, tree methods, boosting/bagging/ensemble model variants, neural networks, deep-learning, and other selected method
  • Conduct hyperparameter-tuning/model-selection as appropriate to the model
  • Manipulate data, fit models, and summarise/display their results/performance and objectively compare models in R, Python or other suitable language
  • Conduct comprehensive analysis of large real-world data, within a group, covering: data preparation; model fitting, critique & refinement; and presentation of results to a range of audiences