ID5059 Knowledge Discovery and Datamining

Academic year

2023 to 2024 Semester 2

Key module information

SCOTCAT credits

15

The Scottish Credit Accumulation and Transfer (SCOTCAT) system allows credits gained in Scotland to be transferred between institutions. The number of credits associated with a module gives an indication of the amount of learning effort required by the learner. European Credit Transfer System (ECTS) credits are half the value of SCOTCAT credits.

SCQF level

SCQF level 11

The Scottish Credit and Qualifications Framework (SCQF) provides an indication of the complexity of award qualifications and associated learning and operates on an ascending numeric scale from Levels 1-12 with SCQF Level 10 equating to a Scottish undergraduate Honours degree.

Availability restrictions

Not automatically available to General Degree students

Planned timetable

11.00 am Mon (odd weeks), Wed and Fri

This information is given as indicative. Timetable may change at short notice depending on room availability.

Module coordinator

Dr C M Fell

This information is given as indicative. Staff involved in a module may change at short notice depending on availability and circumstances.

Module Staff

Dr C Fell

This information is given as indicative. Staff involved in a module may change at short notice depending on availability and circumstances.

Module description

Contemporary data collection can be automated and on a massive scale e.g. credit card transaction databases. Large databases potentially carry a wealth of important information that could inform business strategy, identify criminal activities, characterise network faults etc. These large scale problems may preclude the standard carefully constructed statistical models, necessitating highly automated approaches. This module covers many of the methods found under the banner of Datamining, building from a theoretical perspective but ultimately teaching practical application. Topics covered include: historical/philosophical perspectives, model selection algorithms and optimality measures, tree methods, bagging and boosting, neural nets, and classification in general. Practical applications build sought-after skills in programming (typically R, SAS or python).

Relationship to other modules

Anti-requisites

YOU CANNOT TAKE THIS MODULE IF YOU TAKE CS5014

Assessment pattern

2-hour Written Examination = 60%, Coursework = 40%

Re-assessment

Oral examination = 60%, Existing Coursework = 40%

Learning and teaching methods and delivery

Weekly contact

Lectures, seminars, tutorials and practical classes.

Scheduled learning hours

35

The number of compulsory student:staff contact hours over the period of the module.

Guided independent study hours

115

The number of hours that students are expected to invest in independent study over the period of the module.

Intended learning outcomes

  • Understand the mathematics underpinning common machine-learning/data-mining methods, including parameter estimation
  • Determine what models are applicable for different data and objectives
  • Understand complex regressions from the perspective of basis functions, tree methods, boosting/bagging/ensemble model variants, neural networks, deep-learning, and other selected method
  • Conduct hyperparameter-tuning/model-selection as appropriate to the model
  • Manipulate data, fit models, and summarise/display their results/performance and objectively compare models in R, Python or other suitable language
  • Conduct comprehensive analysis of large real-world data, within a group, covering: data preparation; model fitting, critique & refinement; and presentation of results to a range of audiences