DATA423-20S2 (C) Semester Two 2020

Data Science in Industry

15 points

Details:
Start Date: Monday, 13 July 2020
End Date: Sunday, 8 November 2020
Withdrawal Dates
Last Day to withdraw from this course:
  • Without financial penalty (full fee refund): Friday, 24 July 2020
  • Without academic penalty (including no fee refund): Friday, 25 September 2020

Description

In this course we will address core topics in the application of data science in industry.

This course is taught by a practising Data Scientist and attempts to teach real-life issues that will not be found in text books. The course will cover topics deemed central for a career in Data Science.

This course is heavily focused on the “applied” side of data science rather than the
theoretical. We will use R as the language of choice. Much of the material involving R and shiny
will involve a degree of self learning especially in the early part of the course.

Learning Outcomes

  • There is an emphasis on three main themes.

  • Best statistical practise
    We will progressively look at each stage of analysing data and producing a model of it.
    Best practise is mainly about doing the right things in the order right. In particular we look at the vexing issue of “data leakage.”

  • Communication through visualisation
    We will employ “Shiny” to visualise our data science. Shiny is built upon R and enables you to write an interactive web page employing dynamic visualisations. This is a great way to “sell” your work to your “clients” through a clear message that non-technical decision makers can relate to.

  • Problems typical of the “real” world
    Real life data is not like the numerous data sets that are available in the public domain. Real life data sets are messy; they have: ambiguity, missing data, useless variables, units, data-gaps, measurement uncertainty, correlation, near-zero variance, too many variables, unbalanced categories etc.

Pre-requisites

Subject to approval of the Head of Department of Mathematics and Statistics.

Timetable 2020

Students must attend one activity from each section.

Lecture A
Activity Day Time Location Weeks
01 Monday 12:00 - 13:00 Jack Erskine 315 13 Jul - 23 Aug
7 Sep - 18 Oct
Lecture B
Activity Day Time Location Weeks
01 Wednesday 13:00 - 14:00 Rehua 002 Lectorial 13 Jul - 23 Aug
7 Sep - 18 Oct
Computer Lab A
Activity Day Time Location Weeks
01 Wednesday 15:00 - 16:00 Ernest Rutherford 464 Computer Lab 13 Jul - 23 Aug
7 Sep - 18 Oct
02 Wednesday 16:00 - 17:00 Ernest Rutherford 464 Computer Lab 17 Aug - 23 Aug
7 Sep - 18 Oct

Course Coordinator / Lecturer

Nicholas Ward

Assessment

Assessment Due Date Percentage 
Quiz A 5%
Assignment 1 20%
Quiz B 5%
Assignment 2 20%
Quiz C 5%
Assignment 3 20%
Quiz D 5%
Assignment 4 20%

Textbooks / Resources

There is no prescribed textbook.

Indicative Fees

Domestic fee $1,022.00

* Fees include New Zealand GST and do not include any programme level discount or additional course related expenses.

For further information see Mathematics and Statistics.

All DATA423 Occurrences