DATA423-20S1 (C) Semester One 2020

Data Science in Industry

15 points

Details:
Start Date: Monday, 17 February 2020
End Date: Sunday, 21 June 2020
Withdrawal Dates
Last Day to withdraw from this course:
  • Without financial penalty (full fee refund): Friday, 28 February 2020
  • Without academic penalty (including no fee refund): Friday, 29 May 2020

Description

In this course we will address core topics in the application of data science in industry.

This course is taught by a practising Data Scientist and attempts to teach real-life issues that will not be found in text books. The course will cover topics deemed central for a career in Data Science.

This course is heavily focused on the “applied” side of data science rather than the
theoretical. We will use R as the language of choice. Much of the material involving R and shiny
will involve a degree of self learning especially in the early part of the course.

Learning Outcomes

There is an emphasis on three main themes.

1. Best statistical practise
We will progressively look at each stage of analysing data and producing a model of it.
Best practise is mainly about doing the right things in the order right. In particular we look at the vexing issue of “data leakage.”

2. Communication through visualisation
We will employ “Shiny” to visualise our data science. Shiny is built upon R and enables you to write an interactive web page employing dynamic visualisations. This is a great way to “sell” your work to your “clients” through a clear message that non-technical decision makers can relate to.

3. Problems typical of the “real” world
Real life data is not like the numerous data sets that are available in the public domain. Real life data sets are messy; they have: ambiguity, missing data, useless variables, units, data-gaps, measurement uncertainty, correlation, near-zero variance, too many variables, unbalanced categories etc.

Pre-requisites

Subject to approval of the Head of Department of Mathematics and Statistics.

Timetable 2020

Students must attend one activity from each section.

Lecture A
Activity Day Time Location Weeks
01 Monday 10:00 - 11:00 E16 Lecture Theatre (17/2-16/3)
- (23/3, 20/4, 4/5-25/5)
17 Feb - 29 Mar
20 Apr - 26 Apr
4 May - 31 May
Lecture B
Activity Day Time Location Weeks
01 Tuesday 12:00 - 13:00 Jack Erskine 442 (18/2-17/3)
- (24/3, 21/4-26/5)
17 Feb - 29 Mar
20 Apr - 31 May
Computer Lab A
Activity Day Time Location Weeks
01 Thursday 09:00 - 10:00 Jack Erskine 442 (20/2-19/3)
- (23/4-28/5)
17 Feb - 22 Mar
20 Apr - 31 May
02 Thursday 12:00 - 13:00 Ernest Rutherford 464 Computer Lab (20/2-19/3)
- (23/4-28/5)
17 Feb - 22 Mar
20 Apr - 31 May

Course Coordinator

For further information see Mathematics and Statistics Head of Department

Textbooks / Resources

There is no prescribed textbook.

Indicative Fees

Domestic fee $1,022.00

* Fees include New Zealand GST and do not include any programme level discount or additional course related expenses.

For further information see Mathematics and Statistics.

All DATA423 Occurrences