STAT478-17S1 (C) Semester One 2017

Special Topic - Scalable Data Science

This occurrence is not offered in 2017

15 points

Details:

Start Date:		Monday, 20 February 2017
End Date:		Sunday, 25 June 2017

Withdrawal Dates
Last Day to withdraw from this course:

Without financial penalty (full fee refund): Friday, 3 March 2017
Without academic penalty (including no fee refund): Friday, 19 May 2017

Description

Special Topic - Scalable Data Science

Scalable data science is a technical course in the area of Big Data, aimed at the needs of the emerging data industry in Christchurch and those of certain academic domain experts across UC's Colleges, including, Arts, Science and Engineering. This course uses Apache Spark, a fast and general engine for large-scale data processing via databricks to compute with datasets that won't fit in a single computer. The course will introduce Spark’s core concepts via hands-on coding, including resilient distributed datasets and map-reduce algorithms, data frames and spark SQL on catalyst, scalable machine learning algorithms and vertex programs using the distributed graph processing framework of graphX. We will solve instances of real-world big data decision problems from various scientific domains.

To quickly learn about the computing platform read an introduction to Apache Spark. The course will cover topics from the first 8 of 9 Must-Have Skills to Land Top Big Data Jobs in 2015 and prepare the student to take the spark-certified-developer exams that are available online.

Minimal prerequisites include some experience in python programming (COSC121) and knowledge of 200 level linear algebra (MATH203) and 100 level calculus with probability (MATH103). Additional courses in Mathematics, Statistics or Computer Science will be helpful.

Learning Outcomes

Concrete learning outcomes will include:
familiarity with map-reduce algorithms for processing big-data, including its robust clean-up via regular expressions
basic skills to extract, transform and load data into distributed file systems such as hadoop
working with structured data using dataframes and dynamic querying in sparkSQL on catalyst
basic applications of some of the standard learning algorithms in Spark's machine learning and distributed graph processing libraries
basic data science analytics pathways for the following common data types:
- structured text data (logs generated by machines, tabular data from various open data sources)
- geospatial data (and their integration with other types of data)
- unstructured text data (a collection of text documents)
- social media data

Students will be encouraged to show-case their completed labs (which will have plenty of opportunities for extending the basic labs in creative ways even after the course is completed) by publishing them in public GitHub repositories in order to directly appeal to their potential employers.

Prerequisites

Subject to approval of the Head of School

Course Coordinator

For further information see Mathematics and Statistics Head of Department

Course links

Mathematics and Statistics Honours Booklet
General information for students
LEARN

Indicative Fees

Domestic fee $993.00

International Postgraduate fees

* All fees are inclusive of NZ GST or any equivalent overseas tax, and do not include any programme level discount or additional course-related expenses.

For further information see Mathematics and Statistics .

All STAT478 Occurrences

STAT478-17S1 (C) Semester One 2017 - Not Offered

Previous year Next year

Search Courses

Search by Subject

Special Topic - Scalable Data Science

Description

Learning Outcomes

Prerequisites

Course Coordinator

Course links

Indicative Fees

All STAT478 Occurrences