DATA301-21S1 (C) Semester One 2021

Big Data Computing and Systems

15 points

Start Date: Monday, 22 February 2021
End Date: Sunday, 27 June 2021
Withdrawal Dates
Last Day to withdraw from this course:
  • Without financial penalty (full fee refund): Sunday, 7 March 2021
  • Without academic penalty (including no fee refund): Friday, 14 May 2021


The course introduces distributed computational techniques, distributed algorithms and systems/programming support for large-scale processing of data.

Learning Outcomes

  • Description

    This course teaches parallel and distributed programming, algorithms, and systems principles that are relevant for large-scale processing of big data sets on high performance computing clusters and cloud computing resources.

    Learning Outcomes: At the end of this course, students will be able to...

  • Understand and explain the fundamentals of cloud computing systems (SaaS, PaaS, IaaS, storage and networking architectures, virtual machines and their management, job scheduling).
  • Understand and explain different programming models for parallel and distributed computing (shared memory, shared-nothing / message-passing architectures) and common design patterns for distributed computations on big data sets (e.g. leader/follower, Map/Reduce, Gossiping).
  • Understand the drawbacks and advantages of different cloud solutions and distributed programming models and select appropriate solutions for a given situation.
  • Understand and explain fundamental distributed algorithms (e.g. leader election, consensus) and their properties as well as selected specialized algorithms for distributed processing of big data (e.g. matrix algorithms in parallel / distributed environments, distributed optimization)
  • Be able to design, implement and evaluate distributed processing programs for large data sets using appropriate software frameworks like MPI, CUDA, Hadoop or Apache SPARK.
  • Be able to communicate the results and argue from evidence.
  • Be able to work in teams.


Timetable 2021

Students must attend one activity from each section.

Lecture A
Activity Day Time Location Weeks
01 Monday 11:00 - 12:00 Ernest Rutherford 465
22 Feb - 4 Apr
3 May - 6 Jun
Lecture B
Activity Day Time Location Weeks
01 Friday 09:00 - 10:00 E16 Lecture Theatre
22 Feb - 28 Mar
26 Apr - 6 Jun
Computer Lab A
Activity Day Time Location Weeks
01 Monday 13:00 - 15:00 Ernest Rutherford 464 Computer Lab
22 Feb - 4 Apr
3 May - 6 Jun
02 Wednesday 12:00 - 14:00 Jack Erskine 001 Computer Lab
22 Feb - 4 Apr
26 Apr - 6 Jun

Course Coordinator

James Atlas

Textbooks / Resources

Recommended Reading

Blaise Barney; Introduction to Parallel Computing; (Introduction to Parallel Computing (and other tutorials).

CUDA; CUDA Toolkit Documentation; v10.0.130; (CUDA Programming Guide:

Jure Leskovec, Anand Rajarman, Jeffrey David Ullman; Mining of Massive Datasets; 2nd; Cambridge University Press, 2014 (

Additional Course Outline Information

Academic integrity

You are encouraged to discuss the general aspects of a problem with others. However, anything you submit for credit must be entirely your own work and not copied, with or without modification, from any other person. If you share details of your work with anybody else then you are likely to be in breach of the University's General Course and Examination Regulations and/or Computer Regulations (both of which are set out in the University Calendar) and/or the Computer Science Department's policy (see section 9). The Department treats cases of dishonesty very seriously and, where appropriate, will not hesitate to notify the University Proctor.

If you need help with specific details relating to your work, or are not sure what you are allowed to do, then contact your tutors or lecturer for advice.

Assessment and grading system

Lab assessment - 30%

In the labs students will practice the design and implementation of distributed algorithms and they will gain practical experience with contemporary Big Data and Cloud Computing frameworks such as Apache SPARK, MPI, CUDA and Google Cloud / Amazon Web Services. LO2, LO4, LO5

Project - 40%

In this series of artifacts, students will complete a short, application focused project. Students will work in teams of two or three students on an analysis task for a big data set, which requires them to design, write progress reports, implement and test an appropriate distributed algorithm in an appropriate software framework, to critique their design and to communicate the design and analysis results in a professional manner in a written report. This assessment item addresses LO3, LO5, LO6, LO7

Final exam - 30%

The final exam will allow a summative assessment of learning outcomes related to the full semester. This can include theoretical aspects, algorithms, programming, and techniques covered in lectures and assignments. LO1, LO2, LO3, LO4

Course Outline

The topics covered in lectures will be organized generally with the following progression:

•Introduction: Big Data
•5 Vs (Variety, Velocity, Volume, Veracity, Value)
•Storage and networking architectures
•Divide and Conquer, Map, Reduce, Map/Reduce functional programming in SPARK
•Algorithms in SPARK: Group By, Union, Intersection, Difference, Matrix-Vector and Matrix-Matrix Multiplication
•Systems: SaaS, PaaS, IaaS, Google Cloud / Amazon Web Services, storage and networking architectures, virtual machines and their management, job scheduling, cloud resources
•Algorithms in SPARK on cloud: Hashing, PageRank
•Data Processing: Distributed Data Structures, Graphs, Leader Election, Consensus
•Memory Hierarchy, Shared memory, Shared-nothing, distributed file systems, replication, communication cost, complexity theory
•Programming: Message-Passing (MPI)
•Programming: Threads, Locks and Atomics (CUDA)
•Programming: Work Queues, Schedulers, Streaming
•Heterogenous Processing: Systems and Programming


The course assumes that you are proficient in Python, as taught in COSC121, and in algorithm design and analysis, as taught in COSC262. If you are enrolling in DATA301 but haven't already passed COSC121 and COSC262 or the equivalents, you should consult the course supervisor before enrolling.

Indicative Fees

Domestic fee $785.00

International fee $3,500.00

* All fees are inclusive of NZ GST or any equivalent overseas tax, and do not include any programme level discount or additional course-related expenses.

For further information see Computer Science and Software Engineering.

All DATA301 Occurrences

  • DATA301-21S1 (C) Semester One 2021