ENCI630-20S1 (C) Semester One 2020

Special Topic: Predictive Analytics for Civil and Natural Systems

15 points

Details:
Start Date: Monday, 17 February 2020
End Date: Sunday, 21 June 2020
Withdrawal Dates
Last Day to withdraw from this course:
  • Without financial penalty (full fee refund): Friday, 28 February 2020
  • Without academic penalty (including no fee refund): Friday, 29 May 2020

Description

The course provides instruction on how to use machine learning and data analysis for predictive modelling and insight into complex, interdisciplinary problems. The course will introduce a series of machine learning models and will heavily focus on their appropriate use with respect to validation, communication, and potential ethical issues. This course will be suitable for students from a variety of fields and examples will be taken from civil and natural systems engineering, disaster risk and resilience, urban planning, and public health. It will be primarily an applied course with programming being a heavy component throughout. Knowledge, or quick study, of Python3 is expected. A major component of the course will be a research-based term project where students will integrate multiple, diverse data sources to gain strong predictive accuracy and insight.

Learning Outcomes

  • Understand the data analysis process
  • Be able to use a range of machine learning methods for prediction and inference
  • Appreciate the ethical issues and other challenges inherent to data science in the context of civil and environmental systems engineering
  • Be able to rigorously validate, assess, and compare different types of machine learning models, including for spatial and temporal data
  • Be able to identify overfitting and inflated claims of accuracy in statistical modelling
  • Be able to draw insights into the influence and importance of factors in the societal or natural system
  • Be able to communicate these insights to civil and environmental systems engineers and decision makers
  • Produce a research paper based on predictive data modelling that is suitable for conference or journal submission

Prerequisites

Subject to approval of the Head of Department.

Timetable Note

Wk 1 Topic - Data analytics, civil systems applications, and ethics

Wk 2 - Classification & Regression

Wk 3 - Uncertainty and Bootstrapping Bias-variance trade-off, overfitting, and validation

Wk 4 - Classification and Regression Trees, MARS

Wk 5 - Bagging and Boosting. Ensemble methods

Wk 6 - Exploratory Data Analysis Variable, Model, and [hyper]Parameter selection

Wk 7 - Interpretability: Partial dependence and variable influence

Data challenge #1

Wk 8 - Neural Networks

Wk 9 - Student led: Choose from list of additional topics

Wk 10 - Student led: Choose from list of additional topics

Wk 11 - Student led: Choose from list of additional topics

Data challenge #2

Wk 12 - Spatial and temporal validation techniques Extrapolation: Description, Inference, Prediction, and Prescription

study break
exam week
exam week Report due (9am 19/6/2020)

Additional topics:
• Imbalanced data
• Clustering
• Hypothesis testing
• Mixed-effects, multi-level, and hierarchical modeling
• Artificial neural networks
• Deep learning

Course Coordinator

For further information see Civil and Natural Resources Engineering Head of Department

Assessment

Assessment Due Date Percentage 
Journal paper 60%
Data challenge (x2) 30%
Discussion lead 10%


Journal Paper (50%)

The primary objective of the course is to give you the experience of working on a project with machine learning methods. The best way to do this is by undertaking a full project. Therefore, you are expected to prepare a report in the style of a journal article. To do this, you must undertake a rigorous approach to understanding the data and presenting both your results as well as your code.

I cannot stress enough: Keep a thorough record of where your data comes from. There should be a table in an appendix or an excel sheet or something that records the URL of the data source and details such as the year, spatial resolution etc.

At the end of week three you are required to submit a project summary, identifying the data set, identifying the question (or potential questions) you will explore, and sharing your GitHub repository with the course instructor. Shortly after you will meet with the course instructor and present an overview in the discussion section.

Data Challenge x2 (15% each)

In the data science industry, data challenges have two main purposes:

1) Gives the company insight into your thought process and how you deal with data.
2) Gives you a feel for the types of problems the company solves.

These are often a fun, albeit intense, experience.

In my experience, a company will send you a zip file containing the description of the task and data for analysis. You are expected to return the modelling exercise within 48 hours of receiving that email. They will provide a training set and a test set. The testing set will have the response values removed and your task is to send back the predictions for that test set, along with your code and a report describing what you did. Typically, it will take 8-12 hours of your time.

We will run these two assessments exactly in this way. An example from a company will be posted on Learn.

Discussion Lead (10%)

You are expected to prepare and lead a course discussion on a data science topic in weeks 9-11. Ideally this will be related to a challenge you are facing with your data set or an advanced machine learning model you wish to explore and present.

You will write up an approximately 2-page summary of the topic, identify a relevant journal article, and provide these to the class at least three days before the discussion. Then you will lead a 1-1.5-hour discussion on the topic.

Special Consideration (flexibility)

The assessment and even day and time of the discussions is flexible. We can adjust these to suit peoples needs. The due dates will be by agreement.

Ethical Standards

There will be no exam-based assessment in this course. Therefore, you will be asked to provide a signed statement will your submission attesting that you have received no undue external assistance on the assignments or data challenges. Naturally you are expected to use the internet and those resources for these, and you are encouraged to discuss your projects with one another. However, these must be your own work, and the data challenges must be done independently of your colleagues.

GitHub and Code Style

I will assess your use of GitHub and your code style (commenting etc.) in each of the assessments. The ability to use GitHub and write readable code is now a basic expectation for a literate data scientist.

Within the first week of the course, please register for a student GitHub account.

Textbooks / Resources

Electronic copies of course materials will be made available through Learn.

Indicative Fees

Domestic fee $1,102.00

* All fees are inclusive of NZ GST or any equivalent overseas tax, and do not include any programme level discount or additional course-related expenses.

Minimum enrolments

This course will not be offered if fewer than 5 people apply to enrol.

For further information see Civil and Natural Resources Engineering .

All ENCI630 Occurrences

  • ENCI630-20S1 (C) Semester One 2020