Use the Tab and Up, Down arrow keys to select menu items.
The course provides instruction on how to use machine learning and data analysis for predictive modelling and insight into complex, interdisciplinary problems. The course will introduce a series of machine learning models and will heavily focus on their appropriate use with respect to validation, communication, and potential ethical issues. This course will be suitable for students from a variety of fields and examples will be taken from civil and natural systems engineering, disaster risk and resilience, urban planning, and public health. It will be primarily an applied course with programming being a heavy component throughout. Knowledge, or quick study, of Python3 is expected. A major component of the course will be a research-based term project where students will integrate multiple, diverse data sources to gain strong predictive accuracy and insight.
Understand the data analysis processBe able to use a range of machine learning methods for prediction and inferenceAppreciate the ethical issues and other challenges inherent to data science in the context of civil and environmental systems engineeringBe able to rigorously validate, assess, and compare different types of machine learning models, including for spatial and temporal dataBe able to identify overfitting and inflated claims of accuracy in statistical modellingBe able to draw insights into the influence and importance of factors in the societal or natural systemBe able to communicate these insights to civil and environmental systems engineers and decision makersProduce a research paper based on predictive data modelling that is suitable for conference or journal submission
Subject to approval of the Head of Department.
Wk 1 Topic - Data analytics, civil systems applications, and ethicsWk 2 - Classification & RegressionWk 3 - Uncertainty and Bootstrapping Bias-variance trade-off, overfitting, and validationWk 4 - Classification and Regression Trees, MARSWk 5 - Bagging and Boosting. Ensemble methodsWk 6 - Exploratory Data Analysis Variable, Model, and [hyper]Parameter selectionWk 7 - Interpretability: Partial dependence and variable influenceData challenge #1Wk 8 - Neural NetworksWk 9 - Student led: Choose from list of additional topicsWk 10 - Student led: Choose from list of additional topicsWk 11 - Student led: Choose from list of additional topicsData challenge #2Wk 12 - Spatial and temporal validation techniques Extrapolation: Description, Inference, Prediction, and Prescriptionstudy breakexam weekexam week Report due (9am 19/6/2020)Additional topics:• Imbalanced data• Clustering• Hypothesis testing• Mixed-effects, multi-level, and hierarchical modeling• Artificial neural networks• Deep learning
For further information see Civil and Natural Resources Engineering Head of Department
Journal Paper (50%)The primary objective of the course is to give you the experience of working on a project with machine learning methods. The best way to do this is by undertaking a full project. Therefore, you are expected to prepare a report in the style of a journal article. To do this, you must undertake a rigorous approach to understanding the data and presenting both your results as well as your code.I cannot stress enough: Keep a thorough record of where your data comes from. There should be a table in an appendix or an excel sheet or something that records the URL of the data source and details such as the year, spatial resolution etc.At the end of week three you are required to submit a project summary, identifying the data set, identifying the question (or potential questions) you will explore, and sharing your GitHub repository with the course instructor. Shortly after you will meet with the course instructor and present an overview in the discussion section.Data Challenge x2 (15% each)In the data science industry, data challenges have two main purposes:1) Gives the company insight into your thought process and how you deal with data.2) Gives you a feel for the types of problems the company solves.These are often a fun, albeit intense, experience.In my experience, a company will send you a zip file containing the description of the task and data for analysis. You are expected to return the modelling exercise within 48 hours of receiving that email. They will provide a training set and a test set. The testing set will have the response values removed and your task is to send back the predictions for that test set, along with your code and a report describing what you did. Typically, it will take 8-12 hours of your time.We will run these two assessments exactly in this way. An example from a company will be posted on Learn.Discussion Lead (10%)You are expected to prepare and lead a course discussion on a data science topic in weeks 9-11. Ideally this will be related to a challenge you are facing with your data set or an advanced machine learning model you wish to explore and present.You will write up an approximately 2-page summary of the topic, identify a relevant journal article, and provide these to the class at least three days before the discussion. Then you will lead a 1-1.5-hour discussion on the topic.Special Consideration (flexibility)The assessment and even day and time of the discussions is flexible. We can adjust these to suit peoples needs. The due dates will be by agreement.Ethical StandardsThere will be no exam-based assessment in this course. Therefore, you will be asked to provide a signed statement will your submission attesting that you have received no undue external assistance on the assignments or data challenges. Naturally you are expected to use the internet and those resources for these, and you are encouraged to discuss your projects with one another. However, these must be your own work, and the data challenges must be done independently of your colleagues.GitHub and Code StyleI will assess your use of GitHub and your code style (commenting etc.) in each of the assessments. The ability to use GitHub and write readable code is now a basic expectation for a literate data scientist.Within the first week of the course, please register for a student GitHub account.
Electronic copies of course materials will be made available through Learn.
Domestic fee $1,102.00
International Postgraduate fees
* All fees are inclusive of NZ GST or any equivalent overseas tax, and do not include any programme level discount or additional course-related expenses.
This course will not be offered if fewer than 5 people apply to enrol.
For further information see Civil and Natural Resources Engineering .