COSC471-21S1 (C) Semester One 2021

Special Topic: Natural Language Processing

15 points

Start Date: Monday, 22 February 2021
End Date: Sunday, 27 June 2021
Withdrawal Dates
Last Day to withdraw from this course:
  • Without financial penalty (full fee refund): Sunday, 7 March 2021
  • Without academic penalty (including no fee refund): Friday, 14 May 2021


This course introduces central problems and methods in natural language processing. There is a special focus on the challenges presented by low-resource languages in the Pacific. Through their experiences in this course, students will be able to describe the central problems and methods in natural language processing, apply standard methods and models to existing text datasets, compare standard methods by their assumptions and applications, design an application of existing methods to a NZ-specific context, and evaluate the performance of the above application against reasonable baselines.

In this course we will examine Natural Language Processing theory and applications with an emphasis on how NLP algorithms are built typically, though not exclusively, using statistical machine learning.

The theoretical topics we will cover include:

•             Encoding natural language as features.
•             Estimating features using smoothing, normalization, sampling, and expectation-maximization.
•             Classifying text, training and cross-validation.
•             Distributed word representations such as skip-grams, word2vec and evaluating stability and similarity.
•             Language models: training and evaluation (perplexity), word prediction, and other applications.
•             Sequence models: problem of transitions, Viterbi algorithm, and parsing

Applications of these concepts that we will look at include:

•             Corpus similarity measures
•             Building dictionaries
•             Named-entity recognition
•             Part-of-speech tagging
•             Language identification
•             Topic classification
•             Finding lexical clusters
•             Phrase completion
•             Predicting sentence probabilities


(1) COSC262; (2) Approval by the Head of Department of Computer Science and Software Engineering

Timetable 2021

Students must attend one activity from each section.

Lecture A
Activity Day Time Location Weeks
01 Wednesday 14:00 - 16:00 Ernest Rutherford 141
22 Feb - 4 Apr
26 Apr - 6 Jun

Course Coordinator

Ben Adams


Jonathan Dunn

Indicative Fees

Domestic fee $1,033.00

* All fees are inclusive of NZ GST or any equivalent overseas tax, and do not include any programme level discount or additional course-related expenses.

For further information see Computer Science and Software Engineering.

All COSC471 Occurrences

  • COSC471-21S1 (C) Semester One 2021