Use the Tab and Up, Down arrow keys to select menu items.
This course will introduce students to the truthful art of visualizing data. The students will use an iterative design process to create visualizations that are truthful, functional, beautiful, insightful and enlightening. The lectures will consist of presentations, critiques, in-class exercises and discussions. This course will enable students to select appropriate visualization methods for their data and solve practical data science communication problems. They will consider the context and the indented reader to focus the story their data will tell. The students will learn to use the Tableau software, which will be made available for their own computers within the framework of this course. The course will provide a supportive environment in which students can experiment with the aesthetics of data visualization. Students will need to be familiar with basic data manipulation principles and the process of data gathering and cleaning.
Data comes in a variety of shapes and formats: text documents, images, tables, social network graphs, databases, webpages. Data is used for a variety of uses: archiving, analysis, visualization, communication, and even art. Data wrangling is the process of reshaping data so that it can be more efficiently used. The process can be difficult because it is important to preserve, as much as possible, the relevant information contained in the dataset, while at the same time ensuring an ethical treatment of the data subjects, e.g., protecting people’s security and privacy. Data scientist, thus, need to take careful decisions, and it is estimated that up to 80% of the worktime of a data scientist is spent in cleaning and wrangling data. Learning to do this efficiently, thus, proves to be essential across many discipline and industries.The course aims to provide the students with the tools to handle different sources of data (csvs, spreadsheets, web pages, apis, …), some target formats (long / wide data frames, packages, …) and a variety of data kinds (dates, numeric, strings, text, …). Wherever possible, the students will work on real-world datasets and ethical facets of data wrangling will be explicitly discussed in class. During the course, R will be the default programming language, and the use of JupyterLab and Rstudio strongly encouraged. Reference to other programming languages, e.g. Julia, will be provided. Peer, group, and class interaction will be explicitly required during the course.
Having engaged in learning during the course, students will be able to:Access (read in) different data formats;Interact (manipulate) relational dataset (e.g., data frames) and hierarchical datasets;Output (write to) different data formats;Analyse a dataset in order to identify its format and possible errors;Analyse a data wrangling problem: identify the available source format(s); define the suitable target format(s) and the relevant ethical / technical constraints; develop a flow to transform data from source to target formats.
This course will provide students with an opportunity to develop the Graduate Attributes specified below:
Critically competent in a core academic discipline of their award
Students know and can critically evaluate and, where applicable, apply this knowledge to topics/issues within their majoring subject.
Employable, innovative and enterprising
Students will develop key skills and attributes sought by employers that can be used in a range of applications.
Subject to approval of the Head of Department of Mathematics and Statistics.
Students must attend one activity from each section.
Giulio Dalla Riva
Heyang (Thomas) Li
Locke, Stephanie; Data manipulation in R ; [2 edition] ; Colour version; Locke Data, 2017.
General information for students Library portal
Domestic fee $1,051.00
International Postgraduate fees
* All fees are inclusive of NZ GST or any equivalent overseas tax, and do not include any programme level discount or additional course-related expenses.
For further information see Mathematics and Statistics .