New dogs for old tricks: Improved Compression for Inverted Indexes
Speaker
Alistair Moffat
Institute
School of Computing and Information Systems, The University of Melbourne
Time & Place
Fri, 01 Mar 2019 14:00:00 NZDT in Jack Erskine 340
Abstract
The inverted index is a key element of text search systems. In typical form it consists of a set of posting lists, each of which contains a sequence of docid gaps and a sequence of term frequencies. Both streams are dominated by small integers, and display a (broadly) decreasing probability distribution. A wide range of techniques have been used to represent these two streams, including byte-aligned codes; word-aligned codes; and binary-packed approaches. However there has been relatively little application of entropy coders to index compression, a consequence of the complexity of and overheads associated with applying Huffman or arithmetic codes on a per-postings list basis. In this presentation we describe the use of the recently-developed asymmetric numeral systems entropy coding technique, and show that can be usefully combined with several different index compression approaches to yield improved compression effectiveness within reasonable additional resource costs.
Biography
Alistair Moffat has been involved in research in text compression and information retrieval for more than three decades, and has published numerous papers in the areas of index compression, text compression, and dynamic pruning mechanisms, all of which help support efficient ranked querying.
Alistair is a co- author of the 1994 (revised 1999) book Managing Gigabytes, and also co-author of the 2002 book Compression and Coding Algorithms.
Much of Alistair's recent work has examined the issue of IR system evaluation, and, with other co-authors in Australia, he has focused on the relationship between models of user interactions with search results pages, and the effectiveness metrics that those interactions correspond to.
Alistair was co-Chair for SIGIR 1998 in Melbourne, and for CIKM 2015, also held in Melbourne; and co-Program Committee Chair for SIGIR 2005 (Salvador, Brazil) and SIGIR 2015 (Santiago, Chile).
WSDM, held in Melbourne in February 2019, was Alistair's most recent conference project.
Alistair has been a teaching/research faculty member at the University of Melbourne for thirty years, and was Department Chair from 2007-2011.
During his career at Melbourne he has taught programming skills to well in excess of 15,000 undergraduate students, has authored a C programming textbook (Programming, Problem Solving, and Abstraction with C, 2002, revised 2012), and has received awards for his teaching and lecturing skills. Alistair's PhD was completed in 1985, at the University of Canterbury, in New Zealand, in the area of shortest path algorithms.