Department of Engineering / News / Google Award for the Automatic Statistician

Department of Engineering

Google Award for the Automatic Statistician

Google Award for the Automatic Statistician

The Automatic Statistician

The Automatic Statistician, a project led by Zoubin Ghahramani, Professor of Information Engineering, has won a US$750,000 Google Focused Research Award.

While it's becoming easier to collect and store all kinds of data, there are very few people trained in the statistical and machine learning methods required to test hypotheses, make predictions, and otherwise create interpretable knowledge from this data.

Professor Zoubin Ghahramani

This Award consists of a no-strings attached donation to support research in the Cambridge Machine Learning Group on this topic.

Automating the process of statistical modeling would have a tremendous impact on fields that currently rely on expert statisticians, machine learning researchers, and data scientists. Such expertise in the data sciences is increasingly in demand, especially with the growth in Big Data problems in the sciences and in industry. The Automatic Statistician is a system which explores an open-ended space of possible statistical models to discover a good explanation of the data, and then produces a detailed report with figures and natural-language text. The Cambridge group, including PhD students James Lloyd and David Duvenaud working with Roger Grosse and Joshua Tenenbaum at MIT, has developed an early version of this system which not only automatically produces a 10-15 page report describing patterns discovered in the data, but returns a statistical model with state-of-the-art extrapolation performance evaluated over real time series data sets from various domains. The system is based on reasoning over an open-ended language of nonparametric models using Bayesian inference.

As Zoubin says "Making sense of data is one of the great challenges of the Information Age we live in. While it's becoming easier to collect and store all kinds of data, from personal medical data, to scientific data, to public data, and commercial data, there are very few people trained in the statistical and machine learning methods required to test hypotheses, make predictions, and otherwise create interpretable knowledge from this data. Our Automatic Statistician project aims to build an artificial intelligence system for Data Science, helping people make sense of their data."

Kevin P. Murphy, Senior Research Scientist at Google says: "In recent years, machine learning has made tremendous progress in developing models that can accurately predict future data. However, there are still several obstacles in the way of its more widespread use in the data sciences. The first problem is that current Machine Learning (ML) methods still require considerable human expertise in devising appropriate features and models. The second problem is that the output of current methods, while accurate, is often hard to understand, which makes it hard to trust. The "automatic statistician" project from Cambridge aims to address both problems, by using Bayesian model selection strategies to automatically choose good models/ features, and to interpret the resulting fit in easy-to-understand ways, in terms of human readable, automatically generated reports. This is a very promising direction for ML research, which is likely to find many applications at Google and beyond."

The ultimate aim of the Automatic Statistician is to produce an artificially intelligent (AI) system for statistics and the data sciences.