Department of Engineering / Profiles / Professor Mark Girolami

Department of Engineering

Professor Mark Girolami FRSE

mag92

Mark Girolami

Sir Kirby Laing Professor of Civil Engineering

Academic Division: Civil Engineering

Telephone: TBA

Email: mag92@eng.cam.ac.uk

Personal website

Publications


Research interests

Computational Statistics

Mathematical Statistical Methodology

Bayesian Statistical Methodology

Statistical Approaches to Numerical Methods

Applications of Probabilistic, Stochastic and Statistical Modeling in the Engineering and Natural Sciences

Research projects

Data Centric Engineering 

Computational Statistical Inference for Engineering and Security

Sematic Information Pursuit for Multimodal Data Analysis

Inference, Computation and Numerics for Insights into Cities

Digital Twin of 3-D Printed Stainless Steel Pedestrian Bridge A collaboration with MX3D 

Previous EPSRC funded research

Teaching activity

A First Course in Machine Learning  Textbook from Amazon

Research opportunities

There are always many opportunities to join a vibrant research team at either the undergraduate, postgraduate or doctoral levels. If you are curiosity led, ambitious, driven, and passionate about research then I am always enthusiastic to speak with you. Or even better come and speak to the team of post-docs, PhD stduents and academic faculty that work with me.

Biography

Mark Girolami has had an unusual pathway to an active and distinguished research career in that the first decade after graduation from the University of Glasgow he worked as a professional chartered engineer with IBM where he gained experience in high volume manufacturing, process automation, and electronic system design. He left IBM in his early thirties to start from scratch and undertake a PhD funded by National Cash Registers, the remit being to investigate the statistical signal processing required to support biometric speaker verification. 

Girolami completed the PhD in two years, whilst working at a further education college, during which his research work came to the attention of senior research leaders on the international stage and invitations were extended for him to spend time at their research laboratories. In particular Girolami worked at both the Computational Neurobiology Laboratory of Prof. T. Sejnowski at the Salk Institute of Biological Studies and the Brain Science Institute headed up by Prof S.I.Amari at the Institute of Physical and Chemical Research (RIKEN) in Wako-Shi, Tokyo.

The early research of Girolami came to immediate international attention as it provided the theoretical and methodological generalization of the Infomax method for Independent Component Analysis which had come from the original highly influential Bell & Sejnowski 1995 paper (> 9K citations). The original Infomax algorithm was valid only for blind mixtures of Super-Gaussian source signals (a fact unknown at the time until highlighted in Girolami 1998). Girolami developed the methodology to separate both Sub-Gaussian and Super-Gaussian signals in his 1998 paper. This resulted in a string of highly cited contributions which form the basis of a number of widely distributed software tools employed by neuroscientists worldwide e.g. Lee, Girolami, Sejnowski, 1999. 

During this period Girolami worked at the University of Paisley – a low ranked post-1992 UK higher education institute where he rose from part time PhD student, to Professor and Head of Department. In 2004 he was to return to the University of Glasgow as a lecturer in the Department of Computing Science, with a joint appointment to follow in the Department of Statistics. His ascent to a professorial chair was rapid gaining promotion direct to Reader in 12 months and to Professor 12 months directly after.

It was during this period that Girolami started to work with cellular biologists and made significant contributions to the interface of the statistical and biological sciences. It was at this time that he was awarded the first of his prestigious five year research fellowships from the EPSRC. The culmination of this research programme was the Xu et al paper which appeared in Science Signalling. Girolami initiated, led and drove this large multidisciplinary collaborative programme to showcase the Bayesian approach to ranking hypothesized mathematical models of signaling dynamics mechanisms and thus inform actual experimental procedure to be undertaken to further assess hypothesized molecular mechanisms and structures. 

This was a highly complex programme which required the design of growth factor induced activation of the extracellular regulated kinase pathway experiments where a number of perturbations to the pathway were induced and experimental time series data produced. After this a number of hypothesized signaling structures were translated into mathematical models (mass-action kinetics nonlinear differential equations). Subsequently Bayes factors were estimated via thermodynamic integration methods of MCMC and a very careful statistical analysis conducted of the posited models and data. This ranking of models then suggested additional gene silencing experiments to be conducted to then provide further evidence regarding the hypothesized molecular signaling structures.  This paper was the culmination of the 4 year study and provided new insights into the ERK signaling pathway. From the statistical side it was a massive demonstration of how the scientific process can be made more efficient by targeting the experimental search space via the formal statistical ranking of plausible explanatory models. This joint mathematical, statistical and experimental approach is now routinely employed by various systems biology laboratories worldwide e.g. Schwenter et al, 2015 in their study of Ewings Sarcoma.Further significant contributions followed at the interface of the biological and statistical sciences as referenced in the list of 20 publications.

Girolami made significant contributions to the Machine Learning literature, and continues to do so as is evidenced from the complete publication list, where his focus on statistical methodology brings innovations to the field such as listed in Girolami and Rogers, 2006 as well as others more recent detailed in the full list of publications.

During this time Girolami moved to University College London to take up the Chair of Statistics in the historically important Department of Statistical Science. At this point he was awarded his second consecutive research fellowship from the EPSRC in the mathematical sciences this time at the most senior level of award. He was also awarded a Royal Society Wolfson Research Merit as well as election to the Royal Society of Edinburgh. It was in 2011 that his work on differential geometric approaches to stochastic simulation for statistical inference emerged, Girolami & Calderhead, 2011, and Byrne & Girolami, 2013, and more recent developments listed in publications document. 

The paper of Girolami & Calderhead was selected to be read before the Royal Statistical Society receiving the largest number of contributions to the discussion of any paper presented to the society in its entire 187 year history. It remains the most discussed paper of the Royal Statistical Society. Discussants included Sir David Cox FRS and C.R.Rao FRS. This work was motivated by what Girolami had observed in the induced likelihood functions of mathematical models of biochemical pathways in his earlier work with cellular biologists. The concentration of the measure in nonlinear manifolds of the space was typical and brought huge challenges for simulation based inference. The symplectic nature of Hamiltonian dynamics and the natural Riemannian metric structure induced under statistical models provided the keys by which nonlinear concentration of measure could be overcome by the implicit change of coordinates induced by the natural metric and Levi-Civitta connection of the statistical models. 

The results in this paper and the methodologies described have had an enormous impact on the computational statistics field as well as the Machine Learning, Mathematical Modelling, Engineering & Scientific Computing literature in terms of the complexity of the models over which simulation based inference can now be realistically considered. This work has also stimulated new results in the molecular dynamics literature and applied mathematics. Over 1000 citations have been made to this work to date and its influence on a number of areas of research is pronounced. 

Girolami was recruited to the University of Warwick Department of Statistics where he took a number of research leadership roles, in particular he became one of the founding executive directors of the Alan Turing Institute which is the UK national institute for Data Science and latterly Artificial Intelligence. Due to personal family reasons Girolami moved back to London this time to Imperial College Department of Mathematics. During this time he provided scientific leadership at the Alan Turing Institute where he defined and continues to drive the programme of research on what is an emerging field of research in the engineering sciences known as Data Centric Engineering. In recognition of his research leadership in this area he was awarded a Research Chair in Data Centric Engineering from the Royal Academy of Engineering which runs for five years from 2018 to 2023.

During this period his research contributions to the Statistical Sciences continued for example an important contribution which shed light on the expressive power of Deep models in machine learning Dunlop et al , 2018. This paper was the first to address the question how deep is a deep model – in this case a Deep Gaussian Process. No theoretical understanding or insight was available prior to this work being published of what are the fundamental issues regarding such Deep Gaussian Processes. This work framed the question formally by asking whether a Deep GP defined an ergodic process. If it did then its finite mixing time would indicate the depth beyond which the representational capacity of the structure was saturated. A formal analysis and proof of the ergodicity of Deep GPs was developed and published in the premiere Machine Learning journal and now provides some of the first steps in understanding such structures.

Likewise in Oates, Girolami and Chopin, 2017 he introduced a statistical approach to not only reduce variance in Monte Carlo estimators but also provided a way in which to exceed the associated root-N rates of convergence. This is a by-product of the Probabiistic Nuemrics research programme where it was shown that defining a reproducing Kernel Hilbert space with the use of the Stein operator that ensuing estimators exploited both the smoothness of the function class of the integrand and smoothness of the reference measure. The enhanced rates of convergence of such estimators was then shown to be defined by the Sobolev spaces of both integrand and measure with the smoothest defining the achievable rate of convergence. This work was the initiation of a large developing body of work on Steins methods. See full list of publications for related papers.

Around this time Girolami started to consider the previously explored question ‘is numerical analysis an intrinsically statistical problem?’ previous authors such as Larkin, 1972, Diaconis, 1988 had explored this problem in a restricted manner which gained no traction in the larger scientific community and so Girolami initiated a programme of research to consider the concept of so called Probabilistic Numerics. The paper by  Chkrebtii et al, 2016 was the first from Girolami and his group which sought to develop statistical methodology that would provide a posterior measure over solutions of nonlinear differential equations. This was the first full attempt to provide a statistical approach to solve a nonlinear differential equation and characterise numerical errors using the probabilistic calculus. Due to the novelty and interest in this work the journal published the paper with accompanying discussion which included commentary from Bayesian statisticians. This paper then motivated a number of strands of work from various groups worldwide investigating this emerging area of Probabilistic Numerics.

The three follow on papers from Girolami and his research team as listed are the current culmination of the work in the Probabilistic Numerics programme of research. They are listed as (1) both statistical journals are publishing the papers with discussion such is their novelty and perceived impact, (2) the publication of the Bayesian Probabilistics Numerical Methods work in SIAM Review is a strong indication of how this research is having influence in mainstream mathematical research as well as mainstream statistical research. The main authors Briol and Cockayne being PhD students of Girolami are driving many other publications on Probabilistic Numerics in the Machine Learning and the Engineering Sciences literature. See full list of publications.

Department role and responsibilities

Academic Director of the Centre for Digital Built Britain