Professor of Information Engineering, Mark Gales, has been elected a Fellow of the International Speech Communication Association (ISCA).
The appointment has been made in recognition of Professor Gales' wide-ranging, fundamental contributions to research and leadership in the fields of speech recognition, synthesis and statistical modelling algorithms.Official citation
The appointment has been made in recognition of Professor Gales’ “wide-ranging, fundamental contributions to research and leadership in the fields of speech recognition, synthesis and statistical modelling algorithms”. He will formally receive his award at the ISCA INTERSPEECH conference to be held in Graz, Austria, in September.
ISCA promotes activities and exchanges in all fields related to speech communication science and technology. It has more than 2,000 members worldwide working on all aspects of human communication by speech.
Speech recognition – an introduction
Speech technology is everywhere: in our cars, mobile phones, home voice assistants etc. But 30 years ago, the dictation systems available worked for just one speaker (after that individual had spent time training the system to their voice), and needed to be used with a head-mounted microphone in a quiet room. In order to take the technology out of the laboratory and into the wider world, the underlying speech recognition systems had to become robust not only to background noise, such as the roar of the motorway, but also to as many speakers as possible, taking into consideration each speaker's unique accent. Professor Gales has made major contributions to enable this transition.
It was during his PhD, that Professor Gales developed one of the first model-based compensation techniques for noise robustness called Parallel Model Combination (PMC). This powerful paradigm gives the speech recognition system the ability to adapt its model parameters to different acoustic conditions. It became the de facto standard to compare and contrast any noise compensation scheme developed since then.
A limitation of PMC is that the models are updated, which can involve millions of parameters, so it can be computationally very expensive. To overcome this, Professor Gales proposed the feature-transformation Constrained Maximum Likelihood Linear Regression (CMLLR) or Feature-space MLLR (FMLLR) as it is also known. This acts on input features and has no need to transform the model parameters, resulting in a low computational cost when adapting to different speakers and conditions. It has become a standard transformation used in Deep Learning frameworks, and has been adopted by many speech groups in academia and industry. CMLLR allows canonical 'neutral' acoustic models to be easily trained on data from many speakers and then adapted to some target characteristic. As a result, it has found its way into speaker recognition and verification and text-to-speech synthesis.
In recent years, Professor Gales has worked with others on the latter to make the synthetic voices we hear more lively and expressive. One of his key contributions has been to define efficient frameworks and parameter estimation schemes to allow the factoring out and control of different elements of variability, such as the voice, the expression and the language being spoken.
About Professor Mark Gales
The first automatic speech recognition (ASR) system that Professor Gales built was for his final year undergraduate project on the BA in Electrical and Information Sciences (1988). He worked with Professor Steve Young (ISCA Fellow 2008) who also supervised his PhD on Robust Speech Recognition (1995).
Professor Gales was a Research Fellow at Emmanuel College from 1995-97. He was a Research Staff Member in the Speech Group at IBM TJ Watson Research Center until returning to the Department of Engineering in 1999 as a University Lecturer, College Lecturer, and Official Fellow of Emmanuel College. In 2004, he was appointed Reader in Information Engineering, followed by Professor of Information Engineering in 2012.
His current research interests include: low resource speech processing (IARPA BABEL and MATERIAL projects); automatic assessment of spoken learner English (ALTA Institute); uncertainty, generalisation and interpretability for Deep Learning; high quality expressive, controllable and adaptable speech synthesis; and automatic lecture captioning.