A team from Professor Mark Gales' Speech Research Group and the ALTA Institute recently took part in the INTERSPEECH 2020 Shared Task on Automatic Speech Recognition for Non-Native Children's Speech.
There are limited resources available of children's speech data and even less of non-native speakers. Children's speech differs from that of adults due to their smaller size leading to higher pitch and greater variability in the speech signal.
The results have been released, with a special session to be held at INTERSPEECH 2020 in October. The Department's Automated Language Teaching and Assessment (ALTA) team consisting of Dr Kate Knill, Dr Linlin Wang, Dr Xixin Wu, Dr Yu Wang and Professor Mark Gales came top in the task by a significant margin.
This Shared Task was organised by researchers from Fondazione Bruno Kessler, Trento, Italy, and Educational Testing Service, USA. It was supported by SIG-CHILD, the International Speech Communication Association (ISCA) special interest group focusing on multimodal child-computer interaction. From the organisers description:
This shared task will help advance the state-of-the-art in automatic speech recognition (ASR) by considering a challenging domain for ASR: non-native children's speech. A new data set containing English spoken responses produced by Italian students will be released for training and evaluation. ... The data set consists of spoken responses collected in Italian schools from students between the ages of 9 and 16 in the context of English speaking proficiency assessments.
There are limited resources available of children's speech data and even less of non-native speakers. Children's speech differs from that of adults due to their smaller size leading to higher pitch and greater variability in the speech signal. Non-native speech provides further challenges to automatic speech recognition (ASR) including pronunciation errors biased by the speakers native language(s) and grammatically incorrect sentences. The Department's ALTA team found that the children were more likely than adults to code switch to their native language when they were unsure of what to say in English. This meant that, depending on the child's level, around 4-9% of the "English" data was actually Italian or German words.
The Department's ALTA team built their ASR systems on the Shared Task Closed Track for which only the 49 hours of training data distributed for the Shared Task could be used to train the ASR models. State-of-the-art deep learning based acoustic models and language models were investigated, including a diversity of lexical representations, handling code-switching and learner pronunciation errors, and grade specific models. By combining multiple diverse systems, including both grade independent and grade specific models, an overall word error rate of 15.7% was achieved.
As the organisers were partly based in northern Italy the original evaluation schedule was postponed until mid March to 17-25 April. This meant the Department's ALTA team had to run the evaluation and much of their development work entirely remotely. Linlin Wang and Yu Wang had to balance long evaluation days with childcare responsibilities with their respective partners. This was Xixin Wu's first experience of many of the tools and the Cambridge systems having only joined the team in January from The Chinese University of Hong Kong. Some delays were caused by being unable to sit down next to each other at a computer. A combination of daily Microsoft Teams meetings, virtual tea/coffee Zoom breaks, and WeChat messaging helped.
The Automated Language Teaching and Assessment (ALTA) Institute is a virtual institute of the University of Cambridge, supported by Cambridge Assessment and involving researchers in Engineering and the Computer Science and Technology Departments.