A Cambridge student working on a prize-winning computer plugin for the visually impaired is asking for feedback from potential users to help further its development.
We are committed to solving a problem which affects millions of people worldwide, and in the hope of having a true social impact, we have developed VisualCognition to help the visually impaired navigate the internet better.Filip Kozera, masters student
Filip Kozera, a masters student in information engineering, is part of an international and cross-disciplinary team behind VisualCognition – a computer plugin which works with existing screen readers to generate image descriptions using machine learning.
The team of four, whose studies span information engineering, computer science, and mathematics, recently won the Microsoft Prize for their machine learning algorithm at the University of Cambridge’s hackathon, Hack Cambridge, and will now enter it into Microsoft’s Imagine Cup 2018, a UK competition searching for the most original student applications.
The team have released a survey for those who are visually impaired to provide anonymous feedback, both on their experience using the internet and the problems they encounter. It is hoped that the results of the Screen Reader Survey can be used to further develop the team’s VisualCognition tool and ultimately offer improved internet accessibility.
Filip said the team was inspired to create the plugin after watching Microsoft’s video of a camera app it developed called Seeing AI. The video demonstrated how the app’s creator Saqid Shaikh, a visually impaired software developer at Microsoft, used the app to convert a visual world into an audible experience.
“We knew we wanted to build on the fantastic work of Seeing AI,” said Filip. “We initially discussed building an Android app equivalent. However, with five minutes to go before the Hackathon was due to start, and with much excitement, it dawned on us how much of the internet was still inaccessible to the visually impaired in this day and age.
“We are committed to solving a problem which affects millions of people worldwide, and in the hope of having a true social impact, we have developed VisualCognition to help the visually impaired navigate the internet better. It includes more descriptive image captions, such as facial and emotion recognition, and works alongside existing screen readers, regardless of the device.”
Currently, when a screen reader reaches an image on a web page, it has to rely on the ‘alt tag’ to provide a caption. A lot of the time, alt tags are either missing, superficial or inaccurate. So the team use a Chrome extension which scans through every image on a web page which lacks an alt tag, and then it replaces the missing description.
“For each image where the alt tag is missing, we send the image URL to an Azure cloud function,” said Filip.
“The cloud function checks if we already have the alt tag saved and if so, it reinstates this alt tag. If not, we use Microsoft's Cognitive Services Vision API to perform object recognition and generate the description to be added as the alt tag.
“The speed, accuracy and ease of integration with which the image descriptions are being generated using our plugin would not have been possible just a few years ago,” he added.