Keep up to date with the latest Eye Tracking news and trends

Improving Speech Recognition Accuracy Via Crowd Sourcing

Improving Speech Recognition Accuracy Via Crowd SourcingFor many in the medical and assistive communication industries, speech recognition is nothing new. Like eye tracking, the ability of various devices and technologies to recognize speech has changed the way people, once limited in movement, can interact with the world. Personal computers have long been integrating speech recognition into their models, but I have to admit (and fortunately so), I rarely, if ever, used the speech recognition feature on my laptop.

But now, as mobile phones are making headway by offering speech recognition software as a standard feature, people have started using these technologies more than before. And it’s good timing, really, as new laws are passing that ban hand held phones, dialing, and texting while driving.

As is often the case, higher demand makes for more sophisticated and accessible technology, and speech recognition in cell phones is no exception. The availability of 3G-enabled mobile devices is pushing the industry forward even more, as it’s possible to “crowdsource” data with so many people online at the same time. Crowdsourcing is a new term formed from the combination of “crowd” and “outsourcing” that was created to describe an open call for solutions, usually broadcast to the online community. Fast internet connections and devices that are online 24 hours a day make for a constant stream of data, and this steady input of information helps fuel development of new mobile speech recognition applications.

As an article on the subject published in Live Science put it, “speech recognition software has been around for years, but they were often frustrating to use because they typically required users to “train” them for optimal word recognition or to speak slowly.” Lower computing power in phones and limitations in technology made it more difficult to capture data from specific users to “train” their devices. But as you can imagine, today’s phones have substantially higher levels of computing power, and the intense voice training is no longer needed. This is due in large part to the concept of crowdsourcing.

Smartphones and mobile devices can now crowdsource from a plethora of users, millions of people talking at once to provide a steady stream of data on accents, word pacing, and language tendencies, all the while collecting an extraordinary amount of information from around the world. Online servers can combine all this data and then make broad generalizations that help to improve the device’s ability to recognize words.

Dave Grannen, president and CEO of speech recognition software maker Vlingo (they make iPhone apps too) describes the idea in the article in Live Science.

“The first time you speak to the phone, we put a cookie” – a kind of digital tag – “on your device and when you say something we call up your personal language model from our servers and use it to get better accuracy.” An individual’s voice model contains particular information about accent and word pronunciation, and the various servers can use this information, combining the voice models of several speakers with similar accents, strengthening the database and improving accuracy for that population.

As it turns out, more people speaking with similar accents make for more accuracy, and so when it comes to the device’s ability to recognize and cater to specific accents, some are more accurate than others. Still, this information should eventually reach a critical mass, improving the ability of mobile devices even more. That’s that beauty of crowdsourcing technology.

Perhaps the eye tracking industry could take a cue from the mobile phone industry. Who knows what potential crowd sourcing information about eyes could have for the development of advanced eye tracking technologies?

Speech Recognition for Cell Phones Comes of Age

Related articles:

  1. Speech Recognition and Eye Tracking Combined