Fluent.ai Inc. (www.fluent.ai), creator of the world’s first acoustic voice interface for intelligent devices and user interfaces, is proving the advantages of its platform against leading speech-recognition systems with consumer-electronics OEMs and telecom service providers around the world.
Fluent.ai’s artificial intelligence product, the Fluent AI Engine, is an enterprise-grade voice interface that uses a sophisticated neural network and machine-learning technologies. Fluent AI learns language on the fly like children do – by learning to associate a specific sequence of sounds with an intended command or action. This makes Fluent AI the first entirely acoustic voice interface – eliminating the costly, complex and often inaccurate step of converting speech to text typical of AI voice assistants and speech-recognition systems.
In recent third-party trials, Fluent AI was tested against AI voice assistants from the industry’s largest players. In the first, the Fluent.ai team used a smaller context-specific vocabulary, working offline, against the larger speech-recognition system connected to the cloud. In the second, both systems ran offline with the same vocabulary.
Conventional speech-recognition platforms claim an accuracy rate of up to 95 per cent, but this assumes an ideal scenario – using standard American English with no accent in a clean (or low-noise) environment. In these two trials, Fluent AI used speakers with different accents, as well as background noise with a signal-to-noise ratio of five to 20 decibels. Under these conditions, Fluent AI maintained an accuracy rate of 96 per cent, while the other industry players could achieve only 72 per cent, on average.
Global consumer electronics OEM chooses Fluent.ai for offline and on-device
A global consumer electronics OEM needed an on-device voice interface with a custom vocabulary of 50 commands for a smart watch that would allow children to communicate with their parents. The watch had to reliably compensate for noisy conditions as well as for a child’s voice and speech characteristics that are higher pitch and less enunciated than an adult’s. The OEM also required full functionality offline with a small memory footprint, as well as a customized wake-phrase command that eliminated the need to push a button to talk.
“Conventional speech-recognition technologies simply couldn’t achieve the performance this OEM needed under typical use conditions,” said Niraj Bhargava, CEO of Fluent.ai. “Only we could deliver the accuracy and functionality the OEM required, in a small footprint without the need for an Internet connection.”
European telecom service provider chooses Fluent.ai for multilingual cloud capabilities
A telecom services provider with operations in a dozen countries wanted to roll out a responsive and reliable multilingual automated-voice-response (AVR) system so its customers could ask billing questions, top up pay-as-you-go phone plans, and change service plans at any time at their convenience.
Conventional speech-recognition systems require months, even years, to add support for additional languages or accents. In only 10 weeks, Fluent.ai created fully functional custom models for multiple languages as a proof of concept. This included data collection, training and GUI development.
“Fluent.ai demonstrated how customer-service operations can quickly roll out AVR support for additional languages at a fraction of the cost, and in a fraction of the time, compared with conventional voice-assist platforms, for a much shorter time to market,” said Bhargava. “For our telecom customers, this extends beyond call centre operations to include smart devices for the connected home and the Internet of Things.”
Everyone deserves to be understood
Conventional speech-recognition systems require thousands of speech hours and hundreds of speakers to program a single language or accent. This scale of effort is required regardless of the application and the size of the vocabulary required — it’s all or nothing. Even then, voice-recognition accuracy decreases dramatically with background noise, accents and speech impairments.