Technology

Fluent.ai is a leader in speech understanding and voice user interface solutions.

How do we do it?

Based on over nine years of research in machine learning and artificial intelligence, and multiple families of issued patents, Fluent.ai’s technology is unique and unmatched.

Conventional speech understanding solutions operate in two distinct steps, first interpreting speech into text in a target language and then applying natural language processing to the text to determine the user’s intent. This approach involves large data collection and labeling efforts and requires a large amount of computing power to develop models in a single language. This approach also involves a number of disjointed modules, such as the acoustic model and a language model to map input speech to a string of words. These modules are not optimized together and hence do not provide optimal speech recognition performance. This becomes particularly evident in environments with noise or with variability in speaker accents.

Fluent.ai’s speech-to-intent technology employs unique neural network algorithms to directly map the incoming speech of a user to their intended action without the need to perform speech to text transcription. During training, Fluent.ai technology learns by directly associating semantic representations of a speaker’s intended actions with the spoken utterances. In a way, our models are based on the concept of vocabulary and language acquisition in humans. Unlike conventional automatic speech recognition (ASR), Fluent.ai technology does not require phonetic transcription. Our text-independent approach enables the development of speech understanding models that can learn to recognize a new language from a small amount of data, and allows the end-users to interact with the devices in a language of their choice. The user does not need to conform to any preset phrases and is free to choose words of their preference.

Competitive Advantages

002

Lightweight and Faster

003

Highly Accurate

001

Supports Any Language

005

Allows for Multiple Concurrent Languages

006

Requires a Small Fraction of the Typical Training Data

004

Better Performance in Noisy Environments

Competitive Advantages

002

Lightweight and Faster

003

Highly Accurate

001

Supports Any Language

005

Allows for Multiple Concurrent Languages

006

Requires a Small Fraction of the Typical Training Data

004

Better Performance in Noisy Environments

Leading Speech to Text Providers

Speech to Intent

A

B

C

D

logo (1)

Comparison

Accuracy

  • A
    50%
  • B
    75%
  • C
    50%
  • D
    50%
  • Fluent.ai
    100%

Noise Robustness

  • A
    50%
  • B
    50%
  • C
    50%
  • D
    50%
  • Fluent.ai
    100%

Improvements with User Feedback

  • A
    N/A
  • B
    N/A
  • C
    N/A
  • D
    N/A
  • Fluent.ai
    100%

Offline Performance

  • A
    50%
  • B
    N/A
  • C
    50%
  • D
    N/A
  • Fluent.ai
    100%

Recognition Speed

  • A
    25%
  • B
    50%
  • C
    50%
  • D
    25%
  • Fluent.ai
    100%

Customizability

  • A
    N/A
  • B
    N/A
  • C
    N/A
  • D
    N/A
  • Fluent.ai
    100%

Size of Typical Training Data

  • A
    +10,000 hrs
  • B
    +10,000 hrs
  • C
    +10,000 hrs
  • D
    +10,000 hrs
  • Fluent.ai
    <10 hrs

Speed to Launch New Languages/ Accents

  • A
    25%
  • B
    25%
  • C
    25%
  • D
    25%
  • Fluent.ai
    100%

Ability to Handle Mix of Languages

  • A
    25%
  • B
    25%
  • C
    25%
  • D
    75%
  • Fluent.ai
    100%
Wave Wave

Research

Fluent Speech Commands Dataset: A dataset for spoken language understanding research

At Fluent.ai, our primary research is focused on end-to-end SLU, i.e., directly extracting the intent from speech without converting it to text first. This is somewhat similar to how humans do speech recognition. Such SLU models have caught attention of others in the research community in recent years. However, there are not many SLU datasets readily available to the research community.

DONUT: CTC-based Query-by-Example Keyword Spotting

Authors:
Loren Lugosch, Samuel Myer, Vikrant Tomar
Conference:
NeurIPS 2018 Workshop

Tone Recognition Using Lifters and CTC

Authors:
Loren Lugosch, Vikrant Tomar
Conference:
Interspeech 2018

Efficient keyword spotting using time delay neural networks

Authors:
Samuel Myer, Vikrant Tomar
Conference:
Interspeech 2018

Enhance your devices with Fluent.ai's
offline, robust and multilingual voice AI engine

cta2
en_USEnglish
fr_CAFrench en_USEnglish