“Ok Google, set up an alarm”. “Hey Siri, text mom”. “Hey Alexa, play the latest pop songs”– these are just a few simple phrases we use to converse with our smartphones to help us execute tasks in our daily lives. Our phones are in fact make smart, thanks to the incorporation of such fascinating features. AI/ML developers continuously push the boundaries of innovation and develop path-breaking technologies, which simplify human life.
One such innovation in recent times is AI-power Digital Assistant. We have Apple, Google, Amazon, Windows develop their own conversational Digital Assistant, to enhance user experience. Siri, Alexa, and Cortana are the names of such assistants that have taken productivity to the next level.
“Ok Google, what makes you so smart”.
“Hey Siri, how are you knowledgeable”.
“Hey Alexa, why so intelligent?”.
Did you ever ponder about asking the voice assistants such questions? Don’t worry. We shall have a look at the AI/ML concept behind them, which is the key ingredient to their smart behavior.
How does a Voice-based Digital Assistant work?
The digital assistant take in the textual and audio input. In the case of audio input, it is first converted to textual form. The text is then parse, to bring out the syntactic structure of the text, in order to realize the underlying grammar. Essentially, the Automatic Speech Recognition(ASR) system works in the backend to record input and break it down to phonemes.
however, It is easier to understand the meaning of phonemes. Since a computer doesn’t comprehend human language, it only looks for keywords(phonemes, in this case) in our input. Those keywords are analyze, to match with its pre-train models built by extensive training of massive datasets, which give a well-defined output.
To go into the details-
- NLP (Natural Language Processing) – deals with the parsing. It essentially converts the audio input into the textual form and also parses it to extract keywords and human emotions using LSTM.
- LSTM (Long Short-Term Memory) – it is a part of complex recurrent neural networks that enable artificial intelligence models to imitate and comprehend human thought.
- DNN (Deep Neural Networks) – Large DNN models are use to train huge datasets of human language data against which the input is match up. thus, They make use of transfer learning for data ease.
- Transfer Learning – it is a system of utilizing the knowledge gather by solving one problem, to solve another problem. It cuts down on doing repetitive tasks.
- NLG (Natural Language Generation) – so, it essentially does the opposite of NLP. It is use to generate text and speech, thereby enabling the voice assistants to answer our queries.
What next for Siri, Cortana, Alexa, and Google?
One of the earliest ones to innovate, Apple introduced a feature that automatically wakes up its assistant, without the requirement of the user physically using the device. It makes use of a DNN model for the acoustic pattern of the user, to plot it over a probability distribution.
If the confidence score suggesting the voice belongs to the user is high enough, only then Siri wakes up and responds .With its interests spread across diverse fields, Amazon took a sharp decision to make use of transfer learning, to use data models from other fields, to support its latest pet project- Alexa.
It essentially means that, if Amazon has a data model that can identify cars, the same model’s knowledge is transfer to Alexa, which can now also identify cars.
In recent times, even companies are working on deploying voice-enabled payments as well. Although it’s too early, for people to use the feature, owing to security concerns. thus, it might be the next fad among Digital Assistant lovers.
How the smart assistants are actually used
Concerns
No technology exists without its own set of drawbacks. Almost 40% of users feel that their security is being breached, every time they make use of voice-based agents. It might be a valid concern, as several incidents have been reported wherein it was found that voice assistants were recording audio, transaction history, etc., and saving them on the cloud.
thus, Companies have retorted back saying that it is a mechanism for the assistant to further enhance user experience. However, in a few instances, cybersecurity experts found that the data on the cloud was hacked . This raises several concerns.
The power of AI-based assistants is huge, help in tasks , which would drain out a substantial amount of time. Only when security and privacy are regulated, we can see AI-based assistants reach their prime popularity.
Written by: Viivek Uppalapu
reviewed by: Kothakota Viswanadh
If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs