The current generation of virtual assistants still struggles to be dependable. The next will use advances in machine learning and natural language processing to offer users a ridiculously seamless experience.
Assistants like Siri, Google Now, and Amazon Alexa have been around for less than a decade, and they’ve already become incredibly pervasive. People love the idea of an intelligent assistant who can do whatever one asks. But if you’ve ever tried to test the limits of any of these assistants, you very quickly realize that they’re just not quite there yet. As JR Raphael put it in ComputerWorld:
The truth is, for all of their progress and the many ways in which they can be handy, voice assistants still fail far too frequently to be dependable. And the more Google and other companies push their virtual assistants and expand the areas in which they operate, the more pressing the challenge to correct this problem becomes.
What does the future hold for these assistants? Is it more skills? More features? My perspective is that, while these mainstream devices will slowly improve, they’ll do so at a much slower rate than we’d like. Since these assistants are driven purely by natural language processing and machine learning, failure is a natural result of a lack of more complex skills that have yet to be developed. The cost of employing human operators to ensure that requests get fulfilled would be untenable — even for tech giants like Apple.
In January of 2019, Amazon announced that more than 100 million Alexa devices have been sold. The rollout of these devices — backed by massive marketing campaigns — will result in the widespread adoption and societal normalization of voice assistant technologies. But how will they improve?
A HYBRID FUTURE
A number of companies in the assistant/concierge space have sprouted up in the last few years. Companies like Velocity Black aim to create high-end concierge services geared toward the ultra-wealthy, while others, like the now-pivoted GoButler, tried to create a free concierge service for the masses.
These emerging companies are focusing on building solutions that create real value for users, going beyond the simple delivery of information. They don’t just help you with your calendar, but rather arrange things for you in the real world. What most of these companies have in common is that they are all almost entirely powered by humans sitting at a computer, tapping away at customers’ requests. This means that they are ultimately bound by human labor and capacity constraints that perpetually confine their ability to scale, limiting how useful they can be to masses of users. As a result, the law of supply and demand dictates that their services must get more expensive and less accessible, much like we have seen with Magic, another personal assistant, when users tried to sign up in droves.
The future is filled with dreams of an assistant driven entirely by artificial intelligence. Short of the discovery of artificial general intelligence, however, the immediate future will most likely be hybrid: While humans will still be part of the equation, they will likely complement cutting-edge, ever-learning AI systems that gradually take on more and more jobs. This will help expand capacity and lower the cost of these services to users. Over time, we’ll reach a point where a combination of artificial intelligence systems and strategic partnerships with existing applications are powering the vast majority of requests customers are making. We don’t know what the future of purely AI-driven solutions will look like, but recent advances in machine learning give us clues for features that might make the cut.
WHAT NEXT-GENERATION ASSISTANTS WILL BE ABLE TO DO
- UNDERSTAND EMOTION
Companies like Affectiva have built technologies that leverage recent advances in edge computing that allow them to train and run complex deep learning models on users’ local hardware devices. These developments will enable virtual assistants to effectively analyze voice in a very cost effective manner, then generate a list of emotional characteristics associated with a short audio snippet. In short, assistants could leverage technologies like this to understand how we’re feeling when we speak to them and offer help, or tailor responses to be sympathetic.
- UNDERSTAND WHO YOU ARE
IBM Watson’s Personality Insights tool leverages recent advances in natural language processing and models of psychological traits to analyze simple user speech and user-generated text. The program creates a thorough picture of a user’s personality, associated tendencies, and even drivers of purchase. Assistant services could use technologies like this to “understand” us better, and use this information to modulate the way they speak to us.
- UNDERSTAND WHAT YOU WANT
My team and I at Wing are building technologies that allow users to ask us to do things, teach us about their preferences, and automate what they want on the fly. Our work employs novel techniques that leverage generative adversarial neural networks (GANs), a system where multiple competing neural networks take in vast amounts of data and compete with each other to generate the most human-like action against collected training data. In short, the system can learn how to do things in the real world in response to human speech. An example would be saying “Good morning,” and your assistant ordering you your favorite breakfast to arrive at work, providing you with a morning briefing, and letting you know that your Uber to work will be outside soon.
- GET ANYTHING DONE FOR YOU
Most of these modern assistants go far beyond simple question and answer. In fact, they work with hundreds of partner companies through APIs to get things done on your behalf.
These are just a few examples of what next-generation assistants will be able to do by leveraging machine learning and natural language processing. Ultimately, the capabilities of these machine learning systems are driven by usage; the more data they get from a large group of users, the faster they can improve.
WHAT ABOUT PRIVACY?
Some of the things these new assistants will be able to do will fundamentally change the relationships we have with our devices, spark major conversations about privacy and make people more carefully consider what they should be sharing with these assistants. Will we still be able to trust that the companies behind these assistant services have our best interests at heart? Large corporations offering their assistant services for free or for an incredibly low price inherently have something to gain; for example, having assistants purchase goods and services exclusively from their parent companies. Hopefully, we’ll gravitate toward companies that create privacy-centric services for their users.
I believe the future of next-generation assistants lies in hybrid systems driven by humans and AI. They will do things for you in the real world (and with your consent), use your data to learn about you to make informed decisions and behave as a real assistant would. Advances in natural language processing, machine learning and the speed and accessibility of training complex deep learning systems are what’s driving these assistants to get smarter every minute. A real-life Jarvis-for-all is not too far away anymore.