startups and code

Conversational UI - Why it will never be mainstream.

Back to home

Conversational UI is the new hotness out right now. For the sake of this discussion, I'm referring to voice conversational ui. I know that you can have a conversation with text, ask any millennial.  Hell, you can have a conversation with emojis only (I've seen it).

With Google Home, Alexa, Siri, and I'm sure many more to come, Conversational UI is the buzzword that so many companies are looking for right now. That with Machine Learning seems to be the future of all development if you have a conversation (pun intended) with anyone right now, you will touch on one of those two topics.

I will start with why conversational UI is important before all those internet trolls start yelling anonymously at me.

Conversational UI (CUI from here on), is valuable for vision-impaired individuals. It is truly life-changing for some who has visual challenges without a doubt. In addition to that, it is great in isolated environments where you can perform tasks without needing visual cues (like calling someone in your car).  It is also ok to start a process, but subsequent interactions can get easily misinterpreted.

Here are the problems with conversational ui:

It does not filter out ambient noise well-enough

So much noise in a city and multiple people talking makes the mics for CUI to have a difficult time understanding who is the intended speaker. We talk into a mic directly or a connected bluetooth device does help but it is not an ideal solution.

It does not target users by voice patterns

If someone comes over and talks to Alexa, it will not know that I didn't order those 10 MacBooks. I know Google Home is better about that, but we need it to get to a point where my voice patterns are as unique as a fingerprint. I want a certain level of security which leads to the next challenge.

Security of CUI will never be great

I can walk around recording audio of anyone around me and simply play it back to trigger the expected response. Imagine if someone was on their way to work and was logging into their bank, "what is social security number"?  And they say it out loud. PROBLEM! I can duplicate voices perfectly with some of the recording devices around. I know even with HDR cameras people can duplicate fingerprints also, but the amount of work to duplicate a fingerprint compared to turning out a voice recorder.

The more popular CUI becomes the more noisy the world will become

Imagine a world where CUI is adopted by everyone and you are trying to speak or worse listen to the response with everyone yelling at their random devices that are all connected. "Door Open, Call Marie, What time is it? Tell me about my fantasy team. What should I have for dinner? Call my wife. etc..." That ambient noise will be overwhelming very quickly. We will create new places where we can go to simply be quiet from all of it.

What do we do?

Think about the next generation of UI. Everyone references Minority Report with visually swiping screens in an AR/VR world. What is the ideal interaction with a device? Is it really conversational? If it is conversational, how will it stop talking and listen based on my conversation. We have dialogues with each other but we also pay attention to breaks in conversation that allows each other to speak and listen appropriately. Also, humans have a unique ability to tune out ambient noise and conversations. I am NOT good at that, I hear everything and it is very frustrating when people are talking over people. I think the "what do we do" is to consider the future of UI.  What if it knew your behavior and operated by what you wanted without having to ask. Your presence, time of day, season, who else is present in the room, and provide as much context as possible without requiring any direct command.

Think about it and go build an amazing UI.