The next mobile interface: when bots and voice merge

The pieces are falling into place for the next major interface: voice.

Over breakfast the other morning, my elder daughter leaned over and tapped my watch. It sprang into life, and Hazel was disappointed I hadn’t chosen the Mickey Mouse watch face. And then, she said

“One of the practitioners at nursery has a watch. But hers doesn’t have Siri on it.”

I know many people are still sceptical of the idea of voice interfaces, but watching my daughter wander into my study, ask Siri a question via “Hey, Siri” and wander out contentedly suggests to me that voice interfaces are going to be something that the next generation just expects. And so they should – the ability to communicate via speech is very natural to us, and computing power is finally getting to the point where that becomes practicable.

It’s not just going to be an alternative way of getting computers to do the things they always do, because it’s not just the voice recognition tech that matters here. There’s another set of tech that’s rising in parallel that will make voice interfaces really interesting once they – as they inevitably will – merge. And that’s bots.

When Bots meet voice

Everyone from Microsoft to Facebook are working on developing bots that allow you to interact with information through a chat interface. Right now, we’re moving into a surprisingly chat-centric world, one that looks surprisingly familiar to those of us who spent significant amounts of time in IRC or telnet-based chats back in the 90s. Slack, WhatsApp, Messenger and their ilk are all sophisticated evolutions of that idea.

But chatting with humans is one thing – chatting with bots is quite another. At their best bots usefully guide you through processes and decisions allowing you to quickly get done what you need to get done. Slackbot’s “on boarding” experience, guiding new Slack users through their profile setup is a good – if basic – version of that. And it’s easier to make bots work well in that context: a defined and contained set of problems that can be resolved on a pathway. While some are sceptical of the power of chat-based commerce, Chris Messina, developer experience lead at Uber, is clear that this could become the major interface for mobile:

[…] the MUI (mobile user interface) will be much more conversational than graphical, but we shouldn’t restrict our imagination of the future to what’s familiar and well understood. Instead, the opportunity is to embrace conversation as the channel for delivering your product, service, brand, assistant, API, or platform offerings—and then explore and experiment with what kinds of apps and expressions serve the user’s needs most appropriately with specific attention paid to context and personalization.

The problem with open-ended digital conversations

The problem with applying a voice interface to this is that creates an open-ended situation, which we’re a long way from being able to cope with. As Benedict Evans wrote in a recent post, the problem with Siri is that, when it fails, you see it failing:

A good way to see this problem in action is to compare Siri and Google Now, both of which are of course bots avant la lettre. Google Now is push-based – it only says anything if it thinks it has something for you. In contrast, Siri has to cope with being asked anything, and of course it can’t always understand. Google Now covers the gaps by keeping quiet, whereas Siri covers them with canned jokes, or by giving you lists of what you can ask. The actual intelligence might (for the sake of argument) be identical, but you see Siri failing.

Watch siri

I’ve been wearing an Apple Watch for a little under a year now – and one of the things it has become to me (and to my daughter) is a version of Siri that is only an arm raise away. I’ve become comfortably completely dictating messages and reminders via Siri, as well as turning on and off the Hue connected lights that are slowly proliferating around our home (more on those soon). The Watch is very good at reducing the friction of using Siri – and, interestingly, I find the voice recognition on the Watch to be far superior to that on the iPhone or iPad.

The emerging chat ecosystems

Amazon echo family

These are the pieces of Apple’s voice/bot interface starting to fall into place – but there are some other big bets out there, too. Amazon has had a stealth success with its Echo smart speakers. While they’re only available in the US right now, the original has been well-reviewed by users, and the company recently announced an expansion of the range. And anyone who has used an Amazon Fire to box or stick will speak to the quality of the voice recognition they are using. They were doing voice on TV well long before Apple got the latest Apple TV out.

Google – which probably has the best actually voice recognition tech here – lacks a device to drive this forwards. But there are plenty of rumours that they’re developing an Echo-like device internally – and, interestingly, the rumours suggest that it isn’t being developed within their troubled Next smart home division.

It doesn’t take much to start extrapolating where we might go from here. The idea of a semi-autonomous car that passes much of the ancillary control mechanism over to voice rather than buttons or dials will be a huge boon to road safety. A home where the lights and other devices can simply be controlled with your voice is a lovely idea – although I think about my daughter’s sense of mischief and familiarity with Siri and hope that some form of individual voice recognition comes into play – so that I can turn on our bedroom light at night , but not my daughter…

It could be that, for her generation, the sitting at the keyboard model of computing might feel as awkward as it did for one Montgomery Scott of the USS Enterprise: