Welcome to the age of ask: Understanding Voice Interaction

We have entered into a new era of Voice Interaction, engaging with voice-driven digital assistants like Google Home and Amazon Alexa to do everything from receive weather updates, order groceries and buy movie tickets.

At our recent ustwoTHINKS Sydney event, we were joined by ustwo Product Designers Scott Burns and Emily Loeb, and our special guest Sam Payne, Strategist at Google, who unpacked this new and exciting future and looked at where Voice began, where it is now, and the implications and challenges for brands and consumers.IMG 5342-1024x683


As humans, we can communicate with other humans using voice because we all understand context. There is an intuitive connection that occurs during the flow of an everyday conversation. Nuances, references, language, tone, and vocabulary all contribute to a natural form of communication. But what happens when the natural meets the artificial?

Can a machine understand the context of a conversation?

Past experiences interacting with machines and automated services have felt awkward, disengaging, and frustrating. Think back to having to answer ‘yes’ ‘no’ when responding to an automated phone menu when paying a bill, and having to repeat yourself multiple times before giving up.

Thankfully, what is enabling the ongoing success of Voice interfaces is that the accuracy of automated speech recognition has shot up 25% since 2010 – bringing it to an almost human level of recognition at 95%.

Andrew Ng, a professor at Stanford University, leading the Artificial Intelligence Group has said that “Most people underestimate the difference between 95% and 99% (speech) accuracy. 99% is a game changer.”

We are nearly at the stage when the level of computer performance and machine learning intersects with human ability and performance.Uncanny Valley is nearly upon us.IMG 5273-1024x683


‘What does your brand sound like?’

This statement is the 2018 equivalent of asking ‘what does my brand look like,’ or ‘what does my brand feel like.’ But beyond individual companies having the luxury and context to develop sound as a part of their branding (think the MGM lion roar), most if not all brands have never had a reason to consider sound – until now.

The use of voice as a piece of branding or marketing is as old as the voice-over. Morgan Freeman is routinely used by brands who want his warm, deep timbre to resonate with their customers. What we want is a performance from a voice – a particular tone, delivery, and dynamic range of expression that will deliver on the personality we are creating. Visual led branding speaks volumes; the same needs to be executed via voice. People don’t have a rational response when they interact with a brand via voice- it’s the myriad of subtle elements that make up a voice personality that keep users engaged.

What brands need to remember is that this ‘voice’ is not a temporary thing – just like Coca-Cola doesn’t change from red to blue to purple every year – a brand cannot switch what it sounds like too often lest it confuses and alienates its customers.

Coke’s iconic logo has lasted over 150 years – the question now is, can a ‘voice’ brand last just as long?IMG 5034-1024x683


1. Crafting the brand personality

Brands need to plan and create the brand personality- before their users do it for them. What does Bank A sound like? Is it male or female, does it have an accent, how does it respond to queries, does it make jokes or is it always serious? What persona will resonate with Bank A’s customers?

Brands will need to partner with people who understand the nature of product development in the context of user thinking, to help them develop their brand ‘voice’ early on in the process of building a Voice app. The risk for brands of not getting the personality right is letting their users down and never hearing from them again.

2. Public vs. Private

As smartphones have become an extension of ourselves, containing our most private and personal selves, we have taken privacy for granted. This becomes a problem when our most private queries become public in a matter of seconds. From a security point of view, there are considerations around vocalising bank account details, passwords, as well as the more trivial but potentially embarrassing details such as your last Amazon order.

Beyond that, there is an interesting challenge around the cultural change inherent in having voice assistants in the privacy of our homes. How will future generations grow up with having an all-knowing ‘person’ in the house? What if kids the world over ruin their Christmas’ by asking Google Home “Is Santa real?”

3. ‘Good conversation is hard to find.’

In its basic form, UX is about finding the best path from A to B; however, a conversation is not as straightforward, and flows from A to G to B and back again in a matter of minutes. If a user changes their mind during the ordering process and goes back three steps, the voice app will need to know how to navigate and follow the users’ intent.

The challenge is to develop a user experience that can track all the different threads and tangents of a typical conversation. Voice Interaction has some major catching up to do; we have had millennia of being social animals to develop the ability to follow complex conversations.IMG 5261-1024x683


As part of a three day Google workshop, a team from ustwo designed and built a Voice app prototype for Google Home called Venti. Envisioned as a mental health app, Venti aims to be utilised in times of stress and anxiety as a service offering guidance, support, and a friendly ‘ear,’ while connecting people to professional health care providers if required.

Using DialogFlow, a backend interface for building Voice apps, the team mapped the conversation flow prior to building a catalogue of keywords which would trigger a voice response from the Venti app, so that the word ‘stress’ would trigger a particular question.

Opening up Venti allows you to ‘vent’ to the app while it listens and responds as a friend might, prompting you to keep talking – ‘tell me more,’ ‘how does that make you feel’ – or it may offer you breathing or meditation exercises to help calm your nerves.

Crafting the right brand persona was integral to Venti’s success or failure- considering the emotional state of the user when using the app influenced each design decision thereafter. As a brand ‘persona,’ Venti needed to be warm, reassuring, helpful and calm, while stringently avoiding being judgemental, condescending, or acting as a replacement for a medical professional.Embedded content:

There were a number of key learnings that emerged when designing the Venti conversation flow:

  • Sounding human is more important than correct grammar. This may involve adding extra punctuation to add longer pauses or phrase sentences differently.
  • Don’t overdo Yes/No questions. This brings us back to the earlier point of bad phone automation. It will remind people that they are in fact talking to a machine.
  • Never blame the user. If the app fails, it needs to take responsibility for its failings and respond appropriately, otherwise the user will feel frustrated or judged.
  • Make it feel like the machine is thinking. Adding pauses between information delivery and questions will enhance its humanity.

The future of Voice is still unknown – many of the tools needed to navigate Voice Interaction do not exist yet. To successfully usurp the touchscreen as the best way to interact, brands need to prove that voice interactions are simpler and their use cases more compelling.

This event was hosted in the Sydney studio in November 2017.

ustwoTHINKS is a global event series where we explore challenging industry topics together with experts and practitioners in an engaging panel discussion. If you would like to attend our next ustwoTHINKS event, please email