Hacking Chatbots: Experimenting with a Product Mindset

ustwo London

6 Apr, 2017 · 11 min read

When it comes to chatbots and voice assistants, there are still leagues of uncharted territory to cover. For technologists, there are new devices, practices and languages to conquer. Designers are confronted with an interface that threatens the screen as our interaction paradigm and throws up new opportunities to create in a non-visual medium. And product people across the board are presented with untapped potential to create things that make a meaningful difference, especially in terms of enhancing user experience and accessibility. The ground is primed for exploration – which is why we recently let some of ustwo’s brightest minds loose on a couple of the most well-used systems.

We began by assembling a group of developers and product designers together to experiment, test things, break things and find out more. Three days of experimentation to allow us to get to grips with this technology that isn’t part of our day to day work – yet. It was a chance for unbridled creativity where we can be messy, move fast and have fun.

But first let’s dig a little deeper into the wider potential of conversational UIs – and why we wanted to explore them.

The State of Play

For most of our lives, voice-controlled AI existed firmly in the science fiction realm. J.A.R.V.I.S, HAL, even Samantha in Spike Jonze’ Her – this is technology that lived up to its name: sublimely intelligent, self-learning machines with personality that think independently. And when real-life began to catch up – we were sorely disappointed (or relieved if you’ve seen 2001 Space Odyssey).

Currently, the most prolific example of consumer-facing AI are the mobile assistants we find in our phones but, more recently, we have seen a spate of hardware produced with the purpose of being virtual, voice controlled assistants: Amazon’s Alexa and Google Home are the most obvious examples. And let’s face it, these technologies have not met our science fiction fantasies. They misunderstand you, they get confused by background noise, they can’t decode accents – and they let six-year old's order $170 dollar playhouses.

But they are a work in progress and the technology is improving. More responsive, more discerning, more intuitive, more integrated, more human. All this and more is on the horizon. Truly Smart bots.

Moreover, the consumer demand is there. Voice assistant technologies are becoming more and more widely used – what was once a faddish feature is now firmly in the mainstream, especially among younger generations. 84% of 14-18 year olds currently use or are interested in using the voice-enabled digital assistants in their smartphone – and perhaps more surprising 50% of over 55s are saying the same. And it’s not only the number of people using our mobiles voice-controlled assistants that’s on the rise – the amount of US homes with a voice assistant is predicted to rise from 2.3% in 2016 to 11.5% by 2020.

Industry leaders are already opting to automate key services – they have recognised the potential to both expand offering, generate revenue and cut costs. Japanese Insurance Firm Fukoku Mutual Life revealed in would be replacing 34 staff members with AI designed to calculate insurance by March 2017. And Aviva has already developed a finance jargon-busting chatbot service for Amazon Echo.

In light of all these developments, it’s easy to understand why there was already interest in chatbots across the studio – with some already working on their own side-projects. It all seemed set perfectly for a hackathon-style approach, and we couldn’t wait to dive in.

Product Potential

To kick off, we split into three groups and began to ideate around what we could – and should – create. Even within our small experiment, the teams took vastly different approaches to the challenge and created a wide array of products – which in part demonstrates the breadth of this technology's potential.

Our first team chose to approach the problem incrementally - building a few bots and adding to the complexity each time:

They started with Comedy Bot which followed a tree-like conversation structure to deliver a punch line to the user.
Building on this, they aimed to use the tree-like structure to help distill copy-heavy information for users – this took the form of GovBot. The aim was to replace the form-filling UI you use to find out if you qualify for marriage allowance with a conversational experience that made the information more accessible. With this, they hoped to tap into the potential of voice assistants for making dense, bureaucratic processes clearer and more human – something that’s already proved successful with DoNotPay.
Next, they began to look at more conversational structures with What Am I? This was an educational guessing game aimed at kids where they would ask the bot questions in order to guess the animal.

In a similar vein, our second team created The Mission – an application for children to ask questions and learn about the planets. This idea was informed by some competitor research: when the team looked at the Alexa ‘Skill’ store they saw a gap for a more sophisticated, ongoing experience rather than a one-time tool or gimmicky game. Claudia, a product lead at ustwo, explains more about their project:

'We got to the point that we had inputted key information about the planets of our solar system and you could travel between them to learn more about them. There are different ways of interacting to this - actions (i.e. look around, look up) and commands (tell me about the distance to the sun). It’s interesting to explore how they can be combined to create an immersive story and factual details.'

For this team, it was their sole project and the idea developed at pace. User testing as they went along, they were able to create something not only technically solid but more experiential. The power of audio storytelling is well established, but what impact could something like this have for children's learning and imagination? Where this sort of storytelling overlaps with the rise in conversational UIs is still to be seen.

Our final team proved overly ambitious with their first idea. The intention was to create a bot companion to Moodnotes – a CBT journaling app developed by ustwo and Thriveport – which you could converse with about how you're feeling and it could help track your mood as well as offer prompts and advice. However, the technology we have just wasn’t advanced enough to bring the idea to life – at this stage we had to see the bot as a machine that receives commands rather than being capable of the deeper conversation necessary to make this work.

Instead, the team went back to the drawing board, deciding to look at how this technology can make our digital lives more seamless, and less cluttered, with their bot Peter Peterson. Upon request, this bot delivered a reading list of curated content to your inbox based on a keyword or phrase – ‘I want to learn about the Vietnam War’, ‘Tell me about dog breeds’, ‘I want to start transcendental meditation – where do I start?’ and so on. This is an interesting proposition considering the value that conversational UIs could have in creating a more connected digital experience which looks beyond an ‘app-centric’ approach.

Our final project was meant to solve something for the whole studio – the idea was to create FAMbot. FAM is our weekly Friday afternoon meeting and because there are always people working remotely we share a live stream. Setting up the live stream is always a bit hit and miss, so the goal was to set up a simple voice command – ‘Start streaming FAM’ – that would set up the projector, the streaming, the audio and post links to Slack in one fell swoop. This use case threw up a complex tech challenge. Both Amazon Alexa and Google Home bots and voice triggers run in the cloud– and have to be exposed to the internet. With FAMbot, that wasn’t an option, so after some lateral thinking, we found a solution where we could still use voice triggers in a more secure way. This involved setting up an additional server which was exposed to Google Home and using WebSockets to trigger the streaming server.

Technical Challenges

A big part of the hackathon was about giving us a chance to understand how to build for this technology. In part, that was about skilling-up but we also wanted to understand what the current scope and shortcoming are. Paradoxically, our teams faced both frustrations of getting off the ground as well as not being able to take concepts far enough – a potent combination for any developer.

Creating chatbots is more accessible than ever due a wide range of tools and frameworks from Google, Amazon, Facebook and Microsoft. We felt that by leveraging these tools we could concentrate on testing out our hypotheses far quicker and avoid getting bogged down in start-up difficulties. However, we quickly realised these tools had their limitations – as Nelson, a developer at ustwo, explains ‘It’s a new technology, the documentation available falls into the easy tutorial or quick setup of a sample project but creating more complex solutions requires more knowledge of the platform and documentation – something that I don’t think was very well presented or even developed.’

There were also obvious differences when it came to getting started on the devices. The one team using Alexa were able to get off the ground running relatively quickly. In part, this was due to the work that was done prior to the experiment to make sure Alexa development would work without any headaches – if we'd not sorted out AWS before then progress would be much slower. We had to arrange the Google Home equivalents on the day as we learned about requirements, slowing down our progress. Additionally, the Alexa facts example provided a great starting point for The Mission app where as Api.ai had very limited examples of bots that had similar back-end logic to the other projects so there was more to get our heads round in those early stages.

Once these basics were mastered, the runway they provided still proved too short to get some of the more ambitious features off the ground. Some of these limitations were centered around api.ai. Jamaine, one of the developers who worked on Google Home, explains:

In attempting to simplify the process of making your own bot, the api.ai website can sometimes seem a little obtuse. In order to do straightforward tasks you are able to do everything on the website and push to your device when ready to test.

However, for more complex and conversational style bots, you need a backend web service to process the user’s data. The api.ai website only allows you to provide one endpoint which is called for all intents, this means the backend must filter to figure out which intent it has received from the Google Home. In general getting the backend up running and figuring out what was meant to happen when – was a massive time drain.

You must define your intents (things you expect the user to say) on api.ai, and then provide parameters (keywords or values from the sentence you expected). While this works very well, it also limits you to working with explicitly set or predefined sentences.

The group working on Amazon’s Alexa had similar challenges - the sophistication and complexity just isn’t there yet:

The way the architecture works is by segmenting it in 2 part layer: Alexa interface communication (the device itself that will interpret our requests) and the business logic layer (where we request to execute the command). To assimilate how it works was easy, but to implement changes in the projects sometimes required us to make changes in both locations – the files hosted in the Alexa developer website and the files of the functions that are hosted in a lambda in AWS. That’s quite time consuming and it made us more vulnerable to human error.

For many of those involved, this experiment made clear what still needs to be done. For resident data scientist Yasir ‘the tech isn’t there yet for proper smart chatbots. Chatbots only work in certain boxed environments, since Natural Language Processing isn’t advanced enough to exactly figure out what you mean when you say a sentence.’ Language is subjective and depends heavily on context and personal experience. As the machines only have access to your spoken words, there are often huge gaps and discrepancies in its understanding of human speech.

Non-Visual Design

The hackathon was also a great opportunity to see how multidisciplinary teams will work in this new space. Often new tech is just seen as something for devs to explore but the rise of voice assistants, chatbots and non-visual interfaces more generally, presents some interesting challenges for designers – most of whom have primarily learned their craft through designing for screens. With these mediums even more consideration needs to be paid to elements like sound design, narrative design, storyboarding and, of course, user-experience.

Interestingly the teams – which included designers and product leads – realised the importance of user testing at an early stage. Chris, an ustwo developer, expands on their motivations for testing The Mission with people around the studio:

We wanted to see what kind of questions users would ask – prompted and unprompted. We learnt a lot from these sessions. It’s easy to forget just how many different ways of asking for the same thing there are. Some people were polite, some gave commands, some asked open ended questions. We needed to make sure we covered as many eventualities as possible and after every user test we plugged some gaps and covered just a little bit more each time.

The team see the next step as testing and validating The Mission outside the studio with its intended audience, young people. With this testing, it will be possible to get a better understanding of what ages they should be aiming this product at – something they were unable to explore within the hackathon itself.

1F2B7708-1800x1200

Where do we go next?

For us, our experimentation has only crystallised how far there is to go with chatbots and voice assistants. Speech Recognition Technology has improved vastly, but Natural Language Processing needs to be more sophisticated to make our interaction with chatbots feel human. There are also limitations when it comes to the complexity of what can be built – including how long it actually takes to build something – and the prospect of Smart, self-learning bots being widely and commercially available is still a few years away. Whilst at points we grumbled at these realities, we’re still very excited about where it can be pushed next – and still refusing to lose sight of those sci-fi fantasies. As Daniel, an Android developer at ustwo, put it ‘I want to push this technology to a point where we can have conversations with an AI. I want my own Jarvis. I know it’s possible, it’s only a matter of time.’

But the beauty of this technology is both in its versatility as well as how fast it is progressing. Already, conversational UIs are making digital environments, products and services more accessible – the value of voice assistants for blind people, the elderly, young children and people whose use of visual interfaces is impaired more generally cannot be overlooked.

In a more commercial sense, there are a wealth of ways this sort of technology can be deployed to save time, save money and improve user-experience. The vast amount of options that each team generated during their ideation phase went someway to demonstrating the amount of ways this technology could be used – and used well. There are still so many untapped use cases that would have an immediate and valuable impact. Can you automate customer support with a bot? Can you use it to make information more transparent? Can you help more people access your service? Can you help people access your service whilst on the go? Is there an opportunity to grow your voice and brand?

Even from our three-day experiment, we have come close to creating a viable product with The Mission and overcome some big tech hurdles to build a bot that securely automates part of our studio setup. This experience gave us a real taste of the potential of chatbots in both functional and experiential use cases and we look forward to the next phase of our discovery.

Special thanks to everyone who took part and organised the event: Yasir, Jamaine, Daniel, Narges, Claudia, Kevin, Nelson, Chris, Mark, Marley, Jarek and Jack.