Wizard of Oz

Behavioral Prototype

As part of a prototyping class at the University of Washington we explored conversational UI interactions with a voice assistant device through behavioral prototyping.

The Design: External Wizardry

As a team, we decided to explore conversational interactions with a voice-operated assistant. We wanted to make our own voice operated assistant because we reasoned that it would be hard to fool users in to thinking our “wizardry” was part of the polished VUI users are used to seeing in finished devices. For this reason, we constructed a low-fidelity voice assistant device out of a speaker and cables and named it Emma. This low-fidelity system helped convince users that the device was real and programmed by us. The set-up is shown below:
‍

To add to the “wizardry” magic, we hooked up the voice device to a computer with code on it. This code would be shown during the test, again to fool the user into believing the device was real and worked. All of our efforts to fool our user stemmed from our desire to get as accurate as feedback as possible. Without the user knowing the system wasn’t real, we would get feedback based on the things we wanted (conversation interactions and usability) instead of things we didn’t (physical device).
‍

The last part of the “wizardry” that involved the user was the initial introduction. Given by the facilitator (Nigel), he set the stage for the device, the code, and the test. He used large and complicated words to again underscore that we had the technical ability to program the device and that it was real. He explained the tasks to the user, and then ended with consent to record and questions.

The Design: Internal Wizardry

On the non-user (internal) facing side of the “wizardry,” we prepared a testing plan with three concrete conversational starters:Hey, Emma! What is the weather today?Hey, Emma! What should I eat today?Hey, Emma! What movie should I watch on Netflix?We decided to limit our behavioral prototype to a finite number of tasks because we wanted to balance the user’s flexibility with the feasibility of answering questions.There were two ways we could have pulled this off: We could have made a recording with general responses and played it to our participant or do a live version where we had a voice actor generate responses to the prompts. Our team felt that it would be valuable to improvise and do live responses, that way we could get a sense of what kinds of things a potential user might say or ask to spark a conversation.Based on that, the voice actor (Kay) would answer the initial question and then guide the user in a request by asking another question. From there, the conversation would be open-ended. The user could choose to ignore the prompted question, say “no,” or follow the path. To prepare Kay for questions on these tracks, we developed a script that gave her some information, prompts, and error handling to use while responding to the live queries. Since the exact flow of the interactions couldn’t be predicted, she was expected to Google (with the help from Dena, the scribe) the answers to the questions the user asked on the spot.

By examining these interactions, we hoped to determine if our system was usable. Was it conversational? Was it pleasant to interact with? Did it give the user relevant information?

An example of error states during the process is shown below:

Testing in a virtual world

The last part of the behavioral prototype design was determining how to orchestrate the testing session. As already mentioned earlier, we would each have three distinct roles. Nigel, the facilitator, would be responsible for being in person with the testing subject, giving them the testing introduction, and asking the ending questions. Dena, the scribe, would be responsible for real-time analysis of the testing session. By taking notes and observing the user’s reactions (both physical and voice), she would have an idea of the usability of our conversational interactions in real time. Kay, the voice actor, would be playing the role of the Emma (voice assistant). She would be responsible for responding to the user’s queries in real-time and using grammar and tone that matched with a user’s impression of voice devices.The actual session would be run over zoom, with Kay unmuted but video off with a profile picture of a voice assistant device. Dena would be on the zoom call as well, with video on just taking notes. Nigel would be in person with the testing subject (his roommate), on the zoom call as well with camera and audio on. He would also have the code pulled up on his laptop and the speaker beside him for the introductory setup.

The Prototype

We used Nigel’s roommate (Spencer) to test our conversational UI. A picture of the setup is below.

Spencer chose the weather prompt first, the meal prep one second, and the movie one third. Within each topic he gave a variety of responses that were positives (Yes’s), negatives (“No’s), and questions. Kay was able to respond to all of his questions in real-time, sometimes with a bit of Googling. For example, the weather in Seattle, Portland, chance of rain, a recipe for arugula and pesto pasta, the Moneyball synopsis and the Moneyball actors were all Googled in real-time.The testing session highlight video is included below, followed by the full recording.

Analysis

Overall, I believe our prototype with the behavioral testing was successful. Nigel asked Spencer questions at the end to gauge usability, and received the following (paraphrased) answers:
‍
1. How well was the voice interaction able to answer your question on a scale of 1 to 10? One is “not at all,” and 10 is “perfectly.” — “Honestly, like, I wasn’t expecting it to be that responsive. So, yeah, I’ll give it a 10.”

2. Explain how your interactions with the voice user interface felt like in terms of conversation? — “Um, it felt pretty good. And like, giving me more information, which I really liked. Because I was like, yeah, I probably would want to hear about that. So I liked it.”Overall, how do you feel about the prototype?

3. Is there anything you would improve? — “Not really, I can’t think of anything. I’m not really sure what more it can do. But based on like, what I kind of interacted with. I thought it was solid.”

‍Throughout our prototype evaluation, I observed how our participant felt when interacting with “Emma” — they felt genuinely impressed with the capabilities and were not expecting the level of detail “Emma” was able to provide with her answers. Through this method of Wizard of OZ, we were able to understand what kinds of things should be incorporated in a potential VUI system- the ability for the system to generate questions to lead or help a participant out.Based on the positive feedback, I believe we have met our goal of usability. It appears as if effective design strategies were the introductory story to set the stage, Kay’s responses that made relatively few mistakes, and the depth of information given.Another indication of testing success was that Spencer was very surprised when Kay was revealed as Emma. The wizardry successfully fooled him!

Voice User Interface

Wizard of Oz Prototyping

Client

My Role

Team

Tools