On the non-user (internal) facing side of the “wizardry,” we prepared a
testing plan with three concrete conversational starters:Hey, Emma! What is the weather today?Hey, Emma! What should I eat today?Hey, Emma! What movie should I watch on Netflix?We decided to limit our behavioral prototype to a finite number of tasks because we wanted to balance the user’s flexibility with the feasibility of answering questions.There were two ways we could have pulled this off: We could have made a recording with general responses and played it to our participant or do a live version where we had a voice actor generate responses to the prompts. Our team felt that it would be valuable to improvise and do live responses, that way we could get a sense of what kinds of things a potential user might say or ask to spark a conversation.Based on that, the voice actor (Kay) would answer the initial question and then guide the user in a request by asking another question. From there, the conversation would be open-ended. The user could choose to ignore the prompted question, say “no,” or follow the path. To prepare Kay for questions on these tracks, we developed a
script that gave her some information, prompts, and error handling to use while responding to the live queries. Since the exact flow of the interactions couldn’t be predicted, she was expected to Google (with the help from Dena, the scribe) the answers to the questions the user asked on the spot.
By examining these interactions, we hoped to determine if our system was usable. Was it conversational? Was it pleasant to interact with? Did it give the user relevant information?
An example of error states during the process is shown below: