A podcast app with a built-in voice assistant to enable safer driving.
Concept demonstrated video.
Voicepod is a podcast application concept with a fully functional built-in voice assistant that goes beyond the current capabilities of voice-integrated podcast applications. In particular, this application (app) is meant to aid the hands-free experience, especially for drivers.
UX designer (low-fidelity)
Most podcast apps don't support in-app voice interactions, nor are they compatible with in-device voice assistants. This makes podcast apps difficult to use for drivers.
Create a podcast app with a built-in voice assistant to lessen drivers' touch interactions with their mobile devices and enable safer driving.
Using the Kano model for customer satisfaction, we compared 8 direct competitors (podcast apps) and 1 indirect competitor (audiobook app) to identify features that are basic expectations, satisfiers, and delighters.
We conducted 6 interviews, asking a total of 32 questions and had each participant demonstrate how they browse for podcasts in the app the current use. Our questions covered four areas: demographic information, driving experience, VA experience, podcast app experience. Then, we used the affinity diagram to organize and analyze our interview data.
Users browse differently.
Some rely on recommendations, while others just scroll around for an interesting title.
Some users prefer to complete an episode in one sitting, while others will start and stop an episode as needed.
Though many participants said they use a VA while driving, their device’s VA is unable to do more than open the podcast app.
Some care about episode duration, some don't.
No way to browse podcasts while driving.
Some wait for a stop/red light to look for something else.
It’s difficult to switch podcasts while driving.
Again, some wait for a stop/red light to use their phone or they switch to listening to music or the radio.
Limited assistance from in-device VA.
No way to mark & reference specific episode sections.
Participants expressed that they have no way to refer to specific parts of an episode, so they just have to make a mental note and hope they remember to relisten to the episode.
Based on our interviews, we created a story map to visualize a typical user’s interactions with the app and define the features we need to build. Each map is composed of three elements: activities, tasks, and features. An activity illustrates what users will do in a specific scenario. Tasks elaborate the activity into smaller actions, step by step. Then, features break down components of the app that users may use to accomplish the task. The three elements are arranged in a narrative flow to map the user’s story.
Scenario 1: Using voice assistant while driving.
Scenario 2: Combining podcast app with navigation.
We used Adobe XD to create our lo-fi prototype since it is currently the only tool that allows voice prototyping. We divided our design into four sections: browse, bookmark, library, and navigation.
We designed several ways for users to browse the application, including search-history based tags, recommendations, categories, and a keyword search. In the podcast page, users can see the rating, description, and a sortable episode list.
We created a bookmark function in the app, not only because none of our competitors have this function yet, but also because some of our interviewees expressed interest in being able to refer to sections of the episode later.
Users would like to subscribe, favorite, bookmark, and download their favorite podcasts and episodes, so we created a library where they can collect what they want.
All interviewees say they have experienced using a podcast and navigation app at the same time while driving. Also, some of our competitors have integrated their apps with navigation apps. As a result, we came up with the idea to have Voicepod recommend an episode that fits the user’s commute duration.
Using the behavioral patterns collected from our interviews, we produced an interaction storyboard to clarify the conversation flow when users interact with the podcast application with the voice assistant. It helps us prepare the conversation scripts for usability testing when we assume testers are voice assistants.
Utterance: What the user said
Response: What Voicepod says
Prompt: Question Voicepod says
Usability Testing Round One
What is the description of Female Criminals?
Small improvements to visual UI.
Participants prefer to receive information bit by bit.
Some participants prefer more info than others.
Participants confused between bookmark and liked.
Participants were unlikely to use the bookmark function.
5 participants/ 1 scenario/ 2 VA scripts
Task 1: Use visual UI.
VA interaction expectations.
(no script, users say what feedback they expect)
Use VA with script 1.
(Natural back and forth conversation. For example:
The true crime podcast where women aren’t just the victims.
What is the rating?
This podcast has 4.7 stars.
Use VA with script 2.
(Provide all information at once. For example:
Give me some true crime podcast.
The most popular podcasts related to true crime is Female Criminals.
Rated 4.7. The true crime podcast where women aren’t just the victims.
Use the VA to bookmark
Use visual UI to bookmark
*We found that the navigation app scenario was too easy to test since it only involved the user either choosing to select something or not. Not to mention, several of our participants said they don’t care about whether or not an episode’s duration fits their drive time. As a result, we got rid of scenario 2 in usability testing.
Usability Testing Round Two
6 participants/ 1 scenario/ 1 VA scripts/ Onboarding screen
Task 1: Use updated visual UI.
Task 2: Use VA with a loose script
( no category limitation, participants can say whatever they want)
Some participants want more info than others.
Participants gave commands as phrases or single words.
VA should prompt the user if they don't have a verb in their command.
If the user pauses for more than 10 sec, VA should prompt.
We referred to Amazon Alexa's voice user interface design guidelines to analyze our participants' command based on the conversation in testing. We used searching for a true crime podcast as an example to present the happy path. The “happy path” is the simplest, easiest path to success a customer could follow (or that we hope they follow). I followed the same conversation flow in the video to demonstrate the concept of the Voicepod later.
Since there is design limitation with Adobe XD, we can’t type in different voice commands at the same time for prototyping. Therefore, we decided to use the “keyword” to activate the Voicepod VA by following the happy path we designed. For example, “newest” instead of “give me the newest” or “newest one.”
After two rounds of testing, we finalized both our visual and voice interface designs.
We opted to make buttons bigger and casual browsing easier and faster. We did this because about half our participants from both testing rounds expressed that they would rather choose a podcast quickly and just switch to another episode if they don’t like it, rather than spend time browsing for exactly the right thing.
With testing completed and a happy path scenario created, we were finally able to design response screens for the Voicepod VA. Ideally, a user shouldn’t have to look at the screen while using Voicepod, but we have provided large buttons in case the user does want the screen. The “ ” means utterance which represents what the user says.
Sometimes, less is more.
The challenge of designing a VA is creating functionality while limiting cognitive load for users. We tested two methods of information delivery and learned that even though it requires more interactions with the VA, users prefered receiving less information at once (while using a voice interface). What we also learned, is that because of the added cognitive load associated with a VA, several participants didn’t want as robust functionality as we were prepared to give them.
Challenges of VA testing.
Going into this project, we didn’t realize how difficult it would be to test a completely low fidelity voice assistant. Since voice interfaces are newer and a less developed area than visual interfaces, we had a hard time finding authoritative guidelines or standards for how to test a VA. Especially one that wasn’t somewhat functional yet and built on data and machine learning. Finally, we conducted usability testing by preparing scripts and onboarding screens and pretending as VA by ourselves to collect natural interaction with participants.
The trap of design thinking.
Since we have to design the GUI and VUI at the same time, we found sometimes it is hard to jump out of the box of traditional GUI interface design. There is a vertical hierarchy information architecture in GUI, and we believe VUI should be another method to break away these layers. In other words, VUI should satisfy all requirements from users at once, such as the command like "Give me the episode related to Harry Poter with 20 minutes long." Instead, we still lead users bit by bit to search for the information. Again, since it's somewhat functional yet and built on data and machine learning which we didn't know a lot, we still followed the behavioral patterns found in the interviews to design VUI.