Validation Testing
Conducting usability testing helped me clarify the effectiveness of my prototype and identify room for improvement. However, a well-designed app is far from the only ingredient for a digital product to succeed.
According to IDEO, there are three criteria that are likely to predict a product’s success in the market: viability, desirability, and feasibility. A successful product sits at the intersection of these criteria.
In order to test if my product meets these criteria, I wrote out a list of risky assumptions. These are the foundational beliefs of my app, meaning if they were not met the product could be in serious trouble when it reached market. I organized these assumptions into the following matrix.
The top three risky assumptions I identified are as follows:
- The app will be able to attract a large enough user base to make the alert system effective
- Users will want to drop off their potentially contaminated samples for testing
- Users will want to log their symptoms in the app to identify potential sources of illness
When I began planning out steps to test these assumptions, the first seemed easy enough but the second two presented a major obstacle. Through my own brainstorming and talking with my advisors and colleagues, it seemed like there would be no way to test these two suppositions without willingly making someone sick or getting “lucky” (I use this word facetiously) with a participant coincidentally contracting food poisoning during a two-three week experiment. Therefore, I decided to shift to include the next two riskiest assumptions.
- The app will be able to attract a large enough user base to make the alert system effective
- Users will want to log in and scan their receipts every time they go grocery shopping
- Users will find the lighthearted visual design appealing, rather than trivializing the seriousness of the issues
Pretotyping Design
I took the final three risky assumptions and turned them into actionable hypotheses which I designed experiments for utilizing Alberto Savoia’s pretotyping framework.
I created an ad using Photoshop and ran low-cost campaigns on Facebook and Reddit to draw traffic to this landing page over the course of one week.
Results
I received 308 unique visitors in total, with a maximum of 123 in one day. 17 visitors signed up with their email, which evaluates to 5.6% of visitors, well above the 3% margin. In Alberto Savoia’s language I was able to get significant skin in the game.
It was validating to find that my users moved through the majority of tasks smoothly and efficiently. One user struggled to identify the community board icon but was able to make a post after figuring it out.
A/B Testing
The next assumption, I had been curious about since I first finalized my visual style. I wanted to test if users find the lighthearted visual design appealing, rather than trivializing the seriousness of the issues. As explained earlier, I strove to make my app’s style cute and approachable in an attempt to lighten the serious or frustrated mood users may be experiencing. However, I felt perhaps there was a fine line to ride in this sense.
To test this assumption, I created an additional Wix landing page utilizing exactly the same language and layouts, but with a more serious visual style evocative of a traditional healthcare website.
I switched to a serif font for headers and a deep blue common among other healthcare services I looked at. I also made an alternate ad using the serious style and ran a week-long A/B test, a feature built in to Meta’s ad platform.
Half the audience would be shown an ad in the original lighthearted style and the other half would see the updated serious ad. My hypothesis was that the lighthearted style would receive 15% more clicks or greater.
Results
I was somewhat surprised to find that not only were the results far closer than I thought, but the serious ad in fact did slightly better. The serious ad had a $0.33 cost per click vs. $0.41 of the lighthearted.
In terms of clickthrough rate (CTR), the serious ad drew 2.54% vs 1.98% for the lighthearted.
While these results did not meet my personal benchmark, I discovered that both ads scored well above the industry standard. As of 2022, the average Facebook CTR is 1.04% for technology and 0.83% for healthcare. My results were promising in terms of desirability.
Mechanical Turk
The final assumption I sought to test in lieu of logging symptoms was if users would remember to scan their receipts every time they go grocery shopping. This is an important component of my app to make sure that everyone who is at potential risk of a contamination can be properly reached and notified. I recruited six users (five of which also completed usability testing) and set up a simple experiment to validate this assumption.
Over the course of two and a half weeks, I asked users to send me their receipts after they went grocery shopping. They would send photos and in return I would act as proxy version of my app and send them any concerning information about their groceries. I was testing the hypothesis that at least 2 out of 6 testers would remember to log all of their grocery receipts.
I maintained a table in Figma to remain organized on all the receipts I was getting and where they were coming from, and frequently browsed the FDA’s web page for active recalls to help keep my testers up to date.
Results
Notably, I received all of zero receipts within the first week of my experiment. I could have written this off to nobody having gone shopping in that week, but I found that unlikely.
An important note is that I was sending no reminders or updates in the first week beyond the initial ask. This was an intentional decision because my app design did not incorporate push notifications reminding users to upload their receipts. I hoped it would be a self-motivated act, but again, this is exactly why I was conducting testing.
After the first week of no engagement, my advisor suggested that I change my strategy. For the remaining time of the experiment, I decided to send each user a weekly text reminding them to send their grocery receipt with an additional note about ongoing recalls.
As demonstrated in the table, once I started sending these reminders, the results improved dramatically. I received the total receipts of the time period from 4 out of 6 of my users, surpassing the 2 out of 6 mark. One user sent me 3 receipts over the remaining one and a half weeks. Despite the 2 users who did not engage, I felt confident about the success of this experiment.
Recommendations
Reviewing my initial planning sheet, my experiments in attracting a large enough user base and scanning receipts succeeded, while the evaluation of style preferences failed to reach my hypothesis by a small margin.
I feel confident in the results of the Fake Door experiment, but the other tests warrant further exploration and possible changes in future iterations.
For the grocery receipts, the Mechanical Turk experiment revealed to me the importance of building notifications into my app design reminding users to regularly upload their receipts. I could send periodic notifications, but I had the idea to take it a step further by enabling location tracking to notify users at the opportune time: right as they are leaving the grocery store.
The visual style assessment also suggests changes may be necessary. I was happy with the fact that both ad campaigns came in above the industry benchmark, but the lighthearted style did not beat out the serious style as I had anticipated. My suspicion for this is the power of convention: people are used to seeing a digital healthcare product in a certain way, and therefore have more trust in a product which meets those expectations.
I believe the small difference between the two campaigns does not imply the necessity of a complete design overhaul, or at least not before further testing. In a future iteration, I would consider dialing back the lighthearted illustrations and perhaps finding more of a middle ground between the cute and serious styles.