DoorDash

SWIPE

Gamifying Bias Detection

REDDIT • Project

Tools ~ Figma, Miro, Google Suite

Encouraging everyday users of open discussion forums to audit and report AI biases through gamification.

Expert-led audits have made significant contributions in detecting harmful behaviors (e.g., biases), but they still fail to surface certain critical issues. Many biased algorithmic behaviors surface only in the presence of unanticipated circumstances or social dynamics in the contexts where a system is used, or due to changing norms and practices around the use of algorithmic systems over time.

Research

Project Definition

Prototyping

Reflection

Research

TAIGA Heuristic Evaluation Synthesis

TAIGA Website

In order to really understand our target audience and see how we could feasibly improve the way users reported biases on Reddit, we had to first undergo Heuristic Evaluation Synthesis in an online bias reporting platform called TAIGA. From this, we were able to identify that...

Users might encounter difficulties primarily in two areas:

Navigation

Comprehension

The back page button issue, where users can't easily return without using the browser's back button or top-left corner link, disrupts workflow and may lead to lost search results. This design flaw causes frustration and inefficiency, especially in task-oriented environments. Additionally, technical jargon like the unclear distinction between "Stable Diffusion" and "Google" creates confusion for users reporting AI biases, making the system harder to use or understand.

From testing the usability with our team and other users, we were able to identify both postiive and negative heuristics of the application.

TAIGA features a simple, straightforward layout with minimal buttons, making it easy for users to navigate. The main page offers only five options—generate, compare, show examples, logout, and the question mark—which enhances readability and user experience.

Findings

EVIDENCE:

Frequency: The simple layout with few buttons was common, with about ¾ of users appreciating the ease of navigation on both the main and generated images pages.
Impact: Users found buttons and actions easy to identify, with one user noting it was "straightforward due to fewer clicks."
Persistence:This simplicity was consistent across the site, including the +Report and Submit Report buttons.

The consistent green color scheme helps users easily identify buttons, signaling actions through visual recognition.

Findings

EVIDENCE:

Frequency: 2/4 users appreciated the consistent color scheme, making tools easier to find.
Impact: The color scheme enhances efficiency by helping users recognize buttons, which several users noted they liked.
Persistence:The consistent color scheme is maintained across the site, helping users recognize tools.

From the mandatory questions which were highlight and starred as red, it helped users dig deeper into their own insights and discover biases.

Findings

EVIDENCE:

Frequency: Rare, appearing only after users attempted to report bias in images.
Impact: Helped users understand their goals in uncovering bias, prompting them to think critically about generative AI.
Persistence: Not persistent, but reinforced the theme of bias throughout the image-reporting process.

In the initial page after logging in, the user has no direction on how to post/report a thread. There are no instructions or words that point towards "report" or "post" in the first page that the user sees other than a very small "Show Examples" toggle.

After posting a thread, the user cannot go back to the inital page for auditing images. Instead, they have to click on the WeAudit logo in the top left corner and it takes them way back to the beginning (extra steps to navigate back to the auditing image page).

On the auditing image page, there is a drop-down menu for choosing stable diffusion or Google. However, it isn't intuitive what stable diffusion is without looking up the word or why there is such an option until the user knows the definition. This is jargon that isn't familiar to users who haven't had much experience with generative ai. Hence, users may disregard the option altogether.

Project Definition

Affiniting Mapping

After discovering the main ideas of what users thought of biases in GenAI and how that could be reflected in Reddit,as a team,we evaluated interpretation notes and created affinity clusters.Some of the main findings from our usability testing helped us conclude the following:

UX/UI Fundamental Design

This TAIGA demonstration highlighted the importance of having an intuitive and clean interface. All of the group members reported difficulties in progressing due to some form of faulty interface. This can be further generalized as a base level requirement for all systems: indicating the need for a developed and intuitive interface free from bugs or visualization errors.

Empowering Reporting

Another issue raised by users was being uncertain if they were qualified to conduct any form of auditing. As a developer who is dependent on user feedback and auditing, it is crucial for any platform or system to provide ample support to their users, whether that be through textual support, or educational programs to inform users what they are searching for in the auditing.

Information Display

All websites, if not intuitive to use, should have enough textual guidance to direct users through a success run through. Our testing revealed that users found the website instructions inadequate, causing a lot of user friction and frustration. If any system wants to succeed with audience auditing, it must be straightforward and explained thoroughly.

Feature Accessibility

If the user is expected to monitor and audit, the available features should be properly indicated and intuitive. Important features such as, "Compare" and "Repost" actually caused uncertainty and hindered reports from being conducted. Therefore, in order for a system to succeed in its endeavors of audience auditing, it must have a clear indication of functionality within its report features.

This all contributed to our affinity mapping where we all created clusters and organized them by relevant topics, problems, and solutions. Below are a few of the brainstorming sessions we had with affinity mapping and diagramming.

Project Definition & Scope

In order to define our project and discover our research question, as a team we "walked the wall", a very effective method to identify important information and key stakeholders. Walking the wall really helped us identify the overall main issues from our initial user testing, and we were able to finally formulate a research question that would allow us to see the issue from the user's perspective.

Initial Identified Goals/Questions

How might we design the interface of community discussion forums such as Reddit so users are motivated to report biases they find?
What biases does AI-generated content on Reddit contain?
How might we spread awareness on potential biases in generative AI algorithms?

Narrowed Identified Goals/Question

How might we utilize gamification in open discussion forums, such as Reddit, to empower and motivate everyday users to report biases in generative AI?

What motivates the general public to self report or monitor biases in AI generation?
Motivation within mini games that are most attractive to participants?
Forms of AI that participants are more likely to notice in their usage on platforms?
Common issues with AI platforms and their specific experiences?
Frequency of interaction with open forum platforms like Reddit?

Chosen Contextual Method

For our contextual method, we decided to use the Cultural Model as our primary contextual method. Because our target audience are Reddit users, they have formed a strong identity as a platform base. By looking at related posts on AI Bias and reporting through the Reddit community as a whole, we can interview and understand participants and what motivates them to be using this platform.

Then by examining and identifying common themes or motivations, we can create a game to capture the user's motivation to report and monitor AI bias , tailored to the Reddit community and hopefully increase interactions. A limitation we foresee is that we might overgeneralize from too little user tests. To combat this issue, we plan to interact with as many users as possible, as well as look over a variety of forums and create quantitative comparisons to accurately portray the user base.

Prototyping & Testing

With our research, affiffinity mapping, and project defifinition, we were able to translate these user needs towards lo-fifi and hi-fifi designs. We mainly wanted to focus on a reward system of badges through gamifification. Initially, we needed to identify certain assumptions that we would be testing in our reddit design.

Navigate to Badge Collection

The user acesses their Reddit profile to view the badges section.This phase examines the clarity of navigation and ease of finding the badge collection

Receive a Badge for Reporting Biases

The user receives a pop-up after reporting a bias,rewarding them with their first badge.This step assesses the impact of the badge system on user engagement and motivation to report biases.

View and Interact with Badge Collection

The user returns to their profile to view and interact with their badge collection,verifying if the prototype encourages continued engagement and exploration through badge-based rewards.

Designs

Our prototype seeks to answer the question: How can we encourage everyday users of open discussion forums (Reddit, Quora, X, etc) to audit and report AI biases through gamification? We designed lo-fi prototypes that captures the process of a user finding and reporting a bias and earning a badge in the process. Over time, they can accumulate different badges (with different levels) for more interactive user engagement and user experience.

User Testing

In our prototype testing, we targeted individuals who have used Reddit between the ages of 18-24. We reached out to students and conducted the sessions in an area on campus suchas theHCIILounge,Newell-SimonHalltogatheruserthoughtsonourlo-fi prototypes. We also sent out surveys to learn more about badge designs and how it might encourage users to report AI biases.

We found that we needed to work on proving the credibility of the authority figures through our survey. To address this issue, we recommend emphasizing and displaying the effort and time needed to achieve a certain badge. We also recommend providing a forum that allows people with more experience (shown through the number of badges) to guide new users. From our testing, we found users look forward to an indicator of progress in addition to earning badges. We recommend creating badges for different categories of bias and having different levels for each type of badge. We would also recommend adding an element that signifies their current progress (ex: 10 more to earn this, etc).

Insights #1:

Badges will effectively incentivize users to report biases in generative AI.

We want to test this assumption as if it does not effectively motivate users or increase the number of reports, we can identify that the method of rewarding users through badges should not be implemented. The bias in generative AI in Reddit would still remain without users being self-motivated to report them. If this is the case, we can find different ways to motivate users through feedback loops or a different reward system.

Insights #2:

Users can accurately identify biases in posts that contain generative AI.

As the Lo-Fi Prototype already includes a pop up screen that rewards users with a badge, we are assuming that users are knowledgeable about generative AI biases and can correctly identify when to report. It is important to test this in our prototype, as if users do not know when to report biases, we would just be rewarding them badges without knowing if they completed the report accurately. Moreover, by testing this we can identify if we need to add an additional feature that allows Reddit to read through reports and choose certain users to reward badges to.

Insights #3:

The feedback for rewarding users badges will enhance user engagement.

In the first prototype screen, we implemented feedback for the user that thanks them for their report and ensures them that they are making an impact in the platform. By testing our assumption that this feedback will encourage users to continue reporting biases, we can decide whether or not we need additional detailed feedback. We can also see if users will attempt to accept the badge or regard it more as spam to analyze its effectiveness. As the feedback is pretty generalized, we can test to see if our current feedback makes a significant impact.

Insights #4:

Users will not spam report posts in order to receive badges.

By understanding a user's rationale for reporting posts with biases, we can see whether or not they will simply report posts in order to receive a badge or to make a true impact on the Reddit platform. We can also test why users will not spam report but instead be self motivated. If we discover the opposite during the course, we can understand we need to reanalyze our user's motivations and change our reward system.

Measures

Rating and reception of the badges
Time it takes to navigate to the badges
Shorter badge finding more efficient with navigation time
Barriers or benefits of getting to the badge collection seem reasonable
Testing the option if people would view the badges as a form of authority

Participants & Setting

Participant Requirements: Age 18-24, Reddit users (including readers without account)
Setting: April 2024, HCII Lounge, Newell-Simon Hall, Carnegie Mellon University
Go through the instructions, get consent form signed, read the test scripts, and show the prototypes to get the feedback

Reflection

This project was very insightful to me as I really delved deeper into the world of user research and learned more about the depths of research we must do beforehand about the user background and stakeholders. I was able to create designs and see how user feedback contributed to the final creation of our report an badge reward feature. In addition, working on a project that helped different audiences understand the repercussions of biases in generative AI made me feel as if I was making a true impact on important issues in the world. It was very interesting focusing on one specific topic of helping users uncover biases in generative AI.

In addition, working with a large group and collaborating with them was a great experience as we would have weekly meetings with organized agendas. We had great synergy and our collaboration and teamwork was very well-rounded. I really appreciated working with a great team and discovering important insights together.

Research

Project Definition

Prototyping

Reflection

Simple and Straightforward Layout

Intuitive and consistent color scheme

Newly discovered biases

Unclear Context and Instructions

Limited Exit Options

Error messages are not descriptive