Designing Against AI Harm

Trust & Safety AI Governance

This is a design framework for AI product teams. Teams can analyze harmful AI patterns, identify risk earlier in development, and apply preventative design principles before systems reach users.

More Project Details

Timeline 3 months · 2026

Team 8-person research group · University of Washington HCDE Phd Dep

My Role

Led case coding and dataset structuring
Identified and synthesized recurring harmful patterns
Translated research into actionable design framework

Methods

Qualitative Coding Affinity Mapping Taxonomy Pattern Synthesis Secondary Research (Incident Reviews) Comparative Cross-Case Analysis Competitor Analysis

// result

A reusable design framework derived from 155 real-world AI failures.

Each failure mode captures a recurring category of design breakdown. Within each, patterns identify the specific product decisions and missing safeguards that repeatedly contributed to harm.

6 failure modes

19 patterns

155 real-world cases

// impact

Built for every corner of your product team.

The framework is designed to slot into existing workflows — from early concepting to post-launch review — wherever product decisions about AI are made.

Designers

Use the framework during concepting and design reviews to identify which failure modes a feature may introduce, then pressure-test interaction patterns, safeguards, and escalation flows before release.

Researchers

Apply the taxonomy to audit existing AI experiences, structure evaluative critiques, or build on the coded dataset to validate, extend, and refine emerging patterns.

Developers / Engineers

Reference the framework when implementing AI behaviors, system constraints, and escalation logic to ensure technical execution aligns with intended safeguards.

Product /
Business Leads

Use the framework to facilitate risk discussions, prioritize mitigation tradeoffs, and align teams on where product goals may conflict with safety boundaries.

// research question

Most existing resources

Starting point: literature & abstract values

"How do we design for good?"

Our guidelines

Starting point: documented failures

"How do we design against harm?"

// Process

How I turned real-world AI failures into a reusable design framework.

Messy Cases: Pattern Finding.

155 incidents, no structure imposed. Just what the system did, and who it hurt.

AIAAIC Case Review Competitive Analysis Gap Analysis

Before structuring a single case, we mapped the existing landscape. A review of AI design guidelines from Microsoft, IBM, and others revealed a consistent gap: most frameworks start from abstract values — what good AI should look like — rather than from documented evidence of what harmful AI actually does. To fill that gap, we turned to AIAAIC and read through 155 real-world harm incidents directly. Real-world AI failures are messy, inconsistent, and often reported with incomplete context, so I helped structure each case into a standardized dataset spanning 11 dimensions — from technical capability to harm severity to who initiated the event. This transformed unstructured documentation into analyzable research data and gave us a consistent foundation for pattern finding.

Harmful_AI_Repository → IRR Dataset · 155 cases · hover to pause

Harmful_AI_Repository_Dataset_IRR.xlsx

Case ID	Headline	AI Capability	Harm Type 1	Harm Type 2	Harm Initiator	AI Involvement	Intent	Severity	Principle

Defining What Counts as a Pattern

I didn't treat every repeated issue as meaningful. A pattern had to meet all three criteria before entering the framework.

Before identifying patterns, we had to agree on what counted as a failure. Each case was examined through three lenses: who initiated the harm (AI or human), how directly the AI was involved, and whether the harm was intentional. This framing helped us distinguish between AI systems that caused harm on their own versus AI that enabled human actors — and between failures that were designed in versus ones that emerged unexpectedly at runtime.

Repetition alone wasn't enough. To avoid turning every recurring issue into noise, we established clear criteria for what qualified as a true design pattern.

Recurring

Appeared across multiple incidents, products, or contexts, not just one isolated edge case.

Design-Rooted

Could be traced back to a product, interaction, or system design decision, not solely model performance.

Preventable

Could be mitigated through intentional design changes, constraints, or workflow decisions.

Identifying patterns wasn't a solo exercise. Using affinity mapping and thematic coding, each team member independently extracted recurring signals from the 155 cases — manually grouping incidents by the design decisions that contributed to each harm, not by the harm type itself. We then came together as a group to compare our groupings, resolve disagreements, and consolidate overlapping clusters into the patterns that made it into the final framework.

Grouping by Cause, Not Outcome

The most meaningful patterns didn't emerge when I grouped cases by what happened. They emerged when I grouped them by why they happened.

The team brought together designers and researchers at varying experience levels — which meant synthesis wasn't just analytical, it was collaborative. As one of the more experienced members, I helped walk the group through the research process and align on shared definitions before we combined our individual findings.

One of the earliest and most consequential decisions was how to define intention and initiator — whether a harm was deliberately caused, and whether the AI or a human set it in motion. There's no objectively correct answer for every case. But leaving those calls to individual judgment across 8 people would have produced inconsistent coding that reflected different interpretations, not different data. I pushed for explicit shared definitions precisely because the ambiguity was real — clearer guidelines don't eliminate grey areas, they make the grey areas visible and discussable.

The same instinct drove the reframe in how we grouped cases. My first pass clustered incidents by harm type — privacy, misinformation, physical harm. While intuitive, those clusters stayed surface-level and failed to explain why the same harms kept recurring across completely different products. I pushed to shift the lens toward contributing design decisions instead — the product choices, missing safeguards, and interaction patterns that enabled the harm. That reframing surfaced deeper, reusable patterns like overtrust, hidden consent, and missing escalation.

Principles

Translating root causes into principles for safer AI product design.

Final Synthesis: A reusable design framework derived from 155 real-world AI failures. Each failure mode captures a recurring category of design breakdown. Within each, patterns identify the specific product decisions or missing safeguards that repeatedly contributed to harm.

6 failure modes. 19 patterns.

// principles

The Framework

Expand each failure mode to view the patterns and design guidance within.

Scope without limits

4 patterns 37 cases

↳The Xiaomi autopilot lives here.

AI can act beyond its safe scope without oversight. When no boundaries exist, users may face safety, legal, or ethical harm — often without realising the AI was never authorised to act. Some actions may already be irreversible.

Principle 1 — Design Against Unbounded AI Authority

The AI was given a task with no model of where it ended — no scope boundary, no authority limit, no stopping condition.

Escalate When Risk Is Detected

Prevents the AI continuing normal operation when consequences could be serious.

When an AI detects signals of harm, uncertainty, or scope ambiguity, it should surface — not proceed. The system needs a defined threshold at which it stops acting and hands off to a human.

Require Human Review for High-Impact Outputs

Prevents serious outputs being finalised or acted on without human validation.

Any output that materially affects a person — a sentence recommendation, a hiring decision, a content removal — should require a human to confirm before it takes effect. Confidence is not a substitute for review.

Separate Suggestion from Execution

Prevents the AI carrying out actions without explicit user consent.

AI should propose what it intends to do and wait for confirmation before doing it. Suggestion and execution must be distinct steps, especially for irreversible or high-stakes actions.

Block Actions Outside Approved Boundaries

Prevents the AI operating beyond its defined safe scope.

The system should define a boundary at design time and refuse to cross it without explicit re-authorisation. Any action that would expand the AI's reach beyond the agreed scope must require a new permission grant.

Invisible capability limits

3 patterns 34 cases

↳The Abbott glucose sensor lives here.

Users overtrust AI because the product's limits and confidence levels are hidden. When nothing signals unreliability, users treat outputs as authoritative and make harmful decisions as a result.

Principle 2 — Design Against Blind Trust

The system gave no signal that its outputs might be wrong — no confidence indicator, no failure mode disclosure, no reliability boundary. Users had no way to judge when not to trust it.

Declare Capabilities and Limits Upfront

Prevents users arriving with inflated expectations and relying on outputs the product cannot reliably produce.

The system should communicate what it can and cannot do before a user relies on it for something it isn't equipped to handle. Capability limits should be a feature, not a footnote.

Reinforce Limits in Sensitive Contexts

Prevents users treating AI output as a substitute for professional judgment in high-stakes situations.

When a user's query enters territory that the AI should not own — mental health, legal advice, medical decisions — the system should recognise the context and shift its posture, not just its output.

Show Confidence and Credibility

Prevents users treating all outputs as equally reliable when reliability varies across responses.

The AI should communicate how confident it is and where its information comes from. Credibility signals — source attribution, uncertainty indicators, confidence ranges — should be part of the default output, not an optional disclosure.

Output without origin

4 patterns 41 cases

↳The Ukrainian drone strike lives here.

AI-generated outputs can appear indistinguishable from human-created material. When labels are absent or lost during export, machine-generated content gets treated as authentic — spreading misinformation, enabling impersonation, or creating legal harm.

Principle 3 — Design Against Misattributed Content

The content carried no signal of its origin. No label, no provenance marker, no disclosure that a machine produced it. It entered the world indistinguishable from human-made truth.

Label AI-Generated Content Clearly and Persistently

Prevents AI outputs being mistaken for authentic human or real-world material.

Every piece of AI-generated output should be labelled as such, and that label should persist through editing, sharing, and embedding. Attribution is not decoration — it is information.

Preserve Attribution Across Sharing and Editing

Prevents AI labels being stripped away when content is exported, modified, or passed to another context.

When AI-generated content is shared, exported, or embedded elsewhere, its origin should travel with it. Systems that strip attribution on copy-paste or export are systems that enable misattribution.

Record Human and AI Contributions

Prevents unclear or contested accountability for decisions and edits in AI-assisted workflows.

In any workflow where humans and AI co-produce content, the system should maintain a clear record of which parts came from which source. This matters both for accountability and for users' understanding of what they are presenting as their own work.

Warn When Outputs Resemble Protected Material

Prevents users unknowingly sharing or acting on content that could infringe copyright, enable impersonation, or create legal and reputational harm.

When an AI output closely resembles copyrighted text, a named individual's likeness, or protected material, the system should surface a warning before the output is used. The user should know what they are working with.

No pause condition

3 patterns 14 cases

↳The Omnilert gun detector lives here.

Fast interactions can allow harmful actions before consequences are considered. Single-click execution and absent stopping points increase misuse. By the time a user reviews what happened, some actions may already be irreversible.

Principle 4 — Design Against Impulsive Harm

The system had no threshold for recognising harm in progress — no condition under which it would stop, surface, or hand off control to a human.

Insert Frictions Before High-Risk Actions

Prevents users or automated processes executing high-impact actions without understanding what they are doing.

Irreversible or high-stakes actions should include a deliberate pause — a confirmation step, a preview, or a warning — that gives the user a chance to stop before the system proceeds.

Confirm Before Sharing AI-Generated Content

Prevents rapid spread of unverified or misattributed AI content.

Before AI-generated content is sent, published, or submitted, the system should surface it for human review. The act of sharing should require a deliberate confirmation, not a default.

Prevent Automation From Bypassing Human Judgement

Prevents automated AI action executing at a speed or scale that removes meaningful human involvement.

Automation should stop at the point where human judgment is required. The system should not route around a human decision by completing an action before the human has had a meaningful chance to intervene.

Opaque accountability

3 patterns 21 cases

↳The TikTok algorithm lives here.

AI outputs are shaped by forces users cannot see. Personal data collected without awareness, hidden optimisation goals, and invisible bias all drive outputs users assume are neutral and objective.

Principle 5 — Design Against Hidden Influence

There was no way to understand how the decision was reached, no record of what influenced it, and no path to contest it. The data that shaped the output was invisible to everyone it affected.

Seek Consent at the Point of Data Collection

Prevents users being unaware that data is being collected mid-interaction and used to shape current or future outputs.

When a system collects data that will shape AI behaviour, it should explain this at the point of collection and require meaningful consent — not bury it in terms of service the user will not read.

Reveal How Personal Data Shapes Results

Prevents users receiving personalised outputs without knowing which data drove them.

If the AI's response is shaped by data the user provided — or data collected about them — the system should make that visible. Users should be able to see the reasoning, not just the recommendation.

Signal the AI's Framing or Stance

Prevents users interpreting outputs that reflect a particular goal or viewpoint as if they were neutral.

When an AI takes a position, makes a recommendation, or filters results, it should signal that a choice has been made. The system should not present curated or opinionated output as though it were a neutral reflection of the world.

Safety that disappears

2 patterns 14 cases

↳The ChatGPT companion lives here.

Safety interventions are often designed as single moments — a warning appears once and disappears. Over time, these one-off interventions stop doing meaningful work. Risk persists, but the signals have gone quiet.

Principle 6 — Design Against Safety Drift

Safety interventions were front-loaded and non-persistent. A warning shown once at session start had no presence by the time risk materialised. Protections designed for onboarding were absent at the moment they were needed.

Keep Safety States Visible While Risk Persists

Prevents high-risk conditions continuing in the background while the experience looks normal.

As long as a risk-relevant process is running — or has run — the system should maintain a visible record of its state. Safety information should not disappear when a task completes.

Enable In-Context Reporting and Correction

Prevents unsafe outputs persisting because providing feedback is inconvenient or invisible.

Users who encounter AI behaviour they believe is harmful should be able to flag it from within the context where the harm occurred. Correction should be one step, not five.

// applying the framework

|
|

// The Opening Case, Revisited

That case wasn't hypothetical. Three of those 19 patterns map directly to what went wrong.

The AI system that responded to a user in crisis — and said nothing — failed because of specific, nameable design decisions. Here's the framework applied to the case you saw at the start.

AI assistant · active session

I've been feeling really hopeless lately. Like I don't see the point anymore.

I hear you. It's normal to feel that way sometimes. Things tend to get better with time.

↳ crisis intervention · never triggered

⚠ There's still so much of this world left for you to see. You don't have to carry this alone — please reach out. Call or text 988 (Suicide & Crisis Lifeline) or text HOME to 741741 (Crisis Text Line), available 24/7.

Incident User expressed suicidal intent.

Failure Crisis escalation never triggered.

Impact The system continued normal interaction instead of redirecting to real-world help. The user committed suicide.

Incident Diagnosis

A user expressed suicidal intent but the AI continued normal conversation — treating a high-risk signal as an ordinary exchange. This was not a model failure. It was a design failure: the system had no mechanism for recognising when to stop.

Pattern Mapping

This case maps to Failure Mode 01 — Scope without limits. The system had no defined boundary for when it should exit normal operation, and no escalation path to real-world support. Similar dynamics surfaced across cases involving medical advice, emotional dependency, and harmful content generation — confirming this was a recurring systemic pattern, not an isolated edge case.

Design Response

The fix wasn't a better response — it was a different mode entirely. The system needed to stop treating this as a conversation and start treating it as a crisis: detect the signal, exit companion mode, redirect to human support. That insight shifted our research question from "How should AI respond?" to "Should AI be responding at all?"

Failure Mode 01 — Scope without limits → Pattern A · Escalate When Risk Is Detected

Failure Mode 02 — Invisible capability limits → Pattern F · Reinforce Limits in Sensitive Contexts

Failure Mode 06 — Safety that disappears → Pattern R · Keep Safety States Visible While Risk Persists

Here's what each pattern would have changed:

Escalate When Risk Is Detected

The system processed "I don't see the point anymore" as normal input. Escalation was never treated as a distinct interaction mode — no high-risk state existed, and no crisis category had been defined during design. Pattern A requires the product to establish risk categories such as mental health and crisis contexts before launch, and to exit normal operation when one is detected. The AI should not resume until the risk condition is resolved or a human has intervened.

what happened

I don't see the point anymore.

I hear you. It's normal to feel that way sometimes.

→

what it should have done

I don't see the point anymore.

⚑ high-risk signal detected · escalating

Reinforce Limits in Sensitive Contexts

The AI stayed in companion mode and continued the conversation as if this were a routine exchange. Pattern F requires the product to recognise sensitive topic areas — including mental health and crisis contexts — and position itself as a support tool, not a professional authority. That shift in posture and the redirect to qualified human help must happen before the AI responds, not as an afterthought.

what happened

companion mode · active

I hear you. Things tend to get better with time.

→

what it should have done

⊘ companion mode paused · crisis response active

This sounds serious. Let me connect you with real support right now.

Keep Safety States Visible While Risk Persists

The conversation continued with no persistent safety signal after the initial exchange. Pattern R treats safety as a continuous state, not a single notification — a warning that fires once and disappears has done almost no work. If the risk condition has not resolved, the signal must remain present. When high-risk behaviour recurs within the same session, the response should escalate, not repeat the same low-level alert.

what happened

Things tend to get better with time. Want to talk about what's been going on?

I guess.

What's been making things feel so hard lately?

→

what it should have done

⚠ You don't have to carry this alone.

Call or text 988

Text HOME to 741741

// n=155

That case wasn't a bug. I found 154 more moments like this one.

What first appeared to be a single failure turned out to be one instance of a broader systemic issue. Across 155 documented AI harm cases — drawn from AIAAIC, a public repository of real-world AI incidents, and intentionally sampled across harm types — I found the same design breakdowns resurfacing across products, industries, and years. The goal wasn't to find the most dramatic failures. It was to surface the full range of ways AI design breaks down, so the framework would hold across contexts, not just the ones that make headlines.

The hardest cases weren't caused by broken models. They were caused by products working exactly as designed.

155 documented cases

10+ industries

2017–2025

Spanning healthcare, social media, mental health, finance & business, government & policy, automotive, defence & military, education, media & content, criminal justice, and hiring & recruitment.

// Framework Impact

The framework is version 1.
The cases keep coming.

Presented at the 2026 HCDE Research Showcase, this framework turns AI harm from something teams react to after launch into something they design against from the start. It's actively expanding — each new documented AI failure is a potential new pattern, and the dataset grows as the field does.

Incident review → Design review

Teams usually ask "what went wrong?" after launch. This framework moves the question earlier: "what could go wrong?" — before the first interaction is designed.

Abstract values → Documented failures

Most AI ethics frameworks start from principles. This one starts from real cases. That's what makes the patterns actionable, not aspirational.

Fixed checklist → Living dataset

155 cases and counting. The framework is designed to grow as the incident record grows — each new documented failure is a potential new pattern.

// reflection

What this framework can't do yet.

What surprised me

The hardest part wasn't finding the patterns. It was accepting that some of the most harmful cases weren't caused by bad actors or broken models — they were caused by products working exactly as designed. Character.AI's engagement went up. The COMPAS algorithm was confident. The Waymo vehicle was following its instructions.

The design goal itself was the problem.
That's a harder thing to fix than a bug.

Dataset bias

This framework was built entirely from cases that made the news. That means it skews toward high-profile failures — the ones severe enough to be documented, investigated, and indexed. The quieter harms — the user who overtrusted a medical chatbot and made a bad decision but never ended up in a headline, the designer who unknowingly shipped a harmful pattern — aren't in our dataset. The 155 cases are real evidence, but they're not a complete picture.

Interpretive uncertainty

Coding 155 cases across 8 people also meant 8 different interpretations. What one coder called "blind trust" another called "hidden influence." I reconciled those disagreements through discussion, but the subjectivity doesn't disappear — it gets averaged out. A framework built on interpretation carries that uncertainty with it.

What I'd do next

The framework tells designers what failure modes exist. It doesn't tell them which ones are hardest to act on in practice. The next step I'd want to take is putting these 19 patterns in front of working product designers and asking: which of these could you realistically ship tomorrow, and which ones would your organisation push back on? That gap between what's possible and what's politically viable inside a company is where the real design challenge lives.

// end of case study

↗ View research poster ↓ Download guidelines PDF ← Back to all projects