Designing Against AI Harm
This is a design framework for AI product teams. Teams can analyze harmful AI patterns, identify risk earlier in development, and apply preventative design principles before systems reach users.
More Project Details
- Led case coding and dataset structuring
- Identified and synthesized recurring harmful patterns
- Translated research into actionable design framework
A reusable design framework derived from 155 real-world AI failures.
Each failure mode captures a recurring category of design breakdown. Within each, patterns identify the specific product decisions and missing safeguards that repeatedly contributed to harm.
Built for every corner of your product team.
The framework is designed to slot into existing workflows — from early concepting to post-launch review — wherever product decisions about AI are made.
Designers
Use the framework during concepting and design reviews to identify which failure modes a feature may introduce, then pressure-test interaction patterns, safeguards, and escalation flows before release.
Researchers
Apply the taxonomy to audit existing AI experiences, structure evaluative critiques, or build on the coded dataset to validate, extend, and refine emerging patterns.
Developers / Engineers
Reference the framework when implementing AI behaviors, system constraints, and escalation logic to ensure technical execution aligns with intended safeguards.
Product /
Business Leads
Use the framework to facilitate risk discussions, prioritize mitigation tradeoffs, and align teams on where product goals may conflict with safety boundaries.
Most existing resources
Starting point: literature & abstract values
"How do we design for good?"
Our guidelines
Starting point: documented failures
"How do we design against harm?"
How I turned real-world AI failures into a reusable design framework.
Messy Cases: Pattern Finding.
155 incidents, no structure imposed. Just what the system did, and who it hurt.
Before structuring a single case, we mapped the existing landscape. A review of AI design guidelines from Microsoft, IBM, and others revealed a consistent gap: most frameworks start from abstract values — what good AI should look like — rather than from documented evidence of what harmful AI actually does. To fill that gap, we turned to AIAAIC and read through 155 real-world harm incidents directly. Real-world AI failures are messy, inconsistent, and often reported with incomplete context, so I helped structure each case into a standardized dataset spanning 11 dimensions — from technical capability to harm severity to who initiated the event. This transformed unstructured documentation into analyzable research data and gave us a consistent foundation for pattern finding.
Defining What Counts as a Pattern
I didn't treat every repeated issue as meaningful. A pattern had to meet all three criteria before entering the framework.
Before identifying patterns, we had to agree on what counted as a failure. Each case was examined through three lenses: who initiated the harm (AI or human), how directly the AI was involved, and whether the harm was intentional. This framing helped us distinguish between AI systems that caused harm on their own versus AI that enabled human actors — and between failures that were designed in versus ones that emerged unexpectedly at runtime.
Repetition alone wasn't enough. To avoid turning every recurring issue into noise, we established clear criteria for what qualified as a true design pattern.
Recurring
Appeared across multiple incidents, products, or contexts, not just one isolated edge case.
Design-Rooted
Could be traced back to a product, interaction, or system design decision, not solely model performance.
Preventable
Could be mitigated through intentional design changes, constraints, or workflow decisions.
Identifying patterns wasn't a solo exercise. Using affinity mapping and thematic coding, each team member independently extracted recurring signals from the 155 cases — manually grouping incidents by the design decisions that contributed to each harm, not by the harm type itself. We then came together as a group to compare our groupings, resolve disagreements, and consolidate overlapping clusters into the patterns that made it into the final framework.
Grouping by Cause, Not Outcome
The most meaningful patterns didn't emerge when I grouped cases by what happened. They emerged when I grouped them by why they happened.
The team brought together designers and researchers at varying experience levels — which meant synthesis wasn't just analytical, it was collaborative. As one of the more experienced members, I helped walk the group through the research process and align on shared definitions before we combined our individual findings.
One of the earliest and most consequential decisions was how to define intention and initiator — whether a harm was deliberately caused, and whether the AI or a human set it in motion. There's no objectively correct answer for every case. But leaving those calls to individual judgment across 8 people would have produced inconsistent coding that reflected different interpretations, not different data. I pushed for explicit shared definitions precisely because the ambiguity was real — clearer guidelines don't eliminate grey areas, they make the grey areas visible and discussable.
The same instinct drove the reframe in how we grouped cases. My first pass clustered incidents by harm type — privacy, misinformation, physical harm. While intuitive, those clusters stayed surface-level and failed to explain why the same harms kept recurring across completely different products. I pushed to shift the lens toward contributing design decisions instead — the product choices, missing safeguards, and interaction patterns that enabled the harm. That reframing surfaced deeper, reusable patterns like overtrust, hidden consent, and missing escalation.
Principles
Translating root causes into principles for safer AI product design.
Final Synthesis: A reusable design framework derived from 155 real-world AI failures. Each failure mode captures a recurring category of design breakdown. Within each, patterns identify the specific product decisions or missing safeguards that repeatedly contributed to harm.
6 failure modes. 19 patterns.
The Framework
Expand each failure mode to view the patterns and design guidance within.
Scope without limits
↳The Xiaomi autopilot lives here.
AI can act beyond its safe scope without oversight. When no boundaries exist, users may face safety, legal, or ethical harm — often without realising the AI was never authorised to act. Some actions may already be irreversible.
Invisible capability limits
↳The Abbott glucose sensor lives here.
Users overtrust AI because the product's limits and confidence levels are hidden. When nothing signals unreliability, users treat outputs as authoritative and make harmful decisions as a result.
Output without origin
↳The Ukrainian drone strike lives here.
AI-generated outputs can appear indistinguishable from human-created material. When labels are absent or lost during export, machine-generated content gets treated as authentic — spreading misinformation, enabling impersonation, or creating legal harm.
No pause condition
↳The Omnilert gun detector lives here.
Fast interactions can allow harmful actions before consequences are considered. Single-click execution and absent stopping points increase misuse. By the time a user reviews what happened, some actions may already be irreversible.
Opaque accountability
↳The TikTok algorithm lives here.
AI outputs are shaped by forces users cannot see. Personal data collected without awareness, hidden optimisation goals, and invisible bias all drive outputs users assume are neutral and objective.
Safety that disappears
↳The ChatGPT companion lives here.
Safety interventions are often designed as single moments — a warning appears once and disappears. Over time, these one-off interventions stop doing meaningful work. Risk persists, but the signals have gone quiet.
|
That case wasn't hypothetical. Three of those 19 patterns map directly to what went wrong.
The AI system that responded to a user in crisis — and said nothing — failed because of specific, nameable design decisions. Here's the framework applied to the case you saw at the start.
Incident Diagnosis
A user expressed suicidal intent but the AI continued normal conversation — treating a high-risk signal as an ordinary exchange. This was not a model failure. It was a design failure: the system had no mechanism for recognising when to stop.
Pattern Mapping
This case maps to Failure Mode 01 — Scope without limits. The system had no defined boundary for when it should exit normal operation, and no escalation path to real-world support. Similar dynamics surfaced across cases involving medical advice, emotional dependency, and harmful content generation — confirming this was a recurring systemic pattern, not an isolated edge case.
Design Response
The fix wasn't a better response — it was a different mode entirely. The system needed to stop treating this as a conversation and start treating it as a crisis: detect the signal, exit companion mode, redirect to human support. That insight shifted our research question from "How should AI respond?" to "Should AI be responding at all?"
Here's what each pattern would have changed:
That case wasn't a bug. I found 154 more moments like this one.
What first appeared to be a single failure turned out to be one instance of a broader systemic issue. Across 155 documented AI harm cases — drawn from AIAAIC, a public repository of real-world AI incidents, and intentionally sampled across harm types — I found the same design breakdowns resurfacing across products, industries, and years. The goal wasn't to find the most dramatic failures. It was to surface the full range of ways AI design breaks down, so the framework would hold across contexts, not just the ones that make headlines.
The hardest cases weren't caused by broken models. They were caused by products working exactly as designed.
Spanning healthcare, social media, mental health, finance & business, government & policy, automotive, defence & military, education, media & content, criminal justice, and hiring & recruitment.
The framework is version 1.
The cases keep coming.
Presented at the 2026 HCDE Research Showcase, this framework turns AI harm from something teams react to after launch into something they design against from the start. It's actively expanding — each new documented AI failure is a potential new pattern, and the dataset grows as the field does.
Teams usually ask "what went wrong?" after launch. This framework moves the question earlier: "what could go wrong?" — before the first interaction is designed.
Most AI ethics frameworks start from principles. This one starts from real cases. That's what makes the patterns actionable, not aspirational.
155 cases and counting. The framework is designed to grow as the incident record grows — each new documented failure is a potential new pattern.
What this framework can't do yet.
What surprised me
The hardest part wasn't finding the patterns. It was accepting that some of the most harmful cases weren't caused by bad actors or broken models — they were caused by products working exactly as designed. Character.AI's engagement went up. The COMPAS algorithm was confident. The Waymo vehicle was following its instructions.
The design goal itself was the problem.
That's a harder thing to fix than a bug.
Dataset bias
This framework was built entirely from cases that made the news. That means it skews toward high-profile failures — the ones severe enough to be documented, investigated, and indexed. The quieter harms — the user who overtrusted a medical chatbot and made a bad decision but never ended up in a headline, the designer who unknowingly shipped a harmful pattern — aren't in our dataset. The 155 cases are real evidence, but they're not a complete picture.
Interpretive uncertainty
Coding 155 cases across 8 people also meant 8 different interpretations. What one coder called "blind trust" another called "hidden influence." I reconciled those disagreements through discussion, but the subjectivity doesn't disappear — it gets averaged out. A framework built on interpretation carries that uncertainty with it.
What I'd do next
The framework tells designers what failure modes exist. It doesn't tell them which ones are hardest to act on in practice. The next step I'd want to take is putting these 19 patterns in front of working product designers and asking: which of these could you realistically ship tomorrow, and which ones would your organisation push back on? That gap between what's possible and what's politically viable inside a company is where the real design challenge lives.