← Back to blog

From Mata v. Avianca to Sullivan & Cromwell: The Hallucinated Citation Crisis

In June 2023, a New York federal court fined two lawyers $5,000 for submitting a legal brief containing six completely fabricated case citations generated by ChatGPT. The case — Mata v. Avianca — became the cautionary tale that introduced the legal profession to AI hallucinations.

Three years later, the problem has not been solved. It has scaled.

1,348
Documented cases worldwide
915
In U.S. courts
17
Court decisions in one day (March 2026)

As of April 2026, French researcher Damien Charlotin's AI Hallucination Cases Database tracks these numbers — and the most recent high-profile incident did not involve a solo practitioner using ChatGPT for the first time. It involved Sullivan & Cromwell, one of the most prestigious law firms in the world.

The timeline

June 2023
Mata v. Avianca — $5,000 fine
Attorney Steven Schwartz submitted 6 fabricated ChatGPT case citations. Asked ChatGPT to verify — it confirmed they existed. They didn't. Judge Castel imposed Rule 11 sanctions.
2025
ByoPlanet — 8 matters, substantial sanctions
Attorney James Martin Paul used hallucinated citations across 8 different legal matters. Court found bad faith. 4 federal cases dismissed.
2025
Morgan & Morgan — 8 fake citations
Three attorneys at Morgan & Morgan sanctioned for filing motion with 8 AI-generated non-existent case citations.
2025
California attorney — 21 of 23 quotes fabricated
State court appeal where 21 of 23 cited quotes were completely fabricated by ChatGPT. $10,000 fine.
October 2025
Deloitte Australia — AUD 440,000 refund
237-page government report contained fake academics, fabricated references, invented judicial quotes. Produced using Azure GPT-4o. Full refund issued.
November 2025
Deloitte Canada — second time
CAD 1.6M healthcare report contained 4+ fake citations, linking real researchers to fictional papers. Big Four firm caught twice in two months.
July 2025
Johnson v. Dunn — no sanctions, but noted
Large law firm submitted hallucinated citations. Court declined sanctions due to firm's "responsible attitude" but flagged citations as fabricated.
April 2026
Sullivan & Cromwell — 42 inaccuracies
Emergency apology to Chief Bankruptcy Judge Glenn. Prince Global Holdings Chapter 15 motion with fabricated citations and misquoted statutes. Boies Schiller Flexner flagged it.

Sullivan & Cromwell. Not a solo practitioner. Not a small firm. A firm with 1,000+ lawyers, a dedicated AI policy, and internal review processes. The safeguards existed. They did not catch it.

Why this keeps happening

The legal profession's response to Mata v. Avianca was policies: require attorneys to disclose AI use, mandate verification of citations, add AI-specific provisions to local court rules. As of early 2026, hundreds of courts have adopted AI disclosure requirements.

Policies have not solved the problem because the failure is not procedural — it is structural. Here is why:

Hallucinated citations look real. They follow the correct format. They reference real courts. They cite plausible docket numbers. They include convincing quotations that sound like they came from a judicial opinion. A human reviewer who is not specifically checking each citation against a legal database will often accept them at face value — which is exactly what citation is supposed to enable. The whole point of citing authority is so the reader doesn't have to re-derive the conclusion.

Verification is expensive and manual. Checking a citation means opening Westlaw or LexisNexis, searching for the case, reading the relevant passages, and confirming that the cited material says what the brief claims it says. For a motion with 30 citations, this is hours of paralegal time. The economic pressure to skip verification — or spot-check a few citations rather than all of them — is enormous, especially under deadline pressure.

The failure is invisible until it's caught. A brief with fabricated citations passes every quality check that does not specifically involve verifying citations against primary sources. Formatting is correct. Arguments are coherent. The writing is professional. Spelling and grammar are clean. The document looks like competent legal work. The only way to find the problem is to check every citation, one by one.

The problem compounds in multi-agent systems. When a legal AI pipeline uses one agent to research case law, another to draft arguments, and a third to format and cite-check the brief, attribution errors accumulate at each handoff. Agent A retrieves real cases. Agent B incorporates them into an argument but introduces subtle mischaracterizations. Agent C generates a formatted citation that looks correct but references the wrong section or misquotes the holding. Each agent's output is locally reasonable. The error is emergent.

The numbers

The scale of the problem is no longer anecdotal:

  • 1,348 documented cases of AI-hallucinated citations worldwide as of April 2026, per the Charlotin database
  • 915 in U.S. courts alone
  • At least 8 appellate and trial rulings have imposed fines, bar referrals, or suspensions since Mata v. Avianca
  • 17 court decisions in a single day (March 2026) noting suspected AI hallucinations
  • Sanctions ranging from $5,000 fines to case dismissals and referrals to bar disciplinary committees

And these are only the cases that were caught. The base rate of undetected hallucinated citations in AI-assisted legal work is unknown — but given the difficulty of detection and the economic incentives to skip verification, it is certainly higher than what the documented cases suggest.

Beyond the courtroom

Legal citations are the canary. The same failure mode — AI-generated references to sources that don't exist or don't say what they're claimed to say — appears in every domain where agents produce cited output:

Medical AI

Clinical summaries that cite guidelines or studies that don't exist. The consequence is not a fine — it is a treatment decision based on fabricated evidence.

Financial analysis

Investment memos that cite SEC filings or analyst reports that have been misquoted or don't exist. Capital allocation based on phantom data.

M&A diligence

Due diligence memos citing contract sections in the data room that don't exist or say something different from what's claimed. Deals closed on fabricated assurances.

Compliance

Audit reports citing regulatory provisions or internal policies incorrectly. Compliance posture built on AI-generated fiction.

In every case, the failure has the same structure: the AI produces a specific, authoritative-sounding reference that gives the human reader false confidence. The reference looks like verification. It is fabrication.

What automated investigation looks like

The Sullivan & Cromwell incident is especially instructive because the firm had safeguards. They had an AI policy. They had review processes. The policies were not followed for one document, and 42 errors reached the court.

Human policies fail under time pressure. Automated investigation does not.

Investigation: citation-verification / brief-2026-04-18
Document: S&C Emergency Motion — Prince Global Holdings
Citations checked: 34
Verified against primary source: 34

Findings:
  [ERROR] 3 citations reference non-existent cases
    → Case "In re GlobalTrust Holdings" not found in
      Westlaw, LexisNexis, or PACER
    → Case "Rivera v. Consolidated Partners" — docket
      number does not exist in cited jurisdiction

  [ERROR] 12 citations misquote Bankruptcy Code sections
    → §362(a) quoted as: "automatic stay applies to all
      proceedings against the debtor"
    → Actual text includes 8 enumerated exceptions not
      mentioned in brief

  [CONCERN] 27 citations technically exist but 8 contain
    paraphrased holdings that do not match source text
    → Semantic similarity score: 0.41 (threshold: 0.70)

Total: 3 fabricated, 12 misquoted, 8 mischaracterized
Recommendation: BLOCK filing. Return to drafting attorney.

This runs before the document leaves the firm. Every citation checked against the primary source. Every quoted passage compared to the actual text. Every holding verified. Not as a policy that someone might skip under deadline pressure — as an automated step in the pipeline that cannot be bypassed.

The implication for agent workflows

Legal AI is where the citation problem is most visible because courts have the power to sanction and the incentive to investigate. But the problem exists everywhere agents produce cited output. The failure mode is identical:

  1. Agent retrieves information from a source
  2. Agent synthesizes the information into a new context
  3. Agent generates a citation linking the synthesized claim to the original source
  4. The citation is fabricated, misattributed, or mischaracterizes the source
  5. The human reader treats the citation as verification and acts on the claim

Traces capture steps 1 through 3 as successful tool calls and model completions. Step 4 is invisible to monitoring. Step 5 is where the damage happens.

Investigation catches step 4 by walking the citation chain backward from claim to source and comparing content at each step. This is what Galea does — not as a linter or a post-hoc check, but as an investigation layer that runs on every trace and produces findings scoped to what the customer cares about.


For legal teams: citation accuracy is not a model quality problem. It is a verification problem that compounds in multi-agent pipelines. If your AI workflow produces cited output that humans rely on, you need an investigation layer — not just a disclosure policy.

For everyone else: legal is where the problem is documented because courts keep records. But the same failure is happening in your M&A memos, your clinical summaries, your compliance reports, and your financial analyses. The question is whether you'll find out from an investigation system or from the consequences.

[email protected]