In June 2023, a New York federal court fined two lawyers $5,000 for submitting a legal brief containing six completely fabricated case citations generated by ChatGPT. The case — Mata v. Avianca — became the cautionary tale that introduced the legal profession to AI hallucinations.
Three years later, the problem has not been solved. It has scaled.
As of April 2026, French researcher Damien Charlotin's AI Hallucination Cases Database tracks these numbers — and the most recent high-profile incident did not involve a solo practitioner using ChatGPT for the first time. It involved Sullivan & Cromwell, one of the most prestigious law firms in the world.
The timeline
Sullivan & Cromwell. Not a solo practitioner. Not a small firm. A firm with 1,000+ lawyers, a dedicated AI policy, and internal review processes. The safeguards existed. They did not catch it.
Why this keeps happening
The legal profession's response to Mata v. Avianca was policies: require attorneys to disclose AI use, mandate verification of citations, add AI-specific provisions to local court rules. As of early 2026, hundreds of courts have adopted AI disclosure requirements.
Policies have not solved the problem because the failure is not procedural — it is structural. Here is why:
Hallucinated citations look real. They follow the correct format. They reference real courts. They cite plausible docket numbers. They include convincing quotations that sound like they came from a judicial opinion. A human reviewer who is not specifically checking each citation against a legal database will often accept them at face value — which is exactly what citation is supposed to enable. The whole point of citing authority is so the reader doesn't have to re-derive the conclusion.
Verification is expensive and manual. Checking a citation means opening Westlaw or LexisNexis, searching for the case, reading the relevant passages, and confirming that the cited material says what the brief claims it says. For a motion with 30 citations, this is hours of paralegal time. The economic pressure to skip verification — or spot-check a few citations rather than all of them — is enormous, especially under deadline pressure.
The failure is invisible until it's caught. A brief with fabricated citations passes every quality check that does not specifically involve verifying citations against primary sources. Formatting is correct. Arguments are coherent. The writing is professional. Spelling and grammar are clean. The document looks like competent legal work. The only way to find the problem is to check every citation, one by one.
The problem compounds in multi-agent systems. When a legal AI pipeline uses one agent to research case law, another to draft arguments, and a third to format and cite-check the brief, attribution errors accumulate at each handoff. Agent A retrieves real cases. Agent B incorporates them into an argument but introduces subtle mischaracterizations. Agent C generates a formatted citation that looks correct but references the wrong section or misquotes the holding. Each agent's output is locally reasonable. The error is emergent.
The numbers
The scale of the problem is no longer anecdotal:
- 1,348 documented cases of AI-hallucinated citations worldwide as of April 2026, per the Charlotin database
- 915 in U.S. courts alone
- At least 8 appellate and trial rulings have imposed fines, bar referrals, or suspensions since Mata v. Avianca
- 17 court decisions in a single day (March 2026) noting suspected AI hallucinations
- Sanctions ranging from $5,000 fines to case dismissals and referrals to bar disciplinary committees
And these are only the cases that were caught. The base rate of undetected hallucinated citations in AI-assisted legal work is unknown — but given the difficulty of detection and the economic incentives to skip verification, it is certainly higher than what the documented cases suggest.
Beyond the courtroom
Legal citations are the canary. The same failure mode — AI-generated references to sources that don't exist or don't say what they're claimed to say — appears in every domain where agents produce cited output:
Clinical summaries that cite guidelines or studies that don't exist. The consequence is not a fine — it is a treatment decision based on fabricated evidence.
Investment memos that cite SEC filings or analyst reports that have been misquoted or don't exist. Capital allocation based on phantom data.
Due diligence memos citing contract sections in the data room that don't exist or say something different from what's claimed. Deals closed on fabricated assurances.
Audit reports citing regulatory provisions or internal policies incorrectly. Compliance posture built on AI-generated fiction.
In every case, the failure has the same structure: the AI produces a specific, authoritative-sounding reference that gives the human reader false confidence. The reference looks like verification. It is fabrication.
What automated investigation looks like
The Sullivan & Cromwell incident is especially instructive because the firm had safeguards. They had an AI policy. They had review processes. The policies were not followed for one document, and 42 errors reached the court.
Human policies fail under time pressure. Automated investigation does not.
Investigation: citation-verification / brief-2026-04-18
Document: S&C Emergency Motion — Prince Global Holdings
Citations checked: 34
Verified against primary source: 34
Findings:
[ERROR] 3 citations reference non-existent cases
→ Case "In re GlobalTrust Holdings" not found in
Westlaw, LexisNexis, or PACER
→ Case "Rivera v. Consolidated Partners" — docket
number does not exist in cited jurisdiction
[ERROR] 12 citations misquote Bankruptcy Code sections
→ §362(a) quoted as: "automatic stay applies to all
proceedings against the debtor"
→ Actual text includes 8 enumerated exceptions not
mentioned in brief
[CONCERN] 27 citations technically exist but 8 contain
paraphrased holdings that do not match source text
→ Semantic similarity score: 0.41 (threshold: 0.70)
Total: 3 fabricated, 12 misquoted, 8 mischaracterized
Recommendation: BLOCK filing. Return to drafting attorney.
This runs before the document leaves the firm. Every citation checked against the primary source. Every quoted passage compared to the actual text. Every holding verified. Not as a policy that someone might skip under deadline pressure — as an automated step in the pipeline that cannot be bypassed.
The implication for agent workflows
Legal AI is where the citation problem is most visible because courts have the power to sanction and the incentive to investigate. But the problem exists everywhere agents produce cited output. The failure mode is identical:
- Agent retrieves information from a source
- Agent synthesizes the information into a new context
- Agent generates a citation linking the synthesized claim to the original source
- The citation is fabricated, misattributed, or mischaracterizes the source
- The human reader treats the citation as verification and acts on the claim
Traces capture steps 1 through 3 as successful tool calls and model completions. Step 4 is invisible to monitoring. Step 5 is where the damage happens.
Investigation catches step 4 by walking the citation chain backward from claim to source and comparing content at each step. This is what Galea does — not as a linter or a post-hoc check, but as an investigation layer that runs on every trace and produces findings scoped to what the customer cares about.
For legal teams: citation accuracy is not a model quality problem. It is a verification problem that compounds in multi-agent pipelines. If your AI workflow produces cited output that humans rely on, you need an investigation layer — not just a disclosure policy.
For everyone else: legal is where the problem is documented because courts keep records. But the same failure is happening in your M&A memos, your clinical summaries, your compliance reports, and your financial analyses. The question is whether you'll find out from an investigation system or from the consequences.