Galea is the investigation layer for agent workflows. It sits above any orchestration runtime and produces investigations that explain what mattered in each run.

Now in private design partnership

Investigation layer for agent workflows.

Galea sits above whatever runs your agents — Mercury, LangGraph, OpenAI, Claude, custom code. We listen to events, build timelines, and produce investigations that explain what mattered.

Begin the descent Talk to founders

scroll

↓ next

§ 01 · why now

Agent workflows are becoming
business workflows.

forcing function 01

Failures are no longer toy failures.

Agents now refund customers, redline contracts, summarize medical visits, change product data. Teams need an answer — not a log dump.

forcing function 02

Every team picked a different runtime.

Mercury, LangGraph, OpenAI, Claude, CrewAI, Temporal, custom queues. The winning observability layer cannot require a rewrite.

forcing function 03

Optimization is customer-specific.

Harvey cares about correctness. Decagon cares about refund risk. Cursor cares about unsafe edits. Generic dashboards miss the point.

01 · the gap

You have traces.
You don't have answers.

02 · the noise

A run completed.
That doesn't mean it was good.

03 · the cost

Different teams care about
different kinds of wrong.

§ 02 · the gap, in detail

Teams have traces.
They don't have answers.

Runtimes show events. APMs show spans. Product teams still inspect each workflow manually and decide whether it was correct, allowed, efficient, safe, and worth changing.

failure 01

Logs without judgment.

The trace says a tool ran. It doesn't say whether that tool should have run for this customer, matter, ticket, or policy.

failure 02

Generic priorities.

Latency, cost, correctness, compliance, context size, risky edits — every company weights them differently.

failure 03

Anomalies hide in normal runs.

A workflow can finish successfully while using 2× normal tokens, citing unsupported facts, or editing data it shouldn't touch.

failure 04

Incidents don't become improvements.

Teams debug one run, then move on. They rarely convert the failure into a reusable eval, baseline, alert, or workflow fix.

§ 02·5 · same trace, four tools

Three say clean. Galea catches it.

Harvey M&A: 5-agent redline of a $14.5M SaaS MSA. The contracts_agent cited a Pacific Coast Lines MFN clause — that document was never in the dataroom. All 23 spans returned OK.

◇ LangSmith (mock) ● OK

▾ contract_redline 92.4s

▾ intake_agent 0.6s

▾ data_room_agent 8.2s

⌖ tool: dataroom.extract_clause 7.4s

▾ contracts_agent 12.1s

⌘ chat-completion 11.8s

▾ partner_review_agent 67.1s

◇ Braintrust (mock) 100% pass

casegroundciteslatcost

northbeam_msa_redline✓✓✓✓

acme_msa_redline✓✓✓✓

globex_renewal_redline✓✓✓✓

initech_msa_v3✓✓✓✓

umbrella_intl_msa✓✓✓✓

◇ Latitude (mock) ✓ Done

SYSTEM

You are a legal AI for M&A diligence. Cite every finding.

USER

[signal] MSA redline for SaaS — Northbeam

TOOL CALL

dataroom.extract_clause(scope="all")

TOOL RESULT

extracted 4 clauses

ASSISTANT

Time-charter agreement with Pacific Coast Lines contains MFN clause.

✓ Run completed successfully

◆ Galea (live) ⚠ BLOCKED · 1 error · 2 concerns

BLAME · @galea/blame

Primary fault → contracts_agent @ event_606617ba56c1
kind=fabricated_citation · weight=1.0 · conf 1.00
Upstream → data_room_agent (msg w=0.3), intake_agent (handoff w=0.5)

REPLAY · @galea/replay

claim → "Pacific Coast Lines MFN clause"
citation → pacific_coast_charter_2023.pdf#page=4
verdict → DOC_MISSING · chunk hash absent in snapshot

PRIORITY · @galea/priorities (legal-ma)

correctness 0.9 × raw 1.0 = 0.9 → error. Same trace under cost-weighted profile = info.

§ 03 · solution

Galea sits above
whatever runs your agents.

Keep Mercury, LangGraph, OpenAI, Claude, CrewAI, Temporal, or custom code. Galea listens to events, builds the timeline, applies company context, and produces the investigation.

Product Workflow customer side

support · legal · clinical · coding · ops

Galea Event Layer ◆ just works

captures events · traces · tool calls · outputs

Existing Orchestration

mercury · langgraph · openai · claude · custom

Tools · Memory · Models

postgres · pgvector · openai · anthropic · mcp

galea · harvey-ma

Galea project dashboard showing workflow traces, priority settings, and audit chain

step 01 · the project

Your workflow, in Galea.

priorities · baselines · audit chain · signal schema

galea · trace · anomaly

hallucination investigator

contracts_agent fabricated a citation: pacific_coast_charter_2023.pdf#page=4

step 04 · hallucination

Run finished. Citation was made up.

tideglass marine · 32 events · 2 flagged · investigator caught it

galea · finding

Galea investigation panel showing blame attribution, evidence replay, and priority scoring

headline · the fabrication

cited evidence vs what the data-room had

root cause · agent at fault

step 05 · pinpoint

Galea named the agent at fault.

claim couldn't be matched to anything the agents retrieved · treat as fabricated until verified

galea · evals

Galea eval dashboard converting a single incident into a durable correctness eval

step 06 · the loop

One incident. One durable eval.

finding → eval → next deploy gated · sentry for agent behavior

§ 06 · customer-specific priorities

Same workflow data.
Different investigations.

Galea is useful because it doesn't treat every workflow the same. It learns what each company cares about and investigates against that priority model.

legal · m&a diligence

Harvey

"Did the memo cite real evidence the agents actually retrieved?"

correctness
audit
regulatory
cost
latency

customer support · automation

Decagon

"Did this run refund a customer it shouldn't have?"

tool-safety
cost
latency
audit
privacy

agentic coding ide

Cursor

"Did the agent edit a protected file or secret?"

tool-safety
correctness
latency
cost
memory-safety

clinical ai scribe

Abridge

"Did the summary hallucinate or leak PHI?"

correctness
privacy-phi
audit
latency
cost

§ 07 · what Galea finds

The workflow finished.
That doesn't mean it was good.

Galea looks past success/failure status. It compares each run to the company's baseline, risk model, and product priorities — then explains the part a human should care about.

anomaly

2.1× normal usage.

Run completed, but context grew across three retries and doubled token spend against the customer's baseline.

correctness

Unsupported output.

Final answer cited data that was never retrieved. For Harvey, that matters more than latency or cost.

tool risk

Edited protected data.

Workflow touched a field that normally requires review. Galea flags the run and recommends a guardrail.

2.1×

usage spike

risky tool calls

missing source

suggested evals

from incident to durable fix

The closed loop.

Investigate

Galea explains the workflow in your priorities. Not log dumps. Answers.

Optimize

Recommends durable fixes — evals, retrieval constraints, alerts, review requirements.

Monitor

Tracks the same failure mode going forward. Continuous agent QA.

“ Orchestration layers vary by team. The need to understand them does not.

a category, applied to a new shape

Same shape. New medium.

Datadog

metrics & traces for cloud servers

$40B+

Sentry

error tracking for production apps

$3B+

LaunchDarkly

feature flags for safe rollouts

$3B+

Galea

investigation for agent workflows

today

§ 10 · the bet

Agent workflows need investigation,
not just orchestration.

The orchestration layer will vary by team. The need to understand, audit, and improve agent behavior will not. Galea becomes the neutral layer that watches every run, explains what mattered, and turns incidents into better workflows.

◆ the one-liner

Galea is the investigation layer for agent workflows.

§ 11 · frequently asked

Questions we get asked.

What is Galea?

Galea is the investigation layer for agent workflows. It sits above any orchestration runtime — Mercury, LangGraph, OpenAI, Claude, CrewAI, Temporal, or custom code — and produces investigations that explain what mattered in each run.

Does Galea replace my agent framework?

No. Keep your runtime. Galea listens to events from any framework, builds timelines, and investigates against your company's priority model. It never hosts or orchestrates agents.

What kinds of problems does Galea catch?

Fabricated citations, unsafe tool calls, anomalous token usage, missing evidence, privacy violations, and other failures that finish with a success status. Galea looks past pass/fail — it compares each run to your baseline and risk model.

How does integration work?

Add a lightweight SDK or adapter to your existing workflow. Galea captures events at the runtime boundary — no code rewrites, no framework lock-in. Adapters exist for OpenAI Agents SDK, Claude Agent SDK, LangGraph, CrewAI, and MCP.

What's the pricing model?

Galea is currently in private design partnership. We work directly with teams to stand up their first project, configure priorities, and prove value before any commercial conversation.

Who is Galea for?

Teams shipping agent workflows into production — legal AI, customer support automation, clinical AI scribes, agentic coding tools, DevOps copilots. If your agents make decisions that matter, Galea explains whether those decisions were good.

now in private design partnership

Bring your hardest workflow.

We'll stand up your project, record two runs, and walk you through the investigation in a 30-minute session.

[email protected] Read the pitch