Now in private design partnership

Investigation layer for agent workflows.

Galea sits above whatever runs your agents — Mercury, LangGraph, OpenAI, Claude, custom code. We listen to events, build timelines, and produce investigations that explain what mattered.

scroll
next
§ 01 · why now

Agent workflows are becoming
business workflows.

forcing function 01

Failures are no longer toy failures.

Agents now refund customers, redline contracts, summarize medical visits, change product data. Teams need an answer — not a log dump.

forcing function 02

Every team picked a different runtime.

Mercury, LangGraph, OpenAI, Claude, CrewAI, Temporal, custom queues. The winning observability layer cannot require a rewrite.

forcing function 03

Optimization is customer-specific.

Harvey cares about correctness. Decagon cares about refund risk. Cursor cares about unsafe edits. Generic dashboards miss the point.

01 · the gap

You have traces.
You don't have answers.

02 · the noise

A run completed.
That doesn't mean it was good.

03 · the cost

Different teams care about
different kinds of wrong.

§ 02 · the gap, in detail

Teams have traces.
They don't have answers.

Runtimes show events. APMs show spans. Product teams still inspect each workflow manually and decide whether it was correct, allowed, efficient, safe, and worth changing.

failure 01

Logs without judgment.

The trace says a tool ran. It doesn't say whether that tool should have run for this customer, matter, ticket, or policy.

failure 02

Generic priorities.

Latency, cost, correctness, compliance, context size, risky edits — every company weights them differently.

failure 03

Anomalies hide in normal runs.

A workflow can finish successfully while using 2× normal tokens, citing unsupported facts, or editing data it shouldn't touch.

failure 04

Incidents don't become improvements.

Teams debug one run, then move on. They rarely convert the failure into a reusable eval, baseline, alert, or workflow fix.

§ 02·5 · same trace, four tools

Three say clean. Galea catches it.

Harvey M&A: 5-agent redline of a $14.5M SaaS MSA. The contracts_agent cited a Pacific Coast Lines MFN clause — that document was never in the dataroom. All 23 spans returned OK.

◇ LangSmith (mock) ● OK
▾ contract_redline 92.4s
▾ intake_agent 0.6s
▾ data_room_agent 8.2s
⌖ tool: dataroom.extract_clause 7.4s
▾ contracts_agent 12.1s
⌘ chat-completion 11.8s
▾ partner_review_agent 67.1s
11 spans · 0 errors · 0 warnings — illustrative reconstruction
◇ Braintrust (mock) 100% pass
casegroundciteslatcost
northbeam_msa_redline
acme_msa_redline
globex_renewal_redline
initech_msa_v3
umbrella_intl_msa
5 / 5 passed · no regressions vs baseline — illustrative reconstruction
◇ Latitude (mock) ✓ Done
SYSTEM

You are a legal AI for M&A diligence. Cite every finding.

USER

[signal] MSA redline for SaaS — Northbeam

TOOL CALL

dataroom.extract_clause(scope="all")

TOOL RESULT

extracted 4 clauses

ASSISTANT

Time-charter agreement with Pacific Coast Lines contains MFN clause.

✓ Run completed successfully
prompt log looks normal — illustrative reconstruction
◆ Galea (live) ⚠ BLOCKED · 1 error · 2 concerns
BLAME · @galea/blame
Primary fault → contracts_agent @ event_606617ba56c1
kind=fabricated_citation · weight=1.0 · conf 1.00
Upstream → data_room_agent (msg w=0.3), intake_agent (handoff w=0.5)
REPLAY · @galea/replay
claim → "Pacific Coast Lines MFN clause"
citation → pacific_coast_charter_2023.pdf#page=4
verdict → DOC_MISSING · chunk hash absent in snapshot
PRIORITY · @galea/priorities (legal-ma)
correctness 0.9 × raw 1.0 = 0.9 → error. Same trace under cost-weighted profile = info.
claim ≠ snapshot. fabricated citation, named, attributed, scored.
§ 03 · solution

Galea sits above
whatever runs your agents.

Keep Mercury, LangGraph, OpenAI, Claude, CrewAI, Temporal, or custom code. Galea listens to events, builds the timeline, applies company context, and produces the investigation.

Product Workflow customer side
support · legal · clinical · coding · ops
Galea Event Layer ◆ just works
captures events · traces · tool calls · outputs
Existing Orchestration
mercury · langgraph · openai · claude · custom
Tools · Memory · Models
postgres · pgvector · openai · anthropic · mcp
galea · harvey-ma
Galea project dashboard showing workflow traces, priority settings, and audit chain
step 01 · the project

Your workflow, in Galea.

priorities · baselines · audit chain · signal schema

galea · trace · anomaly
Galea trace view highlighting a fabricated citation flagged by the hallucination investigator
hallucination investigator
contracts_agent fabricated a citation: pacific_coast_charter_2023.pdf#page=4
step 04 · hallucination

Run finished. Citation was made up.

tideglass marine · 32 events · 2 flagged · investigator caught it

galea · finding
Galea investigation panel showing blame attribution, evidence replay, and priority scoring
headline · the fabrication
cited evidence vs what the data-room had
root cause · agent at fault
step 05 · pinpoint

Galea named the agent at fault.

claim couldn't be matched to anything the agents retrieved · treat as fabricated until verified

galea · evals
Galea eval dashboard converting a single incident into a durable correctness eval
step 06 · the loop

One incident. One durable eval.

finding → eval → next deploy gated · sentry for agent behavior

§ 06 · customer-specific priorities

Same workflow data.
Different investigations.

Galea is useful because it doesn't treat every workflow the same. It learns what each company cares about and investigates against that priority model.

legal · m&a diligence

Harvey

"Did the memo cite real evidence the agents actually retrieved?"

  • correctness
  • audit
  • regulatory
  • cost
  • latency
customer support · automation

Decagon

"Did this run refund a customer it shouldn't have?"

  • tool-safety
  • cost
  • latency
  • audit
  • privacy
agentic coding ide

Cursor

"Did the agent edit a protected file or secret?"

  • tool-safety
  • correctness
  • latency
  • cost
  • memory-safety
clinical ai scribe

Abridge

"Did the summary hallucinate or leak PHI?"

  • correctness
  • privacy-phi
  • audit
  • latency
  • cost
§ 07 · what Galea finds

The workflow finished.
That doesn't mean it was good.

Galea looks past success/failure status. It compares each run to the company's baseline, risk model, and product priorities — then explains the part a human should care about.

anomaly

2.1× normal usage.

Run completed, but context grew across three retries and doubled token spend against the customer's baseline.

correctness

Unsupported output.

Final answer cited data that was never retrieved. For Harvey, that matters more than latency or cost.

tool risk

Edited protected data.

Workflow touched a field that normally requires review. Galea flags the run and recommends a guardrail.

2.1×
usage spike
3
risky tool calls
1
missing source
4
suggested evals
from incident to durable fix

The closed loop.

01

Investigate

Galea explains the workflow in your priorities. Not log dumps. Answers.

02

Optimize

Recommends durable fixes — evals, retrieval constraints, alerts, review requirements.

03

Monitor

Tracks the same failure mode going forward. Continuous agent QA.

Orchestration layers vary by team. The need to understand them does not.
a category, applied to a new shape

Same shape. New medium.

Datadog
metrics & traces for cloud servers
$40B+
Sentry
error tracking for production apps
$3B+
LaunchDarkly
feature flags for safe rollouts
$3B+
Galea
investigation for agent workflows
today
§ 10 · the bet

Agent workflows need investigation,
not just orchestration.

The orchestration layer will vary by team. The need to understand, audit, and improve agent behavior will not. Galea becomes the neutral layer that watches every run, explains what mattered, and turns incidents into better workflows.

◆ the one-liner
Galea is the investigation layer for agent workflows.
§ 11 · frequently asked

Questions we get asked.

What is Galea?

Galea is the investigation layer for agent workflows. It sits above any orchestration runtime — Mercury, LangGraph, OpenAI, Claude, CrewAI, Temporal, or custom code — and produces investigations that explain what mattered in each run.

Does Galea replace my agent framework?

No. Keep your runtime. Galea listens to events from any framework, builds timelines, and investigates against your company's priority model. It never hosts or orchestrates agents.

What kinds of problems does Galea catch?

Fabricated citations, unsafe tool calls, anomalous token usage, missing evidence, privacy violations, and other failures that finish with a success status. Galea looks past pass/fail — it compares each run to your baseline and risk model.

How does integration work?

Add a lightweight SDK or adapter to your existing workflow. Galea captures events at the runtime boundary — no code rewrites, no framework lock-in. Adapters exist for OpenAI Agents SDK, Claude Agent SDK, LangGraph, CrewAI, and MCP.

What's the pricing model?

Galea is currently in private design partnership. We work directly with teams to stand up their first project, configure priorities, and prove value before any commercial conversation.

Who is Galea for?

Teams shipping agent workflows into production — legal AI, customer support automation, clinical AI scribes, agentic coding tools, DevOps copilots. If your agents make decisions that matter, Galea explains whether those decisions were good.

now in private design partnership

Bring your hardest workflow.

We'll stand up your project, record two runs, and walk you through the investigation in a 30-minute session.