XC-01
Observability / Tracing
OTel-style spans for every model call, tool call, sub-agent hand-off. Single pane of glass for prod.
langsmithlangfuseheliconearize phoenixbraintrustdatadog llm obsnew relic aiopenlitlunarywhylabslogfire (pydantic)
XC-01.5 · new category, 2026
Investigation / Agent QA
Sits above traces. Ingests events from any runtime (mercury · langgraph · openai · claude · custom), applies a customer-specific priority model (correctness for Harvey, tool-safety for Cursor, PHI for Abridge), and produces investigations that explain what mattered — not log dumps. Catches "successful but wrong" runs (fabricated citations, unsafe tool calls, anomalous token usage) that observability misses and pre-deploy eval suites can't anticipate. Closes the loop: incident → durable eval → gated deploy.
galea— first mover; category being defined
positioning: Datadog (cloud) → Sentry (apps) → Galea (agents). Reads from L03 + XC-01; writes to XC-02 + XC-09. Not a replacement for langfuse/braintrust — sits on top of them. risk: incumbents (braintrust, langfuse) absorb the "priority model" feature before the category stabilizes.
XC-02
Evaluation
Offline + online eval. Regression tests for prompts, agent trajectories, end-to-end task success.
braintrustpatronus aigalileoopenpiperagaspromptfoodeepevalinspect-ai (uk aisi)vellum evallangsmith eval
XC-03
Safety / Guardrails
Input/output filtering — jailbreak detection, PII scrubbing, prompt injection blocking, tool-use policy.
lakera guardllama guard (meta)nemo guardrails (nvidia)guardrails aiprotect aipangeaopenai moderationazure content safetyaws bedrock guardrailscisco robust intelligencehidden layer
XC-04
Auth / Identity
Two flavors: (1) end-user auth into your agent product, (2) agent-on-behalf-of-user OAuth into 3rd-party tools. The second is harder.
workosclerkauth0 / oktastytchdescopeaws cognito
arcade (agent auth)pylonpomerium
XC-05
Data Pipelines · Warehouse · Streaming
For ingesting customer data into agent-readable form. RAG corpora, training data, eval datasets.
fivetranairbytedltestuarystitch
snowflakedatabricksbigqueryclickhousemotherduck (duckdb)tinybird
kafka (confluent)redpandamaterialize
XC-06
Billing / Usage Metering
Usage-based pricing is the default for AaaS. Token costs flow through to customer at margin.
stripeorbmetronomeopenmeterlagochargebee
XC-07
Compliance / Trust Posture
SOC2, HIPAA, ISO 27001, GDPR. Required to sell to enterprise. Single-tenant + BYOC for regulated buyers.
vantadratasprintosecureframetugboat logic
XC-08
Deploy / CI-CD / Secrets
App hosting + pipeline + secret management. Mostly normal SaaS plumbing.
vercelmodalrailwayfly.iorendercf workers
github actionsgitlab ci
dopplerhashicorp vaultinfisicalaws secrets mgr
XC-09
Error Tracking · Feature Flags
Standard SRE tooling, plus feature flags for safe rollout of new prompts / models / tools.
sentryrollbarbugsnag
launchdarklystatsigposthoggrowthbook
XC-10 · R&D
Fine-Tuning / Distillation
Less common than 2024 hype suggested — most teams use frontier models + RAG + good prompts. But essential for cost / latency / proprietary capability.
openpipepredibasetogether fine-tuningfireworks fine-tuninganthropic tuning apiopenai fine-tuningmodal + axolotlrunpod + unslothaws sagemakerhuggingface trl / peft
XC-11 · R&D
Data Labeling / Synthetic
Domain-expert labels are the moat for vertical AI (Harvey lawyers, Abridge doctors). Synthetic generation fills the long tail.
scale aisurge ailabelboxsnorkelprolific
greteltonic aimostly aisnowflake cortex synthetic
XC-12 · R&D
Prompt / Experiment Management
Version control + A/B harness for prompts & agent configs. Often folded into observability vendors.
promptlayerpezzolatitudehelicone promptsbraintrust playgroundvellumlangsmith hub