doc/aaas-stack/v2 · 2026.05 · galea.foo interactive reference

The full Agents-as-a-Service stack —
R&D through production.

Every layer a vertical AI workflow product (Harvey, Glean, Abridge, Decagon, Sierra, Cresta, EvenUp) sits on top of, or has rebuilt internally. 9 functional layers in the request path, 13 cross-cutting concerns spanning all of them — including the newly-defined investigation / agent QA slot (galea). Vendor lists are deliberately exhaustive — ★ marks the load-bearing default at each layer.

channel gateway orchestration protocol memory tools model compute cross-cutting
§00

Architecture — request flow & cross-cutting concerns

visual / overview
REQUEST PATH ─────────────▼ CROSS-CUTTING ──▶ L01 CHANNEL / DISTRIBUTION web · mobile · voice/phone · email · slack · teams · api · browser-ext · mcp-client L02 · CONDITIONAL VOICE / REALTIME STACK livekit · pipecat · deepgram · cartesia · vapi · twilio L03 LLM GATEWAY / ROUTER litellm · portkey · openrouter · cf-ai-gateway · helicone L04 · CORE AGENT ORCHESTRATION / RUNTIME plan · tool-call · loop · branch · checkpoint · handoff · sub-agent framework: langgraph · crewai · autogen · openai-agents-sdk · claude-agent-sdk · pydantic-ai · mastra durable: temporal · inngest · restate · trigger.dev hosted: bedrock-agentcore · vertex · cf-agents L05a MEMORY / CONTEXT vector-db pinecone · turbopuffer · pgvector memory-fw mem0 · letta · zep cache + state redis · upstash · momento L05b TOOL / ACTION browser browserbase · stagehand · steel code-sandbox e2b · modal · daytona search · doc-parse · ipaas exa · firecrawl · reducto · composio L05c PROTOCOL / INTEROP tools ↔ agent MCP (anthropic, broadly adopted) agent ↔ agent A2A (google) · ACP (ibm) agent ↔ user UI AGUI · openai-realtime L06 MODEL LAYER frontier api: anthropic claude · openai gpt-5 · gemini · grok open weights: llama · qwen · deepseek · mistral · phi · gemma inference providers: together · fireworks · groq · cerebras · sambanova · baseten · modal embeddings/rerank: voyage · cohere · jina · openai-embed L07 · BARE METAL COMPUTE / HARDWARE gpu cloud: coreweave · lambda · crusoe · runpod hyperscaler: aws · azure · gcp silicon: nvidia h100/h200/b200 · groq lpu · cerebras wse-3 · aws trainium · google tpu CROSS-CUTTING spans all layers ◐ OBSERVABILITY / TRACING langsmith · langfuse · helicone · arize braintrust · phoenix · datadog-llm INVESTIGATION / AGENT QA ★ galea — new category, 2026 incident → durable eval · priority-aware EVALUATION braintrust · patronus · galileo · ragas openpipe · promptfoo · inspect-ai SAFETY / GUARDRAILS lakera · llama-guard · nemo-guardrails protect-ai · guardrails-ai · pangea AUTH / IDENTITY workos · clerk · auth0 · stytch arcade (agent-auth) · pomerium DATA PIPELINES & WAREHOUSE fivetran · airbyte · dlt · estuary snowflake · databricks · clickhouse BILLING / METERING stripe · orb · metronome openmeter · lago COMPLIANCE / TRUST vanta · drata · sprinto + HIPAA/SOC2/GDPR posture DEPLOY / CI-CD / SECRETS vercel · modal · railway · fly doppler · vault · gh-actions ERROR / FEATURE FLAGS sentry · rollbar launchdarkly · statsig · posthog R&D-ONLY ───────── FINE-TUNING / TRAINING openpipe · together-ft · predibase modal+axolotl · runpod+unsloth DATA LABELING / SYNTHETIC scale · surge · labelbox · snorkel gretel · tonic · mostly-ai PROMPT / EXPERIMENT MGMT promptlayer · pezzo · latitude helicone-prompts · vellum galea ⇠ events R&D LIFECYCLE ─────────────────────────────────────────────────────────────▼ STEP 01 DOMAIN DATA collect·dedupe·label scale · surge labelbox · snorkel unstructured · firecrawl s3 · snowflake · clickhouse STEP 02 EVAL HARNESS build before training braintrust · langsmith patronus · galileo ragas · promptfoo inspect-ai (UK AISI) STEP 03 FT / DISTILL if needed (often not) openpipe · predibase together-ft · fireworks-ft modal+axolotl unsloth · TRL/peft STEP 04 AGENT BUILD framework + tools langgraph / crewai claude-agent-sdk composio · arcade · mcp letta · mem0 (memory) STEP 05 RED TEAM break before ship lakera red · hidden-layer patronus simian cisco robust-intelligence promptfoo redteam STEP 06 DEPLOY → PROD to live request path vercel · modal · railway k8s · ecs · fly + feature flags + canary · ab prod traces ─→ feed back into eval harness ─→ retrain · iterate PROD REQUEST — DETAILED HOPS ───────────────────────────────────────────▼ USER browser · phone EDGE / CDN cloudflare · vercel AUTH GATE workos · clerk ORCHESTRATOR langgraph · temporal PLANNER decompose task GUARDRAIL-IN lakera · llama-grd LLM ROUTER litellm · portkey MODEL CALL claude · gpt · groq GPU INFER h100 · b200 · lpu ⤷ planner emits tool calls — fanned out to: MEMORY READ turbopuffer · pinecone BROWSER browserbase · stagehand CODE SBX e2b · modal · daytona SEARCH / RAG exa · firecrawl · tavily 3RD-PARTY API composio · arcade · mcp SUB-AGENT recursive · A2A VOICE / TTS elevenlabs · cartesia ⤴ tool results fold back to planner → loop until done → SYNTHESIZE final answer GUARDRAIL-OUT pii · jailbreak filter MEMORY WRITE + trace store METER USAGE orb · openmeter EMIT TRACE langfuse · helicone RESPONSE stream → user GALEA INVESTIGATE → durable eval ─ all hops emit OTel-style spans → langfuse / langsmith / datadog ─ galea ingests events & produces investigations ─ $$ metered → orb ─ failures → temporal retries → sentry alerts ─ "successful but wrong" runs (fabricated cite, unsafe tool call) → galea catches ─ semantic cache cuts cost ~30-60%

Vendor lists collapsed to load-bearing defaults inside the diagram. The full enumeration follows below — every layer expanded.

§01

Layers — every vendor, grouped by sub-function

7 layers · ~200 vendors
L01
Channel
How the customer reaches the agent. Surface + transport + UI SDK. Voice is its own sub-stack.
Surfaces
web appmobilevoice / phonesmsemailslackteamsdiscordapibrowser extchrome extdesktopmcp client (claude · cursor · windsurf)
Frontend SDK / Generative UI
vercel ai sdkassistant-uicopilotkittambo (gen-ui)letta uiag-ui protocolcustom react / vue / svelte
Voice Stack — orchestrators
vapiretell aibland aisynthflowvoiceflowopenai realtime api
Voice Stack — transport / STT / TTS / telephony
livekitdailypipecat deepgram (stt)assemblyai (stt)gladia (stt)whisper cartesia (tts)elevenlabs (tts)playhtinworld twiliotelnyxplivo
L02
Gateway / Router
Single entrypoint for all model calls. Auth, retries, fallback, semantic cache, cost cap, rate limit.
LLM Gateways
litellmportkeyopenroutervellumhelicone proxyrequesty
Hyperscaler / CDN AI Gateways
cloudflare ai gatewaykong ai gatewayaws bedrock (built-in)azure ai content safetyf5 ai gateway
Semantic Cache · Cost · Throttle
heliconeportkey cachegptcacheredis (manual)
L03
Orchestration
The brain. Plans, calls tools in a loop, branches, checkpoints, hands off to sub-agents. The main fight in the category.
Open-Source Frameworks (Python)
langgraphcrewaiautogen (microsoft)llamaindexpydantic aismolagents (hf)haystackdspylangroid
Open-Source Frameworks (TS / JS)
mastravercel ai sdklanggraph.jsbamlvoltagent
Vendor SDKs
openai agents sdkclaude agent sdkgoogle adkamazon bedrock agents sdk
Durable Workflow Engines
temporalinngestrestatetrigger.devhatchetdapr workflows
Hosted Agent Runtimes
aws bedrock agentcorevertex ai agent builderazure ai foundry agent servicecloudflare agentssnowflake cortex agents
Visual / Low-Code Builders
n8nzapier agentsmakelindyrelevance aistack aisema4.ai
Vertical / Multi-Agent Platforms
sierra (cx)glean (work)decagon (cx)cresta (contact center)letta (agent os)
L04
Protocol / Interop
The wire format between agents, tools, and UIs. MCP won the tools layer in 2025; A2A and AGUI are still contested.
Tool ↔ Agent
MCP — model context protocol (anthropic, broadly adopted)
Agent ↔ Agent
A2A (google · linux foundation)ACP (ibm research)AgentCard
Agent ↔ UI
AGUI protocolopenai realtime wsanthropic realtime
MCP Server Marketplaces / Registries
smitherymcp.sopulsemcpcomposio mcpopentoolsofficial mcp registry
L05a
Memory / Context
Vector + graph + KV. Short-term scratchpad, long-term episodic, retrieval from corpus. Most vertical agents collapse this into one managed service.
Specialized Vector DBs
pineconeturbopufferweaviateqdrantmilvuschromalancedbvespamarqo
Postgres-Native
pgvectorpgvectorscale (timescale)neonsupabase vectoraurora pg
Hybrid in Other DBs
mongo atlas vectorelasticredis vectorclickhouseduckdb vss
Memory Frameworks (managed)
mem0lettazepcognee
Knowledge Graph
neo4jgraphitimemgraphkuzu
Cache / Session State
redisupstashmomentocf kv / durable objects
L05b
Tools / Actions
Where agents actually do things. The fastest-growing layer — every category here has 5-10 funded startups.
Browser Automation
browserbasestagehandsteelbrowserlesshyperbrowseranthropic computer useopenai operator / cuaskyvernmultionbrowser-use
Code / Compute Sandboxes
e2bmodaldaytonarizacloudflare sandboxescodesandbox sdkvercel sandboxreplit agent
Web Search / Crawl / Scrape
exatavilyfirecrawlbrave search apiserpapijina readerbright dataapifyspider.cloud
Document Parsing / OCR
reductollamaparseunstructuredmistral ocrmarkerdoclingmathpixazure doc intelligence
Integration Hubs / iPaaS for Agents
composioarcadepicaactivepiecesparagonmergenangoapideckworkatopipedream
Specialized Action Tools
stripe agent toolkitdefog (text-to-sql)clay (data enrichment)zapier agents
L06
Model
Intelligence. ~3 frontier labs + a long tail of open weights + a fast-shrinking moat for inference providers.
Frontier API
anthropic claude (opus / sonnet / haiku)openai gpt-5 / o-seriesgoogle geminixai grok
Open-Weight Foundation
llama (meta)qwen (alibaba)deepseekmistralphi (microsoft)gemma (google)command r (cohere)
Specialized / Domain
med-palm · medical-llmbloomberggpt (financial)code llama · qwen-coderharvey custom (legal)protein language models
Embeddings · Rerankers
voyage aicohere embed / rerankopenai text-embeddingjinabgenomic
Inference Providers (host open weights)
together aifireworks aigroqcerebras inferencesambanovareplicateanyscalemodalbasetenleptondeepinfra
Cloud Foundries
aws bedrockazure ai foundrygoogle vertex aisnowflake cortexdatabricks foundation modelsoracle genai
L07
Compute / Hardware
Where the floating-point math actually happens. Most AaaS companies never touch this directly — abstracted by the model layer.
GPU Cloud (neoclouds)
coreweavelambda labscrusoerunpodpaperspacevast.aifluidstacknebiustensorwave
Hyperscalers
awsazuregcporacle cloud
Edge / App Runtimes
cloudflare workersvercel edgefastly computedeno deploy
Inference Silicon
nvidia h100 / h200nvidia b100 / b200 / gb200amd mi300xgroq lpucerebras wse-3aws trainium2 / inferentia2google tpu v5e / v5p / trilliumtenstorrent
§02

Cross-cutting concerns — span every layer above

13 concerns · ~80 vendors · 1 emerging category
XC-01
Observability / Tracing
OTel-style spans for every model call, tool call, sub-agent hand-off. Single pane of glass for prod.
langsmithlangfuseheliconearize phoenixbraintrustdatadog llm obsnew relic aiopenlitlunarywhylabslogfire (pydantic)
XC-01.5 · new category, 2026
Investigation / Agent QA
Sits above traces. Ingests events from any runtime (mercury · langgraph · openai · claude · custom), applies a customer-specific priority model (correctness for Harvey, tool-safety for Cursor, PHI for Abridge), and produces investigations that explain what mattered — not log dumps. Catches "successful but wrong" runs (fabricated citations, unsafe tool calls, anomalous token usage) that observability misses and pre-deploy eval suites can't anticipate. Closes the loop: incident → durable eval → gated deploy.
galea— first mover; category being defined
positioning: Datadog (cloud) → Sentry (apps) → Galea (agents). Reads from L03 + XC-01; writes to XC-02 + XC-09. Not a replacement for langfuse/braintrust — sits on top of them. risk: incumbents (braintrust, langfuse) absorb the "priority model" feature before the category stabilizes.
XC-02
Evaluation
Offline + online eval. Regression tests for prompts, agent trajectories, end-to-end task success.
braintrustpatronus aigalileoopenpiperagaspromptfoodeepevalinspect-ai (uk aisi)vellum evallangsmith eval
XC-03
Safety / Guardrails
Input/output filtering — jailbreak detection, PII scrubbing, prompt injection blocking, tool-use policy.
lakera guardllama guard (meta)nemo guardrails (nvidia)guardrails aiprotect aipangeaopenai moderationazure content safetyaws bedrock guardrailscisco robust intelligencehidden layer
XC-04
Auth / Identity
Two flavors: (1) end-user auth into your agent product, (2) agent-on-behalf-of-user OAuth into 3rd-party tools. The second is harder.
workosclerkauth0 / oktastytchdescopeaws cognito arcade (agent auth)pylonpomerium
XC-05
Data Pipelines · Warehouse · Streaming
For ingesting customer data into agent-readable form. RAG corpora, training data, eval datasets.
fivetranairbytedltestuarystitch snowflakedatabricksbigqueryclickhousemotherduck (duckdb)tinybird kafka (confluent)redpandamaterialize
XC-06
Billing / Usage Metering
Usage-based pricing is the default for AaaS. Token costs flow through to customer at margin.
stripeorbmetronomeopenmeterlagochargebee
XC-07
Compliance / Trust Posture
SOC2, HIPAA, ISO 27001, GDPR. Required to sell to enterprise. Single-tenant + BYOC for regulated buyers.
vantadratasprintosecureframetugboat logic
XC-08
Deploy / CI-CD / Secrets
App hosting + pipeline + secret management. Mostly normal SaaS plumbing.
vercelmodalrailwayfly.iorendercf workers github actionsgitlab ci dopplerhashicorp vaultinfisicalaws secrets mgr
XC-09
Error Tracking · Feature Flags
Standard SRE tooling, plus feature flags for safe rollout of new prompts / models / tools.
sentryrollbarbugsnag launchdarklystatsigposthoggrowthbook
XC-10 · R&D
Fine-Tuning / Distillation
Less common than 2024 hype suggested — most teams use frontier models + RAG + good prompts. But essential for cost / latency / proprietary capability.
openpipepredibasetogether fine-tuningfireworks fine-tuninganthropic tuning apiopenai fine-tuningmodal + axolotlrunpod + unslothaws sagemakerhuggingface trl / peft
XC-11 · R&D
Data Labeling / Synthetic
Domain-expert labels are the moat for vertical AI (Harvey lawyers, Abridge doctors). Synthetic generation fills the long tail.
scale aisurge ailabelboxsnorkelprolific greteltonic aimostly aisnowflake cortex synthetic
XC-12 · R&D
Prompt / Experiment Management
Version control + A/B harness for prompts & agent configs. Often folded into observability vendors.
promptlayerpezzolatitudehelicone promptsbraintrust playgroundvellumlangsmith hub
§03

R&D vs Production — what's used when

phase split

R&D / Pre-prod

Building, training, evaluating, breaking — the loop you do once before launch and continuously after.

Data Acquisition scale · surge · firecrawl · unstructured · custom scrapers — collect & label domain corpus
Eval Harness braintrust · langsmith · ragas · inspect-ai — build before you train, run after every change
Synthetic Data gretel · tonic · custom LLM pipelines — for long-tail edge cases & PII-safe training
Fine-Tuning openpipe · predibase · together-ft · modal+axolotl — only when frontier+RAG hits a ceiling
Prompt Engineering braintrust · helicone prompts · promptlayer · vellum — version, diff, A/B
Agent Trajectory Eval langsmith · langfuse traces · braintrust · phoenix — replay & rate prod traces offline
Red Team lakera red · patronus simian · promptfoo redteam · cisco robust-intel — break before customer does
Benchmarks SWE-bench · GAIA · τ-bench · WebArena · domain-specific (legal, clinical)

Production / Runtime

What actually serves a paying customer's request, end-to-end. Hot path — every ms counts.

Edge / Auth cloudflare · vercel · workos · clerk — terminate TLS, identify user, rate limit
Orchestration langgraph · temporal · openai-agents-sdk · claude-agent-sdk — durable agent loop
Guardrails lakera · llama-guard · openai moderation — pre-LLM input filter, post-LLM output filter
LLM Gateway litellm · portkey · openrouter — model fallback, semantic cache, cost cap
Model Inference anthropic · openai · groq · together · cerebras — frontier or self-hosted open weight
Tool Execution browserbase · e2b · exa · composio · arcade — sandboxed action surface
Memory turbopuffer · pinecone · pgvector · mem0 · letta · redis — read context, write episode
Observability langsmith · langfuse · helicone · datadog — every span captured, replay-able
Metering & Billing orb · metronome · stripe · openmeter — per-token, per-action, per-task pricing
Error / Alerting sentry · pagerduty · datadog — agent crashes are different from API crashes
§04

One request — the full hop chain in plain text

narrative
L01user types in chat edgecloudflare terminates TLS XC-04workos validates JWT L03langgraph orchestrator picks up the run L03temporal persists run state (so a crash doesn't lose work)

XC-03lakera scans input for prompt injection L02litellm routes to claude-sonnet-4-7 (with gpt-5 fallback) L02helicone semantic cache checks for near-dup → miss L06anthropic api gets the request L07runs on h200 cluster L06tokens stream back

model emits 3 tool calls in parallel:
  L05aturbopuffer.query() for past tickets
  L05bbrowserbase.navigate() to fetch live data
  L05bcomposio.salesforce.lookup() via MCP

all 3 results fold back to planner L03claude synthesizes final answer XC-03llama-guard scans output for PII / harmful L05aletta writes episodic memory XC-06orb meters tokens + tool calls XC-01langfuse stores full trace L01response streams back to user

async, post-response: XC-01.5galea ingests the full trace + tool outputs + memory reads runs investigator agents against your priority model (correctness / tool-safety / PHI / cost) verdict: "contracts_agent cited pacific_coast_charter_2023.pdf#page=4 but that doc was never retrieved → fabricated citation, weight 0.9, BLOCKED" XC-02galea writes a durable eval into braintrust next deploy gated on it trace becomes input to next R&D iteration

Every node in this chain is a different vendor decision. A typical AaaS company makes ~25-40 of these decisions before they ship v1.

§05

Where the actual fight is

opinion

Layers that are commoditizing

Race to the bottom — pick the cheapest reliable option, optionality matters more than which vendor.

L02 Gateways litellm is open-source and good enough; portkey/openrouter compete on margin
L06 Inference groq vs together vs cerebras — fight on $/M tokens; switching cost ≈ zero with a gateway
L07 Compute neoclouds (coreweave, crusoe) compete on GPU availability windows; abstracted from app builder

Layers that are still wide open

Where the next $10B companies likely emerge. No clear winner.

L03 Orchestration langgraph leads OSS but feels heavy; openai/anthropic SDKs eating the simple case; Temporal coming in from durability angle
L05a Memory no winner. mem0 / letta / zep all small. real Q: does memory live in the agent framework or beside it?
L05b Browser / Computer Use browserbase has a lead, but anthropic computer use + openai operator could absorb the category
XC-04 Agent Auth arcade is early; OAuth-on-behalf-of-agent is genuinely unsolved at scale
XC-02 Eval braintrust + langsmith + galileo all real businesses; vertical-specific eval still wide open
XC-01.5 Investigation galea defining the slot — "successful but wrong" runs aren't caught by observability or pre-deploy eval. real bet: do incumbents (braintrust, langfuse) absorb the priority-model feature first, or does a standalone investigation layer emerge? customer-specific priorities is the wedge

For a vertical AaaS founder (Harvey, Abridge, Kubera-style): own L03 + L05a + your domain data. Buy everything else — including XC-01.5 (galea-style investigation) when it stabilizes. The moat is workflow + data + compliance — not the underlying stack.