AaaS Infrastructure Stack — R&D to Production

§01

Layers — every vendor, grouped by sub-function

7 layers · ~200 vendors

L01

Channel

How the customer reaches the agent. Surface + transport + UI SDK. Voice is its own sub-stack.

Surfaces

web appmobilevoice / phonesmsemailslackteamsdiscordapibrowser extchrome extdesktopmcp client (claude · cursor · windsurf)

Frontend SDK / Generative UI

vercel ai sdkassistant-uicopilotkittambo (gen-ui)letta uiag-ui protocolcustom react / vue / svelte

Voice Stack — orchestrators

vapiretell aibland aisynthflowvoiceflowopenai realtime api

Voice Stack — transport / STT / TTS / telephony

livekitdailypipecat deepgram (stt)assemblyai (stt)gladia (stt)whisper cartesia (tts)elevenlabs (tts)playhtinworld twiliotelnyxplivo

L02

Gateway / Router

Single entrypoint for all model calls. Auth, retries, fallback, semantic cache, cost cap, rate limit.

LLM Gateways

litellmportkeyopenroutervellumhelicone proxyrequesty

Hyperscaler / CDN AI Gateways

cloudflare ai gatewaykong ai gatewayaws bedrock (built-in)azure ai content safetyf5 ai gateway

Semantic Cache · Cost · Throttle

heliconeportkey cachegptcacheredis (manual)

L03

Orchestration

The brain. Plans, calls tools in a loop, branches, checkpoints, hands off to sub-agents. The main fight in the category.

Open-Source Frameworks (Python)

langgraphcrewaiautogen (microsoft)llamaindexpydantic aismolagents (hf)haystackdspylangroid

Open-Source Frameworks (TS / JS)

mastravercel ai sdklanggraph.jsbamlvoltagent

Vendor SDKs

openai agents sdkclaude agent sdkgoogle adkamazon bedrock agents sdk

Durable Workflow Engines

temporalinngestrestatetrigger.devhatchetdapr workflows

Hosted Agent Runtimes

aws bedrock agentcorevertex ai agent builderazure ai foundry agent servicecloudflare agentssnowflake cortex agents

Visual / Low-Code Builders

n8nzapier agentsmakelindyrelevance aistack aisema4.ai

Vertical / Multi-Agent Platforms

sierra (cx)glean (work)decagon (cx)cresta (contact center)letta (agent os)

L04

Protocol / Interop

The wire format between agents, tools, and UIs. MCP won the tools layer in 2025; A2A and AGUI are still contested.

Tool ↔ Agent

MCP — model context protocol (anthropic, broadly adopted)

Agent ↔ Agent

A2A (google · linux foundation)ACP (ibm research)AgentCard

Agent ↔ UI

AGUI protocolopenai realtime wsanthropic realtime

MCP Server Marketplaces / Registries

smitherymcp.sopulsemcpcomposio mcpopentoolsofficial mcp registry

L05a

Memory / Context

Vector + graph + KV. Short-term scratchpad, long-term episodic, retrieval from corpus. Most vertical agents collapse this into one managed service.

Specialized Vector DBs

pineconeturbopufferweaviateqdrantmilvuschromalancedbvespamarqo

Postgres-Native

pgvectorpgvectorscale (timescale)neonsupabase vectoraurora pg

Hybrid in Other DBs

mongo atlas vectorelasticredis vectorclickhouseduckdb vss

Memory Frameworks (managed)

mem0lettazepcognee

Knowledge Graph

neo4jgraphitimemgraphkuzu

Cache / Session State

redisupstashmomentocf kv / durable objects

L05b

Tools / Actions

Where agents actually do things. The fastest-growing layer — every category here has 5-10 funded startups.

Browser Automation

browserbasestagehandsteelbrowserlesshyperbrowseranthropic computer useopenai operator / cuaskyvernmultionbrowser-use

Code / Compute Sandboxes

e2bmodaldaytonarizacloudflare sandboxescodesandbox sdkvercel sandboxreplit agent

Web Search / Crawl / Scrape

exatavilyfirecrawlbrave search apiserpapijina readerbright dataapifyspider.cloud

Document Parsing / OCR

reductollamaparseunstructuredmistral ocrmarkerdoclingmathpixazure doc intelligence

Integration Hubs / iPaaS for Agents

composioarcadepicaactivepiecesparagonmergenangoapideckworkatopipedream

Specialized Action Tools

stripe agent toolkitdefog (text-to-sql)clay (data enrichment)zapier agents

L06

Model

Intelligence. ~3 frontier labs + a long tail of open weights + a fast-shrinking moat for inference providers.

Frontier API

anthropic claude (opus / sonnet / haiku)openai gpt-5 / o-seriesgoogle geminixai grok

Open-Weight Foundation

llama (meta)qwen (alibaba)deepseekmistralphi (microsoft)gemma (google)command r (cohere)

Specialized / Domain

med-palm · medical-llmbloomberggpt (financial)code llama · qwen-coderharvey custom (legal)protein language models

Embeddings · Rerankers

voyage aicohere embed / rerankopenai text-embeddingjinabgenomic

Inference Providers (host open weights)

together aifireworks aigroqcerebras inferencesambanovareplicateanyscalemodalbasetenleptondeepinfra

Cloud Foundries

aws bedrockazure ai foundrygoogle vertex aisnowflake cortexdatabricks foundation modelsoracle genai

L07

Compute / Hardware

Where the floating-point math actually happens. Most AaaS companies never touch this directly — abstracted by the model layer.

GPU Cloud (neoclouds)

coreweavelambda labscrusoerunpodpaperspacevast.aifluidstacknebiustensorwave

Hyperscalers

awsazuregcporacle cloud

Edge / App Runtimes

cloudflare workersvercel edgefastly computedeno deploy

Inference Silicon

nvidia h100 / h200nvidia b100 / b200 / gb200amd mi300xgroq lpucerebras wse-3aws trainium2 / inferentia2google tpu v5e / v5p / trilliumtenstorrent

§02

Cross-cutting concerns — span every layer above

13 concerns · ~80 vendors · 1 emerging category

XC-01

Observability / Tracing

OTel-style spans for every model call, tool call, sub-agent hand-off. Single pane of glass for prod.

langsmithlangfuseheliconearize phoenixbraintrustdatadog llm obsnew relic aiopenlitlunarywhylabslogfire (pydantic)

XC-01.5 · new category, 2026

Investigation / Agent QA

Sits above traces. Ingests events from any runtime (mercury · langgraph · openai · claude · custom), applies a customer-specific priority model (correctness for Harvey, tool-safety for Cursor, PHI for Abridge), and produces investigations that explain what mattered — not log dumps. Catches "successful but wrong" runs (fabricated citations, unsafe tool calls, anomalous token usage) that observability misses and pre-deploy eval suites can't anticipate. Closes the loop: incident → durable eval → gated deploy.

galea— first mover; category being defined

positioning: Datadog (cloud) → Sentry (apps) → Galea (agents). Reads from L03 + XC-01; writes to XC-02 + XC-09. Not a replacement for langfuse/braintrust — sits on top of them. risk: incumbents (braintrust, langfuse) absorb the "priority model" feature before the category stabilizes.

XC-02

Evaluation

Offline + online eval. Regression tests for prompts, agent trajectories, end-to-end task success.

braintrustpatronus aigalileoopenpiperagaspromptfoodeepevalinspect-ai (uk aisi)vellum evallangsmith eval

XC-03

Safety / Guardrails

Input/output filtering — jailbreak detection, PII scrubbing, prompt injection blocking, tool-use policy.

lakera guardllama guard (meta)nemo guardrails (nvidia)guardrails aiprotect aipangeaopenai moderationazure content safetyaws bedrock guardrailscisco robust intelligencehidden layer

XC-04

Auth / Identity

Two flavors: (1) end-user auth into your agent product, (2) agent-on-behalf-of-user OAuth into 3rd-party tools. The second is harder.

workosclerkauth0 / oktastytchdescopeaws cognito arcade (agent auth)pylonpomerium

XC-05

Data Pipelines · Warehouse · Streaming

For ingesting customer data into agent-readable form. RAG corpora, training data, eval datasets.

fivetranairbytedltestuarystitch snowflakedatabricksbigqueryclickhousemotherduck (duckdb)tinybird kafka (confluent)redpandamaterialize

XC-06

Billing / Usage Metering

Usage-based pricing is the default for AaaS. Token costs flow through to customer at margin.

stripeorbmetronomeopenmeterlagochargebee

XC-07

Compliance / Trust Posture

SOC2, HIPAA, ISO 27001, GDPR. Required to sell to enterprise. Single-tenant + BYOC for regulated buyers.

vantadratasprintosecureframetugboat logic

XC-08

Deploy / CI-CD / Secrets

App hosting + pipeline + secret management. Mostly normal SaaS plumbing.

vercelmodalrailwayfly.iorendercf workers github actionsgitlab ci dopplerhashicorp vaultinfisicalaws secrets mgr

XC-09

Error Tracking · Feature Flags

Standard SRE tooling, plus feature flags for safe rollout of new prompts / models / tools.

sentryrollbarbugsnag launchdarklystatsigposthoggrowthbook

XC-10 · R&D

Fine-Tuning / Distillation

Less common than 2024 hype suggested — most teams use frontier models + RAG + good prompts. But essential for cost / latency / proprietary capability.

openpipepredibasetogether fine-tuningfireworks fine-tuninganthropic tuning apiopenai fine-tuningmodal + axolotlrunpod + unslothaws sagemakerhuggingface trl / peft

XC-11 · R&D

Data Labeling / Synthetic

Domain-expert labels are the moat for vertical AI (Harvey lawyers, Abridge doctors). Synthetic generation fills the long tail.

scale aisurge ailabelboxsnorkelprolific greteltonic aimostly aisnowflake cortex synthetic

XC-12 · R&D

Prompt / Experiment Management

Version control + A/B harness for prompts & agent configs. Often folded into observability vendors.

promptlayerpezzolatitudehelicone promptsbraintrust playgroundvellumlangsmith hub

§03

R&D vs Production — what's used when

phase split

R&D / Pre-prod

Building, training, evaluating, breaking — the loop you do once before launch and continuously after.

Data Acquisition scale · surge · firecrawl · unstructured · custom scrapers — collect & label domain corpus

Eval Harness braintrust · langsmith · ragas · inspect-ai — build before you train, run after every change

Synthetic Data gretel · tonic · custom LLM pipelines — for long-tail edge cases & PII-safe training

Fine-Tuning openpipe · predibase · together-ft · modal+axolotl — only when frontier+RAG hits a ceiling

Prompt Engineering braintrust · helicone prompts · promptlayer · vellum — version, diff, A/B

Agent Trajectory Eval langsmith · langfuse traces · braintrust · phoenix — replay & rate prod traces offline

Red Team lakera red · patronus simian · promptfoo redteam · cisco robust-intel — break before customer does

Benchmarks SWE-bench · GAIA · τ-bench · WebArena · domain-specific (legal, clinical)

Production / Runtime

What actually serves a paying customer's request, end-to-end. Hot path — every ms counts.

Edge / Auth cloudflare · vercel · workos · clerk — terminate TLS, identify user, rate limit

Orchestration langgraph · temporal · openai-agents-sdk · claude-agent-sdk — durable agent loop

Guardrails lakera · llama-guard · openai moderation — pre-LLM input filter, post-LLM output filter

LLM Gateway litellm · portkey · openrouter — model fallback, semantic cache, cost cap

Model Inference anthropic · openai · groq · together · cerebras — frontier or self-hosted open weight

Tool Execution browserbase · e2b · exa · composio · arcade — sandboxed action surface

Memory turbopuffer · pinecone · pgvector · mem0 · letta · redis — read context, write episode

Observability langsmith · langfuse · helicone · datadog — every span captured, replay-able

Metering & Billing orb · metronome · stripe · openmeter — per-token, per-action, per-task pricing

Error / Alerting sentry · pagerduty · datadog — agent crashes are different from API crashes

§04

One request — the full hop chain in plain text

narrative

L01user types in chat→ edgecloudflare terminates TLS→ XC-04workos validates JWT→ L03langgraph orchestrator picks up the run→ L03temporal persists run state (so a crash doesn't lose work)→

XC-03lakera scans input for prompt injection→ L02litellm routes to claude-sonnet-4-7 (with gpt-5 fallback)→ L02helicone semantic cache checks for near-dup → miss→ L06anthropic api gets the request→ L07runs on h200 cluster→ L06tokens stream back→

model emits 3 tool calls in parallel:
  L05aturbopuffer.query() for past tickets
  L05bbrowserbase.navigate() to fetch live data
  L05bcomposio.salesforce.lookup() via MCP →

all 3 results fold back to planner→ L03claude synthesizes final answer→ XC-03llama-guard scans output for PII / harmful→ L05aletta writes episodic memory→ XC-06orb meters tokens + tool calls→ XC-01langfuse stores full trace→ L01response streams back to user

async, post-response:→ XC-01.5galea ingests the full trace + tool outputs + memory reads→ runs investigator agents against your priority model (correctness / tool-safety / PHI / cost)→ verdict: "contracts_agent cited pacific_coast_charter_2023.pdf#page=4 but that doc was never retrieved → fabricated citation, weight 0.9, BLOCKED"→ XC-02galea writes a durable eval into braintrust→ next deploy gated on it→ trace becomes input to next R&D iteration

Every node in this chain is a different vendor decision. A typical AaaS company makes ~25-40 of these decisions before they ship v1.

§05

Where the actual fight is

opinion

Layers that are commoditizing

Race to the bottom — pick the cheapest reliable option, optionality matters more than which vendor.

L02 Gateways litellm is open-source and good enough; portkey/openrouter compete on margin

L06 Inference groq vs together vs cerebras — fight on $/M tokens; switching cost ≈ zero with a gateway

L07 Compute neoclouds (coreweave, crusoe) compete on GPU availability windows; abstracted from app builder

Layers that are still wide open

Where the next $10B companies likely emerge. No clear winner.

L03 Orchestration langgraph leads OSS but feels heavy; openai/anthropic SDKs eating the simple case; Temporal coming in from durability angle

L05a Memory no winner. mem0 / letta / zep all small. real Q: does memory live in the agent framework or beside it?

L05b Browser / Computer Use browserbase has a lead, but anthropic computer use + openai operator could absorb the category

XC-04 Agent Auth arcade is early; OAuth-on-behalf-of-agent is genuinely unsolved at scale

XC-02 Eval braintrust + langsmith + galileo all real businesses; vertical-specific eval still wide open

XC-01.5 Investigation galea defining the slot — "successful but wrong" runs aren't caught by observability or pre-deploy eval. real bet: do incumbents (braintrust, langfuse) absorb the priority-model feature first, or does a standalone investigation layer emerge? customer-specific priorities is the wedge

For a vertical AaaS founder (Harvey, Abridge, Kubera-style): own L03 + L05a + your domain data. Buy everything else — including XC-01.5 (galea-style investigation) when it stabilizes. The moat is workflow + data + compliance — not the underlying stack.

The full Agents-as-a-Service stack —
R&D through production.

Architecture — request flow & cross-cutting concerns