← Back to blog

The Full Agents-as-a-Service Stack: 9 Layers, 200 Vendors, and the Slot Nobody Filled

At GTC 2025, Jensen Huang described agentic AI as a trillion-dollar computing inflection. He was being conservative. Sequoia Capital now frames long-horizon agents as targeting the $10 trillion in services revenue that software has never been able to reach. The analyst consensus is clear: agentic AI is the next platform shift.

What is less clear is what "agentic AI" actually requires in production. Most coverage focuses on one layer at a time. But a production agent system is a full vertical stack — nine functional layers, thirteen cross-cutting concerns, roughly 200 vendors.

We mapped the entire thing.

9
Functional layers
13
Cross-cutting concerns
~200
Vendors mapped
1
Slot unfilled

The complete interactive diagram — every vendor across all layers, with the full request flow.

Open full diagram →

What “agentic” actually means in 2026

The word has been stretched to near-meaninglessness by marketing. Let us be precise.

An agentic system is one where the model controls the execution flow. It decides which tools to call, in what order, whether to loop back, when to hand off to a sub-agent, and when to stop. This is categorically different from a chatbot (model generates text), a RAG pipeline (model generates text given retrieved context), or a workflow automation (deterministic steps with an LLM call embedded).

As Huang said at GTC: agentic AI can “use tools because it understands multimodal information… at the foundation of agentic AI is reasoning.” That reasoning loop is exactly what makes the infrastructure stack so deep.

From SaaS to AaaS: the pricing revolution

CIO Magazine

“SaaS gave you the tools. AaaS gives you the workers.”

Agents-as-a-Service is a subscription-based cloud model for deploying autonomous AI agents that make decisions and execute tasks with limited supervision. The pricing implications are seismic:

  • IDC: Pure seat-based pricing will be obsolete by 2028, forcing 70% of vendors to redesign business models.
  • Bloomberg: Subscription-based pricing declines from 60% to 30% of models over the next decade; outcome-based rises from 10% to 60%.
  • Gartner: At least 40% of enterprise SaaS spend shifts to usage-, agent-, or outcome-based pricing by 2030.

The stack at a glance

Every vertical AI product — Harvey, Abridge, Decagon, Sierra, Glean — sits on top of this same fundamental stack. Click any layer below for the full vendor breakdown.

L01 · Channel
web · mobile · voice · email · slack · teams · api · browser ext · mcp client
L02 · Gateway
litellm · portkey · openrouter · cf ai gateway · helicone · semantic cache · cost cap
L03 · Orchestration
langgraph · crewai · openai agents sdk · claude agent sdk · temporal · inngest · mastra
L04a · Memory
pinecone · pgvector · mem0 · letta · redis
L04b · Tools
browserbase · e2b · exa · composio · firecrawl
L04c · Protocol
MCP · A2A · AGUI
L05 · Model
claude · gpt-5 · gemini · llama · deepseek · together · fireworks · groq
L06 · Compute
nvidia h200/b200 · coreweave · groq lpu · cerebras wse-3 · aws trainium · google tpu

+ 13 cross-cutting concerns spanning all layers ↓

The nine layers — expanded

L01 Channel / Distribution How the user reaches the agent

This sounds simple until you count the surfaces. Each has different latency expectations, auth flows, and streaming requirements. The voice sub-stack alone involves four or five vendor decisions before you reach orchestration.

Surfaces
web appmobilevoice / phonesmsemailslackteamsdiscordapibrowser extmcp client
Frontend SDK / Generative UI
vercel ai sdkassistant-uicopilotkittamboag-ui protocol
Voice Stack
vapiretell aibland aisynthflowvoiceflowlivekitdeepgram (stt)elevenlabs (tts)cartesia (tts)twiliotelnyx
L02 LLM Gateway / Router Single entrypoint for all model calls

Handles auth, retries, fallback routing, semantic caching, cost caps, and rate limiting. Semantic caching at this layer cuts costs 30–60% for repetitive queries. Most teams underinvest here.

LLM Gateways
litellmportkeyopenroutervellumhelicone proxyrequesty
Hyperscaler / CDN Gateways
cloudflare ai gatewaykong ai gatewayaws bedrockazure ai content safety
Semantic Cache
heliconeportkey cachegptcacheredis
L03 Agent Orchestration / Runtime The brain — most contested layer

Plans, calls tools in a loop, branches, checkpoints, hands off to sub-agents. LangGraph leads monthly searches at 27,100 vs CrewAI at 14,800. CrewAI has the best DX — multi-agent system in under 20 lines — but teams often migrate to LangGraph for production.

Open-Source Frameworks (Python)
langgraphcrewaiautogenllamaindexpydantic aismolagentshaystackdspy
TypeScript / JS
mastravercel ai sdklanggraph.jsbamlvoltagent
Vendor SDKs
openai agents sdkclaude agent sdkgoogle adkbedrock agents sdk
Durable Workflow Engines
temporalinngestrestatetrigger.devhatchet
Hosted Agent Runtimes
aws bedrock agentcorevertex ai agent builderazure ai foundrycloudflare agentssnowflake cortex
Visual / Low-Code
n8nzapier agentsmakelindyrelevance ai
L04a Memory
Vector DBs
pineconeturbopufferweaviateqdrantmilvuschroma
Postgres-native
pgvectorneonsupabase
Memory Frameworks
mem0lettazepcognee
Graph / KV
neo4jgraphitiredisupstash
L04b Tools
Browser
browserbasestagehandsteelskyvernbrowser-use
Code Sandbox
e2bmodaldaytonariza
Search / Crawl
exatavilyfirecrawljina
iPaaS for Agents
composioarcadepicanango
L04c Protocol

Protocol wars are settled. MCP won tools. A2A won agent-to-agent. AAIF under Linux Foundation ensures neutrality.

Tool ↔ Agent
MCP (anthropic)
Agent ↔ Agent
A2A (google)ACP (ibm)
Agent ↔ UI
AGUIopenai realtime
Registries
smitherymcp.sopulsemcp
L05 Model Layer Intelligence — shrinking moat for inference

Three frontier labs, a long tail of open weights, and a fast-commoditizing inference provider market.

Frontier API
anthropic claudeopenai gpt-5 / o-seriesgoogle geminixai grok
Open Weights
llama (meta)qwen (alibaba)deepseekmistralphi (microsoft)gemma (google)
Inference Providers
together aifireworksgroqcerebrassambanovareplicatemodalbaseten
Embeddings / Rerankers
voyage aicohereopenai embedjina
L06 Compute / Hardware Where the math happens

Most AaaS companies never touch this directly. Abstracted by the model layer. NVIDIA’s Blackwell Ultra (shipping H2 2025) delivers 40x the performance of Hopper.

GPU Cloud (Neoclouds)
coreweavelambdacrusoerunpodvast.ainebius
Inference Silicon
nvidia h200 / b200amd mi300xgroq lpucerebras wse-3aws trainium2google tpu v5
Hyperscalers
awsazuregcporacle

Thirteen cross-cutting concerns

Nine layers handle the request path. These thirteen concerns span all of them — the infrastructure decisions most teams discover too late.

XC-01
Observability / Tracing
langsmith · langfuse · helicone · arize phoenix · braintrust · datadog llm obs · openlit · logfire
XC-02
Evaluation
braintrust · patronus · galileo · ragas · promptfoo · deepeval · inspect-ai
XC-03
Safety / Guardrails
lakera · llama guard · nemo guardrails · guardrails ai · protect ai · pangea
XC-04
Auth / Identity
workos · clerk · auth0 · stytch · arcade (agent auth) · pomerium
XC-05
Data Pipelines
fivetran · airbyte · dlt · snowflake · databricks · clickhouse · kafka
XC-06
Billing / Metering
stripe · orb · metronome · openmeter · lago
XC-07
Compliance / Trust
vanta · drata · sprinto · secureframe
XC-08
Deploy / CI-CD / Secrets
vercel · modal · railway · fly · doppler · vault · gh actions
XC-09
Error / Feature Flags
sentry · rollbar · launchdarkly · statsig · posthog
XC-10 · R&D
Fine-Tuning
openpipe · predibase · together ft · modal+axolotl · unsloth
XC-11 · R&D
Data Labeling / Synthetic
scale ai · surge ai · labelbox · snorkel · gretel · tonic
XC-12 · R&D
Prompt / Experiment Mgmt
promptlayer · helicone prompts · braintrust playground · vellum
XC-01.5 · New category, 2026
Investigation / Agent QA

Sits above traces. Ingests events from any runtime, applies a customer-specific priority model (correctness for Harvey, tool-safety for Cursor, PHI for Abridge), and produces investigations that explain what mattered — not log dumps. Catches “successful but wrong” runs that observability misses and pre-deploy eval suites can’t anticipate.

galea — first mover; category being defined

One request through the stack

Here is what actually happens when a user sends a message to a production agent:

User Edge / CDN Auth Gate Orchestrator Guardrail-in LLM Router Model Call GPU Infer
↳ model emits tool calls, fanned out to:
Memory Read  Browser  Code Sandbox  Search / RAG  3rd-party API  Sub-agent
↲ results fold back → loop until done →
Synthesize Guardrail-out Memory Write Meter Usage Emit Trace Response Galea Investigate

All hops emit OTel-style spans. “Successful but wrong” runs — fabricated citations, unsafe tool calls — are only caught at the investigation layer.

The math nobody is talking about

Adoption
79%

of companies actively adopting AI agents (PwC)

At Scale
2%

have actually deployed at scale (PwC)

Gartner predicts 40% of enterprise applications will feature task-specific AI agents by end of 2026. But they also predict over 40% of agentic AI projects will be canceled by end of 2027 due to escalating costs, unclear business value, or inadequate risk controls.

Deloitte · 2026 State of AI · 3,235 leaders · 24 countries

“Agentic AI is scaling faster than guardrails.”

More numbers that tell the same story:

  • 3% of companies successfully scaling agentic AI across multiple departments (IDC/AWS, 2025)
  • 95% of generative AI pilots stall due to integration, not model issues (MIT/NANDA)
  • 15% of AI decision-makers reported EBITDA lift in past 12 months (Forrester)
  • 21% of enterprises have mature governance models for agentic AI (Deloitte)
  • 55% cite lack of skilled personnel as greatest challenge (IDC/AWS)

What this means for builders

Four things worth internalizing from this map:

Depth

The stack is deeper than you think. You are making decisions across nine layers and thirteen cross-cutting concerns. Your orchestration framework constrains your durable execution options. Your voice sub-stack adds four vendor dependencies before your first LLM call.

Protocols

The protocol layer has settled. MCP won tools. A2A won agent-to-agent. The AAIF under the Linux Foundation ensures neutrality. Build on them. Do not build your own wire format.

Governance

The governance gap is real and immediate. 40% of agentic AI projects canceled by 2027. The teams that survive will close the loop between production behavior and systematic improvement.

Flywheel

The feedback loop is the product. Production traces feeding into eval harnesses, with investigation catching failure modes that pre-deploy testing cannot anticipate, with findings generating durable evals that prevent recurrence. The teams that build this will compound.

Explore every vendor across all nine layers and thirteen cross-cutting concerns.

Open the full interactive diagram →

Agentic AI is the next platform shift. The infrastructure to build it is maturing fast. The infrastructure to trust it is just getting started.