The Full Agents-as-a-Service Stack: 9 Layers, 200 Vendors, and the Slot Nobody Filled

May 12, 2026 · Galea Team

At GTC 2025, Jensen Huang described agentic AI as a trillion-dollar computing inflection. He was being conservative. Sequoia Capital now frames long-horizon agents as targeting the $10 trillion in services revenue that software has never been able to reach. The analyst consensus is clear: agentic AI is the next platform shift.

What is less clear is what "agentic AI" actually requires in production. Most coverage focuses on one layer at a time. But a production agent system is a full vertical stack — nine functional layers, thirteen cross-cutting concerns, roughly 200 vendors.

We mapped the entire thing.

Functional layers

Cross-cutting concerns

~200

Vendors mapped

Slot unfilled

The complete interactive diagram — every vendor across all layers, with the full request flow.

Open full diagram →

What “agentic” actually means in 2026

The word has been stretched to near-meaninglessness by marketing. Let us be precise.

An agentic system is one where the model controls the execution flow. It decides which tools to call, in what order, whether to loop back, when to hand off to a sub-agent, and when to stop. This is categorically different from a chatbot (model generates text), a RAG pipeline (model generates text given retrieved context), or a workflow automation (deterministic steps with an LLM call embedded).

As Huang said at GTC: agentic AI can “use tools because it understands multimodal information… at the foundation of agentic AI is reasoning.” That reasoning loop is exactly what makes the infrastructure stack so deep.

From SaaS to AaaS: the pricing revolution

CIO Magazine

“SaaS gave you the tools. AaaS gives you the workers.”

Agents-as-a-Service is a subscription-based cloud model for deploying autonomous AI agents that make decisions and execute tasks with limited supervision. The pricing implications are seismic:

IDC: Pure seat-based pricing will be obsolete by 2028, forcing 70% of vendors to redesign business models.
Bloomberg: Subscription-based pricing declines from 60% to 30% of models over the next decade; outcome-based rises from 10% to 60%.
Gartner: At least 40% of enterprise SaaS spend shifts to usage-, agent-, or outcome-based pricing by 2030.

The stack at a glance

Every vertical AI product — Harvey, Abridge, Decagon, Sierra, Glean — sits on top of this same fundamental stack. Click any layer below for the full vendor breakdown.

L01 · Channel

web · mobile · voice · email · slack · teams · api · browser ext · mcp client

↓

L02 · Gateway

litellm · portkey · openrouter · cf ai gateway · helicone · semantic cache · cost cap

↓

L03 · Orchestration

langgraph · crewai · openai agents sdk · claude agent sdk · temporal · inngest · mastra

↓

L04a · Memory

pinecone · pgvector · mem0 · letta · redis

L04b · Tools

browserbase · e2b · exa · composio · firecrawl

L04c · Protocol

MCP · A2A · AGUI

↓

L05 · Model

claude · gpt-5 · gemini · llama · deepseek · together · fireworks · groq

↓

L06 · Compute

nvidia h200/b200 · coreweave · groq lpu · cerebras wse-3 · aws trainium · google tpu

+ 13 cross-cutting concerns spanning all layers ↓

The nine layers — expanded

L01 Channel / Distribution How the user reaches the agent ▶

This sounds simple until you count the surfaces. Each has different latency expectations, auth flows, and streaming requirements. The voice sub-stack alone involves four or five vendor decisions before you reach orchestration.

Surfaces

web appmobilevoice / phonesmsemailslackteamsdiscordapibrowser extmcp client

Frontend SDK / Generative UI

vercel ai sdkassistant-uicopilotkittamboag-ui protocol

Voice Stack

vapiretell aibland aisynthflowvoiceflowlivekitdeepgram (stt)elevenlabs (tts)cartesia (tts)twiliotelnyx

↓

L02 LLM Gateway / Router Single entrypoint for all model calls ▶

Handles auth, retries, fallback routing, semantic caching, cost caps, and rate limiting. Semantic caching at this layer cuts costs 30–60% for repetitive queries. Most teams underinvest here.

LLM Gateways

litellmportkeyopenroutervellumhelicone proxyrequesty

Hyperscaler / CDN Gateways

cloudflare ai gatewaykong ai gatewayaws bedrockazure ai content safety

Semantic Cache

heliconeportkey cachegptcacheredis

↓

L03 Agent Orchestration / Runtime The brain — most contested layer ▶

Plans, calls tools in a loop, branches, checkpoints, hands off to sub-agents. LangGraph leads monthly searches at 27,100 vs CrewAI at 14,800. CrewAI has the best DX — multi-agent system in under 20 lines — but teams often migrate to LangGraph for production.

Open-Source Frameworks (Python)

langgraphcrewaiautogenllamaindexpydantic aismolagentshaystackdspy

TypeScript / JS

mastravercel ai sdklanggraph.jsbamlvoltagent

Vendor SDKs

openai agents sdkclaude agent sdkgoogle adkbedrock agents sdk

Durable Workflow Engines

temporalinngestrestatetrigger.devhatchet

Hosted Agent Runtimes

aws bedrock agentcorevertex ai agent builderazure ai foundrycloudflare agentssnowflake cortex

Visual / Low-Code

n8nzapier agentsmakelindyrelevance ai

↓

L04a Memory ▶

Vector DBs

pineconeturbopufferweaviateqdrantmilvuschroma

Postgres-native

pgvectorneonsupabase

Memory Frameworks

mem0lettazepcognee

Graph / KV

neo4jgraphitiredisupstash

L04b Tools ▶

Browser

browserbasestagehandsteelskyvernbrowser-use

Code Sandbox

e2bmodaldaytonariza

Search / Crawl

exatavilyfirecrawljina

iPaaS for Agents

composioarcadepicanango

L04c Protocol ▶

Protocol wars are settled. MCP won tools. A2A won agent-to-agent. AAIF under Linux Foundation ensures neutrality.

Tool ↔ Agent

MCP (anthropic)

Agent ↔ Agent

A2A (google)ACP (ibm)

Agent ↔ UI

AGUIopenai realtime

Registries

smitherymcp.sopulsemcp

↓

L05 Model Layer Intelligence — shrinking moat for inference ▶

Three frontier labs, a long tail of open weights, and a fast-commoditizing inference provider market.

Frontier API

anthropic claudeopenai gpt-5 / o-seriesgoogle geminixai grok

Open Weights

llama (meta)qwen (alibaba)deepseekmistralphi (microsoft)gemma (google)

Inference Providers

together aifireworksgroqcerebrassambanovareplicatemodalbaseten

Embeddings / Rerankers

voyage aicohereopenai embedjina

↓

L06 Compute / Hardware Where the math happens ▶

Most AaaS companies never touch this directly. Abstracted by the model layer. NVIDIA’s Blackwell Ultra (shipping H2 2025) delivers 40x the performance of Hopper.

GPU Cloud (Neoclouds)

coreweavelambdacrusoerunpodvast.ainebius

Inference Silicon

nvidia h200 / b200amd mi300xgroq lpucerebras wse-3aws trainium2google tpu v5

Hyperscalers

awsazuregcporacle

Thirteen cross-cutting concerns

Nine layers handle the request path. These thirteen concerns span all of them — the infrastructure decisions most teams discover too late.

XC-01

Observability / Tracing

langsmith · langfuse · helicone · arize phoenix · braintrust · datadog llm obs · openlit · logfire

XC-02

Evaluation

braintrust · patronus · galileo · ragas · promptfoo · deepeval · inspect-ai

XC-03

Safety / Guardrails

lakera · llama guard · nemo guardrails · guardrails ai · protect ai · pangea

XC-04

Auth / Identity

workos · clerk · auth0 · stytch · arcade (agent auth) · pomerium

XC-05

Data Pipelines

fivetran · airbyte · dlt · snowflake · databricks · clickhouse · kafka

XC-06

Billing / Metering

stripe · orb · metronome · openmeter · lago

XC-07

Compliance / Trust

vanta · drata · sprinto · secureframe

XC-08

Deploy / CI-CD / Secrets

vercel · modal · railway · fly · doppler · vault · gh actions

XC-09

Error / Feature Flags

sentry · rollbar · launchdarkly · statsig · posthog

XC-10 · R&D

Fine-Tuning

openpipe · predibase · together ft · modal+axolotl · unsloth

XC-11 · R&D

Data Labeling / Synthetic

scale ai · surge ai · labelbox · snorkel · gretel · tonic

XC-12 · R&D

Prompt / Experiment Mgmt

promptlayer · helicone prompts · braintrust playground · vellum

XC-01.5 · New category, 2026

Investigation / Agent QA

Sits above traces. Ingests events from any runtime, applies a customer-specific priority model (correctness for Harvey, tool-safety for Cursor, PHI for Abridge), and produces investigations that explain what mattered — not log dumps. Catches “successful but wrong” runs that observability misses and pre-deploy eval suites can’t anticipate.

galea — first mover; category being defined

One request through the stack

Here is what actually happens when a user sends a message to a production agent:

User→ Edge / CDN→ Auth Gate→ Orchestrator→ Guardrail-in→ LLM Router→ Model Call→ GPU Infer

↳ model emits tool calls, fanned out to:

Memory Read Browser Code Sandbox Search / RAG 3rd-party API Sub-agent

↲ results fold back → loop until done →

Synthesize→ Guardrail-out→ Memory Write→ Meter Usage→ Emit Trace→ Response→ Galea Investigate

All hops emit OTel-style spans. “Successful but wrong” runs — fabricated citations, unsafe tool calls — are only caught at the investigation layer.

The math nobody is talking about

Adoption

79%

of companies actively adopting AI agents (PwC)

At Scale

have actually deployed at scale (PwC)

Gartner predicts 40% of enterprise applications will feature task-specific AI agents by end of 2026. But they also predict over 40% of agentic AI projects will be canceled by end of 2027 due to escalating costs, unclear business value, or inadequate risk controls.

Deloitte · 2026 State of AI · 3,235 leaders · 24 countries

“Agentic AI is scaling faster than guardrails.”

More numbers that tell the same story:

3% of companies successfully scaling agentic AI across multiple departments (IDC/AWS, 2025)
95% of generative AI pilots stall due to integration, not model issues (MIT/NANDA)
15% of AI decision-makers reported EBITDA lift in past 12 months (Forrester)
21% of enterprises have mature governance models for agentic AI (Deloitte)
55% cite lack of skilled personnel as greatest challenge (IDC/AWS)

What this means for builders

Four things worth internalizing from this map:

Depth

The stack is deeper than you think. You are making decisions across nine layers and thirteen cross-cutting concerns. Your orchestration framework constrains your durable execution options. Your voice sub-stack adds four vendor dependencies before your first LLM call.

Protocols

The protocol layer has settled. MCP won tools. A2A won agent-to-agent. The AAIF under the Linux Foundation ensures neutrality. Build on them. Do not build your own wire format.

Governance

The governance gap is real and immediate. 40% of agentic AI projects canceled by 2027. The teams that survive will close the loop between production behavior and systematic improvement.

Flywheel

The feedback loop is the product. Production traces feeding into eval harnesses, with investigation catching failure modes that pre-deploy testing cannot anticipate, with findings generating durable evals that prevent recurrence. The teams that build this will compound.

Explore every vendor across all nine layers and thirteen cross-cutting concerns.

Open the full interactive diagram →

Agentic AI is the next platform shift. The infrastructure to build it is maturing fast. The infrastructure to trust it is just getting started.