← Back to blog

Every SaaS Company Will Become a GaaS Company — Now What?

At GTC 2026, Jensen Huang coined a term that reframed the entire software industry:

"Every SaaS company will become a GaaS company."

— Jensen Huang, GTC 2026 keynote

GaaS — agentic as a service. Not selling dashboards humans operate — selling agents that do the work themselves. SaaS was built for humans to log in, navigate, and act. GaaS is built for agents to receive instructions, plan execution, and deliver outcomes. The product shifts from selling tools to selling outcomes. The software doesn't assist the worker — the software is the worker.

Two months later, he sharpened the point at ServiceNow Knowledge:

"For the first time, service is software. Software is service, and the service industry is 100x larger than the software industry."

— Jensen Huang, ServiceNow Knowledge, May 2026

This isn't a future prediction anymore. It's already happening.

The agentic AI companies are here

A wave of companies has already made this transition — or was born GaaS-native:

Legal Harvey

AI agents that do M&A diligence, contract review, and legal research. The product isn't a document viewer — it's an agent that reads 10,000 pages, identifies material risks, and produces a memo a partner can sign off on.

Support Decagon / Sierra

AI agents that handle customer support end-to-end. Not chatbot scripts — agents that access order systems, process refunds, escalate edge cases, and resolve tickets autonomously.

Engineering Cursor / Replit / Devin

AI agents that write, test, deploy, and maintain code. Entire development workflows delegated to agents with access to production infrastructure.

Enterprise Glean / Moveworks / Adept

AI agents for enterprise search, IT automation, and cross-system workflows. Agents that navigate internal tools, pull data from multiple sources, and take actions across systems.

Every one of these companies ships agent workflows as their core product. When the agent is wrong, the product is wrong. When the agent hallucinates, the company shipped a defective product to a paying customer.

The infrastructure to build agents exists

The tooling landscape for building and running agent workflows has matured rapidly. There is no shortage of ways to build an GaaS product:

Orchestration frameworks. OpenAI Agents SDK, Claude Agent SDK, LangChain/LangGraph, CrewAI, Mercury, Temporal, AutoGen. These handle agent construction — tool use, handoffs, multi-agent coordination, memory, planning loops. The build layer works.

Observability platforms. LangSmith, Arize, Braintrust, Datadog LLM Observability, Weights & Biases Weave. These capture traces, log prompts and completions, track token costs, and visualize agent execution. The tracing layer works.

Evaluation frameworks. Braintrust, Promptfoo, DeepEval, RAGAS. These run offline evaluations against datasets, measuring accuracy, hallucination rates, and response quality. The eval layer works.

Compute infrastructure. CoreWeave, Lambda Labs, Together AI, RunPod — plus hyperscalers — provide the GPU capacity to run inference at scale. NVIDIA's own OpenClaw stack. The compute layer works.

6+
Major orchestration frameworks
5+
Tracing / observability platforms
4+
Eval frameworks

So what's missing?

Traces show what happened. Nobody asks whether it should have.

Every tool in the current stack answers the same category of question: what did the agent do?

LangSmith shows you the trace — every LLM call, tool invocation, and retrieval step. Arize shows you latency distributions and token costs. Braintrust runs evaluations against test datasets. All useful. All necessary.

None of them ask the questions that matter when agents are your product:

What current tools answer
✓ TRACES CAPTURED

What LLM calls were made?
How many tokens were used?
What was the latency?
Did the tool call return 200?
What's the eval score on this dataset?

What GaaS companies need answered
✗ NOT ADDRESSED

Did the agent contradict company policy?
Was that tool call authorized in this context?
Did the output match the source material?
Did the agent violate a regulatory constraint?
Was this failure mode seen before — and getting worse?

This is the difference between tracing and investigation. Tracing records events. Investigation understands whether those events were correct, authorized, and safe — scoped to what each specific customer cares about.

Why existing observability isn't enough

The current tools were built for a world where agents are experimental. Research projects. Internal tools. Prototypes. In that world, traces and evals are enough — you're learning about the technology.

GaaS companies don't have that luxury. When the agent is the product, you need answers to harder questions:

  1. Correctness against source material. Did Harvey's legal agent cite a case that actually exists? Did it quote the holding accurately? LangSmith shows you the retrieval step returned results. It doesn't tell you whether those results were faithfully represented in the output.
  2. Policy compliance in production. Did Decagon's support agent commit to a refund the company doesn't offer? Arize shows latency was 200ms. It doesn't check the response against the company's refund policy.
  3. Tool safety in context. Did Cursor's coding agent access production credentials when it was scoped to staging? Braintrust evaluates output quality on test datasets. It doesn't flag privilege escalation in live runs.
  4. Customer-specific priorities. Harvey cares about citation accuracy. Decagon cares about refund risk. A healthcare GaaS company cares about PHI exposure. Generic dashboards serve none of them well.
  5. Failure mode tracking over time. Is the same type of error happening more often? Is a specific tool call pattern correlated with failures? Are things getting worse since the last model update? This requires longitudinal analysis, not per-run traces.

These aren't edge cases. For an GaaS company, these are the questions that determine whether the product works.

The missing layer

The GaaS stack has clear layers: compute at the bottom (NVIDIA, CoreWeave), orchestration in the middle (LangGraph, Agents SDK), and the application on top. Observability tools like LangSmith and Arize sit alongside the orchestration layer — they capture what the framework produces.

What's missing is an investigation layer that sits above all of it. Not a replacement for tracing — a consumer of traces that asks harder questions about them.

Layer 1
Compute
NVIDIA, CoreWeave, Lambda, hyperscalers. GPUs that run inference. Mature.
Layer 2
Orchestration
OpenAI Agents SDK, Claude Agent SDK, LangGraph, CrewAI. Frameworks that build agents. Mature.
Layer 3
Tracing & Evals
LangSmith, Arize, Braintrust, Datadog. Tools that capture what happened. Mature.
Layer 4
Investigation
Asks whether what happened was correct, authorized, and safe — scoped to each customer. Missing.

This is where Galea sits.

Investigation for the GaaS era

Galea is the investigation layer for agent workflows. It sits above any orchestration runtime and any tracing tool, and provides what GaaS companies need to ship agent products to enterprise customers:

Trace ingest. Framework-agnostic adapters for OpenAI Agents SDK, Claude Agent SDK, LangGraph, CrewAI, Mercury — or plug into existing LangSmith/Arize traces. No rewrite. Works with whatever you already use.

Investigation. An investigator walks every workflow run against company context and customer priorities. Not "did it complete?" but "was it correct, authorized, and efficient?" Findings are scoped to what each customer actually cares about — citation accuracy for legal, refund risk for support, PHI exposure for healthcare.

Optimization. Recommendations for durable fixes: evals to add, guardrails to implement, retrieval to improve, review requirements to set. Scoped to what the investigation found, not generic best practices.

Audit. Signed, immutable export over every workflow. When a customer, regulator, or legal team asks "what did the agent do and why?" — the answer exists, is complete, and is attributable.

The GaaS economy needs this

Jensen Huang is right. The transition from SaaS to GaaS is happening. The companies making that transition have the compute, the frameworks, the tracing tools, and the eval suites. What they don't have is the ability to tell an enterprise customer: we investigated every workflow your agents ran, here's what we found, here's what we fixed, and here's the audit trail.

That's not a monitoring problem. It's not a tracing problem. It's not an eval problem. It's an investigation problem — and it requires a layer purpose-built to answer questions about whether agent workflows were correct, safe, and aligned with what each customer cares about.

The GaaS era needs an investigation layer. That's Galea. [email protected]