EY’s audit platform, called Canvas, processes 1.4 trillion lines of audit data every year. That data flows from 160,000 client engagements, across more than 150 countries, into the workflows of 130,000 EY professionals. As of April 2026, a meaningful share of that processing is run by AI agents, not humans.

This is what enterprise AI agents in production look like. Not a chatbot. Not a copilot. Software that takes an action, evaluates the result, takes the next action, and keeps going for hours or days, governed by rules the organization can audit.

The numbers around enterprise adoption have moved fast. As of early 2026, 79 percent of organizations have adopted AI agents in some form. By the end of the year, the projection is that 40 percent of all enterprise applications will have agents embedded directly. Two years ago, both numbers were under 10 percent. The shift is faster than most internal IT planning cycles can keep up with.

A useful way to understand the scale is to look at one specific deployment in detail.

Salesforce announced their internal use of Agentforce, their agent platform, paired with a new orchestration layer called Agent Fabric. The case study they released, called “Customer Zero,” documents the deployment at Reddit. The numbers are concrete. Case resolution times dropped 84 percent. Annual operational savings exceeded 100 million dollars. The agents handle customer support inquiries that previously routed to human staff, with the humans now handling escalations and complex cases the agents flag.

Eighty-four percent is a big number. The shape of how it gets achieved is interesting.

Most enterprise AI deployments in 2024 and early 2025 looked like this: a copilot inside an existing tool. A salesperson opens Salesforce, the AI suggests next steps, the salesperson decides. The AI saves the salesperson maybe 20 percent of their time, on average. That’s real value, but it’s incremental.

What changed is the move from copilot to agent. Instead of suggesting a next step, the agent takes it. The case comes in. The agent reads it, retrieves the customer’s history, checks inventory, drafts a response, and sends it. If the agent isn’t confident, it routes to a human. If it is confident, it resolves the case end to end. That’s where the 84 percent number comes from. Not 20 percent of human time saved. 80 percent of cases that don’t need a human anymore.

This isn’t possible without a meaningful infrastructure investment underneath. The agent has to know what it can and can’t do. It has to remember what it already tried. It has to log its actions in a way the company’s auditors can review. It has to fail gracefully when it encounters something outside its training. Each of these requirements is solvable. Solving all of them at once, for an enterprise with regulatory requirements, is what separates production from pilot.

Most pilots stay pilots. That’s the discipline gap.

The pattern that’s becoming standard in serious deployments has four pieces, none of which are individually new but which together represent a different operational model. The agent has tool access, meaning it can call the real systems, not just simulate them. The agent has memory, meaning it can retain context across long-running tasks instead of starting fresh each time. The agent has guardrails, meaning the actions it can take are constrained by explicit policy. And the agent is observable, meaning a human can review what it did, why, and what happened next.

Without any one of those four, the deployment fails in production. Tool access without guardrails leads to incidents. Guardrails without memory leads to absurd repetition. Memory without observability leads to “we don’t know what the agent did last week.” Observability without tool access is just logging.

The stack that makes this work has changed quickly. A year ago, building an enterprise agent involved gluing together a model, a vector database, a custom orchestration layer, and a homegrown observability tool, all in code your team had to maintain. By April 2026, most teams use a hosted platform, a standardized protocol like the Model Context Protocol for tool access, and a managed agent service from Anthropic, Salesforce, or one of the cloud providers. The build-vs-buy calculation has shifted hard toward buy, for everything except the highest-stakes use cases.

There’s a quieter pattern in the EY example worth pulling out.

Canvas isn’t a single agent. It’s an orchestration of many specialist agents, each handling one slice of the audit work. One agent might check whether revenue recognition entries match contract terms. Another might compare inventory counts across systems. Another might flag unusual journal entries for human review. The agents talk to each other, hand off work, and escalate when they encounter cases that need human judgment.

This multi-agent pattern is what most large-scale deployments are converging on. A single big general-purpose model running everything is conceptually simple but operationally hard to govern. Multiple specialized agents, each with a narrow scope and clear handoffs, is harder to design but easier to audit. Audit is the word that matters in regulated industries, which is why the pattern has emerged most clearly in audit, banking, and healthcare deployments.

The maturity test: If your company is “exploring AI agents,” you’re a year or more behind the deployment curve in your industry. If you’re “running pilots,” you’re at par. If you have agents in production handling defined workflows, with measurable resolution rates and observable failures, you’re in the leading 25 percent. The gap between the leaders and the laggards is widening, not narrowing.

A practical observation for anyone trying to move from pilot to production. The blocker is rarely the model. The blocker is the governance work: who approves what the agent can do, how mistakes get caught, how changes get rolled out. The teams that ship agents to production have either built or bought a governance layer. The teams that don’t, don’t ship.

The other practical observation. Production agents fail in different ways than humans do. A human customer service rep makes mistakes proportional to fatigue and complexity. An agent makes mistakes proportional to distribution shift, meaning when the inputs drift away from what the agent was trained or evaluated on. Monitoring for distribution shift is a different skill than monitoring for human error. Most enterprises haven’t built that capability yet. The ones running successful agent deployments have. It’s one of the foundational pieces of AI infrastructure that doesn’t show up in vendor pitches but determines whether the deployment works.

Frequently asked questions

What’s the right first agentic deployment for an enterprise?

A workflow that’s high-volume, well-defined, and has a clear escalation path to humans. Customer support is a common starting point because the volume is large, the categories are mostly predictable, and the escalation pattern is well understood. Avoid starting with workflows that touch revenue recognition, legal commitments, or anything that requires regulatory sign-off until the team has built operating muscle on lower-stakes work.

How much does an enterprise agent deployment cost?

The model cost is usually a small fraction of the total. The infrastructure, integration, governance, and ongoing operations cost is where most of the budget goes. A rough rule for a serious deployment in a mid-sized company is that the first year costs 1 to 5 million dollars, with the model itself accounting for less than 20 percent of that. The cost gets better in year two as the platform investments amortize.

Can a small company do this without an enterprise platform?

Yes, with discipline. The same patterns work at smaller scale. The hosted platforms make the build cheaper at small scale because you don’t have to staff up an internal team to maintain them. The risk for small companies isn’t the technology cost. It’s underestimating the operational discipline required to run agents reliably without dedicated headcount.

Are these case studies durable, or will the technology change underneath them?

Some of both. The patterns of deploying agents, governance, observability, and orchestration are likely to hold for several years. The specific tools, platforms, and models will keep changing. Teams that build directly against a single vendor’s stack will need to migrate. Teams that build against open standards like MCP and structured agent protocols will migrate more easily.