This week had one theme: the cost of scale. Enterprises discovered what happens when AI usage goes from pilot to production, and the industry responded with specialized models, managed agent platforms, and strategic acquisitions of the toolchain. Four independent data points converged on the same conclusion. The token economics that worked for PoCs collapse at enterprise scale.
I spent the week redesigning the cost model for our Company Brain architecture. Eight layers, seven integration ports. The governance and orchestration layers cost more to run than the intelligence layer itself.
Here’s what you need to know.
Uber‘s 5,000 engineers depleted their entire 2026 token budget by April. Not a department. Not a pilot. The whole engineering organization.
Blackstone’s Jon Gray reported a 15-fold increase year-over-year in token usage for Q1 across their 270 portfolio companies. That’s not one company experimenting. That’s 270 companies all hitting the same cost wall simultaneously.
The broader data confirms the pattern. A survey of 500 engineering professionals found average monthly enterprise AI spend grew 36% to $85,000 between 2024 and 2025. Over 45% of organizations now spend $100,000 or more per month on AI, up from 20% in 2024. And 71% of companies exceeded their AI budgets in 2025 (per Exponential View, May 19).
Then Gartner updated its forecast on May 19: $2.59 trillion in global AI spend for 2026, a 47% increase from $1.76 trillion in 2025. Yet 80%+ of enterprises cannot prove their AI projects create value. And per G-P’s 2026 “AI at Work Report,” nearly 70% of executives are prepared to cut AI budgets if business goals are not achieved this year.
Why it matters: The token budget problem is structural, not operational. Variable costs that scale with usage replace fixed infrastructure costs. Every engineer with a coding agent, every team with an orchestration layer, every process with an agentic workflow adds to a spend curve that finance teams never modeled. The organizations hitting this wall aren’t the ones doing AI badly. They’re the ones doing it at scale.
Google shipped an entire agent infrastructure stack in one week at I/O. Four announcements, one thesis: collapse the distance between idea and deployed agent.
Gemini 3.5 Flash became the default model in the Gemini app and AI Mode. Optimized for low-latency, high-throughput streaming, and explicitly designed as an agent-first foundation. This is Google making a bet: the default model should be built for agentic use, not adapted for it.
Managed Agents landed on the Gemini API. A single API call spins up a reasoning agent that executes code in isolated Linux sandbox environments. Agents retain state between calls. Configuration via markdown files. Sandbox computing is free during preview.
Antigravity 2.0 shipped as a rebuilt desktop app: multiple agents running in parallel on the same project, scheduled background tasks, one-click export from AI Studio. And Gemini Omni Flash arrived for video generation and editing from text prompts.
Why it matters: Google is selling the full stack. Not a model. Not an API. An infrastructure layer where agents are first-class primitives with sandboxed execution, persistent state, and managed lifecycle. For enterprise teams drowning in the cost problem from Story 1, this is Google saying: stop building the infrastructure. We’ll host it.
Cursor released Composer 2.5 this week. The headline: $0.07 per task on the Artificial Analysis Coding Agent Index. Claude Opus 4.7 costs $4.10 per task on the same benchmark. GPT-5.5 High Reasoning: $4.82.
That’s a 60x cost gap. The performance difference: Composer 2.5 scores 62 versus Opus 4.7 at 66 on the same index. Four points for 60x the cost.
Composer 2.5 is built on Moonshot AI’s open-weight Kimi K2.5 checkpoint, a mixture-of-experts architecture. Cursor spent 85% of their compute budget on their own reinforcement learning pipeline and post-training. They trained on 25x more synthetic coding tasks than the predecessor model. The result is a specialized coding agent that matches frontier models on the tasks it’s designed for, at a fraction of the cost.
The pattern AlphaSignal identified (May 24): modern AI teams are moving toward hybrid architectures. Route 80-90% of coding work to efficient specialized models. Reserve frontier models for 10-20% of deeply complex tasks involving novel architecture or zero-shot edge cases.
Why it matters: This is how the token budget crisis from Story 1 gets solved. Not by spending less on AI. By routing intelligently. The 60x cost gap between specialized and frontier models is wide enough to change the economics of every agentic workflow. The teams that figure out routing first will outspend their competitors while paying less.
Anthropic acquired Stainless, the startup behind API SDK generation used by OpenAI, Google, and Anthropic itself. The deal is valued at over $300 million, roughly 2x its $150 million valuation from December 2025 (per The Information). The Stainless team is integrating fully into Anthropic to focus on Claude tool connectivity and the MCP ecosystem.
This is Anthropic’s fourth acquisition in approximately six months: Bun (JavaScript runtime), Vercept (monitoring), Coefficient Bio (biotech), and now Stainless (developer tooling). The pattern is clear. Anthropic is buying the developer infrastructure layer around its models, not just building better models.
Separately, Anthropic will turn a profit this quarter, two years ahead of schedule. Q2 2026 projected revenue: $10.9 billion, with $559 million in operating profits (per Exponential View, May 23). For context, that quarterly revenue exceeds Anthropic’s entire lifetime revenue to date.
Why it matters: The Stainless acquisition signals where the value is shifting. The model is becoming a commodity. The toolchain around it (SDKs, protocols, monitoring, execution environments) is where the margin lives. Anthropic buying the SDK generator that its competitors also use is a supply-chain play. Control the pipes, not just the water.• OpenAI Math Breakthrough: An OpenAI reasoning model autonomously disproved the Erdos unit distance conjecture, an 80-year-old open problem in discrete geometry. Will Sawin of Princeton distilled and validated the AI reasoning. First autonomous AI solution to a central open math problem.
• MCP Ecosystem at 97M Downloads: Digital Applied published an ecosystem map showing MCP at 97 million downloads, A2A at 50+ partners, and ACP/UCP emerging as alternatives. Agent protocol standardization is real infrastructure now.
• EU AI Act Draft Guidelines: European Commission published draft high-risk classification guidelines on May 19. AI in employment, biometrics, critical infrastructure, education, and law enforcement classified as high-risk. Consultation closes June 23. Fines up to EUR 15 million or 3% of global turnover.
Sources: Exponential View, AlphaSignal, The Rundown AI, Prohuman AI, The Batch (DeepLearning.AI), Tomasz Tunguz, Gartner, G-P “AI at Work Report,” The Information, OpenAI, Digital Applied, European Commission, Hunton Andrews Kurth, GitHub, Thinking Machines Lab, Carnegie Mellon/Stanford.
I write about Production AI, enterprise AI adoption, and building systems that actually work. Follow along if that’s your thing.
Discover materials from our experts, covering extensive topics including next-gen technologies, data analytics, automation processes, and more.