Tecknoworks Blog

AI This Week:
The Accountability Gap

Week of March 23-29, 2026

Every week I read AlphaSignal, The Batch, Exponential View, Tunguz, The Rundown, and about ten more AI newsletters. Most of them cover the same stories. This is where I pull the signal from the noise and write what actually matters for people building production systems.

This week had one theme: accountability. Who is responsible when AI tools fail? The vendors say “entertainment only.” The surveys say 91% of organizations haven’t crossed the production line. The pricing structures say subscriptions can’t sustain production workloads. And the org charts say the person who should be governing the agent doesn’t exist yet.

I spent part of this week building agent governance infrastructure. Every time I asked “who decides what this agent can do,” the answer was a long pause followed by “we haven’t figured that out yet.” After reading Deloitte‘s 120,000-leader survey this week, that answer is more common than I realized.


Here’s what you need to know.

THE BIG FOUR

1.Block’s Goose Agent Handles 90% of Code. Dorsey Says Managers Are Obsolete.

Jack Dorsey co-authored a post this week arguing that AI makes middle management obsolete. Block backed it with action: the company cut over 4,000 employees, more than 40% of its workforce, and reorganized around three roles. Builders. Problem-owners. Player-coaches.

The data behind the thesis is Goose, Block‘s internal AI agent built on Model Context Protocol. According to Gradient Flow’s technical analysis, Goose handles approximately 90% of Block’s code submissions. Roughly 5,000 Block employees use it weekly, including non-developers. Block shares surged approximately 27% after hours following the restructuring announcements.

Dorsey’s argument: managers exist to route information up and down a chain. That function is digitized in a remote-first company where every decision, design, and plan already exists as a digital record. AI just needed to catch up.

Why it matters: This is the first major public case study of a tech company that deployed an internal AI agent at scale, restructured the org around it, and documented the rationale. Every CTO reading this is under pressure to answer: where in my org chart is a management layer routing information that AI could handle?

2. The Production Gap Now Has Numbers: 78% Pilots, 14% Production, 8.6% Deployed

A March 2026 survey of 650 enterprise technology leaders found that 78% have at least one AI agent pilot running. Only 14% have reached production scale. Deloitte surveyed 120,000 business leaders across 24 countries and confirmed: as of January 2026, only 8.6% of organizations have deployed AI agents to production. The other 91.4% are stuck between demo and deployment.

The failure modes are documented: reliability issues, context management breakdowns, cost overruns, integration failures, governance gaps. AlphaSignal’s April 5 issue identified the memory bottleneck as the leading technical cause. Long-context agents slow down, crash, or generate runaway API costs as context windows fill.

Why it matters: The pilot-to-production gap is now confirmed at scale. Not anecdotal. Not a single vendor’s claim. Two surveys, 120,650 leaders, same conclusion. The question for every CTO isn’t whether to experiment with agents. 78% already are. The question is what it takes to cross the production line. Only 8.6% have.

3. Anthropic Splits the Market: Subscriptions for Exploration, API for Production

Starting April 4, Anthropic blocked Claude Pro and Max subscriptions from powering third-party agentic tools like OpenClaw. Users who built agent workflows on subscription pricing are looking at cost increases of up to 50 times their previous monthly outlay.

The reason is architectural. Third-party agent harnesses bypass Anthropic’s prompt cache optimization, the mechanism that makes flat-rate subscriptions economically viable. Boris Cherny, Anthropic’s head of Claude Code, stated directly: “Subscriptions weren’t built for the usage patterns of these third-party tools.”

Anthropic offered a bridge: a one-time credit equal to one month’s subscription cost through April 17, plus 30% off pre-purchased API bundles. The direction is clear. Production workloads belong on the API tier.

Why it matters: This is the market formally separating consumer AI from production AI at the billing layer. Any enterprise building agent systems on subscription economics is operating on borrowed time. The 50x cost cliff is just the invoice arriving for a decision that was made months ago.

4. Microsoft’s Own Terms Call Copilot “Entertainment Only.” 3.3% of Users Agreed.

While Microsoft charges $30/month per enterprise seat for Copilot and pitches it as an AI co-worker, its Terms of Use contain this sentence: “Copilot is for entertainment purposes only. Don’t rely on Copilot for important advice. Use Copilot at your own risk.”

The market responded accordingly. Out of approximately 450 million Microsoft 365 seats, about 15 million users, 3.3%, are paying for Copilot Chat. Nearly half of lapsed users (44.2%) said they stopped because they didn’t trust the answers.

Important nuance: the “entertainment only” clause technically applies to consumer Copilot products, not the enterprise Microsoft 365 Copilot add-on. But the adoption and trust numbers don’t distinguish between them. Enterprise teams aren’t buying either way.

Why it matters: When the vendor won’t warranty its own outputs, enterprise teams need their own governance framework for AI outputs. The production-grade answer isn’t “trust the vendor.” It’s observability, human-in-the-loop checkpoints, and defined error tolerance. The 3.3% adoption rate signals that enterprise buyers are not satisfied with off-the-shelf AI tools, creating demand for production-grade implementations built for specific workflows.

ALSO WORTH KNOWING

Karpathy’s LLM Knowledge Base: Andrej Karpathy published a three-stage architecture where LLMs maintain structured markdown wikis instead of vector databases. Dump raw content, have an LLM compile it into a structured wiki with backlinks, then have the LLM scan for inconsistencies. At his tested scale of 100 articles and 400,000 words, the approach is faster and more maintainable than RAG. Every claim is in a file a human can read and audit.

Google Gemma 4 goes Apache 2.0: Google released Gemma 4 under Apache 2.0 for the first time, removing the legal barrier that blocked enterprise adoption. A 4-model family spanning edge devices to workstations with native multimodality and function calling. The same day, Arcee shipped Trinity-Large-Thinking, a 399B-parameter open model under the same license.

Microsoft Agent Framework 1.0: Microsoft shipped its production-ready SDK unifying Semantic Kernel and AutoGen. Graph-based multi-agent orchestration with connectors for OpenAI, Claude, Gemini, Bedrock, and Ollama. A2A and MCP protocol support. The orchestration layer just commoditized.

Futurum’s Agent Control Plane Framework: The first analyst-grade reference architecture for agent governance. Five layers separating intelligence from authority. Compliance mapping for SOC 2, HIPAA, PCI, and EU AI Act. Layer 0 is the execution environment. Without it, all governance above is advisory.

Exponential View: the labs are rationing. OpenAI’s CFO confirmed they’re passing on compute opportunities. Codex went from 100,000 to 2 million developers in three months. Anthropic tightened limits, with 7% of users hitting session caps. H100 rental prices hit an 18-month high.

THE PATTERN

The accountability gap is structural. It runs through every layer of enterprise AI this week.

  • Vendor accountability: Microsoft disclaims responsibility for Copilot outputs in its own ToS
  • Organizational accountability: 91.4% of organizations haven’t crossed the production line (Deloitte, 120,000 leaders)
  • Economic accountability: Anthropic’s 50x cost cliff separates exploration from production at the billing layer
  • • Architectural accountability: Block rebuilds the org around an AI agent handling 90% of code (Block.xyz/Gradient Flow)
  • Governance accountability: Futurum publishes the first analyst reference architecture for agent governance (5 layers)
  • Infrastructure accountability: The labs are rationing compute. H100 rental prices at 18-month highs (Exponential View #568)

  • The technology ships in weeks. The organizational change takes quarters. The governance frameworks are measured in years. That gap is where production failures live.

    I’ve been building production systems for 25 years. The pattern is always the same: the tool arrives before the discipline to use it safely. The firms that close the accountability gap first don’t just survive the transition. They define the standard everyone else follows.

Sources: AlphaSignal, The Batch (DeepLearning.AI), Exponential View, Tomasz Tunguz (Theory Ventures), The Rundown AI, Prohuman AI, TechCrunch, VentureBeat, TheNextWeb, Microsoft Terms of Use, Deloitte State of AI 2026, Block.xyz, Gradient Flow, Futurum Group, Google DeepMind, Perplexity Research

I write about Production AI, enterprise AI adoption, and building systems that actually work. Follow along if that’s your thing.

Latest Articles

Discover materials from our experts, covering extensive topics including next-gen technologies, data analytics, automation processes, and more.