If the Harness Is the Product

The Incident

On March 31, 2026, Anthropic accidentally included a debugging file in a routine software update for Claude Code.¹ A configuration oversight exposed the complete source code: 512,000 lines across 1,906 files, fully readable. Within hours the code was copied, shared thousands of times, and dissected by the developer community worldwide.

Most of the analysis focused on features: codenames, a pet system, unreleased capabilities, internal tools. But a few analysts asked a different question. What users assumed was a thin interface that passes prompts to an AI model turned out to be a 512,000-line orchestration platform making hundreds of invisible decisions per session. As Han HELOIR Yan put it: the harness,² not the model, is the product.³

That claim raises two questions: First, is it true? Is the orchestration layer genuinely where the value lives, or is this just infrastructure that any competent team could replicate? Second, and this is the question this note is about: if the harness is the product, what kind of product is it? What does it mean when a Fortune 10 company ships an opaque operating system between the developer and the model, treats its configuration as a trade secret, and calls it a coding assistant?

The Harness Is the Product

Among the analyses that followed the leak, Yan’s stood out for asking the structural question rather than cataloguing features: if models are converging and available to anyone at commodity pricing, why does Claude Code alone generate $2.5 billion in annualized revenue? Yan’s answer is that the harness, not the model, is the product. Developers are paying for orchestration, not intelligence.

I think Yan is right, and not just from the leaked code. I use Claude Code daily and not as a coding assistant but as a knowledge work environment, research tool, and operational platform. I have used Cursor, Windsurf, and the API directly. The model is the same or similar across all of them. The experience is not. What makes Claude Code different is everything the harness does before the model sees your prompt and after it returns a response. The gap between using the API raw and using it through a well-built harness is the difference between having an engine and having a car.

The leaked source shows what that orchestration looks like at production scale. The resilience layer alone has five fallback stages for error recovery and five distinct strategies for compressing conversation history, each tuned to a different failure mode, because no single approach works well enough. Memory and context management runs deeper: twenty-eight layered instruction files are assembled dynamically based on user configuration and session state, a background system the code calls “dreaming” organizes project knowledge while the user is away, and conversation history is silently compressed or discarded to stay within the model’s working limits. On top of that sits a full orchestration layer: over sixty tools behind permission gates (eighteen invisible until the model searches for them), seven permission modes including a machine learning classifier that reads the conversation and decides what to allow, the ability to spawn parallel sub-agents, and a remote control system that lets the web interface operate the local application.

This is not a wrapper around a model. Yan quotes a commenter who put it in terms the industry understood: the model is the dealer; the harness is the casino. Casinos are hard to build. The term “harness engineering” is already following “prompt engineering” and “context engineering” into the hype cycle, which is itself evidence of where practitioners see the value shifting.

If the harness is the product

In one sentence: configuration should not be a Trade Secret for enterprise software.

In enterprise software, configuration is what the customer is entitled to know because it defines the behavior of the product they are paying for. Cloud access policies, infrastructure definitions, deployment manifests: all visible and auditable. The harness treats its configuration as a trade secret. That is the norm violation, and everything else follows from it.

The leaked code revealed what the harness adds between your prompt and the model: hidden instructions assembled from 15+ sections that shape every interaction.⁴ Decoy definitions injected into requests to prevent competitors from copying the model’s behavior, billed to the user. Frustration detection using pattern matching that adjusts behavior based on emotional state. Conversation history selectively compressed or discarded based on rules the user cannot see. Available tools filtered by vendor-controlled switches. A stealth mode that strips attribution when contributing to external projects. Invisible meta-messages injected to steer the model. The point is not whether any of these are justified. Each may have a reasonable rationale. The point is that none are disclosed.

The harness functions as an operating system: it manages resources (the AI’s working memory and usage budget), handles input and output (tool execution, file operations), enforces permissions (seven modes, with multiple approval systems racing to respond), manages processes (sub-agents, background tasks), provides a command interface, and schedules work (recurring tasks, sleep/wake cycles, push notifications). An operating system is expected to serve the user.⁵ Windows trained a generation to accept that the OS serves the OS maker: bundled Internet Explorer to kill Netscape, preinstalled Bing, telemetry everywhere, ads in the Start menu. The dangerous part was not that it happened but that it became normal.

Two-layer opacity stack

The asymmetry is structural. The model is measurable through benchmarks, evaluations, and public scrutiny. The harness is not: no benchmarks exist for how well it manages conversation history, how efficiently it compresses context, or how much overhead its hidden instructions add. Usage costs are controlled by the harness, not the user; a harness optimized for user efficiency and one optimized for revenue look identical from the outside. A good harness can mask a declining model; a bad harness can make a great model look mediocre. And once workflows embed around a specific harness through customizations, automation rules, configuration files, plugins, and integrations, switching costs make it hard to escape.

The risk I am pointing at is not overt manipulation for Enterprise customers which usually have security teams, traffic inspection, contractual audit rights. The subtler risk is the slow drift of defaults toward the business model, where every individual default is defensible as a “product improvement” or “trade secret”.

The Pattern Spreads

The leak is not just Anthropic’s problem. It is now a reference architecture. Every AI tool builder (OpenAI, Google, Cursor, Windsurf, and the open source projects already rewriting the code in other languages) now has a detailed blueprint for one of the most sophisticated harnesses in production. Within days of the leak, developers had reverse-engineered the core patterns and begun replicating them.

The good patterns propagate alongside the questionable ones. Automatic error recovery, context compression, tool orchestration, and on-demand capability loading are genuine engineering advances that will make every harness better. But injecting decoy content to block competitors now has a production reference implementation. Frustration detection and undisclosed behavioral adjustment has a template. Stealth mode has a working example. These will not spread as scandals. They will spread as “industry standard practices,” normalized by the fact that the leading company in the space shipped them.

This is my main complaint: every hidden instruction, every decoy definition, every meta-message the harness injects into a request consumes tokens the customer pays for. At individual scale it’s noise. At enterprise scale (hundreds or thousands of seats, heavy daily usage) the overhead becomes a line item nobody budgeted for because nobody knew it was there. The customer is paying for tokens that serve the vendor’s interests (anti-competitive decoys, behavioral steering, attribution stripping) alongside the tokens that serve the customer’s work. There’s no way to distinguish the two on the invoice.

The convergence risk is the real concern. If every harness converges on the same architectural patterns, including the provider-serving ones, users face the same invisible mediation regardless of which tool they choose. The competitive market that is supposed to discipline bad behavior instead standardizes it. You cannot switch away from opaque configuration if every alternative is equally opaque.

Where This Leads

Models are becoming infrastructure, priced toward zero margin the way cloud compute did before them. The harness is becoming the product layer: where brand, UX, lock-in, and value extraction live. The game is now about who controls the layer between the developer and the model, which is the browser wars, mobile OS wars, and cloud platform wars playing out again in a new medium. Open source harnesses will matter more, not as philosophy but as a practical check against invisible quality degradation. The regulatory gap between “enterprise software norms exist” and “enterprise software norms are enforced for AI tools” will not last.

Anthropic is at the center of this transition: valued at $380 billion (Feb 2026), with $14 billion in annualized revenue and a $30 billion funding round. That is Fortune 10 territory by valuation. The leaked codebase reflects this: brilliant orchestration engineering alongside undisclosed behavioral profiling, anti-competitive content injection billed to the user, and a gamification system complete with collectible virtual pets, rarity tiers, and role-playing game stats, all shipped to enterprise customers in the same package. Collectibles, engagement loops, weighted loot drops: these are consumer product techniques. They do not belong in enterprise tooling. No enterprise procurement team evaluates a tool and expects to find a companion pet system with collectible mechanics built in. The codebase reads as engineering-led product decisions without enterprise product management discipline: every feature answers “what would be cool to build” rather than “what should we ship to a customer paying us millions.”

AI-native companies like Anthropic and OpenAI followed the consumerization path into enterprise: individual developers adopted the product, teams followed, procurement formalized the relationship. The iPhone, Slack, and GitHub Copilot took the same route. But consumerization only covers the adoption path, not the accountability expectations. Once the enterprise contract is signed, the product lives under enterprise rules: data governance, configuration disclosure, auditability, compliance. The iPhone got managed by mobile device management. Slack got federal security certification. The internal culture of AI startups, however, has not made that transition. The product thinking remains consumer-grade (engagement, delight, cleverness) even as 80% of revenue comes from enterprise.

This is where incumbents like Microsoft and Google have a structural advantage: they already operate within enterprise norms. Their AI harnesses (Copilot, Gemini Code Assist) inherit decades of enterprise product discipline, compliance infrastructure, and procurement muscle memory. They don’t need to learn what a CISO expects; they already know. The leaked Claude Code codebase is what happens when a research lab’s product culture meets enterprise scale without that transition. The collectible pet system is the visible symptom. If the product culture considered gamification mechanics acceptable in enterprise tooling, what else passed the same filter? That is the question that starts the audit.

The leak itself changes the dynamic. Developers now know what to look for: invisible usage overhead, opaque context decisions, undisclosed behavioral modifications. That awareness does not go away, and it makes future degradation harder to execute undetected.

Conclusion

The leak accelerated a conversation about harnesses that was coming regardless. That part is good. The risk is what gets normalized alongside it: 512,000 lines of production code now serve as the reference implementation, and teams replicating the architecture will copy the opaque patterns alongside the brilliant ones. If every harness converges on the same undisclosed defaults, the market cannot correct what it cannot see. If a product culture considered collectible pets acceptable in enterprise tooling, what else passed the same filter? The leak answered the transparency question empirically. What the industry does with that answer is the only question left.

References

Claude Code’s source code appears to have leaked: here’s what we know, leak incident reporting (VentureBeat, 2026-03-31)
Everyone Analyzed Claude Code’s Features. Nobody Analyzed Its Architecture, architectural analysis arguing the harness is the product (Han HELOIR Yan, Medium, 2026-03-31)
Your AI Coding Assistant Has a Pet, the Buddy companion pet system with gacha mechanics (Marco Kotrotsos, Medium, 2026-03-31)
Inside Claude Code’s Prompt Architecture, analysis of 28 prompt files and layered system prompt architecture (Marco Kotrotsos, Medium, 2026-04-01)
The Claude Code Source Code Leak: Things I Learned, comprehensive walkthrough of the leaked codebase (Marco Kotrotsos, Medium, 2026-04-01)
Anthropic closes $30 billion funding round at $380 billion valuation, $14B ARR, $2.5B Claude Code ARR, 80% enterprise (CNBC, 2026-02-12)
Agent SDK overview, Claude API documentation (Anthropic)
My AI Adoption Journey, where the term “harness” was named for this context (Mitchell Hashimoto, 2026-02-05)
Harness Engineering, harness engineering as a discipline (martinfowler.com)

See [1] for VentureBeat’s reporting on the leak incident. ↩︎
The term “harness” was named for this context by Mitchell Hashimoto in February 2026 [8], and has since entered wider use [9]. The equestrian analogy is tempting – model as horse, harness as reins – but misleading. A horse is useful without a harness. A better analogy is a car engine: powerful, but useless without the chassis, transmission, steering, and instruments that make it drivable. An AI model accessed through its raw interface is similarly impractical for real work, which is precisely why the harness layer is where the product value lives. A harness does not need to be opaque, just provide the control surface. Anthropic’s is unfortunately opaque. ↩︎
See Yan’s analysis [2]. ↩︎
The hidden system prompt is not just a configuration issue; it is a principal inversion. The user assumes they are directing the agent, but the system prompt ensures the agent serves the maker first. See [4] for the prompt architecture details. ↩︎
The exit option exists in theory: connect to the AI model directly and build your own harness. That eliminates the second layer of principal inversion (the harness serving the maker). The first layer (the model serving the maker via training) remains, but is at least a known quantity. Note that Anthropic’s Agent SDK is not an exit; it is built on the same foundation as Claude Code and shares the same orchestration engine, tools, and permission system [7]. To truly escape the harness you would need the raw model interface, not the SDK. In practice, replicating what Claude Code does is a 512,000-line engineering problem. The leak proved that. The exit option is real for sophisticated teams; it is not real for most users, which is precisely why the transparency norms matter. ↩︎