Every interaction with a hosted LLM passes through the same structural layers: a use case (interactive or programmatic), a harness that mediates between the user and the model, and the LLM API itself. The harness is the layer that assembles prompts, dispatches tool calls, manages context, and enforces whatever behavioral constraints exist. Claude Desktop, Claude Code CLI, the Anthropic SDK, and raw API access are all instances of harnesses, each making different tradeoffs about how much the harness manages versus what the caller provides.
The initial temptation is to treat "configured by" and "relies on" as separate concerns, one about intent and the other about infrastructure. But this distinction collapses under inspection.
A system prompt is a file. MCP server definitions are JSON on disk. Permission boundaries are OS-level access controls. The harness consumes all of these as resources from whatever sits beneath it. The distinction between "configured by" and "relies on" is an artifact of thinking about intent rather than mechanism. At the mechanical level, everything the harness needs is a resource served by a provider.
This observation has a structural consequence: since the harness only needs its resource contract satisfied, the provider is substitutable. A full operating system, a gVisor sandbox, a container with mounted config volumes, or a minimal runtime that serves exactly the right files and network routes can all fill the role. The harness neither knows nor cares which one it got.
This substitutability is both the flexibility feature and the attack surface. A malicious provider can present a resource surface that looks correct while intercepting or modifying every resource the harness consumes. It is the same pattern as virtualization: the guest cannot determine if the hypervisor is honest, and no amount of guest-level inspection resolves that without hardware-rooted attestation.
Reverse engineering of the Claude Desktop application reveals that Anthropic itself builds synthetic resource providers.2 The Desktop sandbox uses gVisor with 9P filesystem passthrough, constructing a virtual filesystem stitched together from host paths, mounted volumes, and sandboxed scratch space. It is not an OS in any conventional sense. It is a composite resource surface that satisfies the harness's contract just enough for it to function. The harness cannot tell whether the files it reads came from a real ext4 partition or a 9P passthrough from a container runtime, because the 9P protocol presents them identically.
Anthropic clearly understands that the provider layer is abstract. They built one.
The Anthropic legal and compliance terms for Claude Code impose specific restrictions on credential use. OAuth authentication for consumer plans (Free, Pro, Max) is restricted exclusively to Claude Code and Claude.ai. Using those OAuth tokens in any other product, tool, or service is prohibited. Third-party developers may not offer Claude.ai login or route requests through consumer plan credentials. Anthropic reserves the right to enforce these restrictions without prior notice.1
This is Anthropic asserting harness identity through legal restriction rather than technical attestation.3 The credential is supposed to mean "I am Claude Code," and using it from anything else violates the terms.
Three things could present the same OAuth token: the real Claude Code binary on a developer's machine, a wrapper that extracted the token and replays API calls with modified system prompts, or a hostile environment that satisfies the resource contract while intercepting every tool call. The API endpoint sees identical requests from all three. The missing primitive is something equivalent to TPM-rooted attestation, where the harness can produce a measurement of its own runtime that the API can verify before accepting the credential.
Given this analysis, consider a different design objective: keep the harness fixed. Do not replace it, do not fork it, do not wrap it. Instead, compose the world the harness sees. Claude Code reads ~/.claude/, resolves .mcp.json, walks the filesystem, shells out to whatever is on PATH, and reaches the network for API calls. Every one of those is a provider-level seam. You do not touch the binary; you shape what it lands on.
This is a clean inversion. Instead of configuring the harness (which has limited knobs), you configure the provider. The mechanism can be as lightweight as shell scripts that set up the environment before launching claude, or as structured as container definitions, nix shells, or direnv profiles that swap the resource surface per directory.
The Anthropic credentialing restrictions target a specific abuse pattern: extracting OAuth tokens from Claude Code and replaying them through a different product.4 The concern is someone building a competing client that freeloads on consumer plan rate limits.
Composing the resource surface underneath a legitimate harness invocation is not that pattern. Every developer who runs Claude Code in a Docker container, on a remote VM, in a Nix shell, or in a different project directory with a different .mcp.json is already doing a version of this. Project-scoped config, per-directory CLAUDE.md files, and different MCP servers per workspace are designed-in features of the harness, not exploits.
The line, as stated in the terms, is between using the harness and impersonating or bypassing it. Shaping the resource surface underneath a legitimate harness invocation falls on the "using it" side.
Claude Code resolves credentials through a priority chain: cloud provider environment variables first, then ANTHROPIC_AUTH_TOKEN, then ANTHROPIC_API_KEY, then an apiKeyHelper script, and finally subscription OAuth from /login. On macOS, credentials are stored in the encrypted Keychain. On Linux and Windows, they land in ~/.claude/.credentials.json.
The credential is the one resource that cannot be synthesized, substituted, or meaningfully composed. It is either an OAuth token bound to a subscription identity or an API key bound to a Console organization. Both are opaque bearer tokens that Anthropic's backend validates. You can move them (mount .credentials.json into a container, redirect via $CLAUDE_CONFIG_DIR, use apiKeyHelper to fetch from a vault), but you cannot fabricate them. The credential is the provider layer's one invariant, the root of trust that Anthropic holds the other end of.
Everything else in the provider surface is a projection you control.
With this model fully articulated, the design pattern becomes clear. The agent is not the harness. The agent is the recipe. The harness is the execution engine. What makes Agent X different from Agent Y is not different code or a different framework. It is a different filesystem image that the same binary lands on.
A recipe is a manifest that specifies: which CLAUDE.md (behavioral instructions), which .mcp.json (tool surface), which project files are visible (filesystem scope), which binaries are on PATH (capability surface), and which environment variables are set (runtime context). You assemble that into a user context and launch claude into it. The harness boots, reads its environment, and becomes that agent.
This solves several problems simultaneously. Agent specialization without writing agent code. Reproducibility because a recipe is a declarative manifest that can be version-controlled and diffed. Isolation because each composed provider is a separate filesystem context. Disposability because the provider is ephemeral: tear it down, rebuild from the recipe, and the harness does not know the difference.
The Claude Code configuration hierarchy was designed with layered composition in mind. The harness resolves config from global (~/.claude/) and project (.claude/ in the working directory) scopes, merging them with project settings taking precedence. Every composable surface maps to a file the harness already knows how to find:
| Recipe component | File / directory | Scope |
|---|---|---|
| Behavioral persona | CLAUDE.md | Project root |
| Tool permissions | .claude/settings.json | Project |
| MCP servers | .mcp.json | Project |
| Subagents | .claude/agents/*.md | Project or global |
| Skills | .claude/skills/*/SKILL.md | Project or global |
| Custom commands | .claude/commands/*.md | Project or global |
| Path-scoped rules | .claude/rules/*.md | Project |
| Event hooks | .claude/hooks/ | Project |
| Credential | ~/.claude/.credentials.json | Global (fixed) |
The filesystem assembly step is: create a directory, populate it with the recipe's files, optionally mount data directories into scope, and cd into it before launching claude. The harness picks up everything through its normal resolution path. Additionally, CLAUDE_CONFIG_DIR can redirect the global config root, decoupling the global layer from the UID's actual home directory without requiring separate Unix accounts.
The conventional framing presents two paths to agent development. The first is the SDK path: build a custom harness using the Anthropic API or Agent SDK, write your own prompt assembly, tool dispatch, context management, and orchestration logic. The second is the CLI path: use Claude Code as an interactive terminal tool. The SDK path offers full composability at the cost of building everything from scratch. The CLI path offers convenience at the cost of rigidity.
The composable provider model reveals a third path that collapses this distinction. The CLI is not merely a convenience tool. It is a full-featured harness with hooks, skills, subagents, memory, worktree isolation, tool governance, and background execution. Under API key authentication, the behavioral constraints of consumer plans ("ordinary, individual usage") do not apply. What remains is a usage-billed harness that accepts arbitrary filesystem composition beneath it.
This reframes the SDK as unnecessary for most agent specialization use cases. But the equivalence between CLI and SDK has a boundary that must be stated precisely, because the two compose at fundamentally different integration surfaces.
The CLI composes via filesystem and subprocess. You assemble a directory, launch claude -p "do the thing", and get structured output back via stdout. Chaining tasks means writing an orchestrator that parses output between process invocations. The boundary between your logic and the agent is a process boundary: stdin, stdout, exit codes, files on disk. This works, and it works well for independent agents doing independent jobs, each defined by their filesystem surface.
The SDK composes via function calls and objects in memory. You import a library into your own Python or TypeScript process. You get an async generator yielding typed messages. You can branch on structured output, pass results between agents as in-memory data, implement conditional retry logic, and hold state across steps without serializing to disk. The boundary between your logic and the agent is a function call, not a pipe.
The Agent SDK is, internally, the CLI's agent loop extracted as a library: the same tools, the same context management, the same dispatch engine.6 Reaching for the SDK to build an agent is, in most cases, stripping out the CLI's composition infrastructure (CLAUDE.md resolution, .mcp.json discovery, hooks, skills, the settings hierarchy) and then rebuilding a subset of it in application code. For recipe-level composition, that is wasted effort.
The honest decomposition: the CLI is the right choice when composition is at the recipe level (different agents doing different independent jobs, each defined by their filesystem surface). The SDK is the right choice when composition is at the orchestration level (agents whose outputs feed into other agents' inputs within a single programmatic workflow, with branching logic that cannot be expressed as sequential process invocations). The recipe assembler plus CLI covers the first case. The SDK covers the second. Most people reaching for the SDK are paying the complexity tax of building a harness because they have not recognized that the harness they need already ships as a binary.
The practical consequence for recipe-level composition: under API key billing, the CLI becomes a composable agent engine. Fifty concurrent claude -p instances, each assembled from a different recipe, each working on a different task, each billed per token. No behavioral constraints, no "ordinary usage" questions, no attestation concerns. The only limit is the bill. The recipe assembler (the equivalent of terraform apply for agent definitions) is the missing tool, and it is a small one: read a manifest, assemble a directory, set environment variables, launch claude.
The stack decomposes into four layers: use case, harness, resource provider, and LLM API. Configuration is not a peer to infrastructure; it is a subset of the resources the provider serves. The provider is substitutable, which creates both architectural flexibility and an unresolved attestation gap. Anthropic addresses this gap through legal restriction on credential use rather than through technical enforcement.
Within these constraints, the composable provider model offers a practical path to agent specialization: define agent behavior as a filesystem recipe, assemble it on demand beneath the fixed Claude Code binary, and rely on the harness's native configuration resolution to pick it up. The credential is the one invariant, the root of trust Anthropic holds. Everything else is infrastructure-as-code applied to agent definition.
The result is agent-as-code by means of infrastructure-as-code, without writing a custom harness and without forking or wrapping the CLI. Under consumer plan credentials the scaling constraint is human: the model works at the rate a person can direct it, which is where the "ordinary, individual usage" boundary sits. Under API key credentials that constraint dissolves entirely, and the CLI becomes a composable SDK billed per token, with no behavioral restrictions beyond cost. The recipe assembler, a declarative manifest that describes an agent's filesystem surface and an apply step that assembles it, is the single missing primitive. It is a small one.