How to Ship a Batch of Integrations in Parallel with Claude Code

We needed to ship a batch of integrations — different protocols, different dependency surfaces, different specs — without the overhead that usually comes with that kind of scope.

The conventional approach: hire contractors, run a 6-month project, accumulate tech debt. The approach we actually took: treat Claude Code like a junior engineer who needs extremely precise instructions, and use Cowork as the pre-flight system that keeps those instructions honest.

This is a walkthrough of the method. Not what we built — how we structured the work so an AI coding agent could execute reliably at a level we'd actually merge to main.

Why Most AI Coding Workflows Fail

The tools are good enough. Claude Code can write an HTTP adapter with batching and exponential backoff. It can generate protocol-compliant payloads. It can implement a cache with TTL expiry.

What it can't do is manage itself.

We'd seen three failure modes kill previous attempts:

Premature completion. Claude declares "done" with 60% of the tests written. The model is optimized to be helpful, and "I've completed the implementation" is a very helpful-sounding thing to say.

Assumption drift. The prompt assumes the codebase is structured one way. The codebase is structured another way. Claude improvises a workaround instead of stopping. The workaround compiles but doesn't integrate with anything.

Merge conflicts from parallel sessions. Two Claude Code sessions both decide to create a utility function in the same file. Or both modify a shared interface. Now you're resolving conflicts in AI-generated code you haven't read.

We designed the entire workflow around preventing these three things.

The Four-Stage Framework

Four-Stage Workflow

The method works for any batch of integrations, adapters, or modules that share a common interface but have independent implementations.

Stage 0: Cowork Pre-Flight (read-only)

Before any Claude Code session touches the codebase, run Cowork pointed at the repo folder with a scoping prompt. Cowork has filesystem access. The prompt says: do not modify anything.

The scoping prompt validates every assumption your implementation prompts make against the actual code. Is the package named what you think? Does the function you're extending accept the arguments you expect? Is there already an abstraction you'd be duplicating?

For each assumption, Cowork reports MATCH or MISMATCH with the exact code reference. For every mismatch, it produces a text substitution you apply to your Claude Code prompts before running them.

Twenty minutes of read-only analysis. Saves hours of Claude Code stalling on wrong assumptions.

Stage 1: Foundation (sequential, blocking)

One Claude Code session creates every shared abstraction that the parallel work depends on — base classes, interfaces, configuration wiring. This session runs with hard gates at every step. Nothing proceeds without pasted proof.

This merges to main. Everything after branches from the merged result. The foundation is non-negotiable as a sequential step. We tried skipping it once. Two parallel sessions both created their own version of a base class. Wasted an entire afternoon.

Stage 2: Parallel Tracks (two sessions, isolated branches)

Split the integrations into two groups where no file is touched by both groups. The grouping principle matters — the only rule is zero file overlap after Stage 1.

Each track runs in its own Claude Code terminal on its own git branch. Each component within a track executes one at a time with a gate between them. The sessions can run simultaneously because they never touch the same files.

Stage 3: Post-Merge Validation

After both tracks complete and merge, run the full test suite plus an import smoke test that exercises every new module. This catches any interface mismatches between the tracks.

The Anti-Hallucination System

Gate-and-Wait Execution Loop

The prompts matter more than the code. Here's what actually prevents the three failure modes:

Gate-and-wait pattern. Every component ends with "Stop. Awaiting gate verification." Claude must paste the test output and coverage report before the next component begins. Without this, Claude will bulldoze through five components and hand you a pile of untested code.

Named tests, not described tests. Instead of "write tests for error handling," the prompt says "write test_adapter_retry_on_5xx: mock 503 then 200, verify retry." When the test has a specific name and a specific assertion, Claude can't phone it in with assert True.

Forbidden phrases. The prompt explicitly bans the sentences Claude uses right before it cuts corners:

"This should work" — run it and prove it
"I'll add tests later" — add them now
"Coverage is approximately" — paste the exact number
"The remaining tests are similar" — write every test explicitly

These sound pedantic. They are the difference between code you can merge and code you throw away.

Coverage gates. 90% test coverage per new module, enforced by pytest --cov. Not a suggestion — a hard gate. If coverage is below threshold, Claude writes more tests before proceeding.

Reality anchor. Both track prompts include: "If the codebase does not match expectations: STOP. Describe what you found. Do not improvise." This prevents assumption drift. Claude stops and asks instead of building on a wrong foundation.

What We Learned

Cowork as pre-flight is the highest-leverage step. The scoping report also generates folder instructions that every future session inherits. This compounds over time.

The gate-and-wait pattern changes Claude's behavior. When Claude knows it has to paste proof, it actually runs the tests. When it knows the human will inspect coverage, it writes real tests. The constraint shapes the output more than any instruction about quality.

Named tests beat described tests by a wide margin. A specific test name with a specific scenario produces a meaningful test. A vague description produces something that passes but doesn't validate what matters.

Sequential foundation is non-negotiable. The shared abstractions must exist before parallel work begins. Skipping this step wastes more time than it saves.

Opus for architecture, Sonnet for mechanics. The foundation and first component of each track benefit from Opus-level reasoning. After the pattern is established, later components follow the same structure and can run on a faster model.

The Prompt Stack

For anyone building a similar workflow:

Cowork Pre-Flight — read-only codebase analysis, assumption validation, correction patches, folder instructions generation.
Session 1: Foundation — shared abstractions, interface wiring, dependency groups. Sequential with hard gates.
Sessions 2+3: Parallel Tracks — independent implementations on isolated branches. Each component has named tests, coverage gates, stop-and-wait points.
Post-Merge Validation — full suite, coverage report, import smoke test.

The Real Lesson

You don't need a better AI model. You need a better management layer around the model you have.

Hard gates. Named tests. Forbidden phrases. Read-only pre-flight. Isolated branches. Proof at every checkpoint.

The engineering is in the spec, not the code. That spec took longer to write than any individual integration will take to implement. And that's the point.

Want fewer escalations? See a live trace.

See Briefcase on your stack

Reduce escalations: Catch issues before they hit production with comprehensive observability

Auditability & replay: Complete trace capture for debugging and compliance