Scale AI-First SDLC: N Solo Modes Is Not Scale Mode

07/05/2026

By

Marco Vargas

Cloud Engineering Solution Architect, Gorilla Logic

This is Part 2 of a two-part series. Part 1 covered Solo Mode: one human operator, a team of AI agents, and a disciplined loop. This post starts where that one ended: what breaks when you scale past a single operator and move into a Scale AI-first SDLC.

Recently I was talking to an engineer from a team that had gone deep on AI. Not dabbling. Real adoption. Agentic workflows in the IDE, CI, feature flags, observability, the whole thing. By every visible measure, they were doing AI-first the right way.

And still, PRs were taking weeks to merge.

That detail stuck with me because it names the real problem. The bottleneck was no longer writing code. It was approvals, integration, and convergence. The team had made individual engineers faster. They had not built the coordination layer that had to come next.

That lines up with a point from this MIT Sloan article: task-level AI gains do not automatically become organizational gains. The value leaks out in the handoffs.

A lot of the current AI-first conversation misses the mark. I have spent a lot of time looking at agentic frameworks, skill packs, memory systems, orchestration patterns, and context layers. Most are getting better at the same thing: making one operator stronger.

I wrote about that in my last post: Solo Mode. One person, a disciplined loop, a handful of agents, one contract, one filesystem, serious output.

But the missing dimension is scalability. Not compute scale. People scale.

And N Solo Modes running side by side is not Scale Mode.

That is Team Mode first: several engineers each running a strong solo loop, with the integration burden landing in PR review, shared code, conflicting assumptions, and trust. Scale Mode starts later, when the coordination layer becomes explicit and somebody owns it.

Teams keep looking for the right agent framework, as if a stronger tool for one person will solve a process problem for many people. Usually it does not. It just helps the next bottleneck show up faster.

This post is about those next bottlenecks. First in Team Mode, where they surface. Then in Scale Mode, where they stop being team habits and become org design.

AI-first SDLC: Team Mode starts when you add people to a Solo loop

One operator becomes one team. Solo Mode becomes Team Mode. Four patterns surface in teams that make this transition. Each has the same shape: a rational individual move that aggregates into a team-level failure. Each hides inside apparent progress until the team hits the merge bottleneck.

Trap 1: Engineers prefer their AI-minions over their peers

Once an engineer has a working agent setup, the pull is strong. Work with the agent, not with the team. The agent never pushes back on your approach. It never asks you to justify a design choice. There is no ego, no taste conflict, no calendar friction. It is always available.

What feels like velocity is actually coordination avoidance. The engineer ships more code, but the code is less integrated with what everyone else is shipping. The bottleneck moves from “can I write this?” to “can we merge this?”, and nobody owns the merge problem because the merge problem isn’t glamorous.

This is the single most important failure mode in Team Mode, and it is almost never named directly.

The fix is cultural: AI-aware pair and mob practices, and explicit team rituals that reward integration over solo output. Without that shift, engineers rationally optimize for the fast lane, and the team loses coherence one PR at a time.

Trap 2: Solos reinvent the wheel

When N engineers work solo with their AI agents, the codebase grows N near-identical implementations of the same concept. Three money formatters. Two date utilities. Four slightly different retry wrappers. Each made sense in the session that produced it.

The agents do not prevent this. A fresh session knows nothing about the codebase beyond what the engineer explicitly surfaces. The path of least resistance is to write the thing, not to search for it. AI is often faster at writing a new helper than at finding the one the team already has. Modern agents with codebase indexing narrow the gap, but rarely close it in the flow of a single session.

The result is duplication at scale, hidden by per-engineer velocity. Everyone ships fast. The codebase accumulates parallel implementations with slightly different behaviors. The next time someone needs to change how money is formatted, they find three files, decide which one is canonical, and migrate the rest.

The fix is upstream: shared ADRs, canonical utilities, and a signal-based context layer, where agents query the exact decision, helper, or contract they need instead of loading broad context and hoping it sticks. The answer is not “bigger context”. It is better context: structured handoffs, bounded retrieval, and canonical pointers. Without that, AI amplifies duplication at the same rate it amplifies velocity.

Trap 3: The bottleneck moves to integration, and nobody owns it

When individual velocity jumps, the rest of the system does not keep pace. Reviewers cannot read PRs as fast as engineers and agents produce them. Parallel sessions touch the same files. In a single week, a few developers can generate enough change to outrun the team’s ability to review, merge, and integrate.

At Solo scale, the operator was the integration layer. Add more humans without making that role explicit, and integration becomes nobody’s job. Merge conflicts compound. The same concept gets implemented three different ways across services. PRs age while the context behind them evaporates.

That is why teams say, “PRs take weeks to merge.” The real problem is not the PR. It is that PR review has become the place where the team hopes coordination will somehow happen.

In regulated environments, the bottleneck gets worse. AI can help write the change packet faster. It does not shrink the approval queue. If engineering output triples, the queue gets longer, not shorter.

The answer is not better PR review. It is a better integration model around it: explicit integration ownership, small reviewable units, short-lived branches, and hard gates upstream. Stacked diffs keep review load readable. Ownership rules make it clear whose review is required. Merge queues catch integration problems at the gate instead of after the fact.

PR review can validate a unit of change. It cannot carry the full burden of team coordination.

Trap 4: Artifacts lose trust because “that’s AI”

A PR lands. A developer wrote it with an agent. A critic agent reviewed it. The end-to-end tests are green. Every check passes.

And still, a human reviewer cannot bring themselves to merge it. The hold-up is not a bug, it is unfamiliarity. They need time to absorb what they are integrating. “It’s AI,” they think. “I need to read this myself.”

Meanwhile, that same reviewer uses AI to write their own code, summarize tickets, and polish docs. Their AI use is silent and trusted. Their teammate’s AI use is visible and suspect. “AI-generated” becomes shorthand for “not real work yet,” even when AI also helped review the artifact.

That asymmetry is what makes this trap specific to AI-First SDLC at team scale. Teams have always been uneasy with code they did not personally type, generated files, protobuf output, ORM migrations. AI does not create that pattern. It makes it constant.

AI-first SDLC: Scale Mode starts when teams stop containing the drift

One team becomes many teams. That is the moment Team Mode stops being enough. It is not yet Scale Mode.

In Team Mode, the four traps show up for the first time. One team can still absorb them through tight communication, rituals, and a few senior engineers holding the seams together. In Scale Mode, that stops working. The same traps compound across teams. Each team grows its own version, and the coordination habits that worked inside one team break at the boundary between teams.

The forcing function is concrete. Team A’s agent refactors a retry wrapper on Tuesday. Team B’s agent ships a near-identical one on Wednesday. Team C discovers the collision in production on Friday because both utilities handled the same edge case differently. That is the point where tight communication stops containing the drift. There is no single conversation that can hold everyone’s context anymore.

In Solo Mode, the operator held coordination together. In Team Mode, the team itself still could. In Scale Mode, coordination has to become explicit.

This is where I see orgs miss the turn. They reach for one of two extremes. They go high: a reorg, an AI innovation group, a parallel structure that ends up detached from delivery. Or they go low: chase the best agentic framework and hope Team Mode naturally turns into Scale Mode. Neither solves the actual problem.

The fix sits between those two moves: a shared backbone above any single team and below the org chart. Not a big theory. Just a few explicit decisions: who owns the backbone, what belongs in it, and how integration works across teams.

Intentional Architecture vs Emergent Design

AI makes scaffolding cheap. An MCP server to solve the problem someone hit yesterday. A small in-house framework because the team could stand it up in an afternoon. A new CI pipeline because the old one did not fit the agent loop. A hook for commit conventions. A utility library born in the middle of a refactor.

None of that is bad on its own. The problem starts when several teams do it at once.

Without architectural intent, people build these things in parallel and others adopt them locally. A few weeks later, the org has five agent wrappers, three overlapping MCP surfaces, two eval harnesses that report different numbers, and a CI path only one team understands. Same class of problem, many local answers, all slowly becoming infrastructure.

This is an old architecture tension with a new speed setting: Intentional Architecture vs Emergent Design. Emergent design is what teams produce while shipping. Intentional architecture is what keeps those local choices from fragmenting the whole system. You need both. When emergence runs faster than architectural intent, you get the hundred-ways problem.

That used to be easier to contain. AI changes the rate. Teams can now generate scaffolding, wrappers, hooks, and internal tooling faster than most orgs can decide what should become shared, what should stay local, and what should never exist twice.

That is why architecture matters more here, not less. At Scale, intentional architecture cannot be cleanup work after the fact. It has to be a named responsibility inside the SDLC. Because once teams multiply, coordination functions stop happening by accident. Cross-team integration, semantic consistency, shared context infrastructure, end-to-end validation, and AI spend ownership all default to nobody unless somebody owns them on purpose.

Shared-backbone drift is silent until two teams collide

The next deliberate choice is simple to name and easy to avoid: what stays shared across teams, and what each team is allowed to own locally.

Some things start drifting the moment you duplicate them. A merge queue run per team creates integration conflicts nobody can diagnose cleanly. An agent-memory layer run per team creates parallel architectural universes after a month. An eval harness run per team gives you quality numbers that cannot be compared.

Those are backbone choices. Merge and integration rules, the ADR system, shared context infrastructure, agent memory, evaluation harnesses, handoff contracts, and toolchain versioning all get more expensive and less trustworthy when every team forks its own version.

Other things should stay local. Feature delivery, stream-specific architecture choices, team review rituals, mentorship, standups. Those are supposed to reflect the work and the people closest to it.

There is no universal line for every org. Some teams keep ADRs local until the first cross-service collision and then centralize. Some standardize evals on day one. The important thing is not where you draw the line. It is that you draw it on purpose, item by item, before drift turns into collision.

The shared context layer is the quietest lever

Of all the shared-backbone choices, the most under-invested is the shared context layer: the mechanism that lets every engineer’s AI session see the same slice of institutional knowledge, internal docs, feature flags, observability, deploy history, and the existing codebase.

MCP is the best-known current example, but the principle is broader than any one protocol or platform. Whether the stack uses MCP, LangGraph, Bedrock, or something in-house, the leverage comes from the same place: a shared surface that serves institutional context to agents.

When every engineer runs a one-off context setup, Trap 2 becomes structural. When a team shares a curated context surface, the agent that writes code on Monday sees the same world as the agent that reviews it on Tuesday.

Low glamour, high return. Build this early.

No single integration model fits every org

There is no default integration model for AI-First SDLC at Scale. The right one depends on how the domain is partitioned. Some orgs converge through a merge queue on a shared trunk. Some integrate through service contracts. Some rely on a shared platform as the main integration surface. AI velocity exposes whichever integration layer is least funded, regardless of which model the org calls primary.

Most real orgs are a mix. The mistake is not mixing. The mistake is drifting between models without naming one as primary, so none of them has enough teeth to hold the system together. The model you name primary is where the named owner, the gates, and the evidence chain live.

Spend becomes an architecture decision at AI-First SDLC Scale

AI raises engineering velocity. It also raises the bill. The spend can get real fast, even before an org feels “big.” A public example is Affirm’s engineering write-up: they budgeted nearly $200,000 for one week of token usage across more than 800 engineers, then still had to watch usage daily and investigate outliers as the rollout unfolded. That is the point: AI spend becomes a design problem early, not just a finance problem later.

At small scale, that looks like usage. At Scale, it is architecture. The biggest cost drivers are structural:

Which model each role uses
Whether context and memory infrastructure are shared or duplicated
How often evals and CI fire
Whether agents retry expensive calls without a budget ceiling.

The fix is intentional spend, not expense policing. A few decisions do most of the work:

Use the expensive models where they matter, route plumbing work like orchestration, summarization, and validation to cheaper or local models
Share context and memory infrastructure instead of rebuilding it per team
Cache reusable context so the same architecture docs are not re-billed every session
Put retry caps and output ceilings on agent loops
Budget evals and CI at the pipeline level
Make token use visible to the people whose decisions actually move the number.

Without a named owner, AI spend behaves like AWS spend did a decade ago. Every line item looks reasonable on its own. The total looks insane. And nobody has the mandate to kill the obvious waste. This coordination function has the same shape as merge or integration: someone owns it, or it owns everyone.

Product management becomes the real bottleneck

Fix the four traps and you still have a problem. Engineering can now ship faster than product can decide. When individual velocity jumps, the constraint moves to product decisions. The old cadence of “two-week sprint, client sees it, client has two weeks to decide” collapses when engineering can ship twice a day.

The fix is disciplined product management:

Fewer concurrent bets
Clearer kill criteria
Explicit “decide by” windows
A product function that holds the line against engineering’s appetite for more work in flight.

Concretely, this means calendaring a “decide by” date on every product question at the moment it is raised, so options stay open until the window closes and after that the default option ships; indefinite deliberation stops blocking engineering. It means explicit WIP limits on product bets, with ship and kill criteria defined before work starts, because AI produces features faster than product can steward them. And it means writing down what “this isn’t working” looks like before building, not after, because AI reduces the cost of trying things and that is only an advantage if trying cheaply includes being honest about what failed.

The decision queue has to be managed like any other queue: named owner, visible backlog, clear deadlines, and an explicit default when the window closes.

Without product discipline, AI-First SDLC at Scale becomes a well-tooled factory producing the wrong output.

Conclusion

AI-First SDLC at scale is not a tooling upgrade. It is a coordination design. The question stops being “which framework do I adopt?” and becomes “which coordination functions in my org still do not have owners?”

Seven questions make the audit concrete:

Who owns cross-team merge and integration, by name?
Where do cross-team architecture decisions happen, and who convenes them?
Who owns the shared context layer, and how is it curated?
Is there a Signal Contract for agent and engineer handoffs, or is the medium still prose?
Does every artifact carry a visible validation chain, or does trust still depend on who produced it?
Does product run a visible decision queue with owners, deadlines, and defaults, or do engineering bets sit in deliberation until velocity outruns the roadmap?
Who owns AI spend, and is it budgeted across shared coordination functions or left to accumulate team by team?

The first five are inspectable in an afternoon. The last two take one honest conversation with product and finance.

If you cannot answer three of these in your org, you have found the coordination functions that still default to nobody. That is the work. Start there.

Scale AI-First SDLC: N Solo Modes Is Not Scale Mode

By

Marco Vargas

Cloud Engineering Solution Architect, Gorilla Logic

AI-first SDLC: Team Mode starts when you add people to a Solo loop

Trap 1: Engineers prefer their AI-minions over their peers

Trap 2: Solos reinvent the wheel

Trap 3: The bottleneck moves to integration, and nobody owns it

Trap 4: Artifacts lose trust because “that’s AI”

AI-first SDLC: Scale Mode starts when teams stop containing the drift

Intentional Architecture vs Emergent Design

Shared-backbone drift is silent until two teams collide

The shared context layer is the quietest lever

No single integration model fits every org

Spend becomes an architecture decision at AI-First SDLC Scale

Product management becomes the real bottleneck

Conclusion

Related Content

The Three Layers of AI-Driven Engineering Productivity

Operationalizing AI in Energy Infrastructure

AI-Enabled Engineering Starts With Your Delivery System, Not Your Toolstack

Engineering Services

Resources

About