How We Reduced AI Coding Agent Token Cost by 70% on Large Software Projects
AI coding agents did not make our engineering team faster at first. They made us faster at creating confusion.
The short version: the fix was not a better prompt. It was an operating model. We stopped treating coding agents as smart autocomplete and built structure around them — context maps, startup prompts in CLAUDE.md and AGENTS.md, impact analysis before coding, local test environments, and human design review. That cut our AI coding agent token cost by about 70% and made delivery faster on large, multi-repo projects. The rest of this article is how.
We use coding agents every day: mostly Claude, some Codex, some Antigravity, and some Cursor. Our projects are not small isolated repositories. They are large, interconnected systems with frontend repositories, backend repositories, shared databases, shared cloud resources, and multiple developers working on related parts of the same product.
In that environment, one missed design decision can affect screens, APIs, database state, background jobs, reports, infrastructure, and other developers' work.
The turning point was not a better prompt. It was an operating model.
We reduced token cost by about 70% and improved development speed only after we stopped treating coding agents as smart autocomplete and started treating them as junior execution agents that need context, constraints, test environments, design review, and shared project memory.
This is our internal experience, not a universal benchmark. The exact result depends on project size, team discipline, tooling, documentation quality, and how much repeated context your agents currently consume.
Table of Contents
- Why AI coding agents initially increased chaos instead of productivity
- Coding assistant vs coding agent: why the difference matters here
- What makes large software projects difficult for coding agents
- How we created project context before asking agents to code
- How CLAUDE.md, AGENTS.md, and startup prompts reduced token waste
- Why documentation became the agent's working memory
- How we changed day-to-day development with an AI-assisted protocol
- How do local test environments improve agent output?
- Why design review became more important, not less important
- When do parallel coding agents actually help?
- What changed after the operating model was in place
- The operating model we use now
- Common mistakes when using AI coding agents in large projects
- Checklist: before using coding agents on a large project
- FAQ and References
Why AI coding agents initially increased chaos instead of productivity
The first mistake was simple: we gave developers tools before we gave the team a coherent strategy.
Developers started using agents to write code, fix bugs, and modify features. But the agents did not understand the system. The developers often accepted generated solutions without a deep review. Some changes worked locally but broke related flows. Some code was pushed without proper testing. Some fixes solved the visible issue but created hidden impact elsewhere.
The result was predictable:
- Token costs increased.
- Testing time increased.
- Test quality dropped.
- Review burden increased.
- Developers trusted generated code too quickly.
- Cross-repo impact was missed.
- The team moved faster in isolated tasks but slower as a whole.
A coding agent can make one developer appear faster while making the engineering system slower. In large projects, local speed is not the same as team productivity.
Coding assistant vs coding agent: why the difference matters here
These two terms get used interchangeably, but the distinction is the whole point of this article.
| Coding assistant | Coding agent | |
|---|---|---|
| Scope | Completes the line or block you are typing | Acts across many files, runs commands, and iterates |
| Initiative | Suggests; you decide | Plans, executes, and can change state on its own |
| Failure mode | A bad suggestion you can ignore | A confident multi-file change with hidden blast radius |
| What it needs | A good local context window | An operating model: context maps, constraints, tests, review |
What makes large software projects difficult for coding agents?
Large projects are difficult for coding agents because the real context is not inside one file.
A feature may involve a frontend screen, reusable UI components, API calls, request payloads, response contracts, backend route handlers, business logic functions, database tables, background jobs, queues, cloud storage, permissions, reports, test data, deployment assumptions, and other teams' unfinished work.
The agent may see one part of the codebase and produce a plausible answer. But plausible is not enough.
In a large interconnected project, the important question is not:
"Can the agent write code?"
The important question is:
"Does the agent understand the blast radius of the change?"
Blast radius means the full set of systems, data, workflows, and people affected by a change. The higher the blast radius, the more dangerous blind AI-assisted coding becomes.
How we created project context before asking agents to code
We stopped asking agents to code first.
Instead, we asked them to understand.
This changed everything.
Pass 0: Give the agent the customer goal and existing documentation
We started with whatever project documentation we already had: end-to-end docs, feature notes, architecture notes, and rough internal explanations. Then we explained the project goal from the customer's perspective in detail.
This mattered because agents tend to optimize for the immediate instruction. If they do not understand what the customer is trying to achieve, they may produce technically correct but functionally wrong code.
The first layer of context was therefore not code. It was:
- Who is the user?
- What is the project trying to achieve?
- What are the major workflows?
- What does success look like for the customer?
- What parts of the system are business-critical?
- What should not break?
Pass 1: Map frontend screens, endpoints, payloads, and responses
We downloaded the relevant repositories for the project: frontend, backend, and other related repos. Then we asked the agent to start with the frontend.
The task was not to refactor or improve anything. The task was to map the system.
For every screen, the agent had to identify the screen name, route, components used, visible user purpose, API endpoints called, request payloads, response payloads, state changes, validations, error states, and likely business purpose.
This created the first useful artifact: a frontend-to-API map.
For the first time, the agent had a structured understanding of what the user sees and which backend calls support that experience.
Pass 2: Link backend endpoints to scripts, functions, tables, and cloud resources
Once the frontend map existed, we extended the documentation from the backend side.
For every endpoint, the agent had to trace the route definition, controller or handler, service functions, reusable utilities, database tables used, read/write behavior, cloud resources touched, files or storage used, background jobs triggered, and functionality delivered.
This created the second useful artifact: a frontend-to-backend-to-database-to-cloud map.
This was the first real context breakthrough. Earlier, agents were solving one file at a time. Now they could see how one frontend behavior connected to backend code, shared data, and infrastructure.
Pass 3: Build component-specific design documents
The first pass gave breadth. It did not give enough depth.
So we asked the agent to use the first set of documents and perform a deeper pass component by component.
For each major component, it had to create a detailed design document covering what the component does, which workflows it supports, which APIs it depends on, which tables it reads and writes, which cloud resources it touches, what assumptions it makes, what can break, and what other parts of the project depend on it.
This created a second layer of documentation: component-specific design docs.
The difference was important. The summary map helped the agent start fast. The component docs helped the agent go deep only when needed.
How CLAUDE.md, AGENTS.md, and startup prompts reduced token waste
The next problem was token cost.
If every developer had to paste the entire project context into every session, cost would remain high and context quality would remain inconsistent.
So we created a startup model.
We updated CLAUDE.md and AGENTS.md with instructions telling agents where to begin, which docs to read first, and when to go deeper.
The key principle was:
Do not load the whole project into the prompt. Give the agent a map of where the context lives.
The startup context included the project goal, folders of all repositories, role of each repo, documentation location for each repo, dependency relationships, frontend-backend mapping location, component design document location, test commands, development protocol, and review expectations.
Then we created a start_session prompt. This prompt gave the agent enough context to orient itself without forcing us to paste all documentation repeatedly. The agent could start from the summary, then open deeper documentation only for the relevant component. A generalized, copy-pasteable version of this is published as a free artifact: the SESSION_START multi-repo onboarding template.
This solved two problems at once:
- Token cost went down because we stopped repeatedly loading unnecessary context.
- Output quality improved because every agent started with the same project understanding.
Why documentation became the agent's working memory
The next improvement came from a simple realization: agents do not only need documentation to understand the system. They need documentation to decide, track status, checkpoint work, and avoid reopening the same design questions again and again.
Earlier, a lot of project knowledge lived in conversations, temporary prompts, developer memory, or unfinished implementation notes. That does not work when multiple people and multiple agents are working across interconnected repositories.
So we made documentation part of the workflow, not a separate cleanup activity.
For every meaningful module or feature, the agent was asked to maintain a small set of working files:
| Document | Purpose |
|---|---|
TODO.md or TODO_ | The active working list: open items, status, pending tasks, and next steps |
DESIGN_ | The current design as it exists now, including changes made during development |
DEFERRED_ | Design decisions that were consciously deferred, with the reason for deferring them |
CHANGELOG.md | Chronological record of what changed, why it changed, and what was tested |
MEMORY.md or project-level memory file | Durable instructions for how agents should update and use the documentation |
Instead of every new session starting from memory loss, the agent could read the current design, active TODOs, deferred decisions, and changelog. If a developer forgot to update a document, the agent often caught it because the project-level memory instructed it to update the right files as part of its workflow.
The team was also asked to follow the same discipline. That mattered because the documentation was no longer "for the agent" or "for the team." It became the shared operating memory of the project.
The rule became:
If a decision affects future work, document it where the next agent and the next developer will find it.
This reduced repeated context loading, avoided stale assumptions, and helped agents build the right thing instead of rediscovering the same project structure again and again. This is also where the distinction between AI agent memory and state became practical rather than theoretical.
The most important documents were not long architecture documents. They were live working documents that answer: What are we building now? What did we defer? Why did we defer it? What design is currently true? What changed recently? What should the next session read first? What should the agent never assume?
Once this documentation layer existed, agents became more reliable because they were no longer depending only on the prompt window. They had a project memory system.
How we changed day-to-day development with an AI-assisted protocol
Documentation solved the startup context problem. It did not solve day-to-day work.
The next issue was process. Developers needed a protocol for when to use agents, what to ask, what to review, and when to implement.
These protocols build on the broader habits in working effectively with coding agents — here we focus on the two that mattered most for large, interconnected repos. We created separate protocols for new requirements and updates.
Protocol for new requirements
For a new requirement, the team did not start with code. The sequence became:
- I gave the requirement in detail.
- The developer understood it functionally.
- The developer asked the agent to perform end-to-end impact analysis.
- The agent reviewed frontend, backend, database, and cloud impact.
- I reviewed the impact analysis.
- We corrected the design where needed.
- The developer implemented.
- The developer tested locally.
- The agent helped iterate if tests failed or edge cases appeared.
The agent had to answer: Which screens are affected? Which APIs are affected? Which database tables are affected? Which existing workflows may break? Which tests should be run? Which edge cases should be checked? Which developer or repo might be impacted?
This made AI useful before code generation.
Protocol for bug fixes and updates
For bug fixes, the protocol was different. The sequence became:
- RCA first.
- Identify the failing workflow.
- Identify the root cause, not just the visible symptom.
- Propose the fix.
- Perform end-to-end impact analysis.
- Include frontend impact if relevant.
- Review the design.
- Implement.
- Test.
- Document the change if behavior changed.
The rule became:
No fix before RCA. No implementation before impact analysis.
How do local test environments improve agent output?
The next bottleneck was speed.
The process reduced errors, but I had become the review bottleneck. Every design, fix, and implementation needed oversight. The way out was not to remove oversight. The way out was to make agent execution more testable.
Agents work better when given a clear goal, success criteria, iteration constraints, a working local environment, and test conditions.
So the teams set up the required local test environments.
Instead of saying:
"Fix this issue."
They started giving instructions like:
"The goal is X. Success means Y. Do not change Z. Run these tests. If the test fails, iterate up to N times. Stop and report if the fix requires changing the data model or API contract."
This changed the quality of output. The agent could now create, test, observe failure, and iterate.
The developer's role shifted from "copy code from AI" to "define the operating boundary, review the approach, and verify the result."
Why design review became more important, not less important
The biggest lesson was counterintuitive.
Using coding agents did not reduce the importance of design review. It increased it.
Agents can brainstorm designs. They can compare implementation options. They can trace dependencies. They can produce detailed documentation. But they do not automatically understand business constraints, customer history, cost limits, team capability, delivery pressure, or political reality.
So for complex design tasks, we used agents differently. The sequence became:
- Ask the agent to brainstorm possible designs.
- Review the options.
- Add real-world constraints the agent does not know.
- Ask the agent to revise.
- Challenge edge cases.
- Ask for a final detailed design.
- Ask for an engineering review of the design.
- Document the approved design.
- Implement only after the design review is complete.
That is not inefficiency. That is the correct shift.
AI reduces the cost of producing code. It does not reduce the cost of being wrong. When code becomes cheaper, design judgment becomes more valuable.
This is also why we are strict about a related principle: in agent systems, tool output is not instruction. The agent can read, propose, and execute, but it should not silently promote what it retrieves into a decision.
When do parallel coding agents actually help?
Once the documentation, startup prompts, protocols, and test environments were in place, the remaining constraint was agent speed.
We then started running parallel agents where the work could be split safely. This is important: we did not parallelize first. We parallelized after we had control.
In some cases, independent tasks were assigned to different agents across different tools: Claude Code, Claude in VS Code, Codex, Codex in VS Code, Antigravity, and Cursor.
Each agent was told what task it owned, what other agents were working on, what files or workflows not to touch, which design document to follow, what success criteria applied, and how to report conflicts.
This increased cost per hour because multiple agents were running. But it reduced total wasted work because each agent consumed its own focused context and worked on a bounded task. The result was better than one overloaded session trying to do everything.
Parallel agents are dangerous when the work is poorly defined. They are useful when tasks are independent, context is documented, and ownership boundaries are clear. We go deeper into this in running parallel Claude Code agents without losing control.
What changed after the operating model was in place
Today, our AI-assisted development workflow is very different from where we started.
We are saving significant development time. Token costs have gone down by about 70% based on our internal usage pattern. Work quality has improved because the team is no longer using coding agents as uncontrolled code generators.
The main changes were not tool changes. The main changes were operating changes:
- We documented system context.
- We mapped frontend, backend, database, and cloud dependencies.
- We created startup prompts.
- We used
CLAUDE.md,AGENTS.md, and project memory files to standardize agent behavior. - We required impact analysis before implementation.
- We required RCA before fixes.
- We defined success criteria before asking agents to build.
- We gave agents local test environments.
- We maintained
TODO,DESIGN,DEFERRED,CHANGELOG, andMEMORYfiles. - We used design review before implementation.
- We ran parallel agents only when tasks were bounded.
Where this has brought us
This has brought us to an interesting place.
We are much faster now. But, ironically, becoming faster has made us want more good people, not fewer people.
Earlier, the constraint was coding speed. Now the constraint is review speed, context switching, design quality, and the ability to manage multiple streams of AI-assisted work without losing architectural control.
That has also clarified the kind of people we should hire. We do not only need people who can write code. We need people who can understand context, define success criteria, review AI output, test seriously, spot blast radius, document decisions, and coordinate work across repositories.
AI changed the bottleneck. It moved the bottleneck from typing code to exercising judgment.
The operating model we use now
Here is the model in one view.
| Stage | What happens | Why it matters |
|---|---|---|
| Project context | Explain customer goal, repos, workflows, dependencies | Prevents agents from optimizing for isolated code |
| System mapping | Map screens, APIs, payloads, backend functions, tables, cloud resources | Creates blast-radius visibility |
| Component documentation | Create detailed docs for major components | Allows deeper context only when needed |
| Startup prompt | Give agents repo roles, doc locations, dependencies, and protocols | Reduces repeated token use |
| New requirement protocol | Requirement to impact analysis to design review to build to test | Prevents blind implementation |
| Bug fix protocol | RCA to fix proposal to impact analysis to review to test | Prevents symptom-level patches |
| Test environment | Agents can run, observe, and iterate locally | Improves output quality |
| Design review | Agents brainstorm; humans constrain and approve | Keeps accountability with humans |
| Parallel agents | Independent tasks run in parallel with clear boundaries | Improves throughput without uncontrolled chaos |
| Documentation update | Final designs and changes are documented | Improves future agent sessions |
AI coding agents need an engineering operating system around them.
Without that operating system, they create output. With that operating system, they create leverage.
Common mistakes when using AI coding agents in large projects
1. Treating prompts as a substitute for project context
A good prompt cannot replace missing architecture knowledge. If the agent does not understand the project structure, shared database behavior, API contracts, and customer workflows, it will produce local answers that may create system-level problems.
2. Allowing developers to accept AI output without review
AI-generated code can look clean and still be wrong. The review should check business behavior, architecture impact, data impact, security impact, test coverage, and downstream dependencies.
3. Asking agents to code before asking them to analyze impact
For large projects, impact analysis should come before implementation. The agent should first explain what will be affected. Only then should it be allowed to propose a design or write code.
4. Loading too much context into every session
Dumping all documentation into every session is expensive and often ineffective. A better approach is to give agents a context index: where the docs are, which docs matter first, and when to open deeper component-specific files.
5. Running parallel agents without ownership boundaries
Parallel agents can multiply productivity or multiply conflict. They should be used only when tasks are independent, files are bounded, interfaces are clear, and one human or lead agent is responsible for integration review.
6. Confusing faster coding with faster delivery
AI can reduce coding time while increasing total delivery time if testing, review, and integration are weak. The real measure is not how fast code is generated. The real measure is how fast safe, tested, integrated work reaches production.
Checklist: before using coding agents on a large project
Use this checklist before giving AI coding agents meaningful work on a complex project.
Project context
- Is the customer goal documented?
- Are the major workflows documented?
- Are all relevant repositories listed?
- Is each repository's role clear?
- Are shared database and cloud dependencies documented?
- Are frontend screens mapped to API calls?
- Are API calls mapped to backend handlers?
- Are backend handlers mapped to functions, tables, and cloud resources?
- Are reusable components documented?
- Are high-blast-radius areas clearly marked?
- Does
CLAUDE.md,AGENTS.md, or equivalent project guidance exist? - Does the startup prompt point agents to documentation instead of pasting everything?
- Are coding standards and test commands included?
- Are review and approval expectations clear?
- Are agents told when to stop and ask for human review?
- Is the requirement functionally understood before coding?
- Has the agent produced an impact analysis?
- Has the design been reviewed?
- Are success criteria defined?
- Are test conditions clear?
- Has RCA been done?
- Is the root cause different from the visible symptom?
- Has the fix impact been analyzed?
- Are related frontend and backend flows checked?
- Has regression risk been considered?
- Can the agent run local tests?
- Are iteration limits defined?
- Is there a human reviewer for design decisions?
- Is the final change documented?
- Are integration risks reviewed before merge?
Frequently Asked Questions About AI Coding Agents for Large Software Projects
What are AI coding agents?
AI coding agents are tools that can understand a software task, inspect files, modify code, run commands, and sometimes test or iterate on their own. They are different from simple autocomplete because they can act across multiple files and steps.
Why do AI coding agents fail on large projects?
They usually fail because they do not have enough project context. In large projects, the correct answer depends on screens, APIs, database behavior, cloud resources, business rules, and other developers' work.
How can teams reduce token cost when using coding agents?
Teams can reduce token cost by creating reusable project documentation, startup prompts, and agent instruction files instead of repeatedly pasting the same context. The agent should receive a map of where context lives and open deeper documents only when needed.
What is the role of CLAUDE.md or AGENTS.md?
Files like CLAUDE.md and AGENTS.md provide persistent project instructions for coding agents. They can describe repository structure, build commands, coding standards, testing expectations, architecture notes, and workflow rules.
Should AI coding agents be allowed to implement without human review?
Not on high-blast-radius changes. Agents can propose, implement, and test, but human review should remain responsible for architecture, production risk, customer commitments, security, and business tradeoffs.
Can multiple coding agents work on the same project?
Yes, but only with clear task boundaries. Parallel agents work best when tasks are independent, context is documented, file ownership is clear, and integration review is handled carefully.
What should teams ask agents before asking them to code?
Before coding, ask the agent for impact analysis. It should identify affected screens, APIs, backend functions, database tables, cloud resources, tests, and possible regression risks.
Do AI coding agents reduce the need for developers?
No. In complex projects, they change the developer's role. Developers spend less time writing routine code and more time defining goals, reviewing design, managing context, testing, and controlling integration risk.
Key Takeaways
- AI coding agents initially increased our chaos because the team had tools but no operating model.
- Large projects need project context, dependency maps, and design documentation before agents can be useful.
- Token cost reduced when we stopped pasting full context and started using startup prompts,
CLAUDE.md,AGENTS.md, and documentation maps. - Impact analysis before coding became one of the most important workflow changes.
- Local test environments improved agent output because agents could create, test, fail, and iterate.
- Design review became more important because AI makes code cheaper but does not make wrong decisions cheaper.
- Parallel agents helped only after task boundaries, documentation, and review protocols were in place.
You do not need to build the whole operating model at once. Start small: pick one active module, write its DESIGN, TODO, and DEFERRED files, point your CLAUDE.md at them, and require one impact analysis before the next change. That single loop is enough to feel the difference in your next agent session.
