How We Reduced AI Coding Agent Token Cost by 70% on Large Software Projects

Q: How can teams reduce token cost when using coding agents?

Reduce token cost by creating reusable project documentation, startup prompts, and agent instruction files instead of repeatedly pasting the same context. The agent should receive a map of where context lives and open deeper documents only when needed.

Q: What is the role of CLAUDE.md or AGENTS.md?

Files like CLAUDE.md and AGENTS.md provide persistent project instructions for coding agents. They describe repository structure, build commands, coding standards, testing expectations, architecture notes, and workflow rules.

Q: Do AI coding agents reduce the need for developers?

No. In complex projects they change the developer's role. Developers spend less time writing routine code and more time defining goals, reviewing design, managing context, testing, and controlling integration risk.

By Aakash Ahuja·June 23, 2026·24 min read

AI coding agents did not make our engineering team faster at first. They made us faster at creating confusion.

The short version: the fix was not a better prompt. It was an operating model. We stopped treating coding agents as smart autocomplete and built structure around them — context maps, startup prompts in CLAUDE.md and AGENTS.md, impact analysis before coding, local test environments, and human design review. That cut our AI coding agent token cost by about 70% and made delivery faster on large, multi-repo projects. The rest of this article is how.

We use coding agents every day: mostly Claude, some Codex, some Antigravity, and some Cursor. Our projects are not small isolated repositories. They are large, interconnected systems with frontend repositories, backend repositories, shared databases, shared cloud resources, and multiple developers working on related parts of the same product.

In that environment, one missed design decision can affect screens, APIs, database state, background jobs, reports, infrastructure, and other developers' work.

The turning point was not a better prompt. It was an operating model.

We reduced token cost by about 70% and improved development speed only after we stopped treating coding agents as smart autocomplete and started treating them as junior execution agents that need context, constraints, test environments, design review, and shared project memory.

This is our internal experience, not a universal benchmark. The exact result depends on project size, team discipline, tooling, documentation quality, and how much repeated context your agents currently consume.

Why AI coding agents initially increased chaos instead of productivity
Coding assistant vs coding agent: why the difference matters here
What makes large software projects difficult for coding agents
How we created project context before asking agents to code
How CLAUDE.md, AGENTS.md, and startup prompts reduced token waste
Why documentation became the agent's working memory
How we changed day-to-day development with an AI-assisted protocol
How do local test environments improve agent output?
Why design review became more important, not less important
When do parallel coding agents actually help?
What changed after the operating model was in place
The operating model we use now
Common mistakes when using AI coding agents in large projects
Checklist: before using coding agents on a large project
FAQ and References

---

Why AI coding agents initially increased chaos instead of productivity

The first mistake was simple: we gave developers tools before we gave the team a coherent strategy.

Developers started using agents to write code, fix bugs, and modify features. But the agents did not understand the system. The developers often accepted generated solutions without a deep review. Some changes worked locally but broke related flows. Some code was pushed without proper testing. Some fixes solved the visible issue but created hidden impact elsewhere.

The result was predictable:

Token costs increased.
Testing time increased.
Test quality dropped.
Review burden increased.
Developers trusted generated code too quickly.
Cross-repo impact was missed.
The team moved faster in isolated tasks but slower as a whole.

This is an important distinction.

A coding agent can make one developer appear faster while making the engineering system slower. In large projects, local speed is not the same as team productivity.

Coding assistant vs coding agent: why the difference matters here

These two terms get used interchangeably, but the distinction is the whole point of this article.

	Coding assistant	Coding agent
Scope	Completes the line or block you are typing	Acts across many files, runs commands, and iterates
Initiative	Suggests; you decide	Plans, executes, and can change state on its own
Failure mode	A bad suggestion you can ignore	A confident multi-file change with hidden blast radius
What it needs	A good local context window	An operating model: context maps, constraints, tests, review

A coding assistant is mostly safe because the human stays in the loop on every keystroke. A coding agent removes the human from most of those small decisions, which is exactly why it needs explicit context, boundaries, and review. Treating an agent like an assistant — assuming the human will catch everything — is how teams get plausible code that breaks systems they never looked at.

What makes large software projects difficult for coding agents?

Large projects are difficult for coding agents because the real context is not inside one file.

A feature may involve a frontend screen, reusable UI components, API calls, request payloads, response contracts, backend route handlers, business logic functions, database tables, background jobs, queues, cloud storage, permissions, reports, test data, deployment assumptions, and other teams' unfinished work.

The agent may see one part of the codebase and produce a plausible answer. But plausible is not enough.

In a large interconnected project, the important question is not:

"Can the agent write code?"

The important question is:

"Does the agent understand the blast radius of the change?"

Blast radius means the full set of systems, data, workflows, and people affected by a change. The higher the blast radius, the more dangerous blind AI-assisted coding becomes.

How we created project context before asking agents to code

We stopped asking agents to code first.

Instead, we asked them to understand.

This changed everything.

Pass 0: Give the agent the customer goal and existing documentation

We started with whatever project documentation we already had: end-to-end docs, feature notes, architecture notes, and rough internal explanations. Then we explained the project goal from the customer's perspective in detail.

This mattered because agents tend to optimize for the immediate instruction. If they do not understand what the customer is trying to achieve, they may produce technically correct but functionally wrong code.

The first layer of context was therefore not code. It was:

Who is the user?
What is the project trying to achieve?
What are the major workflows?
What does success look like for the customer?
What parts of the system are business-critical?
What should not break?

Only after this did we start repo-level analysis.

Pass 1: Map frontend screens, endpoints, payloads, and responses

We downloaded the relevant repositories for the project: frontend, backend, and other related repos. Then we asked the agent to start with the frontend.

The task was not to refactor or improve anything. The task was to map the system.

For every screen, the agent had to identify the screen name, route, components used, visible user purpose, API endpoints called, request payloads, response payloads, state changes, validations, error states, and likely business purpose.

This created the first useful artifact: a frontend-to-API map.

For the first time, the agent had a structured understanding of what the user sees and which backend calls support that experience.

Pass 2: Link backend endpoints to scripts, functions, tables, and cloud resources

Once the frontend map existed, we extended the documentation from the backend side.

For every endpoint, the agent had to trace the route definition, controller or handler, service functions, reusable utilities, database tables used, read/write behavior, cloud resources touched, files or storage used, background jobs triggered, and functionality delivered.

This created the second useful artifact: a frontend-to-backend-to-database-to-cloud map.

This was the first real context breakthrough. Earlier, agents were solving one file at a time. Now they could see how one frontend behavior connected to backend code, shared data, and infrastructure.

Pass 3: Build component-specific design documents

The first pass gave breadth. It did not give enough depth.

So we asked the agent to use the first set of documents and perform a deeper pass component by component.

For each major component, it had to create a detailed design document covering what the component does, which workflows it supports, which APIs it depends on, which tables it reads and writes, which cloud resources it touches, what assumptions it makes, what can break, and what other parts of the project depend on it.

This created a second layer of documentation: component-specific design docs.

The difference was important. The summary map helped the agent start fast. The component docs helped the agent go deep only when needed.

How CLAUDE.md, AGENTS.md, and startup prompts reduced token waste

The next problem was token cost.

If every developer had to paste the entire project context into every session, cost would remain high and context quality would remain inconsistent.

So we created a startup model.

We updated CLAUDE.md and AGENTS.md with instructions telling agents where to begin, which docs to read first, and when to go deeper.

The key principle was:

Do not load the whole project into the prompt. Give the agent a map of where the context lives.

The startup context included the project goal, folders of all repositories, role of each repo, documentation location for each repo, dependency relationships, frontend-backend mapping location, component design document location, test commands, development protocol, and review expectations.

Then we created a start_session prompt. This prompt gave the agent enough context to orient itself without forcing us to paste all documentation repeatedly. The agent could start from the summary, then open deeper documentation only for the relevant component. A generalized, copy-pasteable version of this is published as a free artifact: the SESSION_START multi-repo onboarding template.

This solved two problems at once:

Token cost went down because we stopped repeatedly loading unnecessary context.
Output quality improved because every agent started with the same project understanding.

If your sessions feel sluggish for unrelated reasons, that is a separate problem — we cover it in why Claude Code gets slow and how to fix it.

Why documentation became the agent's working memory

The next improvement came from a simple realization: agents do not only need documentation to understand the system. They need documentation to decide, track status, checkpoint work, and avoid reopening the same design questions again and again.

Earlier, a lot of project knowledge lived in conversations, temporary prompts, developer memory, or unfinished implementation notes. That does not work when multiple people and multiple agents are working across interconnected repositories.

So we made documentation part of the workflow, not a separate cleanup activity.

For every meaningful module or feature, the agent was asked to maintain a small set of working files:

Document	Purpose
`TODO.md` or `TODO_.md`	The active working list: open items, status, pending tasks, and next steps
`DESIGN_.md`	The current design as it exists now, including changes made during development
`DEFERRED_.md`	Design decisions that were consciously deferred, with the reason for deferring them
`CHANGELOG.md`	Chronological record of what changed, why it changed, and what was tested
`MEMORY.md` or project-level memory file	Durable instructions for how agents should update and use the documentation

This changed agent behavior.

Instead of every new session starting from memory loss, the agent could read the current design, active TODOs, deferred decisions, and changelog. If a developer forgot to update a document, the agent often caught it because the project-level memory instructed it to update the right files as part of its workflow.

The team was also asked to follow the same discipline. That mattered because the documentation was no longer "for the agent" or "for the team." It became the shared operating memory of the project.

The rule became:

If a decision affects future work, document it where the next agent and the next developer will find it.

This reduced repeated context loading, avoided stale assumptions, and helped agents build the right thing instead of rediscovering the same project structure again and again. This is also where the distinction between AI agent memory and state became practical rather than theoretical.

The most important documents were not long architecture documents. They were live working documents that answer: What are we building now? What did we defer? Why did we defer it? What design is currently true? What changed recently? What should the next session read first? What should the agent never assume?

Once this documentation layer existed, agents became more reliable because they were no longer depending only on the prompt window. They had a project memory system.

How we changed day-to-day development with an AI-assisted protocol

Documentation solved the startup context problem. It did not solve day-to-day work.

The next issue was process. Developers needed a protocol for when to use agents, what to ask, what to review, and when to implement.

These protocols build on the broader habits in working effectively with coding agents — here we focus on the two that mattered most for large, interconnected repos. We created separate protocols for new requirements and updates.

Protocol for new requirements

For a new requirement, the team did not start with code. The sequence became:

I gave the requirement in detail.
The developer understood it functionally.
The developer asked the agent to perform end-to-end impact analysis.
The agent reviewed frontend, backend, database, and cloud impact.
I reviewed the impact analysis.
We corrected the design where needed.
The developer implemented.
The developer tested locally.
The agent helped iterate if tests failed or edge cases appeared.

This reduced blind implementation. The most important step was impact analysis before coding.

The agent had to answer: Which screens are affected? Which APIs are affected? Which database tables are affected? Which existing workflows may break? Which tests should be run? Which edge cases should be checked? Which developer or repo might be impacted?

This made AI useful before code generation.

Protocol for bug fixes and updates

For bug fixes, the protocol was different. The sequence became:

RCA first.
Identify the failing workflow.
Identify the root cause, not just the visible symptom.
Propose the fix.
Perform end-to-end impact analysis.
Include frontend impact if relevant.
Review the design.
Implement.
Test.
Document the change if behavior changed.

This mattered because agents are often good at producing a local fix. But a local fix is dangerous when the root cause is misunderstood.

The rule became:

No fix before RCA. No implementation before impact analysis.

How do local test environments improve agent output?

The next bottleneck was speed.

The process reduced errors, but I had become the review bottleneck. Every design, fix, and implementation needed oversight. The way out was not to remove oversight. The way out was to make agent execution more testable.

Agents work better when given a clear goal, success criteria, iteration constraints, a working local environment, and test conditions.

So the teams set up the required local test environments.

Instead of saying:

"Fix this issue."

They started giving instructions like:

"The goal is X. Success means Y. Do not change Z. Run these tests. If the test fails, iterate up to N times. Stop and report if the fix requires changing the data model or API contract."

This changed the quality of output. The agent could now create, test, observe failure, and iterate.

The developer's role shifted from "copy code from AI" to "define the operating boundary, review the approach, and verify the result."

Why design review became more important, not less important

The biggest lesson was counterintuitive.

Using coding agents did not reduce the importance of design review. It increased it.

Agents can brainstorm designs. They can compare implementation options. They can trace dependencies. They can produce detailed documentation. But they do not automatically understand business constraints, customer history, cost limits, team capability, delivery pressure, or political reality.

So for complex design tasks, we used agents differently. The sequence became:

Ask the agent to brainstorm possible designs.
Review the options.
Add real-world constraints the agent does not know.
Ask the agent to revise.
Challenge edge cases.
Ask for a final detailed design.
Ask for an engineering review of the design.
Document the approved design.
Implement only after the design review is complete.

This took time. In many complex requirements, design time took a few hours. Build time was shorter.

That is not inefficiency. That is the correct shift.

AI reduces the cost of producing code. It does not reduce the cost of being wrong. When code becomes cheaper, design judgment becomes more valuable.

This is also why we are strict about a related principle: in agent systems, tool output is not instruction. The agent can read, propose, and execute, but it should not silently promote what it retrieves into a decision.

When do parallel coding agents actually help?

Once the documentation, startup prompts, protocols, and test environments were in place, the remaining constraint was agent speed.

We then started running parallel agents where the work could be split safely. This is important: we did not parallelize first. We parallelized after we had control.

In some cases, independent tasks were assigned to different agents across different tools: Claude Code, Claude in VS Code, Codex, Codex in VS Code, Antigravity, and Cursor.

Each agent was told what task it owned, what other agents were working on, what files or workflows not to touch, which design document to follow, what success criteria applied, and how to report conflicts.

This increased cost per hour because multiple agents were running. But it reduced total wasted work because each agent consumed its own focused context and worked on a bounded task. The result was better than one overloaded session trying to do everything.

Parallel agents are dangerous when the work is poorly defined. They are useful when tasks are independent, context is documented, and ownership boundaries are clear. We go deeper into this in running parallel Claude Code agents without losing control.

What changed after the operating model was in place

Today, our AI-assisted development workflow is very different from where we started.

We are saving significant development time. Token costs have gone down by about 70% based on our internal usage pattern. Work quality has improved because the team is no longer using coding agents as uncontrolled code generators.

The main changes were not tool changes. The main changes were operating changes:

We documented system context.
We mapped frontend, backend, database, and cloud dependencies.
We created startup prompts.
We used CLAUDE.md, AGENTS.md, and project memory files to standardize agent behavior.
We required impact analysis before implementation.
We required RCA before fixes.
We defined success criteria before asking agents to build.
We gave agents local test environments.
We maintained TODO, DESIGN, DEFERRED, CHANGELOG, and MEMORY files.
We used design review before implementation.
We ran parallel agents only when tasks were bounded.

The result was not "AI replaced developers." The result was that developers became better coordinators of AI-assisted work.

Where this has brought us

This has brought us to an interesting place.

We are much faster now. But, ironically, becoming faster has made us want more good people, not fewer people.

Earlier, the constraint was coding speed. Now the constraint is review speed, context switching, design quality, and the ability to manage multiple streams of AI-assisted work without losing architectural control.

That has also clarified the kind of people we should hire. We do not only need people who can write code. We need people who can understand context, define success criteria, review AI output, test seriously, spot blast radius, document decisions, and coordinate work across repositories.

AI changed the bottleneck. It moved the bottleneck from typing code to exercising judgment.

The operating model we use now

Here is the model in one view.

Stage	What happens	Why it matters
Project context	Explain customer goal, repos, workflows, dependencies	Prevents agents from optimizing for isolated code
System mapping	Map screens, APIs, payloads, backend functions, tables, cloud resources	Creates blast-radius visibility
Component documentation	Create detailed docs for major components	Allows deeper context only when needed
Startup prompt	Give agents repo roles, doc locations, dependencies, and protocols	Reduces repeated token use
New requirement protocol	Requirement to impact analysis to design review to build to test	Prevents blind implementation
Bug fix protocol	RCA to fix proposal to impact analysis to review to test	Prevents symptom-level patches
Test environment	Agents can run, observe, and iterate locally	Improves output quality
Design review	Agents brainstorm; humans constrain and approve	Keeps accountability with humans
Parallel agents	Independent tasks run in parallel with clear boundaries	Improves throughput without uncontrolled chaos
Documentation update	Final designs and changes are documented	Improves future agent sessions

This is the real lesson:

AI coding agents need an engineering operating system around them.

Without that operating system, they create output. With that operating system, they create leverage.

Common mistakes when using AI coding agents in large projects

1. Treating prompts as a substitute for project context

A good prompt cannot replace missing architecture knowledge. If the agent does not understand the project structure, shared database behavior, API contracts, and customer workflows, it will produce local answers that may create system-level problems.

2. Allowing developers to accept AI output without review

AI-generated code can look clean and still be wrong. The review should check business behavior, architecture impact, data impact, security impact, test coverage, and downstream dependencies.

3. Asking agents to code before asking them to analyze impact

For large projects, impact analysis should come before implementation. The agent should first explain what will be affected. Only then should it be allowed to propose a design or write code.

4. Loading too much context into every session

Dumping all documentation into every session is expensive and often ineffective. A better approach is to give agents a context index: where the docs are, which docs matter first, and when to open deeper component-specific files.

5. Running parallel agents without ownership boundaries

Parallel agents can multiply productivity or multiply conflict. They should be used only when tasks are independent, files are bounded, interfaces are clear, and one human or lead agent is responsible for integration review.

6. Confusing faster coding with faster delivery

AI can reduce coding time while increasing total delivery time if testing, review, and integration are weak. The real measure is not how fast code is generated. The real measure is how fast safe, tested, integrated work reaches production.

Checklist: before using coding agents on a large project

Use this checklist before giving AI coding agents meaningful work on a complex project.

Project context

Is the customer goal documented?
Are the major workflows documented?
Are all relevant repositories listed?
Is each repository's role clear?
Are shared database and cloud dependencies documented?

Architecture and dependency mapping

Are frontend screens mapped to API calls?
Are API calls mapped to backend handlers?
Are backend handlers mapped to functions, tables, and cloud resources?
Are reusable components documented?
Are high-blast-radius areas clearly marked?

Agent startup

Does CLAUDE.md, AGENTS.md, or equivalent project guidance exist?
Does the startup prompt point agents to documentation instead of pasting everything?
Are coding standards and test commands included?
Are review and approval expectations clear?
Are agents told when to stop and ask for human review?

Requirement workflow

Is the requirement functionally understood before coding?
Has the agent produced an impact analysis?
Has the design been reviewed?
Are success criteria defined?
Are test conditions clear?

Bug-fix workflow

Has RCA been done?
Is the root cause different from the visible symptom?
Has the fix impact been analyzed?
Are related frontend and backend flows checked?
Has regression risk been considered?

Test and review

Can the agent run local tests?
Are iteration limits defined?
Is there a human reviewer for design decisions?
Is the final change documented?
Are integration risks reviewed before merge?

---

Frequently Asked Questions About AI Coding Agents for Large Software Projects

What are AI coding agents?

AI coding agents are tools that can understand a software task, inspect files, modify code, run commands, and sometimes test or iterate on their own. They are different from simple autocomplete because they can act across multiple files and steps.

Why do AI coding agents fail on large projects?

They usually fail because they do not have enough project context. In large projects, the correct answer depends on screens, APIs, database behavior, cloud resources, business rules, and other developers' work.

How can teams reduce token cost when using coding agents?

Teams can reduce token cost by creating reusable project documentation, startup prompts, and agent instruction files instead of repeatedly pasting the same context. The agent should receive a map of where context lives and open deeper documents only when needed.

What is the role of CLAUDE.md or AGENTS.md?

Files like CLAUDE.md and AGENTS.md provide persistent project instructions for coding agents. They can describe repository structure, build commands, coding standards, testing expectations, architecture notes, and workflow rules.

Should AI coding agents be allowed to implement without human review?

Not on high-blast-radius changes. Agents can propose, implement, and test, but human review should remain responsible for architecture, production risk, customer commitments, security, and business tradeoffs.

Can multiple coding agents work on the same project?

Yes, but only with clear task boundaries. Parallel agents work best when tasks are independent, context is documented, file ownership is clear, and integration review is handled carefully.

What should teams ask agents before asking them to code?

Before coding, ask the agent for impact analysis. It should identify affected screens, APIs, backend functions, database tables, cloud resources, tests, and possible regression risks.

Do AI coding agents reduce the need for developers?

No. In complex projects, they change the developer's role. Developers spend less time writing routine code and more time defining goals, reviewing design, managing context, testing, and controlling integration risk.

Key Takeaways

AI coding agents initially increased our chaos because the team had tools but no operating model.
Large projects need project context, dependency maps, and design documentation before agents can be useful.
Token cost reduced when we stopped pasting full context and started using startup prompts, CLAUDE.md, AGENTS.md, and documentation maps.
Impact analysis before coding became one of the most important workflow changes.
Local test environments improved agent output because agents could create, test, fail, and iterate.
Design review became more important because AI makes code cheaper but does not make wrong decisions cheaper.
Parallel agents helped only after task boundaries, documentation, and review protocols were in place.

If you are using coding agents on a large engineering project, do not start by asking, "Which AI tool is best?" Start with a harder question: does our engineering system have enough context, documentation, testing, and review discipline for AI-generated code to be safe?

You do not need to build the whole operating model at once. Start small: pick one active module, write its DESIGN, TODO, and DEFERRED files, point your CLAUDE.md at them, and require one impact analysis before the next change. That single loop is enough to feel the difference in your next agent session.

References

AIStrategyTechnologyJune 23, 2026