Why Claude Code Gets Slower the Longer You Use It — and How to Fix It
Why Claude Code Gets Slower the Longer You Use It — and How to Fix It
Claude Code is slow — but not randomly. Why? There are two distinct failure modes, and most developers are solving the wrong one.
A single vague "fix this bug" instruction can trigger 15 exploratory tool calls and 12 seconds of wait time — not because Claude is slow, but because you didn't tell it where to look. By turn 30 of a session, you might be processing 400,000 tokens of context just to answer a simple question. That is why Claude Code often feels fine at the start of a session and progressively sluggish by the end.
The first slowness cause is tool call latency: Claude Code makes sequential HTTP round trips to read files, run commands, and process results. Each step takes 300–800ms. Twenty steps in a row means 10–15 seconds of wait time before Claude writes a single line of code — and that's not a network problem, it's an architecture problem caused by how you prompted it.
The second is context accumulation: every file read, shell output, and conversation turn gets appended to the transcript, and that entire transcript is re-processed on every subsequent call.
Understanding which problem you have changes everything about how you fix it.
Quick answer: Claude Code slowness has two root causes — too many sequential tool calls (an architecture problem, fixed by better prompts and CLAUDE.md) and a bloated context window (a session hygiene problem, fixed by/compactor/clear). Switching models helps with cost but only partially with speed. The real gains come from reducing tool call chains and keeping context lean.
Table of Contents
- The short answer: two different problems that feel like one
- Why tool call latency is the real culprit, not token processing
- Why Claude Code gets slower as a session goes on
- How to diagnose what is actually causing the slowness
- Six fixes you can apply right now
- Claude Code vs Cursor vs Codex — where slowness comes from in each
- Common mistakes that make Claude Code slower over time
- Frequently Asked Questions
- Key Takeaways
The short answer: two different problems that feel like one
Developers searching "why is Claude Code slow" usually mean one of these:
- Slow from the first message — the model takes a long time to respond even on simple tasks early in a session
- Slow after a while — responses were fast initially but are now grinding, and nothing changed about the task complexity
The table below separates them clearly.
| Symptom | Likely cause | Primary fix |
|---|---|---|
| Slow on turn 1, short session | Model response latency, network, or exploratory tool calls | Scope the task more precisely in your prompt |
| Fast at start, slow by turn 20+ | Context window accumulation | /compact or /clear |
| Slow on every multi-file task | Sequential tool call chains | CLAUDE.md + explicit file scoping |
| Consistent lag across all sessions | Model tier selected | Switch to Sonnet or Haiku for simpler tasks |
| Occasional spike in latency | Prompt cache miss after idle gap | Work in tighter bursts |
Why tool call latency is the real culprit, not token processing
Most developers assume Claude Code is slow because it is "thinking." In agentic sessions, the bigger delay is usually waiting — not for model inference, but for tool calls to complete.
What a tool call actually is
When Claude Code reads a file, runs a shell command, searches a directory, or calls an external API, each of those is a tool call: a discrete HTTP request that leaves Claude, executes on your machine or a remote server, and returns a result before Claude can continue. According to the Anthropic Claude Code documentation, Claude Code operates as an agentic loop where the model decides which tool to invoke, waits for the result, and then decides the next step.
That wait is synchronous by default. Claude Code does not run tool calls in parallel unless you explicitly structure your work that way (more on this below).
How sequential tool calls compound into visible lag
Here is what a typical "fix this bug" instruction actually triggers:
- Read the file mentioned — 400ms
- Read a related import — 400ms
- Search for usages of the function — 600ms
- Read another file found in the search — 400ms
- Run the test suite — 2,000ms
- Read the error output — 200ms
- Write the fix — 300ms
- Run tests again — 2,000ms
A vague prompt makes this significantly worse, because Claude cannot skip the exploration phase.
How vague prompts trigger unnecessary tool call chains
The instruction "fix the auth bug" tells Claude nothing about where the bug is. Claude must explore: read the auth module, check the router, search for session handling, read middleware, look for tests. That exploration is 10–15 extra tool calls before any fix is written.
The instruction "in src/auth/session.ts, the cookie expiry is hardcoded to 3600 seconds — change it to use the SESSION_TTL env variable" eliminates the exploration entirely. Claude reads one file and writes one change.
Specificity is the single highest-impact speed optimisation for early-session slowness. It costs you 30 extra seconds of thinking time and saves 30–90 seconds of agent time.
Why Claude Code gets slower as a session goes on
Tool call latency explains early-session slowness. Context accumulation explains why sessions degrade over time.
Context window processing overhead grows with every turn
Every turn in a Claude Code session appends to the transcript:
- Your message
- Claude's response
- Every tool call made
- Every tool result returned (file contents, shell output, grep results, test logs)
A turn-5 session might carry 20,000 tokens of context. A turn-35 session after reading several large files and running a build might carry 400,000 tokens. The model processes all of it before writing a single output token. That is not the same as reading it from cache — it is active processing, and it takes time proportional to context size.
The difference between slow at turn 1 and slow at turn 30
This is the diagnostic that saves the most time:
Slow at turn 1: The session is fresh, context is minimal. If Claude Code is slow here, the cause is model response latency, network conditions, or a prompt that immediately triggers a long chain of exploratory tool calls. Fix: write a more specific prompt. Optionally, switch to a faster model tier.
Slow at turn 30: You have been in the session for a while. Files have been read, commands have been run, errors have been processed. Context is large. Each new turn pays the full processing overhead of the entire history. Fix: /compact to summarise and continue, or /clear and restart with fresh context.
How to diagnose what is actually causing the slowness
Before applying any fix, confirm what you are actually dealing with.
Use --verbose to see tool call chains
Running Claude Code with the --verbose flag prints each tool call as it happens, showing you what Claude is doing and how long each step takes. If you see 15 file reads before any output, you have a tool call problem. If the first tool call response takes 8 seconds, you have a model latency or context problem.
claude --verboseWatch the output. Count the tool calls. Look for unexpected file reads that suggest Claude is exploring the codebase rather than executing a scoped task.
Signs Claude is exploring instead of executing
| Signal | What it means |
|---|---|
| Claude reads files you did not mention | Prompt is under-specified |
| Claude searches for function usages across the repo | Claude does not know where the relevant code lives |
| Claude reads test files before touching source | Claude is building its own context map |
| Claude opens the same file twice | Context was not retained or the task was ambiguous |
Diagnostic table: which fix applies to your situation
| Observation | Diagnosis | Fix |
|---|---|---|
| Slow from message 1, short session | Exploratory tool calls from vague prompt | Rewrite prompt with file paths and explicit scope |
| Slow from message 1, long session | Context bloat on first turn | /compact and restart |
| Fast early, slow by turn 20+ | Context accumulation | /compact mid-session |
| Consistent slow on all multi-file tasks | No CLAUDE.md, Claude navigates blind | Write a CLAUDE.md with file structure |
| Random latency spikes | Cache miss after idle | Work in tighter bursts |
Six fixes you can apply right now
For more context on how to structure your work across multiple sessions, see 25 battle-tested practices for teams using coding agents.
1. Scope the files explicitly in your prompt
Give Claude the file path, the function name, and the expected change. Remove the need to explore.
Before: "Fix the bug in the payment flow"
After: "In src/payments/checkout.ts, the calculateTax() function returns undefined when the country code is missing. Add a fallback to 'US' if countryCode is null or undefined."
The second prompt eliminates at minimum 5–10 tool calls. On a 400ms-per-call basis, that is 2–4 seconds of latency removed before Claude touches any code.
2. Use CLAUDE.md to prevent codebase re-exploration every session
CLAUDE.md is loaded at the start of every Claude Code session. If it contains a clear map of your repo — where the auth lives, what the main entry points are, which files Claude should never read — Claude starts every session already oriented. It does not need to explore.
A CLAUDE.md entry as simple as this eliminates a category of exploratory tool calls:
<h1>Structure</h1>
- Auth: src/auth/ — session handling in session.ts, middleware in auth.middleware.ts
- Payments: src/payments/ — do not read stripe-legacy/ unless explicitly asked
- Tests: tests/ — mirror structure of src/
This is a speed optimisation, not just a context optimisation. Every session that does not start with exploratory file reads is a faster session. For a comprehensive guide on writing effective CLAUDE.md files, see our guide to designing AI agents.
3. Run /compact instead of continuing a bloated session
When a session is long but you need continuity — you're mid-feature, you've made architectural decisions Claude should remember — /compact is the right tool. It asks Claude to summarise the full conversation into a short working memory, replacing hundreds of thousands of tokens of transcript with a few thousand tokens of essential context.
/compactUse /compact when: the session is slow but you genuinely need the model to remember prior decisions.
Use /clear when: the task has changed or the session context is no longer relevant. Starting fresh is faster than compacting stale information.
For a detailed breakdown of cost implications of both commands, see the guide on how to cut Claude Code cost.
4. Switch models by task type — speed is not only a cost decision
Model selection affects response latency independently of cost. Claude Haiku responds significantly faster than Claude Sonnet in wall clock time, not just in tokens-per-dollar. For tasks that do not require deep reasoning — renaming a variable, formatting a file, generating a repetitive pattern — Haiku is faster and cheaper.
Switch models in Claude Code with:
/model haiku
/model sonnetA practical heuristic:
| Task | Recommended model | Reason |
|---|---|---|
| Rename, formatting, repetitive edits | Haiku | Fast, cheap, sufficient |
| Single-file logic, debugging | Sonnet | Balanced speed and quality |
| Cross-file architecture, hard bugs | Sonnet with extended thinking | Best quality-per-dollar |
| Novel algorithm, complex planning | Opus | Use sparingly |
5. Run parallel Claude Code sessions for independent workstreams
Claude Code is single-threaded within a session — each tool call waits for the previous one to complete. But you can run multiple Claude Code instances simultaneously in separate terminal windows or tmux panes.
If you have two unrelated tasks — writing tests for module A while refactoring module B — running them in parallel halves the wall clock time. Each session carries only its own context, with no cross-contamination.
<h1>Terminal 1</h1>
claude # Working on auth tests<h1>Terminal 2</h1>
claude # Working on payment refactor
This is one of the most underused speed strategies available to Claude Code users. It requires no configuration and costs the same as running the tasks sequentially.
6. Clear and restart when context is stale
If the session has drifted — you fixed one bug, then another, then started a new feature — the accumulated context is more noise than signal. A /clear and a precise new prompt will almost always be faster than continuing a heavy session.
/clearThe test: if you had to explain the current session context to a new colleague in two sentences, and you cannot — the session is too stale to be useful. Start fresh.
Claude Code vs Cursor vs Codex — where slowness comes from in each
The tool call architecture is common across all three tools. The differences are in packaging and where the bottlenecks appear.
| Tool | Primary speed bottleneck | What helps |
|---|---|---|
| Claude Code | Sequential tool calls + context accumulation | /compact, CLAUDE.md, explicit file scoping |
| Cursor | "Max mode" uses frontier models with long reasoning | Disable Max mode for routine tasks; use standard mode |
| Codex (OpenAI) | Reasoning tokens on o-series models are hidden but billed | Disable extended thinking for simple tasks; use GPT-4.1 for speed |
/model sonnet in Claude Code.For Codex, the equivalent is disabling reasoning mode on o-series models when reasoning is not needed. Codex cache TTL follows the same ~5-minute window as Anthropic's prompt cache — working in bursts keeps the cache warm across all three tools.
Common mistakes that make Claude Code slower over time
Keeping one session alive all day. Context accumulates, gaps cause cache misses, and the session becomes progressively more expensive and slower to respond. A clean session each morning — or each major task switch — is almost always faster.
Using the agent as a file browser. Asking "what does this function do?" inside a long session makes Claude read and process files it may have already read, re-billing that context into an already heavy transcript. For exploratory questions, open a fresh short session.
Letting tool calls run unbounded. Instructions like "look through the codebase and find everywhere this pattern is used" can trigger 30+ file reads. If you need that analysis, scope it: "search only in src/components/ for uses of the useAuth hook."
Ignoring the --verbose output. Most developers never look at what Claude is actually doing between their prompt and the response. Running --verbose once per session type teaches you where the tool calls are going and where the time is being spent.
Switching models for speed without fixing the underlying prompt. A faster model running 20 unnecessary tool calls is still slower than the right model running 3 targeted tool calls.
Frequently Asked Questions About Why Claude Code Is Slow
Why does Claude Code get slower during a long session?
Every tool result and conversation turn is appended to the context window, which is re-processed in full on every subsequent call. A session at turn 5 might carry 20,000 tokens; the same session at turn 35 can carry 400,000 tokens or more. Processing time scales with context size, which is why sessions feel fast early and sluggish later. Running /compact mid-session reduces this overhead significantly.
What is a tool call and why does it cause latency?
A tool call is a discrete action Claude takes — reading a file, running a shell command, searching a directory. Each tool call is a synchronous HTTP round trip: Claude requests the action, waits for the result, and only then decides the next step. A task requiring 20 tool calls at 400ms each adds 8 seconds of pure wait time before Claude writes any output.
Does switching to a smaller model fix Claude Code slowness?
Partially. Smaller models like Haiku respond faster in wall clock time, so model switching helps. But if the root cause is tool call chains or context accumulation, a faster model running the same number of tool calls on the same bloated context will still be slow. Fix the tool call problem with better prompts and CLAUDE.md first, then consider model selection.
What does --verbose do in Claude Code?
The --verbose flag prints each tool call as it executes, showing what Claude is reading, running, or searching — and when. It is the primary diagnostic tool for understanding whether slowness is coming from exploratory file reads, long-running shell commands, or model inference time. Run it once on a typical session to understand your personal bottleneck.
How is Claude Code slowness different from Cursor slowness?
The underlying cause is the same — sequential tool calls and context accumulation — but the controls differ. In Cursor, disabling Max mode switches from frontier models to faster mid-tier models. In Claude Code, /model sonnet or /model haiku achieves the same effect. Context management in both tools benefits from starting fresh sessions for unrelated tasks.
When should I use /compact versus /clear?
Use /compact when you need the model to remember prior decisions in the session — architectural choices, debugging context, in-progress feature logic — but the session has grown slow. /compact summarises the transcript into a short working memory. Use /clear when the task has changed completely and prior context is irrelevant. Starting fresh is faster than carrying stale context.
Does CLAUDE.md actually improve speed?
Yes, directly. CLAUDE.md is loaded at session start and tells Claude where things live in your codebase. Without it, Claude often reads multiple files to orient itself before doing any work. A well-structured CLAUDE.md eliminates that exploration phase, reducing tool calls on the first turn of every session. It is both a speed and a quality improvement.
Key Takeaways
- Claude Code slowness has two distinct root causes: tool call latency (early session, architecture problem) and context accumulation (late session, hygiene problem). Treat them differently.
- A vague prompt can trigger 15+ unnecessary tool calls. Specific prompts with file paths and function names eliminate the exploration phase entirely.
- CLAUDE.md is a speed tool. A repo map at session start prevents Claude from navigating your codebase blind on every session.
- Use
--verboseto see exactly what Claude is doing between your prompt and the response. Most slowness becomes obvious immediately. /compactfor long sessions where context matters./clearwhen the task has changed and context is stale.- Parallel Claude Code sessions in separate terminals halve wall clock time for independent workstreams — no configuration required.
- Model switching helps at the margin. Fixing prompt specificity and context hygiene helps more.
CTA
If you have not read the companion post on how to cut Claude Code cost — speed and cost are the same root problem viewed from different angles. The session hygiene and model selection guidance there applies directly to the fixes in this post.
Coming next: a full CLAUDE.md optimisation guide and a practical breakdown of running parallel Claude Code sessions.
References
- Anthropic — Claude Code documentation
- Anthropic — Prompt caching documentation
- Anthropic — Model overview and pricing
- Anthropic — Claude models overview
