Why Claude Code Gets Slower the Longer You Use It — and How to Fix It

Why Claude Code Gets Slower the Longer You Use It — and How to Fix It

Claude Code is slow — but not randomly. Why? There are two distinct failure modes, and most developers are solving the wrong one.

A single vague "fix this bug" instruction can trigger 15 exploratory tool calls and 12 seconds of wait time — not because Claude is slow, but because you didn't tell it where to look. By turn 30 of a session, you might be processing 400,000 tokens of context just to answer a simple question. That is why Claude Code often feels fine at the start of a session and progressively sluggish by the end.

The first slowness cause is tool call latency: Claude Code makes sequential HTTP round trips to read files, run commands, and process results. Each step takes 300–800ms. Twenty steps in a row means 10–15 seconds of wait time before Claude writes a single line of code — and that's not a network problem, it's an architecture problem caused by how you prompted it.

The second is context accumulation: every file read, shell output, and conversation turn gets appended to the transcript, and that entire transcript is re-processed on every subsequent call.

Understanding which problem you have changes everything about how you fix it.

Quick answer: Claude Code slowness has two root causes — too many sequential tool calls (an architecture problem, fixed by better prompts and CLAUDE.md) and a bloated context window (a session hygiene problem, fixed by /compact or /clear). Switching models helps with cost but only partially with speed. The real gains come from reducing tool call chains and keeping context lean.

Table of Contents

  • The short answer: two different problems that feel like one
  • Why tool call latency is the real culprit, not token processing
  • Why Claude Code gets slower as a session goes on
  • How to diagnose what is actually causing the slowness
  • Six fixes you can apply right now
  • Claude Code vs Cursor vs Codex — where slowness comes from in each
  • Common mistakes that make Claude Code slower over time
  • Frequently Asked Questions
  • Key Takeaways
---

The short answer: two different problems that feel like one

Developers searching "why is Claude Code slow" usually mean one of these:

  • Slow from the first message — the model takes a long time to respond even on simple tasks early in a session
  • Slow after a while — responses were fast initially but are now grinding, and nothing changed about the task complexity
These have different causes and different fixes. Conflating them leads to wasted effort — switching models when the real problem is context bloat, or clearing the session when the real problem is a vague prompt triggering 25 file reads.

The table below separates them clearly.

SymptomLikely causePrimary fix
Slow on turn 1, short sessionModel response latency, network, or exploratory tool callsScope the task more precisely in your prompt
Fast at start, slow by turn 20+Context window accumulation/compact or /clear
Slow on every multi-file taskSequential tool call chainsCLAUDE.md + explicit file scoping
Consistent lag across all sessionsModel tier selectedSwitch to Sonnet or Haiku for simpler tasks
Occasional spike in latencyPrompt cache miss after idle gapWork in tighter bursts
---

Why tool call latency is the real culprit, not token processing

Most developers assume Claude Code is slow because it is "thinking." In agentic sessions, the bigger delay is usually waiting — not for model inference, but for tool calls to complete.

What a tool call actually is

When Claude Code reads a file, runs a shell command, searches a directory, or calls an external API, each of those is a tool call: a discrete HTTP request that leaves Claude, executes on your machine or a remote server, and returns a result before Claude can continue. According to the Anthropic Claude Code documentation, Claude Code operates as an agentic loop where the model decides which tool to invoke, waits for the result, and then decides the next step.

That wait is synchronous by default. Claude Code does not run tool calls in parallel unless you explicitly structure your work that way (more on this below).

How sequential tool calls compound into visible lag

Here is what a typical "fix this bug" instruction actually triggers:

  • Read the file mentioned — 400ms
  • Read a related import — 400ms
  • Search for usages of the function — 600ms
  • Read another file found in the search — 400ms
  • Run the test suite — 2,000ms
  • Read the error output — 200ms
  • Write the fix — 300ms
  • Run tests again — 2,000ms
That is 8 tool calls, almost 7 seconds of wall clock time — and Claude has not done anything unusual. It has done exactly what a careful developer would do. The problem is that every step is sequential, and every step pays a network round trip.

A vague prompt makes this significantly worse, because Claude cannot skip the exploration phase.

How vague prompts trigger unnecessary tool call chains

The instruction "fix the auth bug" tells Claude nothing about where the bug is. Claude must explore: read the auth module, check the router, search for session handling, read middleware, look for tests. That exploration is 10–15 extra tool calls before any fix is written.

The instruction "in src/auth/session.ts, the cookie expiry is hardcoded to 3600 seconds — change it to use the SESSION_TTL env variable" eliminates the exploration entirely. Claude reads one file and writes one change.

Specificity is the single highest-impact speed optimisation for early-session slowness. It costs you 30 extra seconds of thinking time and saves 30–90 seconds of agent time.


Why Claude Code gets slower as a session goes on

Tool call latency explains early-session slowness. Context accumulation explains why sessions degrade over time.

Context window processing overhead grows with every turn

Every turn in a Claude Code session appends to the transcript:

  • Your message
  • Claude's response
  • Every tool call made
  • Every tool result returned (file contents, shell output, grep results, test logs)
This entire transcript is re-sent to the model on every single turn. The model does not maintain a running state — it re-reads the full conversation each time to understand where it is.

A turn-5 session might carry 20,000 tokens of context. A turn-35 session after reading several large files and running a build might carry 400,000 tokens. The model processes all of it before writing a single output token. That is not the same as reading it from cache — it is active processing, and it takes time proportional to context size.

The difference between slow at turn 1 and slow at turn 30

This is the diagnostic that saves the most time:

Slow at turn 1: The session is fresh, context is minimal. If Claude Code is slow here, the cause is model response latency, network conditions, or a prompt that immediately triggers a long chain of exploratory tool calls. Fix: write a more specific prompt. Optionally, switch to a faster model tier.

Slow at turn 30: You have been in the session for a while. Files have been read, commands have been run, errors have been processed. Context is large. Each new turn pays the full processing overhead of the entire history. Fix: /compact to summarise and continue, or /clear and restart with fresh context.


How to diagnose what is actually causing the slowness

Before applying any fix, confirm what you are actually dealing with.

Use --verbose to see tool call chains

Running Claude Code with the --verbose flag prints each tool call as it happens, showing you what Claude is doing and how long each step takes. If you see 15 file reads before any output, you have a tool call problem. If the first tool call response takes 8 seconds, you have a model latency or context problem.

claude --verbose

Watch the output. Count the tool calls. Look for unexpected file reads that suggest Claude is exploring the codebase rather than executing a scoped task.

Signs Claude is exploring instead of executing

SignalWhat it means
Claude reads files you did not mentionPrompt is under-specified
Claude searches for function usages across the repoClaude does not know where the relevant code lives
Claude reads test files before touching sourceClaude is building its own context map
Claude opens the same file twiceContext was not retained or the task was ambiguous
All of these are fixable with better prompts or a well-structured CLAUDE.md.

Diagnostic table: which fix applies to your situation

ObservationDiagnosisFix
Slow from message 1, short sessionExploratory tool calls from vague promptRewrite prompt with file paths and explicit scope
Slow from message 1, long sessionContext bloat on first turn/compact and restart
Fast early, slow by turn 20+Context accumulation/compact mid-session
Consistent slow on all multi-file tasksNo CLAUDE.md, Claude navigates blindWrite a CLAUDE.md with file structure
Random latency spikesCache miss after idleWork in tighter bursts
---

Six fixes you can apply right now

For more context on how to structure your work across multiple sessions, see 25 battle-tested practices for teams using coding agents.

1. Scope the files explicitly in your prompt

Give Claude the file path, the function name, and the expected change. Remove the need to explore.

Before: "Fix the bug in the payment flow"

After: "In src/payments/checkout.ts, the calculateTax() function returns undefined when the country code is missing. Add a fallback to 'US' if countryCode is null or undefined."

The second prompt eliminates at minimum 5–10 tool calls. On a 400ms-per-call basis, that is 2–4 seconds of latency removed before Claude touches any code.

2. Use CLAUDE.md to prevent codebase re-exploration every session

CLAUDE.md is loaded at the start of every Claude Code session. If it contains a clear map of your repo — where the auth lives, what the main entry points are, which files Claude should never read — Claude starts every session already oriented. It does not need to explore.

A CLAUDE.md entry as simple as this eliminates a category of exploratory tool calls:

<h1>Structure</h1>
  • Auth: src/auth/ — session handling in session.ts, middleware in auth.middleware.ts
  • Payments: src/payments/ — do not read stripe-legacy/ unless explicitly asked
  • Tests: tests/ — mirror structure of src/

This is a speed optimisation, not just a context optimisation. Every session that does not start with exploratory file reads is a faster session. For a comprehensive guide on writing effective CLAUDE.md files, see our guide to designing AI agents.

3. Run /compact instead of continuing a bloated session

When a session is long but you need continuity — you're mid-feature, you've made architectural decisions Claude should remember — /compact is the right tool. It asks Claude to summarise the full conversation into a short working memory, replacing hundreds of thousands of tokens of transcript with a few thousand tokens of essential context.

/compact

Use /compact when: the session is slow but you genuinely need the model to remember prior decisions.

Use /clear when: the task has changed or the session context is no longer relevant. Starting fresh is faster than compacting stale information.

For a detailed breakdown of cost implications of both commands, see the guide on how to cut Claude Code cost.

4. Switch models by task type — speed is not only a cost decision

Model selection affects response latency independently of cost. Claude Haiku responds significantly faster than Claude Sonnet in wall clock time, not just in tokens-per-dollar. For tasks that do not require deep reasoning — renaming a variable, formatting a file, generating a repetitive pattern — Haiku is faster and cheaper.

Switch models in Claude Code with:

/model haiku
/model sonnet

A practical heuristic:

TaskRecommended modelReason
Rename, formatting, repetitive editsHaikuFast, cheap, sufficient
Single-file logic, debuggingSonnetBalanced speed and quality
Cross-file architecture, hard bugsSonnet with extended thinkingBest quality-per-dollar
Novel algorithm, complex planningOpusUse sparingly

5. Run parallel Claude Code sessions for independent workstreams

Claude Code is single-threaded within a session — each tool call waits for the previous one to complete. But you can run multiple Claude Code instances simultaneously in separate terminal windows or tmux panes.

If you have two unrelated tasks — writing tests for module A while refactoring module B — running them in parallel halves the wall clock time. Each session carries only its own context, with no cross-contamination.

<h1>Terminal 1</h1>
claude  # Working on auth tests

<h1>Terminal 2</h1> claude # Working on payment refactor

This is one of the most underused speed strategies available to Claude Code users. It requires no configuration and costs the same as running the tasks sequentially.

6. Clear and restart when context is stale

If the session has drifted — you fixed one bug, then another, then started a new feature — the accumulated context is more noise than signal. A /clear and a precise new prompt will almost always be faster than continuing a heavy session.

/clear

The test: if you had to explain the current session context to a new colleague in two sentences, and you cannot — the session is too stale to be useful. Start fresh.


Claude Code vs Cursor vs Codex — where slowness comes from in each

The tool call architecture is common across all three tools. The differences are in packaging and where the bottlenecks appear.

ToolPrimary speed bottleneckWhat helps
Claude CodeSequential tool calls + context accumulation/compact, CLAUDE.md, explicit file scoping
Cursor"Max mode" uses frontier models with long reasoningDisable Max mode for routine tasks; use standard mode
Codex (OpenAI)Reasoning tokens on o-series models are hidden but billedDisable extended thinking for simple tasks; use GPT-4.1 for speed
For deeper context on working effectively with these coding agents, see working effectively with coding agents. Cursor's "turn off Max mode" query — which appeared in your Search Console data — is asking this same question from the Cursor side. In Cursor, Max mode routes requests to frontier models (Claude Opus, GPT-5) with fewer constraints. Turning it off for routine tasks switches to a faster mid-tier model. The mechanics are identical to switching /model sonnet in Claude Code.

For Codex, the equivalent is disabling reasoning mode on o-series models when reasoning is not needed. Codex cache TTL follows the same ~5-minute window as Anthropic's prompt cache — working in bursts keeps the cache warm across all three tools.


Common mistakes that make Claude Code slower over time

Keeping one session alive all day. Context accumulates, gaps cause cache misses, and the session becomes progressively more expensive and slower to respond. A clean session each morning — or each major task switch — is almost always faster.

Using the agent as a file browser. Asking "what does this function do?" inside a long session makes Claude read and process files it may have already read, re-billing that context into an already heavy transcript. For exploratory questions, open a fresh short session.

Letting tool calls run unbounded. Instructions like "look through the codebase and find everywhere this pattern is used" can trigger 30+ file reads. If you need that analysis, scope it: "search only in src/components/ for uses of the useAuth hook."

Ignoring the --verbose output. Most developers never look at what Claude is actually doing between their prompt and the response. Running --verbose once per session type teaches you where the tool calls are going and where the time is being spent.

Switching models for speed without fixing the underlying prompt. A faster model running 20 unnecessary tool calls is still slower than the right model running 3 targeted tool calls.


Frequently Asked Questions About Why Claude Code Is Slow

Why does Claude Code get slower during a long session?

Every tool result and conversation turn is appended to the context window, which is re-processed in full on every subsequent call. A session at turn 5 might carry 20,000 tokens; the same session at turn 35 can carry 400,000 tokens or more. Processing time scales with context size, which is why sessions feel fast early and sluggish later. Running /compact mid-session reduces this overhead significantly.

What is a tool call and why does it cause latency?

A tool call is a discrete action Claude takes — reading a file, running a shell command, searching a directory. Each tool call is a synchronous HTTP round trip: Claude requests the action, waits for the result, and only then decides the next step. A task requiring 20 tool calls at 400ms each adds 8 seconds of pure wait time before Claude writes any output.

Does switching to a smaller model fix Claude Code slowness?

Partially. Smaller models like Haiku respond faster in wall clock time, so model switching helps. But if the root cause is tool call chains or context accumulation, a faster model running the same number of tool calls on the same bloated context will still be slow. Fix the tool call problem with better prompts and CLAUDE.md first, then consider model selection.

What does --verbose do in Claude Code?

The --verbose flag prints each tool call as it executes, showing what Claude is reading, running, or searching — and when. It is the primary diagnostic tool for understanding whether slowness is coming from exploratory file reads, long-running shell commands, or model inference time. Run it once on a typical session to understand your personal bottleneck.

How is Claude Code slowness different from Cursor slowness?

The underlying cause is the same — sequential tool calls and context accumulation — but the controls differ. In Cursor, disabling Max mode switches from frontier models to faster mid-tier models. In Claude Code, /model sonnet or /model haiku achieves the same effect. Context management in both tools benefits from starting fresh sessions for unrelated tasks.

When should I use /compact versus /clear?

Use /compact when you need the model to remember prior decisions in the session — architectural choices, debugging context, in-progress feature logic — but the session has grown slow. /compact summarises the transcript into a short working memory. Use /clear when the task has changed completely and prior context is irrelevant. Starting fresh is faster than carrying stale context.

Does CLAUDE.md actually improve speed?

Yes, directly. CLAUDE.md is loaded at session start and tells Claude where things live in your codebase. Without it, Claude often reads multiple files to orient itself before doing any work. A well-structured CLAUDE.md eliminates that exploration phase, reducing tool calls on the first turn of every session. It is both a speed and a quality improvement.


Key Takeaways

  • Claude Code slowness has two distinct root causes: tool call latency (early session, architecture problem) and context accumulation (late session, hygiene problem). Treat them differently.
  • A vague prompt can trigger 15+ unnecessary tool calls. Specific prompts with file paths and function names eliminate the exploration phase entirely.
  • CLAUDE.md is a speed tool. A repo map at session start prevents Claude from navigating your codebase blind on every session.
  • Use --verbose to see exactly what Claude is doing between your prompt and the response. Most slowness becomes obvious immediately.
  • /compact for long sessions where context matters. /clear when the task has changed and context is stale.
  • Parallel Claude Code sessions in separate terminals halve wall clock time for independent workstreams — no configuration required.
  • Model switching helps at the margin. Fixing prompt specificity and context hygiene helps more.
---

CTA

If you have not read the companion post on how to cut Claude Code cost — speed and cost are the same root problem viewed from different angles. The session hygiene and model selection guidance there applies directly to the fixes in this post.

Coming next: a full CLAUDE.md optimisation guide and a practical breakdown of running parallel Claude Code sessions.


References

AITutorialsMay 4, 2026
Share
Aakash Ahuja

About the Author

Aakash builds systems, platforms, and teams that scale (without breaking… usually). He's worked across 15+ industries, led global teams, and delivered multi-million-dollar projects—while still getting his hands dirty in code. He also teaches AI, Big Data, and Reinforcement Learning at top institutes in India.