Complete Visual Guide · 2026

How Claude's Memory Works

A plain-English, visual walkthrough of Claude's 200K token context window and 11 strategies to use it smarter.

TL;DR — 5 things to know right now

  • Claude has a 200,000 token "working memory" — big, but not infinite
  • Every conversation starts completely fresh — Claude remembers nothing from last time
  • You can fake long-term memory using CLAUDE.md files and Skills
  • Loading too much context at once makes Claude worse, not better
  • The 11 strategies below help you stay under the limit and get sharper answers
Foundations

What is Claude really doing?

Imagine a brilliant contractor who has read every programming book, forum post, and open-source project ever written. They show up knowing everything from training. But:

🧠

Cannot Learn

Cannot update its knowledge after training. What you tell it today disappears when the session ends.

📋

Limited Desk Space

Has a fixed "desk" of 200K tokens. Load too much and the oldest info falls off the edge.

🔍

Pattern Matcher

Doesn't "think" — it recognizes patterns from billions of training examples and predicts the best next word.

🔑
Key Insight

Claude doesn't understand your code the way a senior engineer does. It recognizes patterns similar to what it saw during training. That's still incredibly powerful — but it explains why context management matters so much.

Comparison

Your Brain vs Claude

Feature 🧠 Human Brain 🤖 Claude
Working Memory ~7 items at once 200K tokens (~150,000 words)
Long-term Storage Unlimited — grows over lifetime Fixed at training cutoff
Memory Between Sessions Remembers you forever Forgets everything on session end
Learning Continuous — every experience updates you None after training
Creativity True novel ideas from first principles Combines existing patterns (very well)
Speed Slow for new complex problems Instant pattern recall
Availability Tired, emotional, needs sleep 24/7, always consistent

What does "200K tokens" actually mean?

1
token ≈ 4 characters
("the " = 1 token)
750
words per 1,000 tokens
500
pages of text in 200K tokens
1K
tokens per average code file
Capabilities

What Claude Can (and Can't) Do

❌ Cannot Do

Learn from your conversation in real time
Remember previous conversations
Guarantee mathematical accuracy
Run and verify code (without tools)
Browse the web (without tools)
Understand causation (only correlation)
Hold more than ~200K tokens at once
Update its knowledge after training cutoff

✅ Excels At

Pattern recognition at massive scale
Code generation and completion
Refactoring and language translation
Documentation and explanation
Summarization of large texts
Consistent coding standards everywhere
Rapid prototyping and boilerplate
Objective code reviews (no ego)
Token Budget

Your 200K Token Budget — Visualized

Think of your context window like RAM. Load too much and everything slows down. Here's the recommended split:

🟢 Project Context
CLAUDE.md, Tech Stack, Goals
20K
CLAUDE.md • Sprint goals • Architecture overview
🔵 Active Work
Files, Tests, Recent Changes
50K
Files you're editing • Related tests • Recent diffs
🟣 Conversation
Questions & Replies
80K
Your questions • Claude's replies • Generated code
🔴 Reserve Buffer
Tools & Errors
50K
Tool outputs • Error messages • Search results
TOTAL: 200K TOKENS
20K
50K
80K
50K
🟢 Project 🔵 Active Work 🟣 Conversation 🔴 Reserve
⚠️
Watch Out

When context fills up, Claude starts "forgetting" earlier parts of your conversation. Signs: Claude repeats itself, gives vague answers, or ignores things you mentioned earlier.

The Playbook

11 Strategies — Overview

Each strategy solves a specific Claude limitation. Here's the complete map:

📝
Strategy 01

Markdown Files

Write project info once. Claude reads it whenever needed. Works like your notebook.

Memory
📌
Strategy 02

CLAUDE.md

Auto-loaded every session. Tech stack, conventions, current focus — always in context.

Must Have
🏃
Strategy 03

Skills

Document procedures once. Reference by name. No re-explaining deploy steps ever again.

Efficiency
🛠️
Strategy 04

Built-in Tools

Read → Grep → Edit → Bash. Claude uses real file data instead of guessing.

Accuracy
📡
Strategy 05

MCP Servers

Plug in GitHub, Slack, databases. On-demand access — no pasting schemas into chat.

Integrations
💾
Strategy 06

Prompt Caching

Cache stable context. Pay 10% the normal cost when the same block is reused.

Cost Saving
⚙️
Strategy 07

Hooks

Auto-lint on edit. Auto-notify on finish. Reliable automation — no "remember to…" needed.

Automation
🧩
Strategy 08

Extended Thinking

Claude reasons silently before answering. Better results for complex, high-stakes problems.

Deep Work
👥
Strategy 09

Sub-Agents

Parallel workers with separate 200K contexts. Divide big tasks. Keep main context clean.

Scale
🗺️
Strategy 10

Plan vs Edit Mode

Blueprints before construction (Plan). Touch-ups on existing walls (Edit). Right tool, right job.

Process
🔎
Strategy 11

RAG

Search first, load only what's relevant. 5K tokens instead of 60K. See diagram below.

Large Codebases
Strategy 11 — Deep Dive

How RAG Works

RAG = Retrieval-Augmented Generation. Instead of loading your entire codebase, Claude searches for only the relevant files — then loads just those.

The analogy: A library with 10,000 books. You could carry all 10,000 to your desk. Or search the catalog, find 5 books, carry just those. RAG is the catalog search.

💬 You ask:
"How does authentication work?"
🔢 Your question becomes a vector
(a mathematical fingerprint)
🗄️ Vector database scans all files
finds the 5 closest matches to your question
📂 Loads only those 5 files
5K tokens instead of 60K
✅ Accurate, focused answer
195K tokens still available

When do you need RAG?

Project Size Use RAG?
Small (< 20 files) No — fits easily
Medium (20–100 files) Yes — saves context
Large (100+ files) Essential
Documentation search Yes
Legacy codebase Essential

Popular RAG tools

ChromaDB
Open-source, runs locally
Start Here
Pinecone
Managed cloud, easy scaling
Production
LlamaIndex
Framework to connect RAG to Claude
Framework
Strategy 09 — Deep Dive

Sub-Agents: Divide and Conquer

Claude can spawn independent "helper agents" — each with their own fresh 200K context — to work on parts of a big task in parallel. The messy exploratory work stays inside each sub-agent's isolated context, keeping your main conversation clean.

🤖 Main Agent (orchestrator)
🟢 Sub-Agent 1
"Audit auth module for security issues"
🔵 Sub-Agent 2
"Audit payments module for security issues"
🟣 Sub-Agent 3
"Audit API layer for security issues"
📊 Main Agent synthesizes all findings

Why this matters for context

❌ Without Sub-Agents

All exploratory searching, file reading, and intermediate results fill up your main context window — leaving less space for the actual answers.

✅ With Sub-Agents

Each sub-agent's messy exploratory work stays in its own isolated 200K window. Your main conversation stays clean and focused.

Best use cases

Auditing multiple modules at the same time
Running independent analyses in parallel
Delegating well-defined research tasks
Processing multiple documents simultaneously
🎯
Quick Action

Next time you have "review all 5 modules for issues," ask Claude to use a sub-agent per module. Faster results, cleaner conversation.

Strategy 10 — Deep Dive

Plan Mode vs Edit Mode

Two different ways Claude approaches a task. Think: blueprints before construction (Plan) vs painting an existing wall (Edit).

🗺️

Plan Mode

Think before you type

  • 1 Claude reads your codebase holistically
  • 2 Creates implementation_plan.md
  • 3 Lists all risks and dependencies
  • 4 Waits for your approval
  • 5 Implements phase by phase
Use for:
New features · Refactoring · Architecture · Cross-file changes
⏱ Slower, deliberate (System 2 thinking)
VS
✏️

Edit Mode

Direct, targeted changes

  • 1 Claude reads the specific file
  • 2 Makes the precise edit
  • 3 Done
Use for:
Bug fixes · Small refactors · Doc updates · Known implementations
⚡ Fast, automatic (System 1 thinking)
⚠️
Watch Out

Never use Edit Mode for architecture-level changes. If the change touches more than 3 files or affects how different parts of your system connect, use Plan Mode first.

Real-World Example

Context Done Wrong vs Right

Building an authentication system. Same goal — completely different results depending on context management.

❌ The Wrong Way — Context Overload

"Build me an authentication system"
All user files (20 files)
30K
All API endpoints (15 files)
20K
All test files (25 files)
25K
Database models (10 files)
10K
Documentation (5 files)
15K
Result:
75 files = 100K tokens consumed upfront
Only 100K left for actual conversation
→ Messy, scattered, incomplete responses

✅ The Right Way — Structured Context

📋 Conversation 1: Planning — 20K tokens
Plan Mode → create implementation_plan.md
🗄️ Conversation 2: Models & DB — 40K tokens
Implement Phase 1 from the plan
🔐 Conversation 3: Auth Service — 35K tokens
JWT tokens, password hashing, endpoints
🧪 Conversation 4: Testing — 30K tokens
Run tests, verify everything works
Result:
✅ Each conversation uses <50% context
✅ Clear documentation trail
✅ Fully working auth system with tests
Which Model to Use

Choosing the Right Claude

All Claude models share 200K token context. Pick based on the task complexity, not the context size.

🐇 Haiku 4.5
Fast · Cheap
High-volume
Quick bug fix · Boilerplate · Simple questions · Automation
⚡ Sonnet 4.6
Balanced
Everyday
Feature work · Code review · Refactoring · Daily coding
🧠 Opus 4.7
Most capable
Extended Thinking
Complex architecture · Security analysis · Hard decisions · Extended Thinking
🐇
Haiku 4.5
Quick tasks
High-volume automation
Sonnet 4.6
Feature development
Balanced performance
🧠
Opus 4.7
Hard decisions
Extended Thinking
Best Practices

The Golden Rules

✅ DO These Things

Keep CLAUDE.md updated at project root
Use Plan Mode for complex or architectural work
Provide clear, scoped context in requests
Use Hooks for automation you need every time
Verify all AI-generated code before shipping
Break large tasks into focused conversations
Use Sub-Agents for parallel, independent work
Cache large, stable context when using the API

❌ DON'T Do These Things

Dump entire codebases — use RAG instead
Skip planning for big architectural changes
Mix multiple unrelated concerns in one chat
Trust AI output blindly without testing
Let decisions live only in chat history
Use Edit Mode for multi-file architecture changes
Ask Claude to "remember" things for next time
Let AI make final architectural decisions
Working Together

The Human–Claude Partnership

The best results happen when you understand who does what. This is not about replacing you — it's about multiplying what you can do in a day.

🧠

You Excel At

  • Strategic thinking and vision
  • Understanding stakeholder needs
  • Creative problem-solving
  • Recognizing subtle domain bugs
  • Final verification and sign-off
  • Accountability
+
🤖

Claude Excels At

  • Pattern recognition at scale
  • Code generation and refactoring
  • Consistency across large codebases
  • Rapid prototyping
  • Recall from vast training data
  • 24/7 availability
You provide the vision and the verification.
Claude provides the velocity and the consistency.
Start Today

Your Action Plan — Week by Week

You don't need all 11 strategies today. Start with the highest-impact ones and build up gradually.

🟢 Week 1 — Foundation (30 min)
Create CLAUDE.md at your project root with tech stack, conventions, and current focus
Document one procedure you've explained more than once as a Skill in .claude/skills/
🔵 Week 2 — Automation (1 hr)
Try Plan Mode for your next feature — "Analyze and create a plan first before writing any code"
Set up one Hook in .claude/settings.json — start with auto-lint on file edit
🟣 Week 3 — Scale (2-3 hrs)
Set up ChromaDB (RAG) if you have 50+ files in your project
Enable Prompt Caching in your API calls if you send the same system prompt repeatedly

Quick Reference — When to Use What

Task Strategy to Use
New feature or big refactorPlan Mode → Edit Mode
Quick bug fixEdit Mode directly
Repeated procedure (deploy, test, etc.)Skill file in .claude/skills/
Large codebase navigationRAG + selective loading
Live external data (DB schema, APIs)MCP Server
Complex reasoning, architectureExtended Thinking (Opus 4.7)
Multiple independent analysesSub-Agents
Automation after every editHooks
Repeated large context in APIPrompt Caching
Context overloaded or confused"Summarize to PROGRESS.md → fresh chat"
"The best code is not written — it's orchestrated."
You're the conductor. Claude is your orchestra.
Give it the right score, and the music writes itself.