Complete Visual Guide · 2026

How Claude's Memory Works

A plain-English, visual walkthrough of Claude's 200K token context window and 11 strategies to use it smarter.

TL;DR — 5 things to know right now

Claude has a 200,000 token "working memory" — big, but not infinite
Every conversation starts completely fresh — Claude remembers nothing from last time
You can fake long-term memory using CLAUDE.md files and Skills
Loading too much context at once makes Claude worse, not better
The 11 strategies below help you stay under the limit and get sharper answers

Foundations

What is Claude really doing?

Imagine a brilliant contractor who has read every programming book, forum post, and open-source project ever written. They show up knowing everything from training. But:

🧠

Cannot Learn

Cannot update its knowledge after training. What you tell it today disappears when the session ends.

📋

Limited Desk Space

Has a fixed "desk" of 200K tokens. Load too much and the oldest info falls off the edge.

🔍

Pattern Matcher

Doesn't "think" — it recognizes patterns from billions of training examples and predicts the best next word.

🔑

Key Insight

Claude doesn't understand your code the way a senior engineer does. It recognizes patterns similar to what it saw during training. That's still incredibly powerful — but it explains why context management matters so much.

Comparison

Your Brain vs Claude

Feature	🧠 Human Brain	🤖 Claude
Working Memory	~7 items at once	200K tokens (~150,000 words)
Long-term Storage	Unlimited — grows over lifetime	Fixed at training cutoff
Memory Between Sessions	Remembers you forever	Forgets everything on session end
Learning	Continuous — every experience updates you	None after training
Creativity	True novel ideas from first principles	Combines existing patterns (very well)
Speed	Slow for new complex problems	Instant pattern recall
Availability	Tired, emotional, needs sleep	24/7, always consistent

What does "200K tokens" actually mean?

token ≈ 4 characters
("the " = 1 token)

750

words per 1,000 tokens

500

pages of text in 200K tokens

tokens per average code file

Capabilities

What Claude Can (and Can't) Do

❌ Cannot Do

✗ Learn from your conversation in real time

✗ Remember previous conversations

✗ Guarantee mathematical accuracy

✗ Run and verify code (without tools)

✗ Browse the web (without tools)

✗ Understand causation (only correlation)

✗ Hold more than ~200K tokens at once

✗ Update its knowledge after training cutoff

✅ Excels At

✓ Pattern recognition at massive scale

✓ Code generation and completion

✓ Refactoring and language translation

✓ Documentation and explanation

✓ Summarization of large texts

✓ Consistent coding standards everywhere

✓ Rapid prototyping and boilerplate

✓ Objective code reviews (no ego)

Token Budget

Your 200K Token Budget — Visualized

Think of your context window like RAM. Load too much and everything slows down. Here's the recommended split:

🟢 Project Context

CLAUDE.md, Tech Stack, Goals

20K

CLAUDE.md • Sprint goals • Architecture overview

🔵 Active Work

Files, Tests, Recent Changes

50K

Files you're editing • Related tests • Recent diffs

🟣 Conversation

Questions & Replies

80K

Your questions • Claude's replies • Generated code

🔴 Reserve Buffer

Tools & Errors

50K

Tool outputs • Error messages • Search results

TOTAL: 200K TOKENS

20K

50K

80K

50K

🟢 Project 🔵 Active Work 🟣 Conversation 🔴 Reserve

⚠️

Watch Out

When context fills up, Claude starts "forgetting" earlier parts of your conversation. Signs: Claude repeats itself, gives vague answers, or ignores things you mentioned earlier.

The Playbook

11 Strategies — Overview

Each strategy solves a specific Claude limitation. Here's the complete map:

📝

Strategy 01

Markdown Files

Write project info once. Claude reads it whenever needed. Works like your notebook.

Memory

📌

Strategy 02

CLAUDE.md

Auto-loaded every session. Tech stack, conventions, current focus — always in context.

Must Have

🏃

Strategy 03

Skills

Document procedures once. Reference by name. No re-explaining deploy steps ever again.

Efficiency

🛠️

Strategy 04

Built-in Tools

Read → Grep → Edit → Bash. Claude uses real file data instead of guessing.

Accuracy

📡

Strategy 05

MCP Servers

Plug in GitHub, Slack, databases. On-demand access — no pasting schemas into chat.

Integrations

💾

Strategy 06

Prompt Caching

Cache stable context. Pay 10% the normal cost when the same block is reused.

Cost Saving

⚙️

Strategy 07

Hooks

Auto-lint on edit. Auto-notify on finish. Reliable automation — no "remember to…" needed.

Automation

🧩

Strategy 08

Extended Thinking

Claude reasons silently before answering. Better results for complex, high-stakes problems.

Deep Work

👥

Strategy 09

Sub-Agents

Parallel workers with separate 200K contexts. Divide big tasks. Keep main context clean.

Scale

🗺️

Strategy 10

Plan vs Edit Mode

Blueprints before construction (Plan). Touch-ups on existing walls (Edit). Right tool, right job.

Process

🔎

Strategy 11

RAG

Search first, load only what's relevant. 5K tokens instead of 60K. See diagram below.

Large Codebases

Strategy 11 — Deep Dive

How RAG Works

RAG = Retrieval-Augmented Generation. Instead of loading your entire codebase, Claude searches for only the relevant files — then loads just those.

The analogy: A library with 10,000 books. You could carry all 10,000 to your desk. Or search the catalog, find 5 books, carry just those. RAG is the catalog search.

💬 You ask:
"How does authentication work?"

↓

🔢 Your question becomes a vector
(a mathematical fingerprint)

↓

🗄️ Vector database scans all files
finds the 5 closest matches to your question

↓

📂 Loads only those 5 files
5K tokens instead of 60K

↓

✅ Accurate, focused answer
195K tokens still available

When do you need RAG?

Project Size	Use RAG?
Small (< 20 files)	No — fits easily
Medium (20–100 files)	Yes — saves context
Large (100+ files)	Essential
Documentation search	Yes
Legacy codebase	Essential

Popular RAG tools

ChromaDB
Open-source, runs locally

Start Here

Pinecone
Managed cloud, easy scaling

Production

LlamaIndex
Framework to connect RAG to Claude

Framework

Strategy 09 — Deep Dive

Sub-Agents: Divide and Conquer

Claude can spawn independent "helper agents" — each with their own fresh 200K context — to work on parts of a big task in parallel. The messy exploratory work stays inside each sub-agent's isolated context, keeping your main conversation clean.

🤖 Main Agent (orchestrator)

🟢 Sub-Agent 1
"Audit auth module for security issues"

🔵 Sub-Agent 2
"Audit payments module for security issues"

🟣 Sub-Agent 3
"Audit API layer for security issues"

📊 Main Agent synthesizes all findings

Why this matters for context

❌ Without Sub-Agents

All exploratory searching, file reading, and intermediate results fill up your main context window — leaving less space for the actual answers.

✅ With Sub-Agents

Each sub-agent's messy exploratory work stays in its own isolated 200K window. Your main conversation stays clean and focused.

Best use cases

✓ Auditing multiple modules at the same time

✓ Running independent analyses in parallel

✓ Delegating well-defined research tasks

✓ Processing multiple documents simultaneously

🎯

Quick Action

Next time you have "review all 5 modules for issues," ask Claude to use a sub-agent per module. Faster results, cleaner conversation.

Strategy 10 — Deep Dive

Plan Mode vs Edit Mode

Two different ways Claude approaches a task. Think: blueprints before construction (Plan) vs painting an existing wall (Edit).

🗺️

Plan Mode

Think before you type

1 Claude reads your codebase holistically
2 Creates implementation_plan.md
3 Lists all risks and dependencies
4 Waits for your approval
5 Implements phase by phase

Use for:
New features · Refactoring · Architecture · Cross-file changes

⏱ Slower, deliberate (System 2 thinking)

✏️

Edit Mode

Direct, targeted changes

1 Claude reads the specific file
2 Makes the precise edit
3 Done

Use for:
Bug fixes · Small refactors · Doc updates · Known implementations

⚡ Fast, automatic (System 1 thinking)

⚠️

Watch Out

Never use Edit Mode for architecture-level changes. If the change touches more than 3 files or affects how different parts of your system connect, use Plan Mode first.

Real-World Example

Context Done Wrong vs Right

Building an authentication system. Same goal — completely different results depending on context management.

❌ The Wrong Way — Context Overload

"Build me an authentication system"

All user files (20 files)

30K

All API endpoints (15 files)

20K

All test files (25 files)

25K

Database models (10 files)

10K

Documentation (5 files)

15K

Result:

75 files = 100K tokens consumed upfront

Only 100K left for actual conversation

→ Messy, scattered, incomplete responses

✅ The Right Way — Structured Context

📋 Conversation 1: Planning — 20K tokens
Plan Mode → create implementation_plan.md

↓

🗄️ Conversation 2: Models & DB — 40K tokens
Implement Phase 1 from the plan

↓

🔐 Conversation 3: Auth Service — 35K tokens
JWT tokens, password hashing, endpoints

↓

🧪 Conversation 4: Testing — 30K tokens
Run tests, verify everything works

Result:

✅ Each conversation uses <50% context

✅ Clear documentation trail

✅ Fully working auth system with tests

Which Model to Use

Choosing the Right Claude

All Claude models share 200K token context. Pick based on the task complexity, not the context size.

🐇 Haiku 4.5

Fast · Cheap
High-volume

Quick bug fix · Boilerplate · Simple questions · Automation

⚡ Sonnet 4.6

Balanced
Everyday

Feature work · Code review · Refactoring · Daily coding

🧠 Opus 4.7

Most capable
Extended Thinking

Complex architecture · Security analysis · Hard decisions · Extended Thinking

🐇

Haiku 4.5

Quick tasks
High-volume automation

⚡

Sonnet 4.6

Feature development
Balanced performance

🧠

Opus 4.7

Hard decisions
Extended Thinking

Best Practices

The Golden Rules

✅ DO These Things

✓ Keep CLAUDE.md updated at project root

✓ Use Plan Mode for complex or architectural work

✓ Provide clear, scoped context in requests

✓ Use Hooks for automation you need every time

✓ Verify all AI-generated code before shipping

✓ Break large tasks into focused conversations

✓ Use Sub-Agents for parallel, independent work

✓ Cache large, stable context when using the API

❌ DON'T Do These Things

✗ Dump entire codebases — use RAG instead

✗ Skip planning for big architectural changes

✗ Mix multiple unrelated concerns in one chat

✗ Trust AI output blindly without testing

✗ Let decisions live only in chat history

✗ Use Edit Mode for multi-file architecture changes

✗ Ask Claude to "remember" things for next time

✗ Let AI make final architectural decisions

Working Together

The Human–Claude Partnership

The best results happen when you understand who does what. This is not about replacing you — it's about multiplying what you can do in a day.

🧠

You Excel At

→ Strategic thinking and vision
→ Understanding stakeholder needs
→ Creative problem-solving
→ Recognizing subtle domain bugs
→ Final verification and sign-off
→ Accountability

🤖

Claude Excels At

→ Pattern recognition at scale
→ Code generation and refactoring
→ Consistency across large codebases
→ Rapid prototyping
→ Recall from vast training data
→ 24/7 availability

You provide the vision and the verification.
Claude provides the velocity and the consistency.

Start Today

Your Action Plan — Week by Week

You don't need all 11 strategies today. Start with the highest-impact ones and build up gradually.

🟢 Week 1 — Foundation (30 min)

Create CLAUDE.md at your project root with tech stack, conventions, and current focus

Document one procedure you've explained more than once as a Skill in .claude/skills/

🔵 Week 2 — Automation (1 hr)

Try Plan Mode for your next feature — "Analyze and create a plan first before writing any code"

Set up one Hook in .claude/settings.json — start with auto-lint on file edit

🟣 Week 3 — Scale (2-3 hrs)

Set up ChromaDB (RAG) if you have 50+ files in your project

Enable Prompt Caching in your API calls if you send the same system prompt repeatedly

Quick Reference — When to Use What

Task	Strategy to Use
New feature or big refactor	Plan Mode → Edit Mode
Quick bug fix	Edit Mode directly
Repeated procedure (deploy, test, etc.)	Skill file in .claude/skills/
Large codebase navigation	RAG + selective loading
Live external data (DB schema, APIs)	MCP Server
Complex reasoning, architecture	Extended Thinking (Opus 4.7)
Multiple independent analyses	Sub-Agents
Automation after every edit	Hooks
Repeated large context in API	Prompt Caching
Context overloaded or confused	"Summarize to PROGRESS.md → fresh chat"