Mastering Context Management

Your Brain's New Power Tool

Claude's 200K Token Window

A comprehensive guide to working smarter with AI

Press → or use arrow keys to continue

Your Brain vs Claude

🧠

Human Brain

Working Memory: 7±2 items
Unlimited long-term storage
Creates new connections
Learns continuously
True creative thinking
VS
🤖

Claude (LLM)

Context: 200K tokens
Fixed training knowledge
Pattern recognition at scale
No session memory
Probabilistic output

What is a Large Language Model?

Think of it as...

📚 A collaborator who has processed millions of books and repositories

🎯 Trained to predict the next token based on patterns

🔍 Recognizes patterns from vast training data

⚡ Responds instantly but doesn't truly "understand" in a human sense

🎭 Generates plausible output through statistical probability

Key Insight: It's not thinking — it's pattern matching at massive scale

What LLMs Cannot Do

Formal proofs or deterministic logic
Real-time learning during conversation
Form lasting memories across sessions
Verify code executes correctly (without tools)
Browse the web (without tools)
Guarantee mathematical accuracy
Remember previous conversations
Update base knowledge after training
Understand causation (only correlation)
Hold more than ~200K tokens in context
Note: Claude 4.x with Extended Thinking significantly improves multi-step reasoning — covered later in this deck

Overcoming Limitations with Context Engineering

Limitation Context Engineering Solution
❌ Probabilistic reasoning ✅ Extended Thinking — enable reasoning tokens for complex problems
❌ No real-time learning ✅ Maintain CLAUDE.md — update project context after each session
❌ No lasting memory ✅ Skills & Markdown docs — document decisions and lessons in persistent files
❌ Can't verify execution ✅ Bash tool — execute and verify code immediately
❌ No browsing (standalone) ✅ MCP Servers / WebFetch — integrate real-time external data
❌ Context window limits ✅ RAG + Sub-Agents — load only relevant files; isolate large tasks
❌ No persistent state ✅ Session summaries — end with "save to PROGRESS.md", start new with context
❌ Expensive repeated context ✅ Prompt Caching — cache stable context at ~10% of normal token cost

What LLMs Excel At

Pattern recognition at scale
Code generation & completion
Refactoring & language translation
Documentation & explanation
Summarization
Consistency enforcement across codebases
Rapid prototyping
Multi-language proficiency
24/7 availability
Objective code reviews

Context Windows: Model Comparison

Model Context Window Best For
Claude Opus 4.7 200K tokens Complex reasoning, architecture decisions
Claude Sonnet 4.6 200K tokens Balanced everyday coding tasks
Claude Haiku 4.5 200K tokens Fast, lightweight, high-volume automation
GPT-4o 128K tokens General purpose, multimodal
Gemini 1.5 Pro 1M+ tokens Extremely large documents
Pick the right Claude model: Quick tasks → Haiku  |  Feature work → Sonnet  |  Complex decisions → Opus

Your 200K Token Budget

Project Context (20K)
CLAUDE.md, Sprint Goals, Tech Stack
Active Work (50K)
Files Being Modified, Tests, Recent Changes
Conversation (80K)
Q&A, Generated Code, Debugging History
Reserve (50K)
Tool Outputs, Errors, Search Results, Buffer
Smart allocation = Better, more coherent responses

Strategy #1: Markdown Files

Your Long-Term Memory

README.md — Architecture overview

CONVENTIONS.md — Coding standards

DECISIONS.md — Why we chose X

TROUBLESHOOTING.md — Common fixes

Benefits

Human & AI readable
Version controlled with git
Searchable with Grep
Universal format — works everywhere
Brain Analogy: Notebooks you reference when needed — not carried everywhere

Strategy #2: CLAUDE.md (Working Memory) UPDATED

Placed at project root — Claude Code reads it automatically every session

Also supports ~/.claude/CLAUDE.md for global preferences

Tech stack & conventions
Common commands (dev, test, migrate)
Current sprint goals & active context
Important rules (never commit .env, etc.)
# Project: MyApp ## Tech Stack Backend: FastAPI + PostgreSQL Frontend: React + TypeScript ## Conventions - Use async/await for DB ops - Follow PEP 8 - Functional React components ## Common Commands - Dev: `npm run dev` - Tests: `pytest -v` - Migrate: `alembic upgrade head` ## Active Context Working on: backend/auth/service.py Next: Implement refresh token logic

Strategy #3: Skills (Procedural Memory) UPDATED

What are Skills?

Step-by-step documented procedures
Reusable across conversations
Located in .claude/skills/
Reference by name — no re-explaining
Correct path: .claude/skills/[skill-name]/SKILL.md

Examples

deploy-to-production.md
run-test-suite.md
debug-workflow.md
create-feature.md
security-review.md
Brain Analogy: Muscle memory — execute without cluttering working memory

Strategy #4: Tools (Actual Claude Code Tools) UPDATED

Tool Purpose Example Use
Read Read file contents Read source file before editing
Edit Precise string replacement Fix a specific bug in a file
Write Create or overwrite files Create a new component
Bash Run shell commands Run tests, git operations
Grep Search file contents with regex Find all usages of a function
Glob Find files by pattern List all *.test.ts files
WebFetch / WebSearch Fetch URLs or search the web Read live API documentation
Golden Rule: Use tools to gather current state, let Claude do the analysis

Strategy #5: MCP Servers

💾

Database MCP

Query schemas, run SQL, analyze query performance

🐙

GitHub MCP

Create issues/PRs, search repos, read commit history

💬

Slack MCP

Send messages, read channels, create notifications

📁

Google Drive MCP

Read/write documents, search files

🏢

Custom Domain MCP

Company APIs, internal tools, legacy systems

💡

Context Benefit

On-demand access — no need to paste entire schemas into context

Brain Analogy: Experts on speed dial — call them when needed

Strategy #6: Prompt Caching NEW

How It Works

Mark stable context as cacheable
First request: Claude processes & caches block (5-min TTL)
Subsequent requests: reuse cache — only new parts processed
Cached tokens cost ~10% of normal price
100K-token context reused 10× = ~9× cost reduction

What to Cache

System prompt + conventions (sent every message)
Large codebase loaded for repeated analysis
Documentation blocks referenced across turns
Don't cache: per-message user input
Use "cache_control": {"type": "ephemeral"} in API calls to mark cacheable blocks

Strategy #7: Hooks — Automated Workflows NEW

What Are Hooks?

Shell commands that run automatically in response to Claude Code events — configured in .claude/settings.json

PreToolUse — before Claude calls a tool
PostToolUse — after a tool call completes
Stop — when Claude finishes a response
Notification — when Claude sends a notification
Key rule: If you want something to happen every time, use a hook — not a CLAUDE.md note

Practical Examples

Auto-lint / format after every Edit
Run tests after code changes
Desktop notification when Claude finishes
Log all tool calls for auditing
Block dangerous commands via PreToolUse
{ "hooks": { "PostToolUse": [{ "matcher": "Edit", "hooks": [{"type": "command", "command": "npm run lint --fix"}] }] } }

Strategy #8: Extended Thinking NEW

What Is It?

Dedicated reasoning tokens Claude uses before producing its final answer — enabling deeper, more reliable multi-step analysis

Complex debugging with multiple interacting components
Architectural decisions with many tradeoffs
Security analysis of intricate systems
Algorithm design and optimization
Best model: Claude Opus 4.7
Tradeoff: Uses more tokens and takes longer — use for high-stakes decisions

How to Request It

In conversation:

"Think carefully about this before responding: what are the architectural tradeoffs between approach A and B?"

Via API:

thinking={ "type": "enabled", "budget_tokens": 10000 }

Strategy #9: Sub-Agents — Parallel Workers NEW

What Are Sub-Agents?

Claude spawns independent agents with their own isolated context windows to handle parallel or isolated subtasks

Each sub-agent has its own 200K context
Prevents exploratory work polluting main context
Run multiple analyses in parallel
Main agent synthesizes results
Context benefit: Large file searches & reads stay in sub-agent context — your main conversation stays clean

Example Pattern

Main Agent ├── Sub-agent 1: │ "Audit auth module │ for security issues" │ ├── Sub-agent 2: │ "Audit payments module │ for security issues" │ └── Sub-agent 3: "Audit API layer for security issues" ↓ Main Agent synthesizes all three findings

Strategy #10: Plan vs Edit Mode

Plan Mode

When: New features, refactoring, architecture

What: Analysis → Plan → Approval → Code

Analyzes codebase holistically
Creates implementation_plan.md
Identifies risks & dependencies
Waits for your approval

Speed: Slower, deliberate (System 2)

Edit Mode

When: Bug fixes, small refactors, docs updates

What: Immediate precise file edits

Fast iterations
Direct targeted edits
Single-file focus
Immediate changes

Speed: Fast, automatic (System 1)

Strategy #11: RAG (Retrieval-Augmented Generation)

Your Question: "How does auth work?"
Vector Search finds 5 most relevant files
Only 5K tokens loaded (not 60K!)
Claude analyzes with focused, minimal context
Comprehensive answer — 195K tokens preserved

When to Use RAG?

Project Size Use RAG? Reason
Small (<20 files) No Everything fits comfortably in context
Medium (20–100 files) Yes Selective loading saves significant context
Large (100+ files) Essential Only efficient path forward
Documentation Search Yes Find answers without reading everything
Legacy Codebase Essential Navigate unfamiliar code quickly
Tools: Pinecone, Weaviate, ChromaDB, FAISS  |  Frameworks: LangChain, LlamaIndex

The Golden Rules UPDATED

DO's

✓ Keep CLAUDE.md updated at project root

✓ Use Plan Mode for complex & architectural work

✓ Provide clear, scoped context in requests

✓ Use Hooks for automated repeatable actions

✓ Verify all AI-generated code before shipping

✓ Break large tasks into focused conversations

✓ Use Sub-Agents for parallel analysis

✓ Cache large, stable context to reduce cost

DON'Ts

✗ Dump entire codebases — use RAG instead

✗ Skip planning for big architectural changes

✗ Mix multiple unrelated concerns in one chat

✗ Trust AI output blindly — always test

✗ Rely on conversation history as documentation

✗ Use Edit Mode for multi-file architecture changes

✗ Ask Claude to "remember" things — use hooks or CLAUDE.md

✗ Let AI make final architectural decisions

Inefficient Approach (Context Overload)

Request: "Build me an authentication system"

❌ Loads ALL user files (20 files, 30K tokens)

❌ Loads ALL API endpoints (15 files, 20K tokens)

❌ Loads ALL tests (25 files, 25K tokens)

❌ Loads database models (10 files, 10K tokens)

❌ Loads documentation (5 files, 15K tokens)

Total: 75 files, 100K tokens consumed upfront

Only 100K left for actual conversation

Result: Messy, scattered, incomplete responses

Efficient Approach (Structured Context)

Conversation 1: Planning (20K)

Create implementation_plan.md in Plan Mode

Conversation 2: Models & DB (40K)

Implement Phase 1 from plan

Conversation 3: Auth Service (35K)

JWT, password hashing, endpoints

Conversation 4: Testing (30K)

Verify everything works end-to-end

✅ Each conversation uses <50% context

✅ Clear documentation trail — easy to review and debug

✅ Fully working auth system delivered

The Human-AI Partnership

🧠

You Excel At:

Strategic thinking & vision
Business context & stakeholder needs
Creative problem-solving
Recognizing subtle domain bugs
Final verification & accountability
🤖

Claude Excels At:

Pattern recognition at scale
Code generation & refactoring
Consistency across large codebases
Rapid prototyping
Recall from vast training data
You provide the vision & verification
Claude provides the velocity & consistency

Code Smarter, Not Harder

"The best code is not written — it's orchestrated."

You're the conductor   Claude is your orchestra

Your Next Steps:

✅ Create CLAUDE.md at your project root using the template
✅ Document one common procedure as a skill in .claude/skills/
✅ Try Plan Mode for your next feature
✅ Configure at least one hook (e.g. auto-lint on edit)
✅ Set up RAG if you have 50+ files
1 /