Mastering Context Management

Your Brain's New Power Tool

Claude's 200K Token Window

A comprehensive guide to working smarter with AI

Press → or use arrow keys to continue

Your Brain vs Claude

🧠

Human Brain

Working Memory: 7±2 items

Unlimited long-term storage

Creates new connections

Learns continuously

True creative thinking

VS

🤖

Claude (LLM)

Context: 200K tokens

Fixed training knowledge

Pattern recognition at scale

No session memory

Probabilistic output

What is a Large Language Model?

Think of it as...

📚 A collaborator who has processed millions of books and repositories

🎯 Trained to predict the next token based on patterns

🔍 Recognizes patterns from vast training data

⚡ Responds instantly but doesn't truly "understand" in a human sense

🎭 Generates plausible output through statistical probability

Key Insight: It's not thinking — it's pattern matching at massive scale

What LLMs Cannot Do

✗ Formal proofs or deterministic logic

✗ Real-time learning during conversation

✗ Form lasting memories across sessions

✗ Verify code executes correctly (without tools)

✗ Browse the web (without tools)

✗ Guarantee mathematical accuracy

✗ Remember previous conversations

✗ Update base knowledge after training

✗ Understand causation (only correlation)

✗ Hold more than ~200K tokens in context

Note: Claude 4.x with Extended Thinking significantly improves multi-step reasoning — covered later in this deck

Overcoming Limitations with Context Engineering

Limitation	Context Engineering Solution
❌ Probabilistic reasoning	✅ Extended Thinking — enable reasoning tokens for complex problems
❌ No real-time learning	✅ Maintain CLAUDE.md — update project context after each session
❌ No lasting memory	✅ Skills & Markdown docs — document decisions and lessons in persistent files
❌ Can't verify execution	✅ Bash tool — execute and verify code immediately
❌ No browsing (standalone)	✅ MCP Servers / WebFetch — integrate real-time external data
❌ Context window limits	✅ RAG + Sub-Agents — load only relevant files; isolate large tasks
❌ No persistent state	✅ Session summaries — end with "save to PROGRESS.md", start new with context
❌ Expensive repeated context	✅ Prompt Caching — cache stable context at ~10% of normal token cost

What LLMs Excel At

✓ Pattern recognition at scale

✓ Code generation & completion

✓ Refactoring & language translation

✓ Documentation & explanation

✓ Summarization

✓ Consistency enforcement across codebases

✓ Rapid prototyping

✓ Multi-language proficiency

✓ 24/7 availability

✓ Objective code reviews

Context Windows: Model Comparison

Model	Context Window	Best For
Claude Opus 4.7	200K tokens	Complex reasoning, architecture decisions
Claude Sonnet 4.6	200K tokens	Balanced everyday coding tasks
Claude Haiku 4.5	200K tokens	Fast, lightweight, high-volume automation
GPT-4o	128K tokens	General purpose, multimodal
Gemini 1.5 Pro	1M+ tokens	Extremely large documents

Pick the right Claude model: Quick tasks → Haiku | Feature work → Sonnet | Complex decisions → Opus

Your 200K Token Budget

Project Context (20K)

CLAUDE.md, Sprint Goals, Tech Stack

Active Work (50K)

Files Being Modified, Tests, Recent Changes

Conversation (80K)

Q&A, Generated Code, Debugging History

Reserve (50K)

Tool Outputs, Errors, Search Results, Buffer

Smart allocation = Better, more coherent responses

Strategy #1: Markdown Files

Your Long-Term Memory

README.md — Architecture overview

CONVENTIONS.md — Coding standards

DECISIONS.md — Why we chose X

TROUBLESHOOTING.md — Common fixes

Benefits

Human & AI readable

Version controlled with git

Searchable with Grep

Universal format — works everywhere

Brain Analogy: Notebooks you reference when needed — not carried everywhere

Strategy #2: CLAUDE.md (Working Memory) UPDATED

Placed at project root — Claude Code reads it automatically every session

Also supports ~/.claude/CLAUDE.md for global preferences

Tech stack & conventions

Common commands (dev, test, migrate)

Current sprint goals & active context

Important rules (never commit .env, etc.)

# Project: MyApp

## Tech Stack
Backend: FastAPI + PostgreSQL
Frontend: React + TypeScript

## Conventions
- Use async/await for DB ops
- Follow PEP 8
- Functional React components

## Common Commands
- Dev: `npm run dev`
- Tests: `pytest -v`
- Migrate: `alembic upgrade head`

## Active Context
Working on: backend/auth/service.py
Next: Implement refresh token logic
                    

Strategy #3: Skills (Procedural Memory) UPDATED

What are Skills?

Step-by-step documented procedures

Reusable across conversations

Located in .claude/skills/

Reference by name — no re-explaining

Correct path: .claude/skills/[skill-name]/SKILL.md

Examples

deploy-to-production.md

run-test-suite.md

debug-workflow.md

create-feature.md

security-review.md

Brain Analogy: Muscle memory — execute without cluttering working memory

Strategy #4: Tools (Actual Claude Code Tools) UPDATED

Tool	Purpose	Example Use
Read	Read file contents	Read source file before editing
Edit	Precise string replacement	Fix a specific bug in a file
Write	Create or overwrite files	Create a new component
Bash	Run shell commands	Run tests, git operations
Grep	Search file contents with regex	Find all usages of a function
Glob	Find files by pattern	List all `*.test.ts` files
WebFetch / WebSearch	Fetch URLs or search the web	Read live API documentation

Golden Rule: Use tools to gather current state, let Claude do the analysis

Strategy #5: MCP Servers

💾

Database MCP

Query schemas, run SQL, analyze query performance

🐙

GitHub MCP

Create issues/PRs, search repos, read commit history

💬

Slack MCP

Send messages, read channels, create notifications

📁

Google Drive MCP

Read/write documents, search files

🏢

Custom Domain MCP

Company APIs, internal tools, legacy systems

💡

Context Benefit

On-demand access — no need to paste entire schemas into context

Brain Analogy: Experts on speed dial — call them when needed

Strategy #6: Prompt Caching NEW

How It Works

Mark stable context as cacheable

First request: Claude processes & caches block (5-min TTL)

Subsequent requests: reuse cache — only new parts processed

Cached tokens cost ~10% of normal price

100K-token context reused 10× = ~9× cost reduction

What to Cache

System prompt + conventions (sent every message)

Large codebase loaded for repeated analysis

Documentation blocks referenced across turns

✗ Don't cache: per-message user input

Use "cache_control": {"type": "ephemeral"} in API calls to mark cacheable blocks

Strategy #7: Hooks — Automated Workflows NEW

What Are Hooks?

Shell commands that run automatically in response to Claude Code events — configured in .claude/settings.json

PreToolUse — before Claude calls a tool

PostToolUse — after a tool call completes

Stop — when Claude finishes a response

Notification — when Claude sends a notification

Key rule: If you want something to happen every time, use a hook — not a CLAUDE.md note

Practical Examples

Auto-lint / format after every Edit

Run tests after code changes

Desktop notification when Claude finishes

Log all tool calls for auditing

Block dangerous commands via PreToolUse

{
  "hooks": {
    "PostToolUse": [{
      "matcher": "Edit",
      "hooks": [{"type": "command",
        "command": "npm run lint --fix"}]
    }]
  }
}
                    

Strategy #8: Extended Thinking NEW

What Is It?

Dedicated reasoning tokens Claude uses before producing its final answer — enabling deeper, more reliable multi-step analysis

Complex debugging with multiple interacting components

Architectural decisions with many tradeoffs

Security analysis of intricate systems

Algorithm design and optimization

Best model: Claude Opus 4.7
Tradeoff: Uses more tokens and takes longer — use for high-stakes decisions

How to Request It

In conversation:

"Think carefully about this before responding: what are the architectural tradeoffs between approach A and B?"

Via API:

thinking={
  "type": "enabled",
  "budget_tokens": 10000
}
                    

Strategy #9: Sub-Agents — Parallel Workers NEW

What Are Sub-Agents?

Claude spawns independent agents with their own isolated context windows to handle parallel or isolated subtasks

Each sub-agent has its own 200K context

Prevents exploratory work polluting main context

Run multiple analyses in parallel

Main agent synthesizes results

Context benefit: Large file searches & reads stay in sub-agent context — your main conversation stays clean

Example Pattern

Main Agent
├── Sub-agent 1:
│   "Audit auth module
│    for security issues"
│
├── Sub-agent 2:
│   "Audit payments module
│    for security issues"
│
└── Sub-agent 3:
    "Audit API layer
     for security issues"
         ↓
Main Agent synthesizes
all three findings
                    

Strategy #10: Plan vs Edit Mode

Plan Mode

When: New features, refactoring, architecture

What: Analysis → Plan → Approval → Code

Analyzes codebase holistically

Creates implementation_plan.md

Identifies risks & dependencies

Waits for your approval

Speed: Slower, deliberate (System 2)

Edit Mode

When: Bug fixes, small refactors, docs updates

What: Immediate precise file edits

Fast iterations

Direct targeted edits

Single-file focus

Immediate changes

Speed: Fast, automatic (System 1)

Strategy #11: RAG (Retrieval-Augmented Generation)

Your Question: "How does auth work?"

↓

Vector Search finds 5 most relevant files

↓

Only 5K tokens loaded (not 60K!)

↓

Claude analyzes with focused, minimal context

↓

Comprehensive answer — 195K tokens preserved

When to Use RAG?

Project Size	Use RAG?	Reason
Small (<20 files)	No	Everything fits comfortably in context
Medium (20–100 files)	Yes	Selective loading saves significant context
Large (100+ files)	Essential	Only efficient path forward
Documentation Search	Yes	Find answers without reading everything
Legacy Codebase	Essential	Navigate unfamiliar code quickly

Tools: Pinecone, Weaviate, ChromaDB, FAISS | Frameworks: LangChain, LlamaIndex

The Golden Rules UPDATED

DO's

✓ Keep CLAUDE.md updated at project root

✓ Use Plan Mode for complex & architectural work

✓ Provide clear, scoped context in requests

✓ Use Hooks for automated repeatable actions

✓ Verify all AI-generated code before shipping

✓ Break large tasks into focused conversations

✓ Use Sub-Agents for parallel analysis

✓ Cache large, stable context to reduce cost

DON'Ts

✗ Dump entire codebases — use RAG instead

✗ Skip planning for big architectural changes

✗ Mix multiple unrelated concerns in one chat

✗ Trust AI output blindly — always test

✗ Rely on conversation history as documentation

✗ Use Edit Mode for multi-file architecture changes

✗ Ask Claude to "remember" things — use hooks or CLAUDE.md

✗ Let AI make final architectural decisions

Inefficient Approach (Context Overload)

Request: "Build me an authentication system"

❌ Loads ALL user files (20 files, 30K tokens)

❌ Loads ALL API endpoints (15 files, 20K tokens)

❌ Loads ALL tests (25 files, 25K tokens)

❌ Loads database models (10 files, 10K tokens)

❌ Loads documentation (5 files, 15K tokens)

Total: 75 files, 100K tokens consumed upfront

Only 100K left for actual conversation

Result: Messy, scattered, incomplete responses

Efficient Approach (Structured Context)

Conversation 1: Planning (20K)

Create implementation_plan.md in Plan Mode

↓

Conversation 2: Models & DB (40K)

Implement Phase 1 from plan

↓

Conversation 3: Auth Service (35K)

JWT, password hashing, endpoints

↓

Conversation 4: Testing (30K)

Verify everything works end-to-end

✅ Each conversation uses <50% context

✅ Clear documentation trail — easy to review and debug

✅ Fully working auth system delivered

The Human-AI Partnership

🧠

You Excel At:

Strategic thinking & vision

Business context & stakeholder needs

Creative problem-solving

Recognizing subtle domain bugs

Final verification & accountability

🤖

Claude Excels At:

Pattern recognition at scale

Code generation & refactoring

Consistency across large codebases

Rapid prototyping

Recall from vast training data

You provide the vision & verification
Claude provides the velocity & consistency

Code Smarter, Not Harder

"The best code is not written — it's orchestrated."

You're the conductor Claude is your orchestra

Your Next Steps:

✅ Create CLAUDE.md at your project root using the template

✅ Document one common procedure as a skill in .claude/skills/

✅ Try Plan Mode for your next feature

✅ Configure at least one hook (e.g. auto-lint on edit)

✅ Set up RAG if you have 50+ files