Think of it as a senior engineer who never sleeps, reads 10x faster than you, and has very strong opinions about your code.
day one · the spark
Everyone starts with a prompt.
~ / my-project
$ claude
You: "fix the login bug" Claude:*reads 47 files, writes 3, runs tests* Claude: "Done. The issue was..."
You:"...holy shit."
That first moment when the machine actually understands your code.
week one · the honeymoon
The honeymoon phase.
10x
"I wrote a whole feature in 20 minutes"
Magic
"It fixed a bug I didn't even know about"
Vibes
"I just describe what I want and it appears"
"Why would anyone need a framework for this?"
Ask me again on Day 30 · when the plateaus hit, rules drift, and teammates each build their own rig. Those are the plagues coming up.
the problem space · what goes wrong
Five things that will break your workflow if you don't see them coming.
01 · CONTEXT DEATH
200K tokens sounds infinite. It's not. Your agent forgets your name by tool call 150.
02 · AGENT AMNESIA
New session, who dis? Everything you taught it yesterday is gone.
03 · SCOPE CREEP
You asked for a bug fix. It rewrote your architecture. The yak shaving is real.
04 · QUALITY EROSION
Tool call 1: brilliant. Tool call 200: introduces the bug it was hired to fix.
05 · "WORKS ON MY AI"
Your teammate's Claude produces completely different code from the same prompt.
plague 01 · context death
Your context window is a hotel room, not a house.
CONTEXT BUDGET · 200K TOKENS
System
~30K
CLAUDE.md
~20K
File reads
~70K
Your work
~50K
Tool results
~30K
TOTAL: 200K · COMPACTION TRIGGERED
WHAT IS COMPACTION?
When the context window fills up, Claude summarizes the conversation to free space. Like cramming for an exam – you remember the gist, but lose the details.
What compaction loses
The "why" behind decisions
Failed approaches already tried
Subtle constraints you mentioned once
That one detail from file read #3
REAL EXAMPLE
We built multi-VM infrastructure in one session. By the time we needed to save what we learned about Docker DNS, the context had already compacted away the details. Lesson: save memory at milestones, not at the end.
plague 02 · agent amnesia
Every session starts at zero.
SESSION 1 · TUESDAY
✓ Learned the codebase
✓ Found the bug
✓ Understood the constraints
✓ Knew your preferences
✓ Had full context
→
SESSION 2 · WEDNESDAY
✗ "What project is this?"
✗ Re-reads same 47 files
✗ Tries approach you rejected
✗ Ignores your naming convention
✗ Repeats yesterday's mistake
// Where your time actually goes: first_30_min = re-establishing context next_20_min = re-learning your preferences remaining = actual productive work
imagine onboarding a new hire every single morning. that's what you're doing.
plague 03 · the yak shaving spiral
You said "fix the button". It heard "redesign the application".
1
Fix button color· what you asked for
2
"While I'm here, let me refactor the CSS"
3
"The component structure could be cleaner"
4
"I've introduced a design system"
5
"Tests are failing, let me fix them"
×
Build broken · 47 files changed · button still wrong color
the fix: "don't add features, refactor code, or make improvements beyond what was asked"
plague 04 · quality erosion
Lint passes. Tests pass. Production breaks.
real code that passed every check
local result=$(validate_input "$path") if [[ $? -ne 0 ]]; then
handle_error # never runs fi
The local keyword masks the exit code. $? is always 0. The error handler never executes. And every automated check says "all good."
AUTOMATED CHECKS
✓bash -n – syntax OK ✓shellcheck – no warnings ✓ unit tests – passing ✓ integration tests – passing ✗ production – silent data loss
THE INSIGHT
There are 6 classes of bugs that no linter can catch. Semantic correctness requires adversarial review, not just green checkmarks.
plague 05 · the real version
One codebase. Four tools. Four sets of rules.
9 AM · Claude Code
reads CLAUDE.md
✗processPayment() team uses snake_case ✗import moment banned, use date-fns
11 AM · Cursor
reads .cursorrules
✗try/catch pattern team uses Result<T,E> ✗class AuthError shared errors exist
✗raw SQL execute must use query builder ✗migration_001.sql must use YYYYMMDD
4 tools · 4 rule files · 4 standards · 8 cleanup PRs this sprint. Every tool reads its own instruction file. None of them agree. Your senior devs become AI janitors.
before we go further · the mechanics
What one turn actually costs.
A turn is one user message + one model response. Every turn re-sends the entire payload to the model. Your leverage is making that payload smaller or cacheable.
THE PUNCHLINE ·
every word in CLAUDE.md is paid on every turn ·
every subagent opens a separate turn budget ·
every tool schema taxes every conversation
prompt → hope → revert
→
rules → plan → verify → ship
PART II
The Harness
What if, instead of hoping the AI does the right thing, you could teach it what right looks like?
spoiler: you can. and it's just files.
the solution · harness-based development
What is an agent harness?
"A harness is the set of files, conventions, and automation
that turn a general-purpose LLM into a reliable teammate
that remembers, follows rules, and gets things done."
No new tools. No plugins. Just a CLAUDE.md file, a settings.json, and some conventions.
the foundation · CLAUDE.md
Your CLAUDE.md is the constitution. Everything else is legislation.
~/.claude/CLAUDE.md
Global · your personal preferences across all projects
YOU
project/CLAUDE.md
Project · team standards, checked into git. Everyone gets these.
TEAM
.claude/settings.json
Settings · permissions, hooks, behavior controls
CONFIG
MEMORY.md + PLAN.md
State · survives between sessions. The agent's long-term memory.
STATE
START WITH THESE 5 LINES
1. Your language/framework conventions
2. "Don't add features I didn't ask for"
3. Commit message format
4. Testing requirements
5. Things that have gone wrong before
CLAUDE.md · from a real project (ours)
# Shell Standards
- Shebang: #!/bin/bash
- Use $() not backticks
- Double-quote all variables
- grep -E not egrep
- command -v not which
# Commit Protocol
- One commit per logical unit
- Format: VERSION | Description
- No Co-Authored-By lines
# Common Anti-Patterns
- NEVER local var=$(...)
- NEVER cd without || exit
- NEVER bare cp/mv/rm
# ...300 lines of hard-won rules
in practice · what you actually type
Four commands. Entire lifecycle.
/r-spec
You: "Add rate limiting to the API with per-tenant quotas" Planner agent researches codebase, brainstorms approaches, writes spec → Reviewer challenges assumptions before a single line of code
Output: docs/specs/rate-limiting.md|Catches scope issues before planning
/r-plan
Reads spec + codebase → decomposes into numbered phases with dependencies → scope-classifies each phase (focused / cross-cutting / sensitive) → challenge review
Output: PLAN.md with 5 phases|Vertical slices, not horizontal layers
/r-build
Dispatcher reads PLAN.md → dispatches engineer agents to git worktrees (parallel when safe) → each phase: TDD red/green/refactor → QA verifies → Sentinel reviews
Output: committed, tested code|Scope-derived quality gates per phase
/r-ship
Preflight checks → changelog generation → version bump → release prep (RPM/DEB/tar) → final QA + reviewer pass → tag & publish
Output: tagged release + packages|Same pipeline every time, zero variance
The framework makes the decision tree deterministic. Same input, same process, same quality – whether it's a one-line fix or a 20-phase refactor.
the state layer · why it remembers
Four files give the agent a persistent brain.
~/my-project/ · what the agent reads on startup
CLAUDE.md– the rules (checked into git) PLAN.md– current work state .claude/settings.json– hooks & perms ~/.claude/ CLAUDE.md– your personal rules projects/*/memory/ MEMORY.md– learned context user_role.md, feedback_*.md, ...
THE KEY INSIGHT
No database. No cloud service. No setup wizard. It's just files. The agent reads them every session, automatically.
That's how it remembers your rules, your preferences, and where you left off.
SESSION 1 · TUESDAY
✓ Reads CLAUDE.md – learns your standards ✓ Reads MEMORY.md – knows your role ✓ You correct it: "use snake_case" ✎Writes feedback_style.md – saves the lesson
↓ files persist on disk ↓
SESSION 2 · WEDNESDAY
✓ Reads CLAUDE.md – same standards ✓ Reads MEMORY.md – sees feedback_style.md ✓Already knows snake_case preference ✓ Reads PLAN.md – picks up where you left off
CLAUDE.md
solves Plague 05 "works on my AI"
MEMORY.md
solves Plague 02 agent amnesia
PLAN.md
solves Plague 03 scope creep
settings.json
solves Plague 04 quality erosion
PART III
The Toolbox
Four primitives that solve the five plagues.
Memory. Hooks. Plans. Agents.
all of these are built into Claude Code. no extensions required.
the primitive map · four questions, first yes wins
Four primitives. Ask these four questions.
Ask them in order. First yes wins. Nine times out of ten you land your answer before reaching MCP.
01
Need to enforce a policy the model can't bypass?
block bad commits · redact secrets · rewrite tool input
# When invoked 1. Run `git diff --staged` 2. Scan for the 10 classes in reference.md 3. Output findings in CWE format
PORTABLE
Same skill works in Claude.ai, Claude Code, and API. Subagents are Code-only. Skills win on reach.
RULE OF THUMB ·
if guidance is over 100 lines or team-shared, it belongs in a skill, not CLAUDE.md. Anthropic shipped ~40 first-party skills in April 2026. /skills lists them.
primitive 02 · hooks
Hooks: the control surface nobody talks about.
.claude/settings.json · real config from our setup
Code that runs automatically when Claude does something. Like git hooks, but for AI tool calls. Two types: command (runs a script) and prompt (injects instructions).
PreToolUse
Validate before any tool runs. Block destructive commands.
PostToolUse
Auto-lint after every edit. Catch issues before they compound.
PreCompact
Save work-in-progress before context compaction. The safety net.
SubagentStop
Cleanup and verify when a sub-agent finishes its work.
"permissions": { "allow": [ "Bash(git *)", "Bash(make *)", "Read(~/.claude)"
]
} # no more permission prompts for safe commands
CUSTOM STATUS BAR
"statusLine": { "type": "command", "command": "context-bar.sh"
} # live data in the status line – tokens, branch, anything
DID YOU KNOW?
/fast – same Opus model, 2.5x faster (not a downgrade) Shift+Tab – cycle permission modes in the prompt Esc – interrupt mid-generation (not just Ctrl+C) prompt hooks – inject instructions at specific events
ENVIRONMENT VARIABLES
DISABLE_AUTO_COMPACT=1 – you control when MAX_THINKING_TOKENS=20000 – thinking budget BASH_MAX_OUTPUT_LENGTH – bash output cap CLAUDE_CODE_GLOB_HIDDEN=1 – see dotfiles
the compiler · CLAUDE.md craft · elite vs bloated
The difference between a documentation dump and a compiler.
BLOATED · 2000 LINES
# Project History ## 2022: Founded by... ## 2023: Pivot to...
# Detailed Architecture Our system uses a microservices... [15 paragraphs of prose]
# Code Examples ```typescript [200 lines duplicated from src/] ```
# Every Config Variable [catalog of 60 env vars]
Loads every turn. 40k tokens burned per session. Drifts silently. Nobody reads it.
ELITE · ~200 LINES
# Stack 5 lines: languages, frameworks, runtime
# Commit protocol 8 lines: message format, tagging, staging rules
Opus 4.7 · $5/M input · $25/M output
cache writes 1.25× base · cache reads 0.1×
ProjectDiscovery reported 59–70% real-world savings. Your numbers will vary with CLAUDE.md size and session length. Every bloated CLAUDE.md line also burns cache writes.
cache writes cost 1.25× base input · cache reads cost 0.1× · break-even after two turns · break-bank after ten
fixture 01 · CLAUDE.md starter
Paste this. Start shipping.
CLAUDE.md
# Project conventions
## Stack
- Language: <your stack>
- Formatter: <prettier / black / gofmt>
- Tests: run before every commit
## Commit protocol
- One commit per logical unit
- Stage files by name · never git add -A
- No Co-Authored-By lines
## Anti-patterns
- NEVER add features I didn't ask for
- NEVER commit with failing tests
- NEVER assume · read the code
## Memory
- Durable rules: MEMORY.md
- Current phase: PLAN.md
PROVEN OUTCOME
Kills most day-one scope creep. The "features I didn't ask for" line alone is the single most-cited rule across Willison, Ronacher, Horthy 2026.
AUTO-LOADS
Claude reads this at session start · no re-explaining, ever. 3% of context budget.
GROWS NATURALLY
Starts at 18 lines · caps at ~200. Every new rule came from a real moment. Trim ruthlessly.
same file works as AGENTS.md for Cursor / Codex / Aider · one source of truth, every tool
Most teams are at L1. The ones shipping consistently are at L2-L3. L4 is where the tooling disappears and it just works. You don't need L4 to be productive – L2 is a 30-minute investment that changes everything.
the team play · from one IC's rig to team defaults
The Day-30 answer: promote, don't mandate.
The Day-30 failure mode is fragmented adoption – three ICs, three CLAUDE.md files, three review styles, zero shared muscle. The fix isn't a committee. It's a promotion pipeline.
01 · DISCOVER
One IC builds .claude/commands/lint.md. It solves a real pain point. Others try it locally.
→
02 · PROMOTE
IC moves it to team/shared/.claude/ in a shared repo. Others symlink or copy. Feedback accumulates.
→
03 · HARDEN
Two rounds of real use. Bugs found, edges sanded. Added to the team's canonical-source repo with tests.
→
04 · DEPLOY
Ships via generate / sync. All engineers get it on next session. Drift detectable via grep.
PERSONAL (~/.claude/)
Experiments. Your current workflow. Nothing stable yet.
TEAM (shared repo)
Promoted from personal after proof. Reviewed. Versioned. Grepable drift detection.
ORGANIZATION (canonical)
Locked. Audited. Deployed via generator. Audit-trail for regulated orgs.
One IC's good workflow becomes the team's default via the same git workflow you already have.
The mechanism doesn't matter – the discipline does.
philosophy · the hard-won rules
Six principles that actually matter.
LINT IS NECESSARY, NOT SUFFICIENT
Automated checks catch syntax. They can't catch semantic bugs, wrong assumptions, or compatibility issues. Green checkmarks are a floor, not a ceiling.
TRUST IS EARNED WITH EVIDENCE
"I checked" is not evidence. Show the grep output, the test result, the commit hash. Verification must be independent of the implementer.
ADVERSARIAL REVIEW BEFORE MERGE
At least one adversarial challenge per change. Issues caught at spec review cost 10 minutes. The same issues post-impl cost hours.
REGRESSION IS FIRST-CLASS
Every behavior change needs a test for the old behavior too. Separate from new feature tests. The scariest bugs are the ones you create while fixing others.
CHALLENGES DEMAND RESPONSES
When the reviewer raises a concern, silence is not an option. Fix it or explain with evidence why it's not real. This applies to AI reviewers too.
FIX CONTRADICTIONS AT THE SOURCE
When a spec contradicts the code, fix the spec first. Silent resolution in the plan creates a gap future readers can't trace.
actionable · steal these today
Where to start, based on where you are.
JUST GETTING STARTED
Create a CLAUDE.md
Even 5 lines: your language, your style, "don't add features I didn't ask for"
5 minutes
Use /compact proactively
Don't wait for auto-compaction. Clear context when you shift tasks.
0 seconds
Set effort level to high CLAUDE_CODE_EFFORT_LEVEL=high
1 minute
USING IT DAILY
Add a PostToolUse hook
Auto-lint after every file edit. Catches problems before they compound.
10 minutes
Start a MEMORY.md
Record your role, preferences, and "don't repeat this mistake" entries.
5 minutes
Add permission allow-lists
Stop clicking "allow" on git, make, and test commands 50x/day.
5 minutes
POWER USER
Add a PreCompact hook
Inject "save your work" before context compaction. The safety net.
10 minutes
Build an anti-pattern catalog
Every time something goes wrong, add a NEVER rule. Your future self thanks you.
ongoing
Custom status bar & skills
Live context in the status line. Custom slash commands for your workflow.
30 minutes
summary · the three truths
What we've learned.
01
The model is capable. The instructions are the product. Your CLAUDE.md is more important than your prompt.
02
Every rule is written in blood. Don't skip governance because you haven't been burned yet.
03
The gap between vibing and engineering is 30 minutes. Start with CLAUDE.md + memory. Grow from there.
Prompt → instructions → governance → pipeline you are here → go here
Prompts to Pipelines
Stop vibing. Start engineering.
Ryan MacDonald
ryan@rfxn.com
|
rfxn.com
R-fx Networks
|
github.com/rfxn
APF · BFD · LMD · RDF
this deck was built by claude code, with claude code, about claude code
using the exact harness patterns described in this presentation
no vibes were harmed in the making of this deck