Think of it as a senior engineer who never sleeps, reads 10x faster than you, and has very strong opinions about your code.
day one · the spark
Everyone starts with a prompt.
~ / my-project
$ claude
You: "fix the login bug" Claude:*reads 47 files, writes 3, runs tests* Claude: "Done. The issue was..."
You:"...holy shit."
That first moment when the machine actually understands your code.
week one · the honeymoon
The honeymoon phase.
10x
"I wrote a whole feature in 20 minutes"
Magic
"It fixed a bug I didn't even know about"
Vibes
"I just describe what I want and it appears"
"Why would anyone need a framework for this?"
claude · session 47
You: fix the typo on line 42
Claude: I'll fix that typo. But first, let me
→ refactor the entire module
→ add TypeScript types you didn't ask for
→ "improve" the error handling
→ create 3 new helper functions
→ update tests for code I just changed
→ break the build
You:...the typo was "teh" → "the"
week two · scope creep is not a bug, it's a feature (that you didn't ask for)
Every team hits the same walls. The only question is whether you hit them at week two or month two.
the problem space · what goes wrong
Five things that will break your workflow if you don't see them coming.
01 · CONTEXT DEATH
200K tokens sounds infinite. It's not. Your agent forgets your name by tool call 150.
02 · AGENT AMNESIA
New session, who dis? Everything you taught it yesterday is gone.
03 · SCOPE CREEP
You asked for a bug fix. It rewrote your architecture. The yak shaving is real.
04 · QUALITY EROSION
Tool call 1: brilliant. Tool call 200: introduces the bug it was hired to fix.
05 · "WORKS ON MY AI"
Your teammate's Claude produces completely different code from the same prompt.
plague 01 · context death
Your context window is a hotel room, not a house.
CONTEXT BUDGET · 200K TOKENS
System
~30K
CLAUDE.md
~20K
File reads
~70K
Your work
~50K
Tool results
~30K
TOTAL: 200K · COMPACTION TRIGGERED
WHAT IS COMPACTION?
When the context window fills up, Claude summarizes the conversation to free space. Like cramming for an exam – you remember the gist, but lose the details.
What compaction loses
The "why" behind decisions
Failed approaches already tried
Subtle constraints you mentioned once
That one detail from file read #3
REAL EXAMPLE
We built multi-VM infrastructure in one session. By the time we needed to save what we learned about Docker DNS, the context had already compacted away the details. Lesson: save memory at milestones, not at the end.
plague 02 · agent amnesia
Every session starts at zero.
SESSION 1 · TUESDAY
✓ Learned the codebase
✓ Found the bug
✓ Understood the constraints
✓ Knew your preferences
✓ Had full context
→
SESSION 2 · WEDNESDAY
✗ "What project is this?"
✗ Re-reads same 47 files
✗ Tries approach you rejected
✗ Ignores your naming convention
✗ Repeats yesterday's mistake
// Where your time actually goes: first_30_min = re-establishing context next_20_min = re-learning your preferences remaining = actual productive work
imagine onboarding a new hire every single morning. that's what you're doing.
plague 03 · the yak shaving spiral
You said "fix the button". It heard "redesign the application".
1
Fix button color· what you asked for
2
"While I'm here, let me refactor the CSS"
3
"The component structure could be cleaner"
4
"I've introduced a design system"
5
"Tests are failing, let me fix them"
×
Build broken · 47 files changed · button still wrong color
the fix: "don't add features, refactor code, or make improvements beyond what was asked"
plague 04 · quality erosion
Lint passes. Tests pass. Production breaks.
real code that passed every check
local result=$(validate_input "$path") if [[ $? -ne 0 ]]; then
handle_error # never runs fi
The local keyword masks the exit code. $? is always 0. The error handler never executes. And every automated check says "all good."
AUTOMATED CHECKS
✓bash -n – syntax OK ✓shellcheck – no warnings ✓ unit tests – passing ✓ integration tests – passing ✗ production – silent data loss
THE INSIGHT
There are 6 classes of bugs that no linter can catch. Semantic correctness requires adversarial review, not just green checkmarks.
monday-standup.sh
# Monday morning standup
dev_1: "Claude rewrote my auth module overnight"
dev_2: "Same prompt, completely different output"
dev_3: "It keeps adding emojis to my commits"
lead: "Who approved the React migration?"
everyone: "Nobody. Claude just... did it."
# git log --oneline -1
a1b2c3d 🚀 Refactored everything for maximum vibes
plague 05: "works on my AI" · when your team's Claudes aren't aligned
plague 05 · the real version
One codebase. Four tools. Four sets of rules.
9 AM · Claude Code
reads CLAUDE.md
✗processPayment() team uses snake_case ✗import moment banned, use date-fns
11 AM · Cursor
reads .cursorrules
✗try/catch pattern team uses Result<T,E> ✗class AuthError shared errors exist
✗raw SQL execute must use query builder ✗migration_001.sql must use YYYYMMDD
4 tools · 4 rule files · 4 standards · 8 cleanup PRs this sprint. Every tool reads its own instruction file. None of them agree. Your senior devs become AI janitors.
the real cost · industry data
Velocity without governance produces rework, not results.
1.7x
more issues per AI PR CodeRabbit 2025
2x
code churn – revised within 2 weeks GitClear 2025
+41%
tech debt increase after AI adoption Ox Security
96%
don't fully trust AI output but only 48% verify it Sonar 2026
The agents aren't broken. The instructions are missing. "Shipping 50% more, but half of that is cleaning up last week's slop." – Dexter Horthy, Coding Agents Conference 2026
prompt → hope → revert
→
rules → plan → verify → ship
PART II
The Harness
What if, instead of hoping the AI does the right thing, you could teach it what right looks like?
spoiler: you can. and it's just files.
the solution · harness-based development
What is an agent harness?
"A harness is the set of files, conventions, and automation
that turn a general-purpose LLM into a reliable teammate
that remembers, follows rules, and gets things done."
No new tools. No plugins. Just a CLAUDE.md file, a settings.json, and some conventions.
the foundation · CLAUDE.md
Your CLAUDE.md is the constitution. Everything else is legislation.
~/.claude/CLAUDE.md
Global · your personal preferences across all projects
YOU
project/CLAUDE.md
Project · team standards, checked into git. Everyone gets these.
TEAM
.claude/settings.json
Settings · permissions, hooks, behavior controls
CONFIG
MEMORY.md + PLAN.md
State · survives between sessions. The agent's long-term memory.
STATE
START WITH THESE 5 LINES
1. Your language/framework conventions
2. "Don't add features I didn't ask for"
3. Commit message format
4. Testing requirements
5. Things that have gone wrong before
CLAUDE.md · from a real project (ours)
# Shell Standards
- Shebang: #!/bin/bash
- Use $() not backticks
- Double-quote all variables
- grep -E not egrep
- command -v not which
# Commit Protocol
- One commit per logical unit
- Format: VERSION | Description
- No Co-Authored-By lines
# Common Anti-Patterns
- NEVER local var=$(...)
- NEVER cd without || exit
- NEVER bare cp/mv/rm
# ...300 lines of hard-won rules
frameworks · what the community built
You're not the only one who hit the wall.
GSD
Nick Dobos · late 2025
Task lists with acceptance criteria. Mandatory planning. Scratchpad for state. Most-forked CLAUDE.md template on GitHub.
Plan → Execute → Verify → Checkpoint
QRSPI
Dexter Horthy · HumanLayer · 2026
Evolution of RPI. Split mega-prompts into focused stages. "Don't use prompts for control flow – use control flow for control flow." Instruction budget < 200.
Questions → Research → Structure → Plan → Implement
RDF
R-fx Networks · 2025–2026
Full pipeline orchestration. 6 specialized agents with worktree isolation. 11 governance profiles auto-detected from codebase.
Spec → Plan → Build → Ship
Memory Bank
Community pattern · 2025
Structured context files surviving session boundaries. Project briefs, decision logs, active context tracking across conversations.
Read → Work → Save → Resume
The common thread: every framework was built because someone got burned.
They all converge on the same insight – the model is capable; the instructions are the product.
emerging consensus · what the builders agree on
Four rules the entire industry is converging on.
DO NOT OUTSOURCE THE THINKING
The engineer must own decisions. AI implements, you architect. "Going 10x faster doesn't matter if you throw it all away in 6 months. Shoot for 2–3x with quality."
– Dexter Horthy, Coding Agents Conference 2026
MIND YOUR INSTRUCTION BUDGET
LLMs reliably follow ~150–200 instructions. Beyond that, you're rolling dice. An 85-instruction mega-prompt plus CLAUDE.md plus system prompt plus MCP tools = half-attended chaos.
– Horthy, citing Kyle's archive paper research
PLEASE READ THE CODE
"We tried not reading the code for six months. We had to rip out and replace large parts of the system." Don't review plans. Review code. The plan and the implementation will diverge.
– Horthy, reversing his own 2025 advice
THE TRUST PARADOX
96% of developers don't fully trust AI output. But only 48% actually verify it. That gap – between distrust and verification – is where every bug in this deck was born.
– Aikido Security / RulePlane research 2026
THE FOUR STAGES OF CLAUDE.MD ENLIGHTENMENT
😐
No CLAUDE.md
"Just prompt it, how hard can it be?"
🤔
10-line CLAUDE.md
"Use TypeScript. Don't add emojis. Run tests before committing."
🧐
200-line CLAUDE.md with anti-patterns
"NEVER use \\t in stat -c. NEVER trust grep -c exit code. NEVER cd without || exit..."
"The reviewer found 3 issues. QA verified independently. The dispatcher merged automatically."
every line in your CLAUDE.md was written in blood (or at least a reverted commit)
case study · the rdf pipeline
What a mature pipeline actually looks like.
/r-spec
Requirements. Challenge review.
→
/r-plan
Phases. Dependencies. Scope classification.
→
/r-build
TDD. Worktrees. Parallel dispatch.
→
/r-ship
Changelog. Release. Package & push.
Planner
Research & design
Engineer
TDD implementation
QA
Verification gate
Reviewer
Adversarial sentinel
Dispatcher
Orchestration
Governing 16 projects · 2,302 commits · 6 shared libraries · 8 OS targets · every rule learned the hard way
in practice · what you actually type
Four commands. Entire lifecycle.
/r-spec
You: "Add rate limiting to the API with per-tenant quotas" Planner agent researches codebase, brainstorms approaches, writes spec → Reviewer challenges assumptions before a single line of code
Output: docs/specs/rate-limiting.md|Catches scope issues before planning
/r-plan
Reads spec + codebase → decomposes into numbered phases with dependencies → scope-classifies each phase (focused / cross-cutting / sensitive) → challenge review
Output: PLAN.md with 5 phases|Vertical slices, not horizontal layers
/r-build
Dispatcher reads PLAN.md → dispatches engineer agents to git worktrees (parallel when safe) → each phase: TDD red/green/refactor → QA verifies → Sentinel reviews
Output: committed, tested code|Scope-derived quality gates per phase
/r-ship
Preflight checks → changelog generation → version bump → release prep (RPM/DEB/tar) → final QA + reviewer pass → tag & publish
Output: tagged release + packages|Same pipeline every time, zero variance
The framework makes the decision tree deterministic. Same input, same process, same quality – whether it's a one-line fix or a 20-phase refactor.
the state layer · why it remembers
Four files give the agent a persistent brain.
~/my-project/ · what the agent reads on startup
CLAUDE.md– the rules (checked into git) PLAN.md– current work state .claude/settings.json– hooks & perms ~/.claude/ CLAUDE.md– your personal rules projects/*/memory/ MEMORY.md– learned context user_role.md, feedback_*.md, ...
THE KEY INSIGHT
No database. No cloud service. No setup wizard. It's just files. The agent reads them every session, automatically.
That's how it remembers your rules, your preferences, and where you left off.
SESSION 1 · TUESDAY
✓ Reads CLAUDE.md – learns your standards ✓ Reads MEMORY.md – knows your role ✓ You correct it: "use snake_case" ✎Writes feedback_style.md – saves the lesson
↓ files persist on disk ↓
SESSION 2 · WEDNESDAY
✓ Reads CLAUDE.md – same standards ✓ Reads MEMORY.md – sees feedback_style.md ✓Already knows snake_case preference ✓ Reads PLAN.md – picks up where you left off
CLAUDE.md
solves Plague 05 "works on my AI"
MEMORY.md
solves Plague 02 agent amnesia
PLAN.md
solves Plague 03 scope creep
settings.json
solves Plague 04 quality erosion
PART III
The Toolbox
Four primitives that solve the five plagues.
Memory. Hooks. Plans. Agents.
all of these are built into Claude Code. no extensions required.
Code that runs automatically when Claude does something. Like git hooks, but for AI tool calls. Two types: command (runs a script) and prompt (injects instructions).
PreToolUse
Validate before any tool runs. Block destructive commands.
PostToolUse
Auto-lint after every edit. Catch issues before they compound.
PreCompact
Save work-in-progress before context compaction. The safety net.
SubagentStop
Cleanup and verify when a sub-agent finishes its work.
"permissions": { "allow": [ "Bash(git *)", "Bash(make *)", "Read(~/.claude)"
]
} # no more permission prompts for safe commands
CUSTOM STATUS BAR
"statusLine": { "type": "command", "command": "context-bar.sh"
} # live data in the status line – tokens, branch, anything
DID YOU KNOW?
/fast – same Opus model, 2.5x faster (not a downgrade) Shift+Tab – cycle permission modes in the prompt Esc – interrupt mid-generation (not just Ctrl+C) prompt hooks – inject instructions at specific events
ENVIRONMENT VARIABLES
DISABLE_AUTO_COMPACT=1 – you control when MAX_THINKING_TOKENS=20000 – thinking budget BASH_MAX_OUTPUT_LENGTH – bash output cap CLAUDE_CODE_GLOB_HIDDEN=1 – see dotfiles
THE TWO PATHS
🙅
"Claude, just vibe with it" prompt → hope → ctrl+z → repeat
Most teams are at L1. The ones shipping consistently are at L2-L3. L4 is where the tooling disappears and it just works. You don't need L4 to be productive – L2 is a 30-minute investment that changes everything.
philosophy · the hard-won rules
Six principles that actually matter.
LINT IS NECESSARY, NOT SUFFICIENT
Automated checks catch syntax. They can't catch semantic bugs, wrong assumptions, or compatibility issues. Green checkmarks are a floor, not a ceiling.
TRUST IS EARNED WITH EVIDENCE
"I checked" is not evidence. Show the grep output, the test result, the commit hash. Verification must be independent of the implementer.
ADVERSARIAL REVIEW BEFORE MERGE
At least one adversarial challenge per change. Issues caught at spec review cost 10 minutes. The same issues post-impl cost hours.
REGRESSION IS FIRST-CLASS
Every behavior change needs a test for the old behavior too. Separate from new feature tests. The scariest bugs are the ones you create while fixing others.
CHALLENGES DEMAND RESPONSES
When the reviewer raises a concern, silence is not an option. Fix it or explain with evidence why it's not real. This applies to AI reviewers too.
FIX CONTRADICTIONS AT THE SOURCE
When a spec contradicts the code, fix the spec first. Silent resolution in the plan creates a gap future readers can't trace.
actionable · steal these today
Where to start, based on where you are.
JUST GETTING STARTED
Create a CLAUDE.md
Even 5 lines: your language, your style, "don't add features I didn't ask for"
5 minutes
Use /compact proactively
Don't wait for auto-compaction. Clear context when you shift tasks.
0 seconds
Set effort level to high CLAUDE_CODE_EFFORT_LEVEL=high
1 minute
USING IT DAILY
Add a PostToolUse hook
Auto-lint after every file edit. Catches problems before they compound.
10 minutes
Start a MEMORY.md
Record your role, preferences, and "don't repeat this mistake" entries.
5 minutes
Add permission allow-lists
Stop clicking "allow" on git, make, and test commands 50x/day.
5 minutes
POWER USER
Add a PreCompact hook
Inject "save your work" before context compaction. The safety net.
10 minutes
Build an anti-pattern catalog
Every time something goes wrong, add a NEVER rule. Your future self thanks you.
ongoing
Custom status bar & skills
Live context in the status line. Custom slash commands for your workflow.
30 minutes
the-future.sh
🤖
"Look at me. I'm the CI/CD now."
– Claude, after being given a PLAN.md, 5 specialized agents, and worktree isolation
the difference between a toy and a tool is the instructions it ships with
the optimistic thesis
2026
This is the worst AI-generated code will ever be.
Models
Better every quarter. Deeper reasoning. Longer context.
Tools
Hooks, memory, agents, worktrees, MCP – all shipping fast.
Your Harness
Compounds over time. Every rule you write makes every session better.
Today's quality floor is not the ceiling. It is the starting line. The teams investing in harness infrastructure now will compound that advantage every quarter.
summary · the three truths
What we've learned.
01
The model is capable. The instructions are the product. Your CLAUDE.md is more important than your prompt.
02
Every rule is written in blood. Don't skip governance because you haven't been burned yet.
03
The gap between vibing and engineering is 30 minutes. Start with CLAUDE.md + memory. Grow from there.
Prompt → instructions → governance → pipeline you are here → go here
Prompts to Pipelines
Stop vibing. Start engineering.
Ryan MacDonald
ryan@rfxn.com
|
rfxn.com
R-fx Networks
|
github.com/rfxn
APF · BFD · LMD · RDF
this deck was built by claude code, with claude code, about claude code
using the exact harness patterns described in this presentation
no vibes were harmed in the making of this deck