$ claude --dangerously-skip-permissions

best practices · agentic development · april 2026

Prompts to Pipelines

How to stop vibing and start engineering
with Claude Code

Opus 4.7 · 1M context · xhigh effort · cache reads at 0.1× base rate

Ryan MacDonald · rmacdonald@nexcess.net · ryan@rfxn.com

for the uninitiated · 60-second primer

Claude Code is an AI agent that lives in your terminal.

terminal
$ claude
Claude Code OPUS 4.7 · /help for commands
You: Add rate limiting to the API
• Reads your codebase
• Writes code, runs tests
• Creates commits, opens PRs
• Spawns sub-agents for complex work
It reads
Files, git history, docs, tests – the full codebase
It writes
Code, tests, configs, documentation – real files
It runs
Shell commands, test suites, linters, builds – validates its own work
It orchestrates
Sub-agents, parallel workers, git worktrees – scales itself

Think of it as a senior engineer who never sleeps, reads 10x faster than you, and has very strong opinions about your code.

day one · the spark

Everyone starts with a prompt.

~ / my-project
$ claude
You: "fix the login bug"
Claude: *reads 47 files, writes 3, runs tests*
Claude: "Done. The issue was..."

You: "...holy shit."

That first moment when the machine actually understands your code.

week one · the honeymoon

The honeymoon phase.

10x
"I wrote a whole feature in 20 minutes"
Magic
"It fixed a bug I didn't even know about"
Vibes
"I just describe what I want and it appears"

"Why would anyone need a framework for this?"

Ask me again on Day 30 · when the plateaus hit, rules drift, and teammates each build their own rig. Those are the plagues coming up.

the problem space · what goes wrong

Five things that will break your workflow
if you don't see them coming.

01 · CONTEXT DEATH
200K tokens sounds infinite. It's not. Your agent forgets your name by tool call 150.
02 · AGENT AMNESIA
New session, who dis? Everything you taught it yesterday is gone.
03 · SCOPE CREEP
You asked for a bug fix. It rewrote your architecture. The yak shaving is real.
04 · QUALITY EROSION
Tool call 1: brilliant. Tool call 200: introduces the bug it was hired to fix.
05 · "WORKS ON MY AI"
Your teammate's Claude produces completely different code from the same prompt.

plague 01 · context death

Your context window is a hotel room, not a house.

CONTEXT BUDGET · 200K TOKENS
System
~30K
CLAUDE.md
~20K
File reads
~70K
Your work
~50K
Tool results
~30K
TOTAL: 200K · COMPACTION TRIGGERED
WHAT IS COMPACTION?
When the context window fills up, Claude summarizes the conversation to free space. Like cramming for an exam – you remember the gist, but lose the details.
What compaction loses
  • The "why" behind decisions
  • Failed approaches already tried
  • Subtle constraints you mentioned once
  • That one detail from file read #3
REAL EXAMPLE
We built multi-VM infrastructure in one session. By the time we needed to save what we learned about Docker DNS, the context had already compacted away the details. Lesson: save memory at milestones, not at the end.

plague 02 · agent amnesia

Every session starts at zero.

SESSION 1 · TUESDAY
✓ Learned the codebase
✓ Found the bug
✓ Understood the constraints
✓ Knew your preferences
✓ Had full context
SESSION 2 · WEDNESDAY
✗ "What project is this?"
✗ Re-reads same 47 files
✗ Tries approach you rejected
✗ Ignores your naming convention
Repeats yesterday's mistake
// Where your time actually goes:
first_30_min = re-establishing context
next_20_min = re-learning your preferences
remaining = actual productive work

imagine onboarding a new hire every single morning. that's what you're doing.

plague 03 · the yak shaving spiral

You said "fix the button".
It heard "redesign the application".

1
Fix button color · what you asked for
2
"While I'm here, let me refactor the CSS"
3
"The component structure could be cleaner"
4
"I've introduced a design system"
5
"Tests are failing, let me fix them"
×
Build broken · 47 files changed · button still wrong color

the fix: "don't add features, refactor code, or make improvements beyond what was asked"

plague 04 · quality erosion

Lint passes. Tests pass.
Production breaks.

real code that passed every check
local result=$(validate_input "$path")
if [[ $? -ne 0 ]]; then
  handle_error # never runs
fi
The local keyword masks the exit code. $? is always 0. The error handler never executes. And every automated check says "all good."
AUTOMATED CHECKS
bash -n – syntax OK
shellcheck – no warnings
unit tests – passing
integration tests – passing
production – silent data loss
THE INSIGHT
There are 6 classes of bugs that no linter can catch. Semantic correctness requires adversarial review, not just green checkmarks.

plague 05 · the real version

One codebase. Four tools. Four sets of rules.

9 AM · Claude Code
reads CLAUDE.md
processPayment()
team uses snake_case
import moment
banned, use date-fns
11 AM · Cursor
reads .cursorrules
try/catch pattern
team uses Result<T,E>
class AuthError
shared errors exist
2 PM · Copilot
reads copilot-instructions.md
lodash import
banned external util
const user: any
strict typing enforced
4 PM · Gemini CLI
reads GEMINI.md
raw SQL execute
must use query builder
migration_001.sql
must use YYYYMMDD
4 tools · 4 rule files · 4 standards · 8 cleanup PRs this sprint.
Every tool reads its own instruction file. None of them agree. Your senior devs become AI janitors.

before we go further · the mechanics

What one turn actually costs.

A turn is one user message + one model response. Every turn re-sends the entire payload to the model. Your leverage is making that payload smaller or cacheable.

ONE TURN · WHAT THE MODEL RECEIVES ~35k tokens typical SYSTEM ~3k built-in CLAUDE.md ~5k global + workspace + project MEM ~2k 200-line cap TOOL SCHEMAS ~10k (grows with every MCP server) Read + Edit + Bash + Grep + MCPs... CONVERSATION + TOOL RESULTS ~13k · unbounded until /compact file reads + grep output dominate MSG ~2k CACHED 0.1× after turn 1 ENABLE_TOOL_SEARCH defers schemas until needed /compact · SUBAGENT summarize or isolate research UNCACHED · 10 TURNS 35k × 10 × $5/M  =  $1.75 every byte, full price, every turn WITH PROMPT CACHING $0.29  −85% break-even after 2 turns · free after 10
THE PUNCHLINE · every word in CLAUDE.md is paid on every turn · every subagent opens a separate turn budget · every tool schema taxes every conversation
prompt → hope → revert
rules → plan → verify → ship
PART II

The Harness

What if, instead of hoping the AI does the right thing,
you could teach it what right looks like?

spoiler: you can. and it's just files.

the solution · harness-based development

What is an agent harness?

"A harness is the set of files, conventions, and automation
that turn a general-purpose LLM into a reliable teammate
that remembers, follows rules, and gets things done."
WITHOUT HARNESS
Prompt → hope → review → revert → re-prompt → hope harder
WITH HARNESS
Rules → plan → execute → verify → checkpoint → ship

No new tools. No plugins. Just a CLAUDE.md file, a settings.json, and some conventions.

the foundation · CLAUDE.md

Your CLAUDE.md is the constitution.
Everything else is legislation.

~/.claude/CLAUDE.md
Global · your personal preferences across all projects
YOU
project/CLAUDE.md
Project · team standards, checked into git. Everyone gets these.
TEAM
.claude/settings.json
Settings · permissions, hooks, behavior controls
CONFIG
MEMORY.md + PLAN.md
State · survives between sessions. The agent's long-term memory.
STATE
START WITH THESE 5 LINES
1. Your language/framework conventions
2. "Don't add features I didn't ask for"
3. Commit message format
4. Testing requirements
5. Things that have gone wrong before
CLAUDE.md · from a real project (ours)
# Shell Standards
- Shebang: #!/bin/bash
- Use $() not backticks
- Double-quote all variables
- grep -E not egrep
- command -v not which

# Commit Protocol
- One commit per logical unit
- Format: VERSION | Description
- No Co-Authored-By lines

# Common Anti-Patterns
- NEVER local var=$(...)
- NEVER cd without || exit
- NEVER bare cp/mv/rm

# ...300 lines of hard-won rules

in practice · what you actually type

Four commands. Entire lifecycle.

/r-spec
You: "Add rate limiting to the API with per-tenant quotas"
Planner agent researches codebase, brainstorms approaches, writes spec → Reviewer challenges assumptions before a single line of code
Output: docs/specs/rate-limiting.md | Catches scope issues before planning
/r-plan
Reads spec + codebase → decomposes into numbered phases with dependencies → scope-classifies each phase (focused / cross-cutting / sensitive) → challenge review
Output: PLAN.md with 5 phases | Vertical slices, not horizontal layers
/r-build
Dispatcher reads PLAN.md → dispatches engineer agents to git worktrees (parallel when safe) → each phase: TDD red/green/refactor → QA verifies → Sentinel reviews
Output: committed, tested code | Scope-derived quality gates per phase
/r-ship
Preflight checks → changelog generation → version bump → release prep (RPM/DEB/tar) → final QA + reviewer pass → tag & publish
Output: tagged release + packages | Same pipeline every time, zero variance
The framework makes the decision tree deterministic. Same input, same process, same quality – whether it's a one-line fix or a 20-phase refactor.

the state layer · why it remembers

Four files give the agent a persistent brain.

~/my-project/ · what the agent reads on startup
CLAUDE.md – the rules (checked into git)
PLAN.md – current work state
.claude/settings.json – hooks & perms
~/.claude/
  CLAUDE.md – your personal rules
  projects/*/memory/
    MEMORY.md – learned context
    user_role.md, feedback_*.md, ...
THE KEY INSIGHT
No database. No cloud service. No setup wizard.
It's just files. The agent reads them every session, automatically. That's how it remembers your rules, your preferences, and where you left off.
SESSION 1 · TUESDAY
Reads CLAUDE.md – learns your standards
Reads MEMORY.md – knows your role
You correct it: "use snake_case"
Writes feedback_style.md – saves the lesson
↓  files persist on disk  ↓
SESSION 2 · WEDNESDAY
Reads CLAUDE.md – same standards
Reads MEMORY.md – sees feedback_style.md
Already knows snake_case preference
Reads PLAN.md – picks up where you left off
CLAUDE.md
solves Plague 05
"works on my AI"
MEMORY.md
solves Plague 02
agent amnesia
PLAN.md
solves Plague 03
scope creep
settings.json
solves Plague 04
quality erosion
PART III

The Toolbox

Four primitives that solve the five plagues.
Memory. Hooks. Plans. Agents.

all of these are built into Claude Code. no extensions required.

the primitive map · four questions, first yes wins

Four primitives. Ask these four questions.

Ask them in order. First yes wins. Nine times out of ten you land your answer before reaching MCP.

01
Need to enforce a policy the model can't bypass?
block bad commits · redact secrets · rewrite tool input
HOOK
PreToolUse / PostToolUse · shell command · deterministic
02
Teaching a repeatable procedure the model should invoke itself?
security review · release prep · log triage · common workflows
SKILL
.claude/skills/<name>/SKILL.md · loads on description match
03
Will this task pollute main context with research you won't re-read?
grepping 50 files · reading large PDFs · parallel multi-file edits
SUBAGENT
.claude/agents/<role>.md · context dies on return
04
Do you need new capability from an external system?
Datadog · Sentry · GitHub · proprietary APIs · live data
MCP
server exposes tools · curate hard · 15-tool cap
They compose. hook enforces policy · skill encodes procedure · subagent isolates context · MCP exposes capability · a real workflow uses all four

primitive 01 · memory

Memory: teaching your agent to remember.

MEMORY.md · auto-loaded every session
# Project Memory Index

- [User role](user_role.md) – senior eng, Bash
- [Testing](feedback_testing.md) – always run Rocky 9
- [Style](feedback_style.md) – no trailing summaries
- [Project](project_auth.md) – auth rewrite, compliance
feedback_testing.md
---
type: feedback
---
Integration tests must hit a real DB.
**Why:** mock/prod divergence broke migration
**How:** never mock DB in test suite
FOUR MEMORY TYPES
user – who you are, your expertise
feedback – corrections & confirmations
project – goals, deadlines, context
reference – where to find things
FOR BEGINNERS
MEMORY.md loads automatically at the start of every session. You don't have to do anything – Claude reads it and remembers what you've taught it.
200-LINE CAP
Lines past 200 are truncated. Memory bloat defeats the purpose. Keep the index lean – full content lives in separate files.
PRO TIP
Save memory at milestones, not at session end. By the time you're done, the context may have already compacted away the details.

the newest primitive · skills · on-demand guidance

CLAUDE.md always loads. Skills load on demand.

TOKEN ECONOMICS · 500 LINES EACH
CLAUDE.md ~8,000 tokens × EVERY turn
Loads on session start. Paid every turn. Session-long tax.
SKILL.md ~200 bytes until invoked
Just name + description in context. Full file loads once when model invokes it.
10–50× cheaper per session on average. The more skills you have, the bigger the win.
.claude/skills/security-review/SKILL.md
---
name: security-review
description: Review diffs for OWASP top-10,
  SQL injection, XSS
allowed-tools: Bash(git diff:*), Read, Grep
disable-model-invocation: false
---

# When invoked
1. Run `git diff --staged`
2. Scan for the 10 classes in reference.md
3. Output findings in CWE format
PORTABLE
Same skill works in Claude.ai, Claude Code, and API. Subagents are Code-only. Skills win on reach.
RULE OF THUMB · if guidance is over 100 lines or team-shared, it belongs in a skill, not CLAUDE.md. Anthropic shipped ~40 first-party skills in April 2026. /skills lists them.

primitive 02 · hooks

Hooks: the control surface nobody talks about.

.claude/settings.json · real config from our setup
"hooks": {
  "PreToolUse": [{
    "matcher": "Bash",
    "hooks": [{
      "type": "command",
      "command": "validate.sh"
    }]
  }],
  "PostToolUse": [{
    "matcher": "Edit|Write",
    "hooks": [{ "post-edit-lint.sh" }]
  }],
  "PreCompact": [{
    "type": "prompt",
    "Save work state!"
  }]
}
WHAT ARE HOOKS?
Code that runs automatically when Claude does something. Like git hooks, but for AI tool calls. Two types: command (runs a script) and prompt (injects instructions).
PreToolUse
Validate before any tool runs. Block destructive commands.
PostToolUse
Auto-lint after every edit. Catch issues before they compound.
PreCompact
Save work-in-progress before context compaction. The safety net.
SubagentStop
Cleanup and verify when a sub-agent finishes its work.
+ 15 more events
SessionStart, Notification, Stop, FileChanged, WorktreeCreate...

primitive 03 · subagents

Subagents: divide, conquer, and verify.

DISPATCHER
reads plan · dispatches phases · enforces quality gates
WORKTREE A
Phase 1 · isolated git branch
WORKTREE B
Phase 2 · isolated git branch
MAIN
QA gate · sentinel review
Self-contained payloads – subagents can't prompt the user or access parent state
Commit hash = delivery – no hash in the result means the work didn't land
One file owner per file – two agents writing the same file = guaranteed merge conflict
Verify with git log – never trust the agent's summary; check the actual commits

primitive 04 · settings & secrets

The settings.json nobody reads.

EFFORT LEVEL
"env": {
  "CLAUDE_CODE_EFFORT_LEVEL": "high"
}
# low/medium/high – controls reasoning depth
KILL THE ATTRIBUTION
"attribution": {
  "commit": "",
  "pr": ""
}
# empty = no "Co-Authored-By: Claude" lines
PERMISSION ALLOW-LISTS
"permissions": {
  "allow": [
    "Bash(git *)",
    "Bash(make *)",
    "Read(~/.claude)"
  ]
}
# no more permission prompts for safe commands
CUSTOM STATUS BAR
"statusLine": {
  "type": "command",
  "command": "context-bar.sh"
}
# live data in the status line – tokens, branch, anything
DID YOU KNOW?
/fast – same Opus model, 2.5x faster (not a downgrade)
Shift+Tab – cycle permission modes in the prompt
Esc – interrupt mid-generation (not just Ctrl+C)
prompt hooks – inject instructions at specific events
ENVIRONMENT VARIABLES
DISABLE_AUTO_COMPACT=1 – you control when
MAX_THINKING_TOKENS=20000 – thinking budget
BASH_MAX_OUTPUT_LENGTH – bash output cap
CLAUDE_CODE_GLOB_HIDDEN=1 – see dotfiles

the compiler · CLAUDE.md craft · elite vs bloated

The difference between a documentation dump and a compiler.

BLOATED · 2000 LINES
# Project History
## 2022: Founded by...
## 2023: Pivot to...

# Detailed Architecture
Our system uses a microservices...
[15 paragraphs of prose]

# Code Examples
```typescript
[200 lines duplicated from src/]
```

# Every Config Variable
[catalog of 60 env vars]
Loads every turn. 40k tokens burned per session. Drifts silently. Nobody reads it.
ELITE · ~200 LINES
# Stack
5 lines: languages, frameworks, runtime

# Commit protocol
8 lines: message format, tagging,
staging rules

# Path-scoped rules
src/api/**: require zod
src/components/**: Tailwind + aria

# Anti-patterns
1. NEVER add features not asked for
2. NEVER commit with failing tests
[13 numbered NEVER rules]

# Memory
Durable: MEMORY.md
Loads every turn. ~2k tokens. Every line is load-bearing. Source of truth is the code itself.
✓ Under 500 lines
✓ No prose catalogues
✓ Path-scoped rules
✓ Anti-pattern ledger
✓ No volatile data

Your CLAUDE.md is a compiler for your agent. Every bloated line taxes every prompt.

putting it together · one task, all levers

Fix the login bug. End to end.

01
/r-spec login-bug
2 MIN
Plan mode + ultrathink. Claude reads codebase, writes spec to docs/specs/login-bug.md, runs challenge pass. Primitives: plan mode, MEMORY.md context, spec skill.
02
/r-plan docs/specs/login-bug.md
3 MIN
Decomposes into 3 phases: reproduce, fix, regression test. Writes PLAN.md. Primitives: PLAN.md scope, path-scoped CLAUDE.md rules.
03
/r-build 1-3 --worktree
12 MIN
Dispatcher spawns engineer + qa + reviewer subagents in parallel worktrees. Each phase: engineer writes code, qa runs tests, reviewer adversarial check. Hooks gate every commit. Primitives: subagents, worktrees, hooks, disallowedTools.
04
/r-review
2 MIN
Sentinel pass: four-stage adversarial review across full diff. Catches anything the per-phase reviewer missed. Primitives: reviewer subagent, PostToolUse lint hooks.
05
/r-ship
1 MIN
Changelog, tag, PR. Attribution headers stripped. Version bump follows commit protocol. Primitives: commit hooks, CLAUDE.md rules.
20 minutes, one IC, five primitives composed. Zero yak-shaving. Zero context drift. Everything auditable.

the economics · why caching matters

Prompt caching takes a $2 turn to $0.29.

CACHEABLE REGIONS · per turn
system prompt
~3k tokens · stable, always cached
CLAUDE.md (global + ws + proj)
~5k tokens · stable
MEMORY.md
~2k tokens · index + ~5 auto-loaded · stable
codebase map
~10k tokens · stable across session
conversation tail
variable · not cached
The first four stack up to ~20k stable tokens that replay every turn. Cache writes pay for themselves in two.
UNCACHED · 10 TURNS
~$2.00
10 × ~20k tokens × $5/M  =  $1.00 in
+ ~$1.00 out  =  $2.00
CACHED · 10 TURNS
~$0.29
cache reads (0.1×)  =  $0.10 in
+ $0.19 cache write
+ output unchanged  =  $0.29
~85%
savings on input cost over a long session
Opus 4.7 · $5/M input · $25/M output
cache writes 1.25× base · cache reads 0.1×
ProjectDiscovery reported 59–70% real-world savings. Your numbers will vary with CLAUDE.md size and session length. Every bloated CLAUDE.md line also burns cache writes.

cache writes cost 1.25× base input · cache reads cost 0.1× · break-even after two turns · break-bank after ten

fixture 01 · CLAUDE.md starter

Paste this. Start shipping.

CLAUDE.md
# Project conventions

## Stack
- Language: <your stack>
- Formatter: <prettier / black / gofmt>
- Tests: run before every commit

## Commit protocol
- One commit per logical unit
- Stage files by name · never git add -A
- No Co-Authored-By lines

## Anti-patterns
- NEVER add features I didn't ask for
- NEVER commit with failing tests
- NEVER assume · read the code

## Memory
- Durable rules: MEMORY.md
- Current phase: PLAN.md
PROVEN OUTCOME
Kills most day-one scope creep. The "features I didn't ask for" line alone is the single most-cited rule across Willison, Ronacher, Horthy 2026.
AUTO-LOADS
Claude reads this at session start · no re-explaining, ever. 3% of context budget.
GROWS NATURALLY
Starts at 18 lines · caps at ~200. Every new rule came from a real moment. Trim ruthlessly.

same file works as AGENTS.md for Cursor / Codex / Aider · one source of truth, every tool

fixture 02 · settings.json starter

Five env vars. One hook. That's it.

.claude/settings.json
{
  "env": {
    "ENABLE_TOOL_SEARCH": "true",
    "CLAUDE_CODE_EFFORT_LEVEL": "xhigh",
    "MAX_THINKING_TOKENS": "31999",
    "BASH_MAX_OUTPUT_LENGTH": "128000",
    "CLAUDE_CODE_ATTRIBUTION_HEADER": "0"
  },
  "permissions": { "allow": [
    "Bash(git status:*)", "Bash(git diff:*)",
    "Bash(make test:*)", "Bash(pytest:*)"
  ] },
  "hooks": {
    "PreToolUse": [{ "matcher": "Bash",
      "hooks": [{"type": "command",
        "command": "$HOME/.claude/scripts/pre-commit.sh"}]
    }],
    "PreCompact": [{ "hooks": [{
      "type": "prompt",
      "prompt": "Compaction imminent. Save WIP. Verify MEMORY.md."
    }] }]
  }
}
THE FIVE ENV VARS
Tool-search reduces MCP bloat. Effort dials reasoning depth. Thinking tokens expand the scratchpad. Bash output raises the truncation ceiling. Attribution header strips the Claude signature.
PreToolUse GATE
Hook exits 2 → tool call blocked. Caught 40+ broken commits on rfxn projects in the last month.
PreCompact PROMPT
One injected prompt before compaction fires. WIP never lost to a summarization event again.
ALLOWLIST
Ends the permission-prompt treadmill. Most users click "allow" 40–50×/day without this.

fixture 03 · adversarial reviewer

A reviewer that cannot write code.

.claude/agents/reviewer.md
---
name: reviewer
description: adversarial code review · four-pass gate
model: opus
allowed-tools: [Read, Grep, Glob, Bash(git diff:*)]
disallowedTools: [Edit, Write]
---

You are an adversarial reviewer. You cannot
modify the code you are judging. Run four passes:

PASS 1 · semantic bugs
  masked exit codes, wrong assumptions,
  lint-clean + runtime-broken patterns.

PASS 2 · scope drift
  diff vs PLAN.md · flag anything off-plan.

PASS 3 · regression surface
  behavior changes without a test for the
  old behavior are blockers.

PASS 4 · evidence
  cite file:line for every finding.

Output: APPROVE or CHANGES NEEDED + list.
THE KEY INVARIANT
Key: disallowedTools is the safety. The reviewer can READ everything, WRITE nothing. Cannot amend its own findings.
ISOLATED CONTEXT
Subagent runs in its own context window. Parent's assumptions don't leak in. Findings come back clean.
USE IT
You: "use the reviewer to check this auth module"
Claude: dispatches reviewer · reports back in ~30s
COMPOSES
Drop into any project. Chain with qa + engineer subagents. The dispatcher in /r-build fans these out in parallel worktrees.

war stories · rapid fire

The Greatest Hits

💀
Named a function head() – shadowed /usr/bin/head across entire codebase APF
💥
Moved run() into test namespace – 1,185 of 1,599 tests instantly failed BFD
📦
RPM/DEB packages missing 9 runtime libraries – would ship a non-functional app BFD
👻
Audited functions that don't exist – hallucinated duplicate code bodies APF
🔀
Asked to fix path traversal RCE – fixed a different bug, declared success LMD
Signatures "passed validation" but used syntax LMD silently rejects – dead on arrival Sigforge
🤖
Used -i flag in bats-assert – flag doesn't exist, was testing for literal string "-i" APF

all real. all committed. all caught before release – by governance, not by luck.

the transformation · before / after

What day 1 looks like vs month 6.

DAY 1 · NO HARNESS
~/my-project/
  src/
  tests/
  README.md

No CLAUDE.md. No memory. No hooks.
No plan. No rules. Just vibes.


Every session starts from zero.
Every preference re-explained.
Every mistake repeated.
MONTH 6 · FULL HARNESS
~/my-project/
  src/
  tests/
  CLAUDE.md – 300 lines, team standards
  .claude/settings.json – hooks, perms
~/.claude/
  CLAUDE.md – personal global conventions
  scripts/ – 10 hook scripts
  agents/ – specialized agent definitions
  commands/ – custom slash commands
  projects/*/memory/ – per-project memory
891
sessions in 37 days
148
permission rules
17
custom commands
6
governance profiles
0
ungoverned commits

the proof · real telemetry from one workspace

891 sessions. One developer.

399,863
total messages
87,087
lines added
2,302
git commits
4,597
test cases
Quality
75.6% fully achieved · 53.2% essential
74 specs designed · 52 plans executed
Infrastructure
16 projects · 6 shared libraries
27 AI agent roles · 8 OS targets
2,302 commits. Zero ungoverned. Every commit through the same pipeline: spec, plan, build, ship.

at scale · when the harness pays off

When the agent builds the factory,
not just the product.

THE CHALLENGE
Generate malware detection signatures at scale. Not one – thousands. 9-stage pipeline: collect, import, dedup, classify, regen, export, validate, distribute, stats. 7 signature formats. 4 validation gates.
CURRENT SIGNATURE STOCK
41,427 MD5 hashes
39,378 SHA-256 hashes
3,706 YARA rules
2,297 hex patterns
INCIDENT RESPONSE: POLYSHELL
Active Magento exploitation campaign (IC-5727). Claude helped build real-time modsec rules, signature packages, and blocklists during the incident.
348K+
servers protected
20+
years of signatures
6h
update cycle
prompts instructions governance pipelines
PART V

Growing Up

Where are you on the journey?
Where do you want to be?

the journey · four levels

The Agentic Maturity Model.

L1
Prompts
Ask and pray. No instructions, no memory, no plan.
day 1
"fix the bug"
L2
Instructions
CLAUDE.md with standards. Memory for state. Permission allow-lists.
week 2
GSD-level
L3
Governance
Plans with phases. Hooks for automation. Anti-pattern catalogs. Review gates.
month 2
team-ready
L4
Pipelines
Multi-agent orchestration. Worktree isolation. Automated spec → ship.
month 6+
production-grade
Most teams are at L1. The ones shipping consistently are at L2-L3. L4 is where the tooling disappears and it just works. You don't need L4 to be productive – L2 is a 30-minute investment that changes everything.

the team play · from one IC's rig to team defaults

The Day-30 answer: promote, don't mandate.

The Day-30 failure mode is fragmented adoption – three ICs, three CLAUDE.md files, three review styles, zero shared muscle. The fix isn't a committee. It's a promotion pipeline.

01 · DISCOVER
One IC builds .claude/commands/lint.md. It solves a real pain point. Others try it locally.
02 · PROMOTE
IC moves it to team/shared/.claude/ in a shared repo. Others symlink or copy. Feedback accumulates.
03 · HARDEN
Two rounds of real use. Bugs found, edges sanded. Added to the team's canonical-source repo with tests.
04 · DEPLOY
Ships via generate / sync. All engineers get it on next session. Drift detectable via grep.
PERSONAL (~/.claude/)
Experiments. Your current workflow. Nothing stable yet.
TEAM (shared repo)
Promoted from personal after proof. Reviewed. Versioned. Grepable drift detection.
ORGANIZATION (canonical)
Locked. Audited. Deployed via generator. Audit-trail for regulated orgs.
One IC's good workflow becomes the team's default via the same git workflow you already have. The mechanism doesn't matter – the discipline does.

philosophy · the hard-won rules

Six principles that actually matter.

LINT IS NECESSARY, NOT SUFFICIENT
Automated checks catch syntax. They can't catch semantic bugs, wrong assumptions, or compatibility issues. Green checkmarks are a floor, not a ceiling.
TRUST IS EARNED WITH EVIDENCE
"I checked" is not evidence. Show the grep output, the test result, the commit hash. Verification must be independent of the implementer.
ADVERSARIAL REVIEW BEFORE MERGE
At least one adversarial challenge per change. Issues caught at spec review cost 10 minutes. The same issues post-impl cost hours.
REGRESSION IS FIRST-CLASS
Every behavior change needs a test for the old behavior too. Separate from new feature tests. The scariest bugs are the ones you create while fixing others.
CHALLENGES DEMAND RESPONSES
When the reviewer raises a concern, silence is not an option. Fix it or explain with evidence why it's not real. This applies to AI reviewers too.
FIX CONTRADICTIONS AT THE SOURCE
When a spec contradicts the code, fix the spec first. Silent resolution in the plan creates a gap future readers can't trace.

actionable · steal these today

Where to start, based on where you are.

JUST GETTING STARTED
Create a CLAUDE.md
Even 5 lines: your language, your style, "don't add features I didn't ask for"
5 minutes
Use /compact proactively
Don't wait for auto-compaction. Clear context when you shift tasks.
0 seconds
Set effort level to high
CLAUDE_CODE_EFFORT_LEVEL=high
1 minute
USING IT DAILY
Add a PostToolUse hook
Auto-lint after every file edit. Catches problems before they compound.
10 minutes
Start a MEMORY.md
Record your role, preferences, and "don't repeat this mistake" entries.
5 minutes
Add permission allow-lists
Stop clicking "allow" on git, make, and test commands 50x/day.
5 minutes
POWER USER
Add a PreCompact hook
Inject "save your work" before context compaction. The safety net.
10 minutes
Build an anti-pattern catalog
Every time something goes wrong, add a NEVER rule. Your future self thanks you.
ongoing
Custom status bar & skills
Live context in the status line. Custom slash commands for your workflow.
30 minutes

summary · the three truths

What we've learned.

01 The model is capable. The instructions are the product.
Your CLAUDE.md is more important than your prompt.
02 Every rule is written in blood.
Don't skip governance because you haven't been burned yet.
03 The gap between vibing and engineering is 30 minutes.
Start with CLAUDE.md + memory. Grow from there.
Prompt → instructionsgovernancepipeline
you are here                   →              go here

Prompts to Pipelines

Stop vibing. Start engineering.

Ryan MacDonald
ryan@rfxn.com
|
rfxn.com
R-fx Networks
|
github.com/rfxn
APF · BFD · LMD · RDF

this deck was built by claude code, with claude code, about claude code
using the exact harness patterns described in this presentation
no vibes were harmed in the making of this deck

1 / 40