$ claude --dangerously-skip-permissions
  

best practices · agentic development · april 2026

Prompts to Pipelines

How to stop vibing and start engineering
with Claude Code

Opus 4.7 · 1M context · xhigh effort · cache reads at 0.1× base rate

Ryan MacDonald · rmacdonald@nexcess.net · ryan@rfxn.com

for the uninitiated · 60-second primer

Claude Code is an AI agent that lives in your terminal.

terminal

$ claude
Claude Code OPUS 4.7 · /help for commands
You: Add rate limiting to the API

            • Reads your codebase

            • Writes code, runs tests

            • Creates commits, opens PRs

            • Spawns sub-agents for complex work

It reads

Files, git history, docs, tests – the full codebase

It writes

Code, tests, configs, documentation – real files

It runs

Shell commands, test suites, linters, builds – validates its own work

It orchestrates

Sub-agents, parallel workers, git worktrees – scales itself

Think of it as a senior engineer who never sleeps, reads 10x faster than you, and has very strong opinions about your code.

day one · the spark

Everyone starts with a prompt.

~ / my-project

$ claude

      You: "fix the login bug"

      Claude: *reads 47 files, writes 3, runs tests*

      Claude: "Done. The issue was..."

      You: "...holy shit."

That first moment when the machine actually understands your code.

week one · the honeymoon

The honeymoon phase.

10x

"I wrote a whole feature in 20 minutes"

Magic

"It fixed a bug I didn't even know about"

Vibes

"I just describe what I want and it appears"

"Why would anyone need a framework for this?"

Ask me again on Day 30 · when the plateaus hit, rules drift, and teammates each build their own rig. Those are the plagues coming up.

the problem space · what goes wrong

Five things that will break your workflow
if you don't see them coming.

01 · CONTEXT DEATH

200K tokens sounds infinite. It's not. Your agent forgets your name by tool call 150.

02 · AGENT AMNESIA

New session, who dis? Everything you taught it yesterday is gone.

03 · SCOPE CREEP

You asked for a bug fix. It rewrote your architecture. The yak shaving is real.

04 · QUALITY EROSION

Tool call 1: brilliant. Tool call 200: introduces the bug it was hired to fix.

05 · "WORKS ON MY AI"

Your teammate's Claude produces completely different code from the same prompt.

plague 01 · context death

Your context window is a hotel room, not a house.

CONTEXT BUDGET · 200K TOKENS

System

~30K

CLAUDE.md

~20K

File reads

~70K

Your work

~50K

Tool results

~30K

TOTAL: 200K · COMPACTION TRIGGERED

WHAT IS COMPACTION?

When the context window fills up, Claude summarizes the conversation to free space. Like cramming for an exam – you remember the gist, but lose the details.

What compaction loses

The "why" behind decisions
Failed approaches already tried
Subtle constraints you mentioned once
That one detail from file read #3

REAL EXAMPLE

We built multi-VM infrastructure in one session. By the time we needed to save what we learned about Docker DNS, the context had already compacted away the details. Lesson: save memory at milestones, not at the end.

plague 02 · agent amnesia

Every session starts at zero.

SESSION 1 · TUESDAY

        ✓ Learned the codebase

        ✓ Found the bug

        ✓ Understood the constraints

        ✓ Knew your preferences

        ✓ Had full context

→

SESSION 2 · WEDNESDAY

        ✗ "What project is this?"

        ✗ Re-reads same 47 files

        ✗ Tries approach you rejected

        ✗ Ignores your naming convention

        ✗ Repeats yesterday's mistake

      // Where your time actually goes:

      first_30_min = re-establishing context

      next_20_min = re-learning your preferences

      remaining = actual productive work

imagine onboarding a new hire every single morning. that's what you're doing.

plague 03 · the yak shaving spiral

You said "fix the button".
It heard "redesign the application".

1

          Fix button color · what you asked for
        

2

          "While I'm here, let me refactor the CSS"
        

3

          "The component structure could be cleaner"
        

4

          "I've introduced a design system"
        

5

          "Tests are failing, let me fix them"
        

×

          Build broken · 47 files changed · button still wrong color
        

the fix: "don't add features, refactor code, or make improvements beyond what was asked"

plague 04 · quality erosion

Lint passes. Tests pass.
Production breaks.

real code that passed every check

          local result=$(validate_input "$path")

          if [[ $? -ne 0 ]]; then

            handle_error # never runs

          fi

The local keyword masks the exit code. $? is always 0. The error handler never executes. And every automated check says "all good."

AUTOMATED CHECKS

✓ bash -n – syntax OK
✓ shellcheck – no warnings
✓ unit tests – passing
✓ integration tests – passing
✗ production – silent data loss

THE INSIGHT

There are 6 classes of bugs that no linter can catch. Semantic correctness requires adversarial review, not just green checkmarks.

plague 05 · the real version

One codebase. Four tools. Four sets of rules.

9 AM · Claude Code

reads CLAUDE.md

        ✗ processPayment()

        team uses snake_case

        ✗ import moment

        banned, use date-fns

11 AM · Cursor

reads .cursorrules

        ✗ try/catch pattern

        team uses Result<T,E>

        ✗ class AuthError

        shared errors exist

2 PM · Copilot

reads copilot-instructions.md

        ✗ lodash import

        banned external util

        ✗ const user: any

        strict typing enforced

4 PM · Gemini CLI

reads GEMINI.md

        ✗ raw SQL execute

        must use query builder

        ✗ migration_001.sql

        must use YYYYMMDD

4 tools · 4 rule files · 4 standards · 8 cleanup PRs this sprint.
Every tool reads its own instruction file. None of them agree. Your senior devs become AI janitors.

before we go further · the mechanics

What one turn actually costs.

A turn is one user message + one model response. Every turn re-sends the entire payload to the model. Your leverage is making that payload smaller or cacheable.

THE PUNCHLINE · every word in CLAUDE.md is paid on every turn · every subagent opens a separate turn budget · every tool schema taxes every conversation

prompt → hope → revert

→

rules → plan → verify → ship

PART II

The Harness

What if, instead of hoping the AI does the right thing,
you could teach it what right looks like?

spoiler: you can. and it's just files.

the solution · harness-based development

What is an agent harness?

"A harness is the set of files, conventions, and automation
that turn a general-purpose LLM into a reliable teammate
that remembers, follows rules, and gets things done."

WITHOUT HARNESS

Prompt → hope → review → revert → re-prompt → hope harder

WITH HARNESS

Rules → plan → execute → verify → checkpoint → ship

No new tools. No plugins. Just a CLAUDE.md file, a settings.json, and some conventions.

the foundation · CLAUDE.md

Your CLAUDE.md is the constitution.
Everything else is legislation.

~/.claude/CLAUDE.md

Global · your personal preferences across all projects

YOU

project/CLAUDE.md

Project · team standards, checked into git. Everyone gets these.

TEAM

.claude/settings.json

Settings · permissions, hooks, behavior controls

CONFIG

MEMORY.md + PLAN.md

State · survives between sessions. The agent's long-term memory.

STATE

START WITH THESE 5 LINES

          1. Your language/framework conventions

          2. "Don't add features I didn't ask for"

          3. Commit message format

          4. Testing requirements

          5. Things that have gone wrong before

CLAUDE.md · from a real project (ours)

          # Shell Standards

          - Shebang: #!/bin/bash

          - Use $() not backticks

          - Double-quote all variables

          - grep -E not egrep

          - command -v not which

          # Commit Protocol

          - One commit per logical unit

          - Format: VERSION | Description

          - No Co-Authored-By lines

          # Common Anti-Patterns

          - NEVER local var=$(...)

          - NEVER cd without || exit

          - NEVER bare cp/mv/rm

          # ...300 lines of hard-won rules

in practice · what you actually type

Four commands. Entire lifecycle.

/r-spec

          You: "Add rate limiting to the API with per-tenant quotas"

          Planner agent researches codebase, brainstorms approaches, writes spec → Reviewer challenges assumptions before a single line of code

Output: docs/specs/rate-limiting.md | Catches scope issues before planning

/r-plan

          Reads spec + codebase → decomposes into numbered phases with dependencies → scope-classifies each phase (focused / cross-cutting / sensitive) → challenge review
        

Output: PLAN.md with 5 phases | Vertical slices, not horizontal layers

/r-build

          Dispatcher reads PLAN.md → dispatches engineer agents to git worktrees (parallel when safe) → each phase: TDD red/green/refactor → QA verifies → Sentinel reviews
        

Output: committed, tested code | Scope-derived quality gates per phase

/r-ship

          Preflight checks → changelog generation → version bump → release prep (RPM/DEB/tar) → final QA + reviewer pass → tag & publish
        

Output: tagged release + packages | Same pipeline every time, zero variance

The framework makes the decision tree deterministic. Same input, same process, same quality – whether it's a one-line fix or a 20-phase refactor.

the state layer · why it remembers

Four files give the agent a persistent brain.

~/my-project/ · what the agent reads on startup

          CLAUDE.md – the rules (checked into git)

          PLAN.md – current work state

          .claude/settings.json – hooks & perms

          ~/.claude/

            CLAUDE.md – your personal rules

            projects/*/memory/

              MEMORY.md – learned context

              user_role.md, feedback_*.md, ...

THE KEY INSIGHT

No database. No cloud service. No setup wizard.
It's just files. The agent reads them every session, automatically. That's how it remembers your rules, your preferences, and where you left off.

SESSION 1 · TUESDAY

✓ Reads CLAUDE.md – learns your standards
✓ Reads MEMORY.md – knows your role
✓ You correct it: "use snake_case"
✎ Writes feedback_style.md – saves the lesson

        ↓  files persist on disk  ↓
      

SESSION 2 · WEDNESDAY

✓ Reads CLAUDE.md – same standards
✓ Reads MEMORY.md – sees feedback_style.md
✓ Already knows snake_case preference
✓ Reads PLAN.md – picks up where you left off

CLAUDE.md

solves Plague 05
"works on my AI"

MEMORY.md

solves Plague 02
agent amnesia

PLAN.md

solves Plague 03
scope creep

settings.json

solves Plague 04
quality erosion

PART III

The Toolbox

Four primitives that solve the five plagues.
Memory. Hooks. Plans. Agents.

all of these are built into Claude Code. no extensions required.

the primitive map · four questions, first yes wins

Four primitives. Ask these four questions.

Ask them in order. First yes wins. Nine times out of ten you land your answer before reaching MCP.

01

Need to enforce a policy the model can't bypass?

block bad commits · redact secrets · rewrite tool input

→

HOOK

PreToolUse / PostToolUse · shell command · deterministic

02

Teaching a repeatable procedure the model should invoke itself?

security review · release prep · log triage · common workflows

→

SKILL

.claude/skills/<name>/SKILL.md · loads on description match

03

Will this task pollute main context with research you won't re-read?

grepping 50 files · reading large PDFs · parallel multi-file edits

→

SUBAGENT

.claude/agents/<role>.md · context dies on return

04

Do you need new capability from an external system?

Datadog · Sentry · GitHub · proprietary APIs · live data

→

MCP

server exposes tools · curate hard · 15-tool cap

They compose. hook enforces policy · skill encodes procedure · subagent isolates context · MCP exposes capability · a real workflow uses all four

primitive 01 · memory

Memory: teaching your agent to remember.

MEMORY.md · auto-loaded every session

          # Project Memory Index

          - [User role](user_role.md) – senior eng, Bash

          - [Testing](feedback_testing.md) – always run Rocky 9

          - [Style](feedback_style.md) – no trailing summaries

          - [Project](project_auth.md) – auth rewrite, compliance

feedback_testing.md

          ---

          type: feedback

          ---

          Integration tests must hit a real DB.

          **Why:** mock/prod divergence broke migration

          **How:** never mock DB in test suite

FOUR MEMORY TYPES

user – who you are, your expertise

feedback – corrections & confirmations

project – goals, deadlines, context

reference – where to find things

FOR BEGINNERS

MEMORY.md loads automatically at the start of every session. You don't have to do anything – Claude reads it and remembers what you've taught it.

200-LINE CAP

Lines past 200 are truncated. Memory bloat defeats the purpose. Keep the index lean – full content lives in separate files.

PRO TIP

Save memory at milestones, not at session end. By the time you're done, the context may have already compacted away the details.

the newest primitive · skills · on-demand guidance

CLAUDE.md always loads. Skills load on demand.

TOKEN ECONOMICS · 500 LINES EACH

CLAUDE.md ~8,000 tokens × EVERY turn

Loads on session start. Paid every turn. Session-long tax.

SKILL.md ~200 bytes until invoked

Just name + description in context. Full file loads once when model invokes it.

10–50× cheaper per session on average. The more skills you have, the bigger the win.

.claude/skills/security-review/SKILL.md

          ---

          name: security-review

          description: Review diffs for OWASP top-10,

            SQL injection, XSS

          allowed-tools: Bash(git diff:*), Read, Grep

          disable-model-invocation: false

          ---

          # When invoked

          1. Run `git diff --staged`

          2. Scan for the 10 classes in reference.md

          3. Output findings in CWE format

PORTABLE

Same skill works in Claude.ai, Claude Code, and API. Subagents are Code-only. Skills win on reach.

RULE OF THUMB · if guidance is over 100 lines or team-shared, it belongs in a skill, not CLAUDE.md. Anthropic shipped ~40 first-party skills in April 2026. /skills lists them.

primitive 02 · hooks

Hooks: the control surface nobody talks about.

.claude/settings.json · real config from our setup

          "hooks": {

            "PreToolUse": [{

              "matcher": "Bash",

              "hooks": [{

                "type": "command",

                "command": "validate.sh"

              }]

            }],

            "PostToolUse": [{

              "matcher": "Edit|Write",

              "hooks": [{ "post-edit-lint.sh" }]

            }],

            "PreCompact": [{

              "type": "prompt",

              "Save work state!"

            }]

          }

WHAT ARE HOOKS?

Code that runs automatically when Claude does something. Like git hooks, but for AI tool calls. Two types: command (runs a script) and prompt (injects instructions).

PreToolUse

Validate before any tool runs. Block destructive commands.

PostToolUse

Auto-lint after every edit. Catch issues before they compound.

PreCompact

Save work-in-progress before context compaction. The safety net.

SubagentStop

Cleanup and verify when a sub-agent finishes its work.

+ 15 more events

SessionStart, Notification, Stop, FileChanged, WorktreeCreate...

primitive 03 · subagents

Subagents: divide, conquer, and verify.

DISPATCHER

reads plan · dispatches phases · enforces quality gates

↓

WORKTREE A

Phase 1 · isolated git branch

WORKTREE B

Phase 2 · isolated git branch

MAIN

QA gate · sentinel review

Self-contained payloads – subagents can't prompt the user or access parent state

Commit hash = delivery – no hash in the result means the work didn't land

One file owner per file – two agents writing the same file = guaranteed merge conflict

Verify with git log – never trust the agent's summary; check the actual commits

primitive 04 · settings & secrets

The settings.json nobody reads.

EFFORT LEVEL

        "env": {

          "CLAUDE_CODE_EFFORT_LEVEL": "high"

        }

        # low/medium/high – controls reasoning depth

KILL THE ATTRIBUTION

        "attribution": {

          "commit": "",

          "pr": ""

        }

        # empty = no "Co-Authored-By: Claude" lines

PERMISSION ALLOW-LISTS

        "permissions": {

          "allow": [

            "Bash(git *)",

            "Bash(make *)",

            "Read(~/.claude)"

          ]

        }

        # no more permission prompts for safe commands

CUSTOM STATUS BAR

        "statusLine": {

          "type": "command",

          "command": "context-bar.sh"

        }

        # live data in the status line – tokens, branch, anything

DID YOU KNOW?

/fast – same Opus model, 2.5x faster (not a downgrade)
Shift+Tab – cycle permission modes in the prompt
Esc – interrupt mid-generation (not just Ctrl+C)
prompt hooks – inject instructions at specific events

ENVIRONMENT VARIABLES

DISABLE_AUTO_COMPACT=1 – you control when
MAX_THINKING_TOKENS=20000 – thinking budget
BASH_MAX_OUTPUT_LENGTH – bash output cap
CLAUDE_CODE_GLOB_HIDDEN=1 – see dotfiles

the compiler · CLAUDE.md craft · elite vs bloated

The difference between a documentation dump and a compiler.

BLOATED · 2000 LINES

        # Project History

        ## 2022: Founded by...

        ## 2023: Pivot to...

        # Detailed Architecture

        Our system uses a microservices...

        [15 paragraphs of prose]

        # Code Examples

        ```typescript

        [200 lines duplicated from src/]

        ```

        # Every Config Variable

        [catalog of 60 env vars]

Loads every turn. 40k tokens burned per session. Drifts silently. Nobody reads it.

ELITE · ~200 LINES

        # Stack

        5 lines: languages, frameworks, runtime

        # Commit protocol

        8 lines: message format, tagging,

        staging rules

        # Path-scoped rules

        src/api/**: require zod

        src/components/**: Tailwind + aria

        # Anti-patterns

        1. NEVER add features not asked for

        2. NEVER commit with failing tests

        [13 numbered NEVER rules]

        # Memory

        Durable: MEMORY.md

Loads every turn. ~2k tokens. Every line is load-bearing. Source of truth is the code itself.

✓ Under 500 lines

✓ No prose catalogues

✓ Path-scoped rules

✓ Anti-pattern ledger

✓ No volatile data

Your CLAUDE.md is a compiler for your agent. Every bloated line taxes every prompt.

putting it together · one task, all levers

Fix the login bug. End to end.

01

/r-spec login-bug

2 MIN

Plan mode + ultrathink. Claude reads codebase, writes spec to docs/specs/login-bug.md, runs challenge pass. Primitives: plan mode, MEMORY.md context, spec skill.

02

/r-plan docs/specs/login-bug.md

3 MIN

Decomposes into 3 phases: reproduce, fix, regression test. Writes PLAN.md. Primitives: PLAN.md scope, path-scoped CLAUDE.md rules.

03

/r-build 1-3 --worktree

12 MIN

Dispatcher spawns engineer + qa + reviewer subagents in parallel worktrees. Each phase: engineer writes code, qa runs tests, reviewer adversarial check. Hooks gate every commit. Primitives: subagents, worktrees, hooks, disallowedTools.

04

/r-review

2 MIN

Sentinel pass: four-stage adversarial review across full diff. Catches anything the per-phase reviewer missed. Primitives: reviewer subagent, PostToolUse lint hooks.

05

/r-ship

1 MIN

Changelog, tag, PR. Attribution headers stripped. Version bump follows commit protocol. Primitives: commit hooks, CLAUDE.md rules.

20 minutes, one IC, five primitives composed. Zero yak-shaving. Zero context drift. Everything auditable.

the economics · why caching matters

Prompt caching takes a $2 turn to $0.29.

CACHEABLE REGIONS · per turn

system prompt
~3k tokens · stable, always cached

CLAUDE.md (global + ws + proj)
~5k tokens · stable

MEMORY.md
~2k tokens · index + ~5 auto-loaded · stable

codebase map
~10k tokens · stable across session

conversation tail
variable · not cached

The first four stack up to ~20k stable tokens that replay every turn. Cache writes pay for themselves in two.

UNCACHED · 10 TURNS

~$2.00

          10 × ~20k tokens × $5/M  =  $1.00 in

          + ~$1.00 out  =  $2.00

CACHED · 10 TURNS

~$0.29

          cache reads (0.1×)  =  $0.10 in

          + $0.19 cache write

          + output unchanged  =  $0.29

~85%

savings on input cost over a long session

        Opus 4.7 · $5/M input · $25/M output

        cache writes 1.25× base · cache reads 0.1×

ProjectDiscovery reported 59–70% real-world savings. Your numbers will vary with CLAUDE.md size and session length. Every bloated CLAUDE.md line also burns cache writes.

cache writes cost 1.25× base input · cache reads cost 0.1× · break-even after two turns · break-bank after ten

fixture 01 · CLAUDE.md starter

Paste this. Start shipping.

CLAUDE.md

        # Project conventions

        ## Stack

        - Language: <your stack>

        - Formatter: <prettier / black / gofmt>

        - Tests: run before every commit

        ## Commit protocol

        - One commit per logical unit

        - Stage files by name · never git add -A

        - No Co-Authored-By lines

        ## Anti-patterns

        - NEVER add features I didn't ask for

        - NEVER commit with failing tests

        - NEVER assume · read the code

        ## Memory

        - Durable rules: MEMORY.md

        - Current phase: PLAN.md

PROVEN OUTCOME

Kills most day-one scope creep. The "features I didn't ask for" line alone is the single most-cited rule across Willison, Ronacher, Horthy 2026.

AUTO-LOADS

Claude reads this at session start · no re-explaining, ever. 3% of context budget.

GROWS NATURALLY

Starts at 18 lines · caps at ~200. Every new rule came from a real moment. Trim ruthlessly.

same file works as AGENTS.md for Cursor / Codex / Aider · one source of truth, every tool

fixture 02 · settings.json starter

Five env vars. One hook. That's it.

.claude/settings.json

{

  "env": {

    "ENABLE_TOOL_SEARCH": "true",

    "CLAUDE_CODE_EFFORT_LEVEL": "xhigh",

    "MAX_THINKING_TOKENS": "31999",

    "BASH_MAX_OUTPUT_LENGTH": "128000",

    "CLAUDE_CODE_ATTRIBUTION_HEADER": "0"

  },

  "permissions": { "allow": [

    "Bash(git status:*)", "Bash(git diff:*)",

    "Bash(make test:*)", "Bash(pytest:*)"

  ] },

  "hooks": {

    "PreToolUse": [{ "matcher": "Bash",

      "hooks": [{"type": "command",

        "command": "$HOME/.claude/scripts/pre-commit.sh"}]

    }],

    "PreCompact": [{ "hooks": [{

      "type": "prompt",

      "prompt": "Compaction imminent. Save WIP. Verify MEMORY.md."

    }] }]

  }

}

THE FIVE ENV VARS

Tool-search reduces MCP bloat. Effort dials reasoning depth. Thinking tokens expand the scratchpad. Bash output raises the truncation ceiling. Attribution header strips the Claude signature.

PreToolUse GATE

Hook exits 2 → tool call blocked. Caught 40+ broken commits on rfxn projects in the last month.

PreCompact PROMPT

One injected prompt before compaction fires. WIP never lost to a summarization event again.

ALLOWLIST

Ends the permission-prompt treadmill. Most users click "allow" 40–50×/day without this.

fixture 03 · adversarial reviewer

A reviewer that cannot write code.

.claude/agents/reviewer.md

---

name: reviewer

description: adversarial code review · four-pass gate

model: opus

allowed-tools: [Read, Grep, Glob, Bash(git diff:*)]

disallowedTools: [Edit, Write]

---

You are an adversarial reviewer. You cannot

modify the code you are judging. Run four passes:

PASS 1 · semantic bugs

  masked exit codes, wrong assumptions,

  lint-clean + runtime-broken patterns.

PASS 2 · scope drift

  diff vs PLAN.md · flag anything off-plan.

PASS 3 · regression surface

  behavior changes without a test for the

  old behavior are blockers.

PASS 4 · evidence

  cite file:line for every finding.

Output: APPROVE or CHANGES NEEDED + list.

THE KEY INVARIANT

Key: disallowedTools is the safety. The reviewer can READ everything, WRITE nothing. Cannot amend its own findings.

ISOLATED CONTEXT

Subagent runs in its own context window. Parent's assumptions don't leak in. Findings come back clean.

USE IT

You: "use the reviewer to check this auth module"
Claude: dispatches reviewer · reports back in ~30s

COMPOSES

Drop into any project. Chain with qa + engineer subagents. The dispatcher in /r-build fans these out in parallel worktrees.

war stories · rapid fire

The Greatest Hits

💀

Named a function head() – shadowed /usr/bin/head across entire codebase APF

💥

Moved run() into test namespace – 1,185 of 1,599 tests instantly failed BFD

📦

RPM/DEB packages missing 9 runtime libraries – would ship a non-functional app BFD

👻

Audited functions that don't exist – hallucinated duplicate code bodies APF

🔀

Asked to fix path traversal RCE – fixed a different bug, declared success LMD

✅

Signatures "passed validation" but used syntax LMD silently rejects – dead on arrival Sigforge

🤖

Used -i flag in bats-assert – flag doesn't exist, was testing for literal string "-i" APF

all real. all committed. all caught before release – by governance, not by luck.

the transformation · before / after

What day 1 looks like vs month 6.

DAY 1 · NO HARNESS

        ~/my-project/

          src/

          tests/

          README.md

        No CLAUDE.md. No memory. No hooks.
No plan. No rules. Just vibes.

        Every session starts from zero.

        Every preference re-explained.

        Every mistake repeated.

MONTH 6 · FULL HARNESS

        ~/my-project/

          src/

          tests/

          CLAUDE.md – 300 lines, team standards

          .claude/settings.json – hooks, perms

        ~/.claude/

          CLAUDE.md – personal global conventions

          scripts/ – 10 hook scripts

          agents/ – specialized agent definitions

          commands/ – custom slash commands

          projects/*/memory/ – per-project memory

891

sessions in 37 days

148

permission rules

17

custom commands

6

governance profiles

0

ungoverned commits

the proof · real telemetry from one workspace

891 sessions. One developer.

399,863

total messages

87,087

lines added

2,302

git commits

4,597

test cases

Quality

75.6% fully achieved · 53.2% essential
74 specs designed · 52 plans executed

Infrastructure

16 projects · 6 shared libraries
27 AI agent roles · 8 OS targets

2,302 commits. Zero ungoverned. Every commit through the same pipeline: spec, plan, build, ship.

at scale · when the harness pays off

When the agent builds the factory,
not just the product.

THE CHALLENGE

Generate malware detection signatures at scale. Not one – thousands. 9-stage pipeline: collect, import, dedup, classify, regen, export, validate, distribute, stats. 7 signature formats. 4 validation gates.

CURRENT SIGNATURE STOCK

41,427 MD5 hashes

39,378 SHA-256 hashes

3,706 YARA rules

2,297 hex patterns

INCIDENT RESPONSE: POLYSHELL

Active Magento exploitation campaign (IC-5727). Claude helped build real-time modsec rules, signature packages, and blocklists during the incident.

348K+

servers protected

20+

years of signatures

6h

update cycle

      prompts
      instructions
      governance
      pipelines
    

PART V

Growing Up

Where are you on the journey?
Where do you want to be?

the journey · four levels

The Agentic Maturity Model.

L1

Prompts

Ask and pray. No instructions, no memory, no plan.

day 1
"fix the bug"

L2

Instructions

CLAUDE.md with standards. Memory for state. Permission allow-lists.

week 2
GSD-level

L3

Governance

Plans with phases. Hooks for automation. Anti-pattern catalogs. Review gates.

month 2
team-ready

L4

Pipelines

Multi-agent orchestration. Worktree isolation. Automated spec → ship.

month 6+
production-grade

Most teams are at L1. The ones shipping consistently are at L2-L3. L4 is where the tooling disappears and it just works. You don't need L4 to be productive – L2 is a 30-minute investment that changes everything.

the team play · from one IC's rig to team defaults

The Day-30 answer: promote, don't mandate.

The Day-30 failure mode is fragmented adoption – three ICs, three CLAUDE.md files, three review styles, zero shared muscle. The fix isn't a committee. It's a promotion pipeline.

01 · DISCOVER

One IC builds .claude/commands/lint.md. It solves a real pain point. Others try it locally.

→

02 · PROMOTE

IC moves it to team/shared/.claude/ in a shared repo. Others symlink or copy. Feedback accumulates.

→

03 · HARDEN

Two rounds of real use. Bugs found, edges sanded. Added to the team's canonical-source repo with tests.

→

04 · DEPLOY

Ships via generate / sync. All engineers get it on next session. Drift detectable via grep.

PERSONAL (~/.claude/)

Experiments. Your current workflow. Nothing stable yet.

TEAM (shared repo)

Promoted from personal after proof. Reviewed. Versioned. Grepable drift detection.

ORGANIZATION (canonical)

Locked. Audited. Deployed via generator. Audit-trail for regulated orgs.

One IC's good workflow becomes the team's default via the same git workflow you already have. The mechanism doesn't matter – the discipline does.

philosophy · the hard-won rules

Six principles that actually matter.

LINT IS NECESSARY, NOT SUFFICIENT

Automated checks catch syntax. They can't catch semantic bugs, wrong assumptions, or compatibility issues. Green checkmarks are a floor, not a ceiling.

TRUST IS EARNED WITH EVIDENCE

"I checked" is not evidence. Show the grep output, the test result, the commit hash. Verification must be independent of the implementer.

ADVERSARIAL REVIEW BEFORE MERGE

At least one adversarial challenge per change. Issues caught at spec review cost 10 minutes. The same issues post-impl cost hours.

REGRESSION IS FIRST-CLASS

Every behavior change needs a test for the old behavior too. Separate from new feature tests. The scariest bugs are the ones you create while fixing others.

CHALLENGES DEMAND RESPONSES

When the reviewer raises a concern, silence is not an option. Fix it or explain with evidence why it's not real. This applies to AI reviewers too.

FIX CONTRADICTIONS AT THE SOURCE

When a spec contradicts the code, fix the spec first. Silent resolution in the plan creates a gap future readers can't trace.

actionable · steal these today

Where to start, based on where you are.

JUST GETTING STARTED

Create a CLAUDE.md
Even 5 lines: your language, your style, "don't add features I didn't ask for"

5 minutes

Use /compact proactively
Don't wait for auto-compaction. Clear context when you shift tasks.

0 seconds

Set effort level to high
CLAUDE_CODE_EFFORT_LEVEL=high

1 minute

USING IT DAILY

Add a PostToolUse hook
Auto-lint after every file edit. Catches problems before they compound.

10 minutes

Start a MEMORY.md
Record your role, preferences, and "don't repeat this mistake" entries.

5 minutes

Add permission allow-lists
Stop clicking "allow" on git, make, and test commands 50x/day.

5 minutes

POWER USER

Add a PreCompact hook
Inject "save your work" before context compaction. The safety net.

10 minutes

Build an anti-pattern catalog
Every time something goes wrong, add a NEVER rule. Your future self thanks you.

ongoing

Custom status bar & skills
Live context in the status line. Custom slash commands for your workflow.

30 minutes

summary · the three truths

What we've learned.

01 The model is capable. The instructions are the product.
Your CLAUDE.md is more important than your prompt.

02 Every rule is written in blood.
Don't skip governance because you haven't been burned yet.

03 The gap between vibing and engineering is 30 minutes.
Start with CLAUDE.md + memory. Grow from there.

      Prompt → instructions → governance → pipeline

      you are here                   →              go here

Prompts to Pipelines

Stop vibing. Start engineering.

Ryan MacDonald
ryan@rfxn.com
|
rfxn.com
R-fx Networks
|
github.com/rfxn
APF · BFD · LMD · RDF

this deck was built by claude code, with claude code, about claude code
using the exact harness patterns described in this presentation
no vibes were harmed in the making of this deck

Prompts to Pipelines

How to stop vibing and start engineeringwith Claude Code

Claude Code is an AI agent that lives in your terminal.

Everyone starts with a prompt.

The honeymoon phase.

Five things that will break your workflowif you don't see them coming.

Your context window is a hotel room, not a house.

Every session starts at zero.

You said "fix the button".It heard "redesign the application".

Lint passes. Tests pass.Production breaks.

One codebase. Four tools. Four sets of rules.

What one turn actually costs.

The Harness

What is an agent harness?

Your CLAUDE.md is the constitution.Everything else is legislation.

Four commands. Entire lifecycle.

Four files give the agent a persistent brain.

The Toolbox

Four primitives. Ask these four questions.

Memory: teaching your agent to remember.

CLAUDE.md always loads. Skills load on demand.

Hooks: the control surface nobody talks about.

Subagents: divide, conquer, and verify.

The settings.json nobody reads.

The difference between a documentation dump and a compiler.

Fix the login bug. End to end.

Prompt caching takes a $2 turn to $0.29.

Paste this. Start shipping.

Five env vars. One hook. That's it.

A reviewer that cannot write code.

The Greatest Hits

What day 1 looks like vs month 6.

891 sessions. One developer.

When the agent builds the factory,not just the product.

Growing Up

The Agentic Maturity Model.

The Day-30 answer: promote, don't mandate.

Six principles that actually matter.

Where to start, based on where you are.

What we've learned.

Prompts to Pipelines

How to stop vibing and start engineering
with Claude Code

Five things that will break your workflow
if you don't see them coming.

You said "fix the button".
It heard "redesign the application".

Lint passes. Tests pass.
Production breaks.

Your CLAUDE.md is the constitution.
Everything else is legislation.

When the agent builds the factory,
not just the product.