Action Tiers & Safety Gates

The agent classifies. The Runner enforces. Trust is layered, never single-source.

Every action blacklight takes is classified into one of five tiers. The tier determines gate behavior: auto-execute, suggested-with-diff-confirm, destructive-with-explicit-yes, or denied-by-default. Tier is authored by the agent and written into bl-case/pending/<step-id>.json as action_tier: auto|suggested|destructive. The Runner enforces the gate based on this field plus the verb class. Trust from the agent is never single-source.

The five tiers

Tier	Examples	Gate behavior
Read-only	`observe `, `consult `, `case show/log/list`	Auto-execute; no confirm; no audit write beyond standard case ledger
Reversible, low-risk	`defend firewall <ip>` (new block), `defend sig` (after corpus-FP-pass)	Auto-execute + Slack/stdout notification + 15-minute operator veto window (via `bl defend firewall --remove <ip>`); ledger entry created
Reversible, high-impact	`defend modsec` (new rule)	Suggest → operator reviews diff → explicit `bl run --yes` to apply; `apachectl configtest` pre-flight mandatory
Destructive	`clean htaccess`, `clean cron`, `clean proc`, `clean file`, `defend modsec --remove`	Diff shown (for file edits) or capture-then-kill (for proc); explicit `--yes` per-operation required; no batch auto-confirm; backup written before apply
Unknown	Any bash command the agent proposes that does not map to a known verb	Deny by default; operator must invoke `bl run <step-id> --unsafe --yes` explicitly; discouraged

Read-only: auto-runs

bl observe, bl consult, and bl case show/log/list are auto-tier. They do not write to host state. They emit to the case ledger and write evidence records to bl-case/<case>/evidence/. There is no operator confirmation prompt. The Runner does not pause.

This matters because the curator iterates on observation steps frequently, a typical case has 8–15 observation steps before the first defense or clean step. If each one needed a confirmation, the operator's hand would be on y for the whole case.

Reversible, low-risk: apply-and-notify

A new firewall block on a fresh IP is reversible (bl defend firewall --remove <ip>), low-impact (only that one IP is affected), and high-value (every minute the block isn't in place is a minute the attacker can re-pivot). The gate behavior here:

CDN-safe-list pre-check (internal allowlist + ASN lookup against a public WHOIS cache).
If clean → apply, write ledger entry to bl-case/<case>/actions/applied/<act-id>.json with a retire_after hint, emit a notification.
Operator has a 15-minute veto window during which bl defend firewall --remove <ip> will roll back the block and revert the ledger.
After 15 minutes, the block is committed.

defend sig follows the same pattern after the FP-corpus gate passes (zero false positives against /var/lib/bl/fp-corpus/). YARA signatures are auto-tier iff FP gate passes.

Reversible, high-impact: diff-and-yes

A new ModSec rule modifies the request-handling pipeline of every site on the host. Even when reversible, it is high-impact enough that the operator must see the diff and explicitly confirm.

bl-defend 2026-04-24T04:27:15Z, CASE-2026-0017 step s-09
Target: /etc/apache2/mods-enabled/bl-CASE-2026-0017-941999.conf

Diff (proposed):
   +SecRule REQUEST_FILENAME "@rx \.php/[^/]+\.(jpg|png|gif)$" \
   +    "id:941999,phase:2,deny,log,msg:'polyshell double-ext staging'"

apachectl -t ... OK (sandbox)
Apply? [y/N/diff-full/explain/abort]

Pre-flight: apachectl -t runs in the curator's sandbox before the rule is offered. The diff shown is the literal file write; diff-full shows the whole before/after; explain requests the curator's reasoning field from the pending-step JSON; abort cancels and marks the step operator-rejected (the curator sees this and may revise).

Destructive: diff, backup, explicit per-op yes

Every bl clean operation is destructive. Five mechanical disciplines apply.

Diff shown before apply

For file edits (clean htaccess, clean cron):

bl-clean 2026-04-24T04:27:15Z, CASE-2026-0017 step s-10
Target: /home/sitefoo/.../.htaccess

Diff (proposed):
   -  <FilesMatch "\.php$">
   -      Require all denied
   -  </FilesMatch>
   +  # (line removed, injected block, per agent analysis)

Backup will be written to: /var/lib/bl/backups/2026-04-24T04-27-15Z.htaccess
Apply? [y/N/diff-full/explain/abort]

Backup before apply

Every bl clean operation writes a pre-apply backup to /var/lib/bl/backups/<ISO-ts>.<hash>.<basename>. The manifest tracks backups; bl case log lists them; bl clean --undo <backup-id> restores.

--dry-run contract

Every bl clean subcommand supports --dry-run. Dry-run shows the full diff and backup path but takes no action and writes nothing. Dry-run success is required before a non-dry-run is attempted, the Runner enforces this.

Quarantine, not delete

bl clean file never unlinks. Files move to /var/lib/bl/quarantine/<case-id>/<sha256>-<basename> with a manifest entry. bl case show --quarantine lists them; bl clean --unquarantine <entry> restores. Operator-rescue is one command away.

Capture before kill

bl clean proc <pid> captures /proc/<pid>/{cmdline,environ,exe,cwd,status,maps} and lsof -p <pid> to the case evidence before sending signal. --capture=off disables (operator must pass explicitly). Default is capture-on because the forensic value of a running process's /proc snapshot is often higher than whatever latency the capture adds.

Unknown: deny by default

If the agent proposes a step whose verb does not match any of the seven known namespaces, the Runner rejects the step in pre-validation. The operator can override with bl run <step-id> --unsafe --yes, but this is discouraged and surfaces a warning at the end of every shell invocation until the case closes.

This is the safety property the whole design rests on: the agent cannot emit arbitrary bash. It emits step records that map to named verbs with typed arguments. Even a fully compromised curator session cannot make the Runner run rm -rf /. The verb does not exist in the dispatch table.

Why tier authorship belongs to the agent

The agent has the hypothesis, the evidence, the curator's reasoning state. It knows whether a particular ModSec rule is a probe (low confidence; should be suggested) or a confirmed-pattern block (high confidence; can be auto once FP-gated). The Runner does not have that context.

What the Runner has: a verb table, a tier enforcement matrix, and a backup discipline. The Runner's job is to refuse to do anything that would surprise the operator, not to second-guess the curator's classification, but to bound it. A clean cron step always requires diff-confirm regardless of tier. A defend firewall always passes the CDN safe-list. The contract is: the agent classifies; the Runner bounds and enforces.

Failure modes the gate catches

Agent hallucinates a verb that doesn't exist. Pre-validation rejects unknown verbs.
Agent under-classifies a destructive step as auto. Verb class re-enforces; clean * always destructive regardless of agent-asserted tier.
Agent omits required fields. Schema validation rejects (destructive steps fail without diff or patch).
Apache configtest fails on a synthesized ModSec rule. Sandbox-side pre-flight catches before the Runner ever sees the rule.
YARA sig matches benign files in the FP corpus. FP gate trips, signature is rejected, ledger event defend_sig_rejected reason=fp_gate_trip is written.
Operator races a symlink between rename(2) and chown. The Runner applies chown/chmod/touch to the staged inode before the final mv -T rename: no chown-time TOCTOU window.