Development Standard

BitterPillEngineering

Audits any AI instruction set for over-prompting using the core test — would a smarter model make this rule unnecessary?

Workflow

References

Triggers

medium

Effort

The Problem

AI instruction sets accumulate weight over time. Every bad output gets a new rule. Every edge case gets a new reminder. Every misfire gets a new constraint. Six months in, your CLAUDE.md is thousands of words of scaffolding that competes for context window, contradicts itself, and restates things the model already does by default. The instructions meant to sharpen the AI are now dulling it — because every unnecessary rule crowds out the ones that actually matter.

How This Skill Approaches It

Apply one core test to every rule: would a smarter model make this unnecessary? If yes, it's scaffolding you're maintaining for no reason. The Five Questions classify each rule — does Claude already do this? Does it contradict another rule? Is it a one-off fix for a past mistake? Is it too vague to act on consistently? Then each rule gets a disposition: CUT if it restates default behavior, RESOLVE if it contradicts another rule, MERGE if it's duplicated elsewhere, EVALUATE if it was a one-off patch, SHARPEN or CUT if it's too vague, MOVE if it's rarely needed and could load on demand. Two workflows: Audit runs the full system reading all force-loaded files from settings.json, evaluating every rule, and returning categorized findings with estimated token savings. QuickCheck targets a single file for a fast keep/cut/sharpen verdict.

Applies Five Questions to every rule (Claude already does this? Contradiction? Redundant? One-off fix? Vague?) then classifies as CUT/RESOLVE/MERGE/EVALUATE/SHARPEN/MOVE/KEEP
Workflows: Audit (full system, token savings), QuickCheck (single file)
Principle: less scaffolding = better output

Not for attacking logical flaws in ideas (use RedTeam)

In Action

What you say to your DA, and what the BitterPillEngineering skill actually does.

You say "run BPE on my setup"

Runs Audit: reads all force-loaded files from settings.json, evaluates every rule against the Five Questions, and returns a categorized report sorted by CUT/RESOLVE/MERGE/EVALUATE/SHARPEN/MOVE/KEEP with estimated token savings.
You say "quick check this claude.md for dead weight"

Runs QuickCheck on the target file: reads it, applies the Five Questions to each rule, returns a concise verdict for each with a keep/cut/sharpen call.

Inside the Skill

The thinking, frameworks, and architecture that distinguish this skill from a generic version of the same task.

What It Does

Audits any AI instruction set for over-prompting. It runs every rule through Five Questions — does Claude already do this, does it contradict another rule, is it redundant, was it a one-off fix, is it vague — then classifies each as CUT, RESOLVE, MERGE, EVALUATE, SHARPEN, MOVE, or KEEP, with an estimate of the tokens you'd save. Two workflows: Audit (full system) and QuickCheck (single file).

The Problem

Instruction sets accumulate. Every time the model does something wrong, someone adds a rule, and over months the file fills with instructions that restate default behavior, contradict each other, or fixed one bad output that never recurred. The cost is hidden: every unnecessary rule competes for attention and degrades the rules that actually matter, so a bloated setup produces worse output than a lean one. The hard part is telling load-bearing rules from dead weight — which is what this audit does, rule by rule.

How It Works

Built on the principle that less scaffolding = better output. The core test for every rule: "Would a smarter model make this unnecessary?" If yes, it's scaffolding, not architecture, and it's a candidate to cut. The Five Questions and the classification table below drive the verdict for each rule.

Examples

Example 1: Full system audit

User: "Run BPE on my setup"
→ Invokes Audit workflow
→ Reads all force-loaded files from settings.json
→ Evaluates each rule against the Five Questions
→ Returns categorized report with estimated token savings

Example 2: Check a single file

User: "Quick check this CLAUDE.md"
→ Invokes QuickCheck workflow
→ Reads the target file
→ Returns concise keep/cut/sharpen verdict

Example 3: Post-cleanup validation

User: "I trimmed my rules, check if anything's still redundant"
→ Invokes Audit workflow
→ Compares remaining rules against Claude defaults
→ Flags any surviving dead weight

Gotchas

Claude's built-in system prompt changes across versions — what was "default behavior" 3 months ago may not be now. When in doubt, test rather than assume.
Rules that seem redundant with defaults may have been added because Claude was inconsistent about following the default. Check failure history before cutting.
"One-off fix" rules sometimes prevent recurring failures. Check if the failure pattern is truly gone before removing.
The loadAtStartup list in settings.json and postCompactRestore.fullFiles must stay in sync — if you remove a file from one, check the other.

The Five Questions

For every rule, instruction, or preference found, evaluate:

Default behavior? Does Claude already do this without being told?
Contradiction? Does this conflict with another rule in the same or different file?
Redundancy? Is this already covered by a different rule or file?
One-off fix? Was this added to fix one specific bad output rather than improve outputs generally?
Vague? Would Claude interpret this differently every time? (e.g., "be more natural", numeric personality scales)

Classification

Category	Action
Restates default behavior	CUT — the model already does this
Contradicts another rule	RESOLVE — pick one, cut the other
Duplicates another rule	MERGE — one location, one statement
One-off fix for past mistake	EVALUATE — still relevant or already learned?
Vague / unquantifiable	SHARPEN — add specific DO/DON'T examples, or cut
Loaded but rarely actionable	MOVE to on-demand — load via CONTEXT_ROUTING when needed
Specific, actionable, non-default	KEEP — this is what good instructions look like

Anti-Fragile vs Fragile

Keep (anti-fragile): Verification harnesses, ISC, data pipelines, specific DO/DON'T examples, tool preferences, routing rules.

Cut (fragile): CoT orchestrators, format parsers, retry cascades, numeric personality scales, abstract value statements, process descriptions that aren't followed.

BitterPillEngineering Audit

Scope: [what was audited] Files read: [count] Rules evaluated: [count]

CUT (restating defaults)

[rule] — [reason]

RESOLVE (contradictions)

[rule A] vs [rule B] — [which to keep and why]

MERGE (redundancies)

[locations] — [merge into where]

EVALUATE (one-off fixes)

[rule] — [still needed? verdict]

SHARPEN or CUT (vague)

[rule] — [sharpen how, or cut why]

MOVE to on-demand

[content] — [how often it's actually needed]

KEEP (carrying weight)

[rule] — [why it matters]

Estimated savings: [lines] lines, ~[tokens] tokens ```

Workflows · 1

01

Audit Workflows/Audit.md

audit setup, full audit, check all rules

How to Invoke

Say any of these to your DA and PAI activates the BitterPillEngineering skill automatically:

"BPE"
"bitter pill"
"audit setup"
"over-prompting"
"trim instructions"
"dead weight"
"simplify setup"
"clean up CLAUDE.md"

Or invoke explicitly:

Skill("BitterPillEngineering")

Related Skills

References & Credits

The thinkers, books, frameworks, and research this skill is built on. The ideas belong to them — the integration belongs to PAI.

Source

The Bitter Lesson Rich Sutton — general methods that scale with computation beat hand-crafted scaffolding.

Want PAI to do this for you?

Install PAI on your machine — your DA gets the BitterPillEngineering skill plus 44 others, all hooked into one Life OS.

Install PAI View on GitHub