Development Deep

Science

The scientific method as a universal problem-solving algorithm — goal-first, hypothesis-plural, falsifiable experiments, honest measurement.

Workflows

References

Triggers

high

Effort

The Problem

Ask a generic AI to 'figure out why this is broken' and it guesses. One hypothesis, no experiment design, no measurement criteria — just pattern-matching toward a plausible answer. When the first guess is wrong, it guesses again with slight variation. There's no feedback loop, no falsification, no honest accounting of what the data actually showed. You get confident-sounding reasoning that collapses the moment you run it against reality.

How This Skill Approaches It

The skill embeds the scientific method as a literal execution loop: DefineGoal first (success criteria before any action), then GenerateHypotheses with a hard floor of three candidates (single-hypothesis work is confirmation bias by construction), DesignExperiment to make tests falsifiable, MeasureResults to collect only goal-relevant data, AnalyzeResults against the original success criteria, then Iterate back to the hypothesis phase. QuickDiagnosis applies this cycle in 15-minute bursts for fast debugging. StructuredInvestigation runs the full multi-factor version for complex problems. The skill integrates Council for hypothesis generation, Evals for structured measurement, and RedTeam for stress-testing conclusions before you commit to them.

Diagnostic shortcuts: QuickDiagnosis (15-min rule), StructuredInvestigation (multi-factor)
Scales across micro (TDD), meso (feature validation), macro (MVP launch)
Integrates Council, Evals, Development, RedTeam

Not for multi-angle lens passes (use IterativeDepth)

In Action

What you say to your DA, and what the Science skill actually does.

You say "figure out why the surface time filters are showing stale items"

Runs QuickDiagnosis: generates at least three hypotheses (timestamp format mismatch, cache invalidation gap, query boundary condition), designs a minimal test for each, runs them, compares results to the success criterion, and surfaces the structural cause — not just the first thing that looked suspicious.
You say "experiment with different prompt structures to get better output quality"

Runs FullCycle: defines a measurable quality criterion, generates 3+ prompt variants as competing hypotheses, designs controlled experiments with the Evals skill as the measurement layer, analyzes which variant actually moved the metric, and iterates until the winner is confirmed or a new hypothesis enters the pool.
You say "how do we test whether adding a rate limit will actually reduce abuse"

Runs DesignExperiment: frames the question as a falsifiable claim, identifies what data would confirm or refute it, defines the minimum viable test, and flags second-order effects (legitimate traffic retry load) that the hypothesis needs to account for before the experiment runs.

Inside the Skill

The thinking, frameworks, and architecture that distinguish this skill from a generic version of the same task.

What It Does

Applies the scientific method as a general problem-solving algorithm: define the goal first, generate multiple hypotheses, design experiments that can fail, measure honestly, analyze against the goal, iterate. Seven core workflows plus two diagnostic shortcuts (quick 15-minute debugging and structured multi-factor investigation). It scales from micro (TDD) to meso (feature validation) to macro (MVP launch).

The Problem

Most problem-solving is guessing dressed up as work. You pick the first idea that comes to mind, change something, and call it done when it "seems better" — which is confirmation bias, not progress. Without a clear definition of success you can't tell whether a change helped, so you keep tweaking forever or stop too early. Single-hypothesis thinking means you only ever test the idea you already believed. This skill forces the discipline that fixes all of that: a stated goal, at least three competing hypotheses, falsifiable tests, and measurement that compares to the goal rather than to your hopes.

How It Works

The whole thing is one repeating cycle, and the goal anchors it — without clear success criteria you cannot judge results:

GOAL -----> What does success look like?
   |
OBSERVE --> What is the current state?
   |
HYPOTHESIZE -> What might work? (Generate MULTIPLE)
   |
EXPERIMENT -> Design and run the test
   |
MEASURE --> What happened? (Data collection)
   |
ANALYZE --> How does it compare to the goal?
   |
ITERATE --> Adjust hypothesis and repeat
   |
   +------> Back to HYPOTHESIZE

The answer emerges from the cycle, not from guessing.

Resource Index

Resource	Description
`METHODOLOGY.md`	Deep dive into each phase
`Protocol.md`	How skills implement Science
`Templates.md`	Goal, Hypothesis, Experiment, Results templates
`Examples.md`	Worked examples across scales

Domain Applications

Domain	Manifestation	Related Skill
Coding	TDD (Red-Green-Refactor)	Development
Products	MVP -> Measure -> Iterate	Development
Research	Question -> Study -> Analyze	Research
Prompts	Prompt -> Eval -> Iterate	Evals
Decisions	Options -> Council -> Choose	Council

Scale of Application

Level	Cycle Time	Example
Micro	Minutes	TDD: test, code, refactor
Meso	Hours-Days	Feature: spec, implement, validate
Macro	Weeks-Months	Product: MVP, launch, measure PMF

Integration Points

Phase	Skills to Invoke
Goal	Council for validation
Observe	Research for context
Hypothesize	Council for ideas, RedTeam for stress-test
Experiment	Development (Worktrees) for parallel tests
Measure	Evals for structured measurement
Analyze	Council for multi-perspective analysis

Key Principles (Quick Reference)

Goal-First - Define success before starting
Hypothesis Plurality - NEVER just one idea (minimum 3)
Minimum Viable Experiments - Smallest test that teaches
Falsifiability - Experiments must be able to fail
Measure What Matters - Only goal-relevant data
Honest Analysis - Compare to goal, not expectations
Rapid Iteration - Cycle speed > perfect experiments

Anti-Patterns

Bad	Good
"Make it better"	"Reduce load time from 3s to 1s"
"I think X will work"	"Here are 3 approaches: X, Y, Z"
"Prove I'm right"	"Design test that could disprove"
"Pretend failure didn't happen"	"What did we learn?"
"Keep experimenting forever"	"Ship and learn from production"

Quick Start

Goal - What does success look like?
Observe - What do we know?
Hypothesize - At least 3 ideas
Experiment - Minimum viable tests
Measure - Collect goal-relevant data
Analyze - Compare to success criteria
Iterate - Adjust and repeat

The answer emerges from the cycle, not from guessing.

Gotchas

Hypothesis-test-analyze is the core loop. Don't skip the hypothesis step — going straight to testing is just trial-and-error, not science.
Minimum 3 hypotheses before testing. Single-hypothesis testing is confirmation bias.
Measurements must be specific and reproducible. "It seems better" is not a measurement.
Full cycle is for systematic investigation. For quick debugging, use quick diagnosis mode.

Examples

Example 1: Quick diagnosis

User: "figure out why Surface time filters show stale items"
→ Quick diagnosis mode
→ Hypothesis: timestamp format mismatch in D1
→ Test: query D1 for actual stored format
→ Analyze: compare stored vs expected format
→ Result: ISO string vs Unix timestamp mismatch

Example 2: Full systematic investigation

User: "experiment with different prompt structures for better output"
→ Full cycle mode
→ 3+ hypotheses generated
→ Controlled experiments with measurements
→ Analysis identifies winning approach
→ Iterates until convergence

Workflows · 9

01

`Workflows/DefineGoal.md` Workflows/`Workflows/DefineGoal.md`.md

define the goal, what are we trying to achieve
02

`Workflows/GenerateHypotheses.md` Workflows/`Workflows/GenerateHypotheses.md`.md

what might work, ideas, hypotheses
03

`Workflows/DesignExperiment.md` Workflows/`Workflows/DesignExperiment.md`.md

how do we test, experiment design
04

`Workflows/MeasureResults.md` Workflows/`Workflows/MeasureResults.md`.md

what happened, measure, results
05

`Workflows/AnalyzeResults.md` Workflows/`Workflows/AnalyzeResults.md`.md

analyze, compare to goal
06

`Workflows/Iterate.md` Workflows/`Workflows/Iterate.md`.md

iterate, try again, next cycle
07

`Workflows/FullCycle.md` Workflows/`Workflows/FullCycle.md`.md

Full structured cycle
08

`Workflows/QuickDiagnosis.md` Workflows/`Workflows/QuickDiagnosis.md`.md

Quick debugging (15-min rule)
09

`Workflows/StructuredInvestigation.md` Workflows/`Workflows/StructuredInvestigation.md`.md

Complex investigation

How to Invoke

Say any of these to your DA and PAI activates the Science skill automatically:

"think about"
"figure out"
"experiment"
"iterate"
"optimize"
"hypothesis"
"science"
"full cycle"
"quick diagnosis"
"structured investigation"
"how do we test"
"analyze results"

Or invoke explicitly:

Skill("Science")

References · 4

Auxiliary files the skill loads at runtime — frameworks, guides, configs.

Examples
METHODOLOGY
Protocol
Templates

Related Skills

References & Credits

The thinkers, books, frameworks, and research this skill is built on. The ideas belong to them — the integration belongs to PAI.

Authors

Want PAI to do this for you?

Install PAI on your machine — your DA gets the Science skill plus 44 others, all hooked into one Life OS.

Install PAI View on GitHub