Development Deep

Science

The scientific method as a universal problem-solving algorithm — goal-first, hypothesis-plural, falsifiable experiments, honest measurement.

09
Workflows
04
References
12
Triggers
high
Effort

The Problem

Ask a generic AI to 'figure out why this is broken' and it guesses. One hypothesis, no experiment design, no measurement criteria — just pattern-matching toward a plausible answer. When the first guess is wrong, it guesses again with slight variation. There's no feedback loop, no falsification, no honest accounting of what the data actually showed. You get confident-sounding reasoning that collapses the moment you run it against reality.

How This Skill Approaches It

The skill embeds the scientific method as a literal execution loop: DefineGoal first (success criteria before any action), then GenerateHypotheses with a hard floor of three candidates (single-hypothesis work is confirmation bias by construction), DesignExperiment to make tests falsifiable, MeasureResults to collect only goal-relevant data, AnalyzeResults against the original success criteria, then Iterate back to the hypothesis phase. QuickDiagnosis applies this cycle in 15-minute bursts for fast debugging. StructuredInvestigation runs the full multi-factor version for complex problems. The skill integrates Council for hypothesis generation, Evals for structured measurement, and RedTeam for stress-testing conclusions before you commit to them.

  • Diagnostic shortcuts: QuickDiagnosis (15-min rule), StructuredInvestigation (multi-factor)
  • Scales across micro (TDD), meso (feature validation), macro (MVP launch)
  • Integrates Council, Evals, Development, RedTeam
Not for multi-angle lens passes (use IterativeDepth)

In Action

What you say to your DA, and what the Science skill actually does.

  • You say "figure out why the surface time filters are showing stale items"
    Runs QuickDiagnosis: generates at least three hypotheses (timestamp format mismatch, cache invalidation gap, query boundary condition), designs a minimal test for each, runs them, compares results to the success criterion, and surfaces the structural cause — not just the first thing that looked suspicious.
  • You say "experiment with different prompt structures to get better output quality"
    Runs FullCycle: defines a measurable quality criterion, generates 3+ prompt variants as competing hypotheses, designs controlled experiments with the Evals skill as the measurement layer, analyzes which variant actually moved the metric, and iterates until the winner is confirmed or a new hypothesis enters the pool.
  • You say "how do we test whether adding a rate limit will actually reduce abuse"
    Runs DesignExperiment: frames the question as a falsifiable claim, identifies what data would confirm or refute it, defines the minimum viable test, and flags second-order effects (legitimate traffic retry load) that the hypothesis needs to account for before the experiment runs.

Inside the Skill

The thinking, frameworks, and architecture that distinguish this skill from a generic version of the same task.

What It Does

Applies the scientific method as a general problem-solving algorithm: define the goal first, generate multiple hypotheses, design experiments that can fail, measure honestly, analyze against the goal, iterate. Seven core workflows plus two diagnostic shortcuts (quick 15-minute debugging and structured multi-factor investigation). It scales from micro (TDD) to meso (feature validation) to macro (MVP launch).

The Problem

Most problem-solving is guessing dressed up as work. You pick the first idea that comes to mind, change something, and call it done when it "seems better" — which is confirmation bias, not progress. Without a clear definition of success you can't tell whether a change helped, so you keep tweaking forever or stop too early. Single-hypothesis thinking means you only ever test the idea you already believed. This skill forces the discipline that fixes all of that: a stated goal, at least three competing hypotheses, falsifiable tests, and measurement that compares to the goal rather than to your hopes.

How It Works

The whole thing is one repeating cycle, and the goal anchors it — without clear success criteria you cannot judge results:

GOAL -----> What does success look like?
   |
OBSERVE --> What is the current state?
   |
HYPOTHESIZE -> What might work? (Generate MULTIPLE)
   |
EXPERIMENT -> Design and run the test
   |
MEASURE --> What happened? (Data collection)
   |
ANALYZE --> How does it compare to the goal?
   |
ITERATE --> Adjust hypothesis and repeat
   |
   +------> Back to HYPOTHESIZE

The answer emerges from the cycle, not from guessing.


Resource Index

Resource Description
METHODOLOGY.md Deep dive into each phase
Protocol.md How skills implement Science
Templates.md Goal, Hypothesis, Experiment, Results templates
Examples.md Worked examples across scales

Domain Applications

Domain Manifestation Related Skill
Coding TDD (Red-Green-Refactor) Development
Products MVP -> Measure -> Iterate Development
Research Question -> Study -> Analyze Research
Prompts Prompt -> Eval -> Iterate Evals
Decisions Options -> Council -> Choose Council

Scale of Application

Level Cycle Time Example
Micro Minutes TDD: test, code, refactor
Meso Hours-Days Feature: spec, implement, validate
Macro Weeks-Months Product: MVP, launch, measure PMF

Integration Points

Phase Skills to Invoke
Goal Council for validation
Observe Research for context
Hypothesize Council for ideas, RedTeam for stress-test
Experiment Development (Worktrees) for parallel tests
Measure Evals for structured measurement
Analyze Council for multi-perspective analysis

Key Principles (Quick Reference)

  1. Goal-First - Define success before starting
  2. Hypothesis Plurality - NEVER just one idea (minimum 3)
  3. Minimum Viable Experiments - Smallest test that teaches
  4. Falsifiability - Experiments must be able to fail
  5. Measure What Matters - Only goal-relevant data
  6. Honest Analysis - Compare to goal, not expectations
  7. Rapid Iteration - Cycle speed > perfect experiments

Anti-Patterns

Bad Good
"Make it better" "Reduce load time from 3s to 1s"
"I think X will work" "Here are 3 approaches: X, Y, Z"
"Prove I'm right" "Design test that could disprove"
"Pretend failure didn't happen" "What did we learn?"
"Keep experimenting forever" "Ship and learn from production"

Quick Start

  1. Goal - What does success look like?
  2. Observe - What do we know?
  3. Hypothesize - At least 3 ideas
  4. Experiment - Minimum viable tests
  5. Measure - Collect goal-relevant data
  6. Analyze - Compare to success criteria
  7. Iterate - Adjust and repeat

The answer emerges from the cycle, not from guessing.

Gotchas

  • Hypothesis-test-analyze is the core loop. Don't skip the hypothesis step — going straight to testing is just trial-and-error, not science.
  • Minimum 3 hypotheses before testing. Single-hypothesis testing is confirmation bias.
  • Measurements must be specific and reproducible. "It seems better" is not a measurement.
  • Full cycle is for systematic investigation. For quick debugging, use quick diagnosis mode.

Examples

Example 1: Quick diagnosis

User: "figure out why Surface time filters show stale items"
→ Quick diagnosis mode
→ Hypothesis: timestamp format mismatch in D1
→ Test: query D1 for actual stored format
→ Analyze: compare stored vs expected format
→ Result: ISO string vs Unix timestamp mismatch

Example 2: Full systematic investigation

User: "experiment with different prompt structures for better output"
→ Full cycle mode
→ 3+ hypotheses generated
→ Controlled experiments with measurements
→ Analysis identifies winning approach
→ Iterates until convergence

Workflows · 9

  1. 01
    `Workflows/DefineGoal.md` Workflows/`Workflows/DefineGoal.md`.md

    define the goal, what are we trying to achieve

  2. 02
    `Workflows/GenerateHypotheses.md` Workflows/`Workflows/GenerateHypotheses.md`.md

    what might work, ideas, hypotheses

  3. 03
    `Workflows/DesignExperiment.md` Workflows/`Workflows/DesignExperiment.md`.md

    how do we test, experiment design

  4. 04
    `Workflows/MeasureResults.md` Workflows/`Workflows/MeasureResults.md`.md

    what happened, measure, results

  5. 05
    `Workflows/AnalyzeResults.md` Workflows/`Workflows/AnalyzeResults.md`.md

    analyze, compare to goal

  6. 06
    `Workflows/Iterate.md` Workflows/`Workflows/Iterate.md`.md

    iterate, try again, next cycle

  7. 07
    `Workflows/FullCycle.md` Workflows/`Workflows/FullCycle.md`.md

    Full structured cycle

  8. 08
    `Workflows/QuickDiagnosis.md` Workflows/`Workflows/QuickDiagnosis.md`.md

    Quick debugging (15-min rule)

  9. 09
    `Workflows/StructuredInvestigation.md` Workflows/`Workflows/StructuredInvestigation.md`.md

    Complex investigation

How to Invoke

Say any of these to your DA and PAI activates the Science skill automatically:

  • "think about"
  • "figure out"
  • "experiment"
  • "iterate"
  • "optimize"
  • "hypothesis"
  • "science"
  • "full cycle"
  • "quick diagnosis"
  • "structured investigation"
  • "how do we test"
  • "analyze results"

Or invoke explicitly:

Skill("Science")

References · 4

Auxiliary files the skill loads at runtime — frameworks, guides, configs.

  • Examples
  • METHODOLOGY
  • Protocol
  • Templates

References & Credits

The thinkers, books, frameworks, and research this skill is built on. The ideas belong to them — the integration belongs to PAI.

Want PAI to do this for you?

Install PAI on your machine — your DA gets the Science skill plus 44 others, all hooked into one Life OS.