Example 1: Production outage
User: "the payments service went down for 14 minutes last night"
→ Postmortem workflow
→ Timeline: deploy at 23:47 → health check passed → traffic shift 23:49 → p99 latency spike 23:51 → auto-rollback 00:01
→ 5 Whys inside: Why did p99 spike? Cold cache. Why cold? New pod group. Why no warm? No warm-up in deploy script. Why? Not in checklist. Why? Template predates the caching layer.
→ Contributing factors: deploy template stale (latent); no warm-up step (active); no cache-cold canary (latent)
→ Remediation: update deploy template, add warm-up step, add cold-cache canary gate
Example 2: Recurring defect
User: "users keep reporting the same kind of auth failure, we've fixed it 3 times"
→ Fishbone workflow
→ 6 M's expansion: People (ops auth rotates keys without notifying infra), Method (no key-rotation runbook), Machine (secret cache TTL exceeds rotation window), Material (shared key instead of per-service), Measurement (no key-expiry dashboard), Mother-Nature (none)
→ Root causes (multiple): Method + Material + Measurement all contribute. Single-point fix won't hold.
Example 3: Subtle defect
User: "this flaky test only fails in CI, not locally"
→ KepnerTregoe workflow
→ IS/IS-NOT table: fails on CI / passes locally; fails Tuesdays / not other days; fails on shared runners / not dedicated; fails with parallel test workers / not serial
→ Distinctions point to: time-zone + concurrency + shared file system
→ Hypothesis: test relies on local timezone assumption + race condition on shared /tmp — both only triggered in CI's environment.