Let's Connect →

10 80 10 Testing Protocol — AI Governance Testing Guide

10 80 10 testing protocol: Edmund Ng's journey spoke on governed AI, harness testing, and Vibe Coding for solo founders. Explore.

Published 2026-05-22Updated 2026-05-2315 min read

ai-architecturetestinggovernance

10 80 10 testing protocol — Edmund Ng AI architecture harness hero diagram (4:3 WebP)

10 80 10 testing protocol matters when you move from demo velocity to production scrutiny. This article is Edmund Ng's field notes on 10/80/10 AI testing, harness discipline, and the journey toward auditable AI—written for solo founders and system rule designers who cannot afford silent regressions.

Continue with these journey spokes.

Continue with these journey spokes.

Continue with these journey spokes.

Continue with these journey spokes.

Continue with these journey spokes.

Continue with these journey spokes.

Complete Vibe Coding Guide for Non-Programmers · Harness Engineering for Production AI · Building Auditable AI Systems

On this page

What — 10 80 10 testing protocol — AI testing protocol governance — 10/80/10 AI testing — the 10/80/10 protocol
Why — AI governance testing — reports without retest are theater
When — 10/80/10 AI testing — apply 10/80/10
Where — AI governance testing — connects Act 2 → Act 3
How — 10/80/10 AI testing — run one cycle
是什么 — extended 10/80/10 AI testing — AI governance testing
为什么 — extended AI governance testing — 10/80/10 AI testing

Key takeaways

10 80 10 testing protocol needs written rules—not hero prompts alone.
10/80/10 AI testing keeps demo speed from becoming production regret.
Harness discipline connects this spoke to the wider governed production journey.
Cross-link Phase docs, Harness retests, and written tradeoff logs before calling work done.

Takeaways above anchor the rest of this spoke.

What — 10 80 10 testing protocol — AI testing protocol governance — 10/80/10 AI testing — the 10/80/10 protocol

10/80/10 is Edmund Ng's development/QA harness for AI systems:

Phase	Share	Owner	Action
PRE	10%	Frontier	Execute once; freeze snapshot
PARALLEL	80%	Sub-agents	Analyze frozen artifact per lane
POST	10%	Frontier	Remediate and retest

Distinct from runtime orchestration (governed sequential execution in production) — both layers required; they do not contradict.

Prerequisite: Harness Engineering.

Structured exports and harness retests matter more than demo velocity when reviewers ask for evidence.

Structured exports and harness retests matter more than demo velocity when reviewers ask for evidence.

Governed exports and harness checkpoints prevent demo velocity from collapsing under review.

In the What layer of this Act 2 architecture and harness spoke, teams work from an operational contract—not a marketing label. Governed exports and harness checkpoints prevent demo velocity from collapsing under multi-axis review or compliance questions. A practical test for what is the 10/80/10 AI protocol: what is frozen before agents sweep, what gets logged at tradeoff time, and which Harness retest proves behavior instead of UI luck. Edmund Ng's field notes emphasize exportable rules and Decision Logs so six-month-later auditors can follow the chain—that is the same fast AND governed bridge Acts 1–3 teach.

In the What layer of this Act 2 architecture and harness spoke, teams work from an operational contract—not a marketing label. Governed exports and harness checkpoints prevent demo velocity from collapsing under multi-axis review or compliance questions. A practical test for what is the 10/80/10 AI protocol: what is frozen before agents sweep, what gets logged at tradeoff time, and which Harness retest proves behavior instead of UI luck. Edmund Ng's field notes emphasize exportable rules and Decision Logs so six-month-later auditors can follow the chain—that is the same fast AND governed bridge Acts 1–3 teach.

Why — AI governance testing — reports without retest are theater

Many teams stop at parallel findings PDFs. POST phase owns fix verification — same spirit as auditable systems: outcomes must survive challenge, not just generate slides.

Structured exports and harness retests matter more than demo velocity when reviewers ask for evidence.

Structured exports and harness retests matter more than demo velocity when reviewers ask for evidence.

Governed exports and harness checkpoints prevent demo velocity from collapsing under review.

In the Why layer of this Act 2 architecture and harness spoke, teams work from an operational contract—not a marketing label. Governed exports and harness checkpoints prevent demo velocity from collapsing under multi-axis review or compliance questions. A practical test for when should frontier freeze AI snapshots: what is frozen before agents sweep, what gets logged at tradeoff time, and which Harness retest proves behavior instead of UI luck. Edmund Ng's field notes emphasize exportable rules and Decision Logs so six-month-later auditors can follow the chain—that is the same fast AND governed bridge Acts 1–3 teach.

In the Why layer of this Act 2 architecture and harness spoke, teams work from an operational contract—not a marketing label. Governed exports and harness checkpoints prevent demo velocity from collapsing under multi-axis review or compliance questions. A practical test for when should frontier freeze AI snapshots: what is frozen before agents sweep, what gets logged at tradeoff time, and which Harness retest proves behavior instead of UI luck. Edmund Ng's field notes emphasize exportable rules and Decision Logs so six-month-later auditors can follow the chain—that is the same fast AND governed bridge Acts 1–3 teach.

When — 10/80/10 AI testing — apply 10/80/10

Release candidates for multi-agent features
After Constitution or Framework changes
Before claiming governance score improvements (abstract rubric)

Structured exports and harness retests matter more than demo velocity when reviewers ask for evidence.

Structured exports and harness retests matter more than demo velocity when reviewers ask for evidence.

Governed exports and harness checkpoints prevent demo velocity from collapsing under review.

In the When layer of this Act 2 architecture and harness spoke, teams work from an operational contract—not a marketing label. Governed exports and harness checkpoints prevent demo velocity from collapsing under multi-axis review or compliance questions. A practical test for how does 10/80/10 testing work for AI: what is frozen before agents sweep, what gets logged at tradeoff time, and which Harness retest proves behavior instead of UI luck. Edmund Ng's field notes emphasize exportable rules and Decision Logs so six-month-later auditors can follow the chain—that is the same fast AND governed bridge Acts 1–3 teach.

In the When layer of this Act 2 architecture and harness spoke, teams work from an operational contract—not a marketing label. Governed exports and harness checkpoints prevent demo velocity from collapsing under multi-axis review or compliance questions. A practical test for how does 10/80/10 testing work for AI: what is frozen before agents sweep, what gets logged at tradeoff time, and which Harness retest proves behavior instead of UI luck. Edmund Ng's field notes emphasize exportable rules and Decision Logs so six-month-later auditors can follow the chain—that is the same fast AND governed bridge Acts 1–3 teach.

Where — AI governance testing — connects Act 2 → Act 3

Implements parallel lanes detailed in Multi-Axis Review
Feeds evidence discipline in Auditable AI pillar
Complements Playwright smoke — route 200 ≠ agent correct

Structured exports and harness retests matter more than demo velocity when reviewers ask for evidence.

Structured exports and harness retests matter more than demo velocity when reviewers ask for evidence.

Governed exports and harness checkpoints prevent demo velocity from collapsing under review.

In the Where layer of this Act 2 architecture and harness spoke, teams work from an operational contract—not a marketing label. Governed exports and harness checkpoints prevent demo velocity from collapsing under multi-axis review or compliance questions. A practical test for what is the 10/80/10 AI protocol: what is frozen before agents sweep, what gets logged at tradeoff time, and which Harness retest proves behavior instead of UI luck. Edmund Ng's field notes emphasize exportable rules and Decision Logs so six-month-later auditors can follow the chain—that is the same fast AND governed bridge Acts 1–3 teach.

In the Where layer of this Act 2 architecture and harness spoke, teams work from an operational contract—not a marketing label. Governed exports and harness checkpoints prevent demo velocity from collapsing under multi-axis review or compliance questions. A practical test for what is the 10/80/10 AI protocol: what is frozen before agents sweep, what gets logged at tradeoff time, and which Harness retest proves behavior instead of UI luck. Edmund Ng's field notes emphasize exportable rules and Decision Logs so six-month-later auditors can follow the chain—that is the same fast AND governed bridge Acts 1–3 teach.

How — 10/80/10 AI testing — run one cycle

PRE: pick real scenario; run; save snapshot + metadata
PARALLEL: assign lanes (gap, error, contradiction, boundary, over-promise, quality)
POST: Frontier merges → patch plan → retest failed lanes only
Log: Decision Log entry if rules changed

Honest attribution: smoke tiers (e.g. 42 passed) ≠ long-run harness packs — cite scope when sharing results.

Structured exports and harness retests matter more than demo velocity when reviewers ask for evidence.

Structured exports and harness retests matter more than demo velocity when reviewers ask for evidence.

Governed exports and harness checkpoints prevent demo velocity from collapsing under review.

In the How layer of this Act 2 architecture and harness spoke, teams work from an operational contract—not a marketing label. Governed exports and harness checkpoints prevent demo velocity from collapsing under multi-axis review or compliance questions. A practical test for when should frontier freeze AI snapshots: what is frozen before agents sweep, what gets logged at tradeoff time, and which Harness retest proves behavior instead of UI luck. Edmund Ng's field notes emphasize exportable rules and Decision Logs so six-month-later auditors can follow the chain—that is the same fast AND governed bridge Acts 1–3 teach.

In the How layer of this Act 2 architecture and harness spoke, teams work from an operational contract—not a marketing label. Governed exports and harness checkpoints prevent demo velocity from collapsing under multi-axis review or compliance questions. A practical test for when should frontier freeze AI snapshots: what is frozen before agents sweep, what gets logged at tradeoff time, and which Harness retest proves behavior instead of UI luck. Edmund Ng's field notes emphasize exportable rules and Decision Logs so six-month-later auditors can follow the chain—that is the same fast AND governed bridge Acts 1–3 teach.

是什么 — extended 10/80/10 AI testing — AI governance testing

Governed builders treat written rules, frozen snapshots, and harness retests as production requirements—not optional polish after a green demo. Edmund Ng's journey from non-programmer Vibe Coding to auditable AI systems shows why structure beats model churn when stakeholders ask how you decided, what you rejected, and what evidence you can export tomorrow.

Governed builders treat written rules, frozen snapshots, and harness retests as production requirements—not optional polish after a green demo. Edmund Ng's journey from non-programmer Vibe Coding to auditable AI systems shows why structure beats model churn when stakeholders ask how you decided, what you rejected, and what evidence you can export tomorrow.

Structured exports and harness retests matter more than demo velocity when reviewers ask for evidence.

Governed exports and harness checkpoints prevent demo velocity from collapsing under review.

In the 是什么 layer of this Act 2 architecture and harness spoke, teams work from an operational contract—not a marketing label. Governed exports and harness checkpoints prevent demo velocity from collapsing under multi-axis review or compliance questions. A practical test for how does 10/80/10 testing work for AI: what is frozen before agents sweep, what gets logged at tradeoff time, and which Harness retest proves behavior instead of UI luck. Edmund Ng's field notes emphasize exportable rules and Decision Logs so six-month-later auditors can follow the chain—that is the same fast AND governed bridge Acts 1–3 teach.

为什么 — extended AI governance testing — 10/80/10 AI testing

Governed builders treat written rules, frozen snapshots, and harness retests as production requirements—not optional polish after a green demo. Edmund Ng's journey from non-programmer Vibe Coding to auditable AI systems shows why structure beats model churn when stakeholders ask how you decided, what you rejected, and what evidence you can export tomorrow.

Governed builders treat written rules, frozen snapshots, and harness retests as production requirements—not optional polish after a green demo. Edmund Ng's journey from non-programmer Vibe Coding to auditable AI systems shows why structure beats model churn when stakeholders ask how you decided, what you rejected, and what evidence you can export tomorrow.

Structured exports and harness retests matter more than demo velocity when reviewers ask for evidence.

Governed exports and harness checkpoints prevent demo velocity from collapsing under review.

In the 为什么 layer of this Act 2 architecture and harness spoke, teams work from an operational contract—not a marketing label. Governed exports and harness checkpoints prevent demo velocity from collapsing under multi-axis review or compliance questions. A practical test for what is the 10/80/10 AI protocol: what is frozen before agents sweep, what gets logged at tradeoff time, and which Harness retest proves behavior instead of UI luck. Edmund Ng's field notes emphasize exportable rules and Decision Logs so six-month-later auditors can follow the chain—that is the same fast AND governed bridge Acts 1–3 teach.

Summary

10 80 10 testing protocol on Edmund Ng's journey means shipping with 10/80/10 AI testing, harness retests, and evidence-friendly decisions—not one-off prompts. Models change; written rules, exportable snapshots, and governance patterns endure.

Governed builders treat written rules, frozen snapshots, and harness retests as production requirements—not optional polish after a green demo. The journey from non-programmer Vibe Coding to auditable AI shows why structure beats model churn when stakeholders ask how you decided, what you rejected, and what evidence you can export tomorrow.

How does 10/80/10 testing work for AI

Edmund Ng treats each long-tail question as a production gate: freeze the spec, log the tradeoff, and prove behavior with Harness retests—not demo clicks alone.

Governed builders treat written rules, frozen snapshots, and harness retests as production requirements—not optional polish after a green demo. The journey from non-programmer Vibe Coding to auditable AI shows why structure beats model churn when stakeholders ask how you decided, what you rejected, and what evidence you can export tomorrow.

What is the 10/80/10 AI protocol

Edmund Ng treats each long-tail question as a production gate: freeze the spec, log the tradeoff, and prove behavior with Harness retests—not demo clicks alone.

Solo founders in Malaysia and APAC often face professional scrutiny early. Externalizing Phase documents, Decision Logs, and smoke tiers before the demo invitation arrives is cheaper than rebuilding trust after a silent regression reaches a customer walkthrough.

When should frontier freeze AI snapshots

Edmund Ng treats each long-tail question as a production gate: freeze the spec, log the tradeoff, and prove behavior with Harness retests—not demo clicks alone.

Role separation matters: builder models may sweep diffs, but frontier models should audit frozen snapshots. Mixing those hats in one chat thread is how teams lose reproducibility and inherit context debt that no IDE upgrade fixes.

FAQ

What is 10 80 10 testing protocol?

Edmund Ng answers with structure first: freeze specs, separate builder and frontier roles, and prove behavior with Harness—not demo clicks. Written rules, Phase documents, and Decision Logs let teams explain tradeoffs months later without reconstructing chat history.

Governed builders treat written rules, frozen snapshots, and harness retests as production requirements—not optional polish after a green demo. The journey from non-programmer Vibe Coding to auditable AI shows why structure beats model churn when stakeholders ask how you decided, what you rejected, and what evidence you can export tomorrow.

How does 10/80/10 testing work for AI?

Edmund Ng answers with structure first: freeze specs, separate builder and frontier roles, and prove behavior with Harness—not demo clicks. Written rules, Phase documents, and Decision Logs let teams explain tradeoffs months later without reconstructing chat history.

Solo founders in Malaysia and APAC often face professional scrutiny early. Externalizing Phase documents, Decision Logs, and smoke tiers before the demo invitation arrives is cheaper than rebuilding trust after a silent regression reaches a customer walkthrough.

What is the 10/80/10 AI protocol?

Edmund Ng answers with structure first: freeze specs, separate builder and frontier roles, and prove behavior with Harness—not demo clicks. Written rules, Phase documents, and Decision Logs let teams explain tradeoffs months later without reconstructing chat history.

Role separation matters: builder models may sweep diffs, but frontier models should audit frozen snapshots. Mixing those hats in one chat thread is how teams lose reproducibility and inherit context debt that no IDE upgrade fixes.

When should frontier freeze AI snapshots?

Edmund Ng answers with structure first: freeze specs, separate builder and frontier roles, and prove behavior with Harness—not demo clicks. Written rules, Phase documents, and Decision Logs let teams explain tradeoffs months later without reconstructing chat history.

Governed builders treat written rules, frozen snapshots, and harness retests as production requirements—not optional polish after a green demo. The journey from non-programmer Vibe Coding to auditable AI shows why structure beats model churn when stakeholders ask how you decided, what you rejected, and what evidence you can export tomorrow.

Why does 10/80/10 AI testing matter for solo founders?

Edmund Ng answers with structure first: freeze specs, separate builder and frontier roles, and prove behavior with Harness—not demo clicks. Written rules, Phase documents, and Decision Logs let teams explain tradeoffs months later without reconstructing chat history.

Solo founders in Malaysia and APAC often face professional scrutiny early. Externalizing Phase documents, Decision Logs, and smoke tiers before the demo invitation arrives is cheaper than rebuilding trust after a silent regression reaches a customer walkthrough.

When should teams freeze specs before agent sweeps?

Edmund Ng answers with structure first: freeze specs, separate builder and frontier roles, and prove behavior with Harness—not demo clicks. Written rules, Phase documents, and Decision Logs let teams explain tradeoffs months later without reconstructing chat history.

Role separation matters: builder models may sweep diffs, but frontier models should audit frozen snapshots. Mixing those hats in one chat thread is how teams lose reproducibility and inherit context debt that no IDE upgrade fixes.

About the author

Edmund Ng — AI systems architect portrait

Edmund Ng — Malaysia-based solo founder, AI systems architect, and system rule designer. He ships governed AI with Vibe Coding, harness engineering, and auditable evidence chains. About · Projects · LinkedIn.

Related posts