The Dashboard the Verification Needed

A research program that wants to be honest about what it has proven needs a way to publish those figures without a human typing them. On 2026-06-14 we shipped that pipeline for Monogate. It caught us over-counting on its first CI run.

This post is the arc of that day: what we built, how it broke us, what the breaking taught us, and what now lives in the open.

The starting point — a Khovanskii framework

The headline going into the day was a constructive zero-count bound for polynomial-in-(x, eˣ) expressions, shipped in MachLib under{” ”} MachLib.SingleExpKhovanskii. Three resolution paths (expPoly_khovanskii_bound, expPoly_auto_bound_with_propagation_aux, expPoly_ode_no_zeros), an AxiomAudit.lean reproducing the axiom dependency listing, and four worked Forge-kernel applications:

butler_volmer_zero_at_zero_overpotential (battery-management systems, fuel cells)
plasma_concentration_nonneg (TCI anaesthesia, IEC 62304 Class C)
discharge_voltage_signpreserving (defibrillators, IEC 62304 Class C)
spring_critical_signpreserving (gaming animation kernel)

Each “strengthens” the Forge-emitted @verify(lean, …) obligation on the corresponding .eml source from a True := by trivial placeholder (or a sorry) to a real proof in Lean. Proven modulo MachLib’s axiomatized analytic base — MachLib.Real.HasDerivAt_*, exp_pos, the Rolle zero-counting corollary, etc. — which is the honest framing. We are not claiming Mathlib-grounded foundations; we are saying the proofs check against a documented set of axioms, and grounding that set in Mathlib is open work.

The propagation gap nobody could see

Four strengthened contracts is a real artifact. But Monogate ships hundreds of @verify obligations across forge/ and eml-stdlib/. Knowing what fraction were strengthened, what fraction were still vacuous placeholders, what fraction were sorry stubs — that was a manual cross-walk every time. “Which obligations still need an Applications/ companion?” was a thirty-minute spelunk.

So the first real piece of infrastructure that day was{” ”} tools/forge_verify_audit/forge_verify_audit.py. It walks three sources and joins them:

.eml files in forge/ and eml-stdlib/, for the @verify(lean, theorem = "X") annotations.
MachLib/Discovered/*.lean — the Forge backend’s emitted theorem stubs.
MachLib/Applications/*.lean — the hand-authored strengthening proofs.

For each obligation it computes a status:

status	meaning
`strengthened`	An `Applications/` proof discharges it (matched by name or by an explicit `-- @strengthens X` marker).
`proven_in_place`	The Discovered/ stub already carries a concrete proof — no `sorry`, no vacuous `True`.
`placeholder`	The Discovered/ stub body is `True := by trivial`.
`open`	The Discovered/ stub contains `sorry`, or no stub exists at all.

First local run: 1102 obligations total. 4 strengthened.{” ”} 210 proven-in-place. 225 placeholders. 663 open. The framing that wrote itself was “19% discharged, 81% open.”

Where the post-mortem starts

The over-count was already baked in. I didn’t know.

I went on to refresh the monogate.net/evidence-status dashboard with what I thought were the live numbers — hand-typed, because that surface hadn’t yet been wired to CI. I quoted “MachLib discovered sorries: 269 (up from 222 due to Forge emission cadence)” in the changelog. The operator caught the framing problem in the audit’s headline — “210 / 1102 = 19% is the favorable fifth, not the story; the honest top-line is the 81% gap” — and that critique was right. But the critique was also based on numbers that came from me. Which were wrong.

The story would not have ended honestly if the next thing built hadn’t been the CI pipeline.

The CI pipeline

The operator’s specification was four-part:

CI-only writes. The data lives on a surface only the GitHub Actions runner touches. Force-push to an orphan branch called status-data whose contents are exactly one file: status.json. Humans don’t write there; the only history is the sequence of CI snapshots.
Reproducibility, not just write-prevention. The JSON carries the exact commands to recompute it at the recorded SHA. A stranger with the same commits checked out can re-derive every figure.
Loud staleness. The page must show the data’s SHA and timestamp, never the page-load time. If the fetch fails or required fields are missing, render “Verification status unavailable” — never silently fall back to a cached or hard-coded value.
The right top-line. Lead with the gap, not the win. Split proven into proven_from_mathlib (structurally 0 for MachLib, by design) and proven_mod_machlib_axioms (everything else), so a renderer cannot mis-report a single “proven” total.

We shipped exactly that. The status workflow lives at .github/workflows/status.yml in machlib. It runs on every push to master, runs lake build and lake env lean AxiomAudit.lean, runs the forge_verify_audit against fresh clones of forge and eml-stdlib, generates status.json via tools/status/generate_status.py, and force-pushes the single file to the orphan status-data branch.

The renderer is at{” ”} monogate.net/evidence-status. It’s plain client-side JavaScript that fetches from{” ”} raw.githubusercontent.com/agent-maestro/machlib/status-data/status.json{” ”} on every page load, validates the contract fields, and refuses to render a hard-coded fallback if anything is missing.

What the first CI run said

sha=71e63b0
sorries          : 198
@verify total    : 1088
strengthened     : 4
proven_mod_axioms: 36
placeholder      : 225
open             : 823
gap_pct          : 96.3%
discharged_pct   : 3.7%

Compare to the local figures I had been quoting all day:

field	local (my reports)	CI (public truth)
sorries	221	198
@verify total	1102	1088
proven_in_place	210	36
open	663	823
gap_pct	80.6%	96.3%
discharged_pct	19.4%	3.7%

The first CI run reported numbers that disagreed sharply with everything I had been quoting. Not by a unit. By a lot.

Sixty-two Discovered/ stubs existed on my local working tree that weren’t tracked in the public repo. foundations/MachLib/Discovered/ carried a .gitignore line of just * — auto-generated stubs from the local auto_prove.py workflow were intentionally held out of public master pending review. 174 of those stubs had concrete proofs that the audit was counting as “proven-in-place” — for me, running it locally. A fresh clone, with the gitignored stubs absent, saw 36.

Plus 32 forge .eml files in a build/ artifact directory I had sitting around that the audit was double-counting.

The CI did exactly what the operator’s specification said it would do. It made the public numbers public-truth on the very first run. The 19% framing was almost entirely the gitignored private state. 3.7% is what a stranger sees.

What we shipped after the finding

The right response was not to massage the framing back. The right response was to surface the finding, fix the audit so a local run matches a CI run, push the 60 reviewable stubs to master so the public/private gap closes honestly, and write a calibration note into machlib’s CHANGELOG so the next reader of that document understands why the numbers moved.

By end of day:

forge_verify_audit.py defaults to git ls-files-aware file enumeration; a local run now sees what a CI run sees.
60 of the 62 ungated Discovered/ stubs were reviewed and pushed. Two were broken (referenced an unbound floor function); we added axiom floor : Real → Real to MachLib.Forge and wired FLOOR through the Forge backend as a builtin NodeKind so future emits use Real.floor directly. The two stubs now compile.
CHANGELOG.md carries a calibration paragraph naming the over-count and the corrected public-verifiable figures.

The public dashboard now reads, post-cleanup: gap_pct 81.1% / discharged_pct 18.9% / 221 sorries / 4 strengthened / 202 proven_mod_axioms. The 4 strengthened contracts are real and publicly tracked. The 202 proven-in-place is now real (was 36 on the first CI run, because the gitignored stubs really did carry concrete proofs; pushing them moved that figure honestly). The remaining ~0.5 percentage-point difference from my old local view is the forge build/ artifact duplicates, which the gitignore filter now correctly excludes.

What we built on top

By late afternoon, the verification dashboard had been live for two hours and the temptation was to keep building on it. We did, on the 1op side, with three audiovisual exhibits that use the verified work:

MachLib Pulse — Every Forge @verify obligation rains into one of four buckets (open, placeholder, strengthened, proven-mod-axioms). The particle’s vertical position is the literal critically-damped spring kernel we proved, (1 + ω·t)·exp(−ω·t). The sign-preservation theorem guarantees no overshoot in the animation window. Live figures fetched from the same status.json on every page load.
Reverb Tail — The gaming reverb kernel exp(−α·t)·sin(ω·t + φ) from eml-stdlib/gaming/audio/reverb.eml. Pluck it, hear it. The envelope-bound proof is named honestly as queued; the underdamped sibling of the spring kernel sits outside the single-exp Khovanskii framework.
Khovanskii Counter — The length + Σ deg(pᵢ) bound, exposed against 5 presets that span the framework’s interesting regimes. The bound is finite but coarse; the exhibit shows by how much. (1 + x)·exp(−x): bound 2, actual 1. exp(x) − x: bound 3, actual 0. The slack is visible.
Verified Basin — The (ω, ζ) parameter plane of the damped harmonic oscillator, coloured by sign-preservation. The white line at ζ = 1 is the theorem. Above it: green (proven). Below it: red (the cos factor of the underdamped solution dips the response negative, and we explicitly left that branch open).

Each exhibit grounds itself in 1op’s in-tree EML parser (src/lib/eml/parser.ts) by surfacing the kernel’s positive / general / naive cost alongside the proven theorem name. The math is the experience.

Why this matters more than the figures

The story isn’t 4 strengthened contracts or 81% gap or any specific number. The story is that a research program with this kind of staffing — one human operator and a small fleet of AI models — needs verification infrastructure as a prerequisite, not as a finishing touch.

Without it: the operator is on the hook for claims they can’t personally verify. The framing drifts toward whatever sounds best. Wrong counts get quoted because no one notices they’re wrong. The critique that comes back (“the headline framing is wrong”) is on solid ground because the underlying data was solid, but neither the operator nor the model has a way to immediately check it.

With it: the published numbers come out of lake build on a fresh clone. A stranger can recompute every figure at the recorded SHA. A drift between local and public is caught the moment CI runs. The operator’s framing critique still holds — it’s a framing critique — but it lands on a substrate that can be re-derived from first principles, and the model can no longer give the operator wrong numbers without the dashboard immediately calling it out.

The dashboard was the audience this work didn’t have until today. Now it does.

What’s not done

MachLib’s analytic base is still axiomatized, not Mathlib-grounded. The MachLib.Real.HasDerivAt_* axioms are theorems in Mathlib; grounding them is open work.
The underdamped (ζ < 1) sign-preservation branch is open. Single- exp Khovanskii doesn’t reach it; either trig-Khovanskii or per-half-cycle splitting is the path.
Two Discovered/ stubs (crystal_lattice, neon_substrate) were fixable today via the floor axiom. There will be more. Each new Forge emit eventually finds a new unbound name; the pattern is to shore them up as they appear.
The Khovanskii Counter’s custom-expression slot plots and counts but doesn’t compute a bound; the AST → (length, deg) recogniser is follow-up work.
monogate-lean’s 470/5 figures are still byte-derived from 2026-06-10. Folding them into the status.json so the dashboard’s “Other Lean surfaces” section can retire its caveat is queued.

The point of writing this

We built a thing today that I am personally satisfied with — the verification pipeline — not because the figures it published are impressive (they aren’t, yet), but because the figures are now real. The next time someone asks “how much of Monogate is actually verified?”, the answer is a URL with a SHA and a timestamp and a reproduction command, not a confidence claim from a human or a model.

That’s the artifact worth shipping. The four Forge contracts are real but they’re a corollary; the infrastructure is the headline.

Monogate Research (2026). “The Dashboard the Verification Needed.” monogate research blog. https://monogate.org/blog/the-dashboard-the-verification-needed

React