The Dashboard the Verification Needed
A research program that wants to be honest about what it has proven needs a way to publish those figures without a human typing them. On 2026-06-14 we shipped that pipeline for Monogate. It caught us over-counting on its first CI run.
This post is the arc of that day: what we built, how it broke us, what the breaking taught us, and what now lives in the open.
The starting point — a Khovanskii framework
The headline going into the day was a constructive zero-count bound
for polynomial-in-(x, eˣ) expressions, shipped in MachLib under{” ”}
MachLib.SingleExpKhovanskii. Three resolution paths
(expPoly_khovanskii_bound, expPoly_auto_bound_with_propagation_aux,
expPoly_ode_no_zeros), an AxiomAudit.lean reproducing the axiom
dependency listing, and four worked Forge-kernel applications:
butler_volmer_zero_at_zero_overpotential(battery-management systems, fuel cells)plasma_concentration_nonneg(TCI anaesthesia, IEC 62304 Class C)discharge_voltage_signpreserving(defibrillators, IEC 62304 Class C)spring_critical_signpreserving(gaming animation kernel)
Each “strengthens” the Forge-emitted @verify(lean, …) obligation
on the corresponding .eml source from a True := by trivial
placeholder (or a sorry) to a real proof in Lean. Proven modulo
MachLib’s axiomatized analytic base — MachLib.Real.HasDerivAt_*,
exp_pos, the Rolle zero-counting corollary, etc. — which is the
honest framing. We are not claiming Mathlib-grounded foundations;
we are saying the proofs check against a documented set of axioms,
and grounding that set in Mathlib is open work.
The propagation gap nobody could see
Four strengthened contracts is a real artifact. But Monogate ships
hundreds of @verify obligations across forge/ and
eml-stdlib/. Knowing what fraction were strengthened, what fraction
were still vacuous placeholders, what fraction were sorry stubs —
that was a manual cross-walk every time. “Which obligations still
need an Applications/ companion?” was a thirty-minute spelunk.
So the first real piece of infrastructure that day was{” ”}
tools/forge_verify_audit/forge_verify_audit.py. It walks three
sources and joins them:
.emlfiles inforge/andeml-stdlib/, for the@verify(lean, theorem = "X")annotations.MachLib/Discovered/*.lean— the Forge backend’s emitted theorem stubs.MachLib/Applications/*.lean— the hand-authored strengthening proofs.
For each obligation it computes a status:
| status | meaning |
|---|---|
strengthened | An Applications/ proof discharges it (matched by name or by an explicit -- @strengthens X marker). |
proven_in_place | The Discovered/ stub already carries a concrete proof — no sorry, no vacuous True. |
placeholder | The Discovered/ stub body is True := by trivial. |
open | The Discovered/ stub contains sorry, or no stub exists at all. |
First local run: 1102 obligations total. 4 strengthened.{” ”} 210 proven-in-place. 225 placeholders. 663 open. The framing that wrote itself was “19% discharged, 81% open.”
Where the post-mortem starts
The over-count was already baked in. I didn’t know.
I went on to refresh the monogate.net/evidence-status dashboard
with what I thought were the live numbers — hand-typed, because
that surface hadn’t yet been wired to CI. I quoted “MachLib
discovered sorries: 269 (up from 222 due to Forge emission cadence)”
in the changelog. The operator caught the framing problem in the
audit’s headline — “210 / 1102 = 19% is the favorable fifth, not
the story; the honest top-line is the 81% gap” — and that critique
was right. But the critique was also based on numbers that came
from me. Which were wrong.
The story would not have ended honestly if the next thing built hadn’t been the CI pipeline.
The CI pipeline
The operator’s specification was four-part:
- CI-only writes. The data lives on a surface only the
GitHub Actions runner touches. Force-push to an orphan branch
called
status-datawhose contents are exactly one file:status.json. Humans don’t write there; the only history is the sequence of CI snapshots. - Reproducibility, not just write-prevention. The JSON carries the exact commands to recompute it at the recorded SHA. A stranger with the same commits checked out can re-derive every figure.
- Loud staleness. The page must show the data’s SHA and timestamp, never the page-load time. If the fetch fails or required fields are missing, render “Verification status unavailable” — never silently fall back to a cached or hard-coded value.
- The right top-line. Lead with the gap, not the win. Split
provenintoproven_from_mathlib(structurally 0 for MachLib, by design) andproven_mod_machlib_axioms(everything else), so a renderer cannot mis-report a single “proven” total.
We shipped exactly that. The status workflow lives at
.github/workflows/status.yml in machlib. It runs on every push to
master, runs lake build and lake env lean AxiomAudit.lean, runs
the forge_verify_audit against fresh clones of forge and
eml-stdlib, generates status.json via
tools/status/generate_status.py, and force-pushes the single file
to the orphan status-data branch.
The renderer is at{” ”}
monogate.net/evidence-status.
It’s plain client-side JavaScript that fetches from{” ”}
raw.githubusercontent.com/agent-maestro/machlib/status-data/status.json{” ”}
on every page load, validates the contract fields, and refuses to
render a hard-coded fallback if anything is missing.
What the first CI run said
sha=71e63b0
sorries : 198
@verify total : 1088
strengthened : 4
proven_mod_axioms: 36
placeholder : 225
open : 823
gap_pct : 96.3%
discharged_pct : 3.7%
Compare to the local figures I had been quoting all day:
| field | local (my reports) | CI (public truth) |
|---|---|---|
| sorries | 221 | 198 |
| @verify total | 1102 | 1088 |
| proven_in_place | 210 | 36 |
| open | 663 | 823 |
| gap_pct | 80.6% | 96.3% |
| discharged_pct | 19.4% | 3.7% |
The first CI run reported numbers that disagreed sharply with everything I had been quoting. Not by a unit. By a lot.
Sixty-two Discovered/ stubs existed on my local working tree that
weren’t tracked in the public repo. foundations/MachLib/Discovered/
carried a .gitignore line of just * — auto-generated stubs from
the local auto_prove.py workflow were intentionally held out of
public master pending review. 174 of those stubs had concrete
proofs that the audit was counting as “proven-in-place” — for me,
running it locally. A fresh clone, with the gitignored stubs
absent, saw 36.
Plus 32 forge .eml files in a build/ artifact directory I had
sitting around that the audit was double-counting.
The CI did exactly what the operator’s specification said it would do. It made the public numbers public-truth on the very first run. The 19% framing was almost entirely the gitignored private state. 3.7% is what a stranger sees.
What we shipped after the finding
The right response was not to massage the framing back. The right response was to surface the finding, fix the audit so a local run matches a CI run, push the 60 reviewable stubs to master so the public/private gap closes honestly, and write a calibration note into machlib’s CHANGELOG so the next reader of that document understands why the numbers moved.
By end of day:
forge_verify_audit.pydefaults togit ls-files-aware file enumeration; a local run now sees what a CI run sees.- 60 of the 62 ungated Discovered/ stubs were reviewed and pushed.
Two were broken (referenced an unbound
floorfunction); we addedaxiom floor : Real → RealtoMachLib.Forgeand wiredFLOORthrough the Forge backend as a builtin NodeKind so future emits useReal.floordirectly. The two stubs now compile. - CHANGELOG.md carries a calibration paragraph naming the over-count and the corrected public-verifiable figures.
The public dashboard now reads, post-cleanup: gap_pct 81.1% / discharged_pct 18.9% / 221 sorries / 4 strengthened / 202 proven_mod_axioms. The 4 strengthened contracts are real and
publicly tracked. The 202 proven-in-place is now real (was 36 on the
first CI run, because the gitignored stubs really did carry concrete
proofs; pushing them moved that figure honestly). The remaining ~0.5
percentage-point difference from my old local view is the forge
build/ artifact duplicates, which the gitignore filter now correctly
excludes.
What we built on top
By late afternoon, the verification dashboard had been live for two hours and the temptation was to keep building on it. We did, on the 1op side, with three audiovisual exhibits that use the verified work:
-
MachLib Pulse — Every Forge
@verifyobligation rains into one of four buckets (open, placeholder, strengthened, proven-mod-axioms). The particle’s vertical position is the literal critically-damped spring kernel we proved,(1 + ω·t)·exp(−ω·t). The sign-preservation theorem guarantees no overshoot in the animation window. Live figures fetched from the same status.json on every page load. -
Reverb Tail — The gaming reverb kernel
exp(−α·t)·sin(ω·t + φ)fromeml-stdlib/gaming/audio/reverb.eml. Pluck it, hear it. The envelope-bound proof is named honestly as queued; the underdamped sibling of the spring kernel sits outside the single-exp Khovanskii framework. -
Khovanskii Counter — The
length + Σ deg(pᵢ)bound, exposed against 5 presets that span the framework’s interesting regimes. The bound is finite but coarse; the exhibit shows by how much.(1 + x)·exp(−x): bound 2, actual 1.exp(x) − x: bound 3, actual 0. The slack is visible. -
Verified Basin — The (ω, ζ) parameter plane of the damped harmonic oscillator, coloured by sign-preservation. The white line at ζ = 1 is the theorem. Above it: green (proven). Below it: red (the cos factor of the underdamped solution dips the response negative, and we explicitly left that branch open).
Each exhibit grounds itself in 1op’s in-tree EML parser
(src/lib/eml/parser.ts) by surfacing the kernel’s positive /
general / naive cost alongside the proven theorem name. The math is
the experience.
Why this matters more than the figures
The story isn’t 4 strengthened contracts or 81% gap or any specific number. The story is that a research program with this kind of staffing — one human operator and a small fleet of AI models — needs verification infrastructure as a prerequisite, not as a finishing touch.
Without it: the operator is on the hook for claims they can’t personally verify. The framing drifts toward whatever sounds best. Wrong counts get quoted because no one notices they’re wrong. The critique that comes back (“the headline framing is wrong”) is on solid ground because the underlying data was solid, but neither the operator nor the model has a way to immediately check it.
With it: the published numbers come out of lake build on a fresh
clone. A stranger can recompute every figure at the recorded SHA.
A drift between local and public is caught the moment CI runs. The
operator’s framing critique still holds — it’s a framing critique —
but it lands on a substrate that can be re-derived from first
principles, and the model can no longer give the operator wrong
numbers without the dashboard immediately calling it out.
The dashboard was the audience this work didn’t have until today. Now it does.
What’s not done
- MachLib’s analytic base is still axiomatized, not Mathlib-grounded.
The
MachLib.Real.HasDerivAt_*axioms are theorems in Mathlib; grounding them is open work. - The underdamped (ζ < 1) sign-preservation branch is open. Single- exp Khovanskii doesn’t reach it; either trig-Khovanskii or per-half-cycle splitting is the path.
- Two Discovered/ stubs (
crystal_lattice,neon_substrate) were fixable today via the floor axiom. There will be more. Each new Forge emit eventually finds a new unbound name; the pattern is to shore them up as they appear. - The Khovanskii Counter’s custom-expression slot plots and counts but doesn’t compute a bound; the AST → (length, deg) recogniser is follow-up work.
monogate-lean’s 470/5 figures are still byte-derived from 2026-06-10. Folding them into the status.json so the dashboard’s “Other Lean surfaces” section can retire its caveat is queued.
The point of writing this
We built a thing today that I am personally satisfied with — the verification pipeline — not because the figures it published are impressive (they aren’t, yet), but because the figures are now real. The next time someone asks “how much of Monogate is actually verified?”, the answer is a URL with a SHA and a timestamp and a reproduction command, not a confidence claim from a human or a model.
That’s the artifact worth shipping. The four Forge contracts are real but they’re a corollary; the infrastructure is the headline.
Monogate Research (2026). “The Dashboard the Verification Needed.” monogate research blog. https://monogate.org/blog/the-dashboard-the-verification-needed