FMA Is the Only Primitive That Matters

Tier: OBSERVATION (empirical, reproducible)

The question: as we grow the SuperBEST primitive basis, where do the savings actually come from? We measured aggregate node cost across seven basis states on 222 parseable equations drawn from a 265-row catalog of chemistry, physics, biology, neuroscience, and economics. TREE semantics, sympy-canonical form.

The staircase

basis	total nodes	savings vs naive	Δ from previous
naive (no primitives)	1125	0.0%	—
F16 (16 canonical ops)	1041	7.5%	−7.5 pp
23-op (F16 + Layer-2)	1031	8.4%	−0.9 pp
23-op + FMA	951	15.5%	−7.1 pp
+ exp-affine + log-affine	948	15.7%	−0.2 pp
+ sigmoid (unary)	939	16.6%	−0.8 pp
+ bilinear-FMA (quaternary)	919	18.6%	−1.8 pp

One step dominates. FMA — a·b + c — contributes 7.1 percentage points on its own. The seven Layer-2 extensions combined add 0.9 pp. Every other tested primitive adds between 0.2 and 1.8 pp. A staircase decay model wins AIC comparison (ΔAIC = 3.45) against exponential and polynomial alternatives.

Where FMA lives

FMA matched in 52 of 222 equations across nine domains. Breakdown of marginal FMA savings by field:

Geology: +15.3 pp (radioactive decay, geothermal gradients)
Neuroscience: +14.1 pp (Hodgkin-Huxley, FitzHugh-Nagumo, cable equation)
Economics: +9.1 pp (CAPM, Phillips curve, Black-Scholes d₁)
Biology: +6.3 pp
Chemistry: +3.4 pp
Astrophysics: +1.6 pp
Electromagnetism: 0.0 pp — structural absence

Electromagnetism has no affine-shifted structure to absorb. Maxwell’s equations are products and curls. FMA catches nothing there. Other fields see FMA as their dominant structural primitive.

What this doesn’t say

The 15.5% figure is aggregate savings on this catalog. Individual equations range from 0% (most equations see nothing from FMA) to over 50% local savings (polynomial-heavy neuroscience). The claim is structural: on a representative sample of elementary-function science, FMA absorbs more cost than the entire bivariate Layer-2 extension.

It also doesn’t say FMA is the unique primitive worth adopting. Bilinear-FMA adds another 1.8 pp — a second, smaller step. Softmax-style patterns would matter for ML workloads not in this catalog. But no single additional primitive we tested approaches FMA’s 7.1 pp jump.

Reproduce

Raw data: 222 per-row parsed costs under each basis state are in the private exploration repo (exploration/deep10/catalog_parsed_v4.json). The parse rate reflects sympy’s reach on ASCII-math formulas; the remaining 43 rows are mostly summations and matrix-notation expressions outside elementary closure.

Source: 265-equation catalog (exploration/deep-sessions/data/expanded_genome.json). Public capcard holds the toy-basket 14n / 80.8% headline; this post is the catalog-aggregate complement.

Monogate Research (2026). “FMA Is the Only Primitive That Matters.” monogate research blog. https://monogate.org/blog/fma-staircase

React