The ReLU–Softplus Error is Exactly ln(2)/β
Tier: THEOREM (proved)
Softplus is the smooth stand-in for ReLU everyone’s used for a decade:
$$
\mathrm{softplus}_\beta(x) ;=; \frac{1}{\beta},\ln!\bigl(1 + e^{\beta x}\bigr), \qquad \mathrm{ReLU}(x) = \max(x, 0).
$$
Make β big and softplus hugs ReLU closely. Make it small and you get a soft bend. Almost every neural-network textbook mentions the approximation; almost none of them tell you how good it is.
It turns out the answer is exact.
The bound
Theorem. For every real x and every β > 0,
$$
0 ;\le; \mathrm{softplus}\beta(x) - \mathrm{ReLU}(x) ;\le; \frac{\ln 2}{\beta},
$$
with the upper bound attained uniquely at x = 0. Consequently
$$
|\mathrm{softplus}\beta - \mathrm{ReLU}|_{L^\infty(\mathbb{R})} ;=; \frac{\ln 2}{\beta}.
$$
Two lines:
- x ≤ 0:
softplus_β(x) = (1/β) ln(1 + e^{βx}) ≤ (1/β) ln 2, with equality atx = 0. ReLU is0. Gap bounded byln 2 / β. - x ≥ 0:
softplus_β(x) − x = (1/β) ln(1 + e^{-βx}), which is at most(1/β) ln 2(same bound, attained atx = 0).
The maximum is hit exactly at the hinge, decays smoothly toward zero as |x| → ∞, and never exceeds ln(2)/β.
What this looks like numerically
200 points on [-1, 1], measuring max(softplus_β − ReLU):
| β | observed max error | ln(2)/β |
|---|---|---|
| 1 | 0.6931 | 0.6931 |
| 2 | 0.3466 | 0.3466 |
| 4 | 0.1733 | 0.1733 |
| 8 | 0.0866 | 0.0866 |
| 16 | 0.0433 | 0.0433 |
| 32 | 0.0217 | 0.0217 |
| 64 | 0.0108 | 0.0108 |
| 128 | 0.0054 | 0.0054 |
| 256 | 0.0027 | 0.0027 |
Log-log fit across these points: slope -1.0000, intercept e^a = 0.6931. That’s −1 and ln 2 to four decimal places. The theorem isn’t asymptotic; it’s sharp at every β.
Why ln(2), specifically?
Because the kink of ReLU at x = 0 has value max(0, 0) = 0, whereas softplus at x = 0 evaluates to (1/β) ln(1 + e^0) = (1/β) ln(2). The constant is baked into the structure of how you round off a corner to a smooth curve. It doesn’t depend on your input distribution, on initialization, or on the architecture — you pay ln 2 for each unit of β-precision and no more.
ReLU isn’t in ELC. Softplus is.
The deeper structural point: ReLU is not a finite EML tree — it has a non-analytic kink at the origin, and finite EML trees are real-analytic on (0, ∞) with finitely many zeros on compacts. So ReLU ∉ ELC(ℝ).
Softplus, on the other hand, is in ELC: it’s literally
$$
\ln!\bigl(1 + e^{\beta x}\bigr) / \beta
$$
— one exp, one log, arithmetic. Depth 2 over leaves {x, 1, β}. That means every softplus layer in a neural network is a depth-O(1) EML computation, and every softplus-driven activation stays inside ELC at every layer. Swap in ReLU and you leave ELC at the first non-differentiable point — approximately approximable but never exactly representable.
This is the Tier-0 boundary at the activation-function level. ReLU is in the uniform closure of ELC (for any precision ε you want, some β gets you there), but not in ELC itself. The approximation tax is ln(2)/β per unit of precision — no more, no less.
Reproduce
import math
def softplus(beta, x):
z = beta * x
if z > 500: return x # avoid overflow
if z < -500: return 0.0
return math.log1p(math.exp(z)) / beta
def relu(x): return x if x > 0 else 0.0
for beta in [1, 2, 4, 8, 16, 32, 64, 128, 256]:
xs = [-1 + 2 * k / 200 for k in range(201)]
err = max(abs(softplus(beta, x) - relu(x)) for x in xs)
print(f"beta={beta:4d} max_err={err:.6f} ln(2)/beta={math.log(2)/beta:.6f}")
Output matches the table above to double-precision.
Consequence for practitioners
- If your network uses softplus with β ≈ 1 (the default in many frameworks), your activation is off from ReLU by up to 0.69 at the hinge. That’s often fine; sometimes it isn’t.
- To hit ReLU to within 0.001 you need
β ≥ ln 2 / 0.001 ≈ 693. At β=693 the gradient near the origin is very steep — training may behave essentially like ReLU. softpluswith β small gives you a genuinely smoother function with a quantified accuracy penalty. If that penalty is acceptable for your task, you keep differentiability (and ELC membership) for free.
The ln(2) constant is cheap. It’s also exact. No tables, no ambiguity.
Cite: Monogate Research (2026). “The ReLU–Softplus Error is Exactly ln(2)/β.” monogate research blog. https://monogate.org/blog/relu-softplus-exact-error