The Big Idea

What if, instead of trying to guess the right equation, you let equations compete for survival?

CU-SRT is a tournament-style algorithm for discovering scientific laws from data. Instead of treating overfitting as a vice, we embrace it as a generative force. Overfit on many independent datasets (“universes”), then pit those equations against each other. The ones that can’t generalize get eliminated. The true law survives.

Phase A overfits locally in each universe, creating a diverse candidate pool. Phase B cross-tests every candidate against every universe and eliminates the weak. Phase C crowns a champion. The master equation:

L=argmaxφF{Gˉ(φ)λ(φ)}\mathcal{L}^\star = \arg\max_{\varphi \in \mathcal{F}} \left\{ \bar{G}(\varphi) - \lambda\, \ell(\varphi) \right\}

Maximize cross-universe fitness, pay for complexity. Natural selection as an argmax. The full derivation, convergence proofs, and all theoretical guarantees are in the PDF.

Pipeline

Cross-Testing Schematic
Universe 1φ₀₁₁ φ₀₁₂ φ₀₁₃Universe 2φ₀₂₁ φ₀₂₂ φ₀₂₃Universe 3φ₀₃₁ φ₀₃₂ φ₀₃₃Cross-universefitness filterT(φ) < τT(φ) ≥ τDiscardSurvivors 𝒮Champion / Ensemble

Live Tournament Simulator

Select an equation to discover, then hit Run Tournament. Watch the terminal as CU-SRT evaluates candidates across universes and eliminates the impostors.

CU-SRT
MutateSelectOutput
Universes5
λ (complexity)0.020
φ(x)
U1
U2
U3
U4
U5
T
cusrt -- newton_s_cooling_law
cusrt ~ % waiting for input...

Things to try:

Key Guarantees

Exponential decay. The probability that a spurious formula survives decays exponentially with the number of universes. For any non-true candidate φ~\tilde{\varphi} deviating from L\mathcal{L} by at least Δ>0\Delta > 0:

Pr ⁣{Gˉ(φ~)Gˉ(L)ζ}exp(2Nζ2)\Pr\!\big\{\bar{G}(\tilde{\varphi}) \geq \bar{G}(\mathcal{L}) - \zeta\big\} \leq \exp(-2N\zeta^2)

More universes means exponentially less chance of being fooled.

Finite sample guarantee. If the number of universes satisfies:

NlogC+log(1/β)2Δ2N \geq \frac{\log|\mathcal{C}| + \log(1/\beta)}{2\Delta^2}

then CU-SRT selects the true law L\mathcal{L} with probability at least 1β1 - \beta.

Geometric contraction. With adaptive thresholds, the candidate pool after tt rounds satisfies C(t)C(1)(1q)t|\mathcal{C}^{(t)}| \leq |\mathcal{C}^{(1)}|(1-q)^t, decaying geometrically.

Optional Extensions

The paper introduces four plug-in modules, each preserving all theoretical guarantees:

See the PDF for complete formulations and proofs of all extensions.

References

  1. C. Darwin, On the Origin of Species, 6th ed., John Murray, 1872.
  2. F. Nietzsche, Thus Spoke Zarathustra: A Book for All and None, 1883-1885.
  3. H. Spencer, The Principles of Biology, vol. 1, Williams & Norgate, 1864.
  4. R. Dawkins, The Selfish Gene, Oxford University Press, 1976.