The Big Idea
What if, instead of trying to guess the right equation, you let equations compete for survival?
CU-SRT is a tournament-style algorithm for discovering scientific laws from data. Instead of treating overfitting as a vice, we embrace it as a generative force. Overfit on many independent datasets (“universes”), then pit those equations against each other. The ones that can’t generalize get eliminated. The true law survives.
Phase A overfits locally in each universe, creating a diverse candidate pool. Phase B cross-tests every candidate against every universe and eliminates the weak. Phase C crowns a champion. The master equation:
Maximize cross-universe fitness, pay for complexity. Natural selection as an argmax. The full derivation, convergence proofs, and all theoretical guarantees are in the PDF.
Pipeline
Live Tournament Simulator
Select an equation to discover, then hit Run Tournament. Watch the terminal as CU-SRT evaluates candidates across universes and eliminates the impostors.
Things to try:
- Increase universes to 7 or 8. Notice how more universes makes it almost impossible for a specialist to hide. This is the exponential decay guarantee in action.
- Crank the complexity weight () up to 0.05. Watch how longer expressions get penalized even if they’re accurate. Set it to 0 and see pure accuracy without parsimony.
- Run multiple tournaments with different seeds. The true law (the one with consistent cross-universe fitness) should win nearly every time.
Key Guarantees
Exponential decay. The probability that a spurious formula survives decays exponentially with the number of universes. For any non-true candidate deviating from by at least :
More universes means exponentially less chance of being fooled.
Finite sample guarantee. If the number of universes satisfies:
then CU-SRT selects the true law with probability at least .
Geometric contraction. With adaptive thresholds, the candidate pool after rounds satisfies , decaying geometrically.
Optional Extensions
The paper introduces four plug-in modules, each preserving all theoretical guarantees:
-
Universe-Weighted Scores. Noisier universes get down-weighted via inverse-variance weighting, so data-rich, clean universes steer the tournament.
-
Stochastic Grammar Annealing. Useful primitives get sampled more often. Useless operators are demoted but never deleted, preserving exploration.
-
Causal-Graph Pruning. Equations that violate known causal sign constraints are culled before cross-universe testing even begins.
-
Bayesian Tournament Scoring. Replace accuracy with full Bayesian marginal likelihood, injecting an automatic Occam factor.
See the PDF for complete formulations and proofs of all extensions.
References
- C. Darwin, On the Origin of Species, 6th ed., John Murray, 1872.
- F. Nietzsche, Thus Spoke Zarathustra: A Book for All and None, 1883-1885.
- H. Spencer, The Principles of Biology, vol. 1, Williams & Norgate, 1864.
- R. Dawkins, The Selfish Gene, Oxford University Press, 1976.