Education · Long-form

Backtesting Integrity: Why Most Retail Backtests Are Statistically Wrong

If your backtest shows 80% win rate, the math is probably lying to you. Most retail backtests have at least one of four biases that materially distort results: look-ahead, survivorship, data-snooping, and inadequate cost modelling. Each independently overstates results by 5–15 percentage points. In combination, the typical 'great' backtest result is overstated by 25–40 percentage points. This page covers the four biases, walk-forward analysis as the structural cure, and where Stage 4 of Bharath Shiksha sits in the systematic-translation path.

The four biases retail almost always misses

Look-ahead bias: using data that wasn't available at the time of the simulated decision. Common version: using closing price as entry while pretending you entered intraday. Survivorship bias: backtesting only on currently-listed stocks ignores delisted ones. Data-snooping: trying many parameter combinations and reporting the best. With 100 parameter combinations and a noise-only signal, you can produce 'positive' results purely by chance ~5% of the time. Costs: forgetting brokerage, STT, GST, slippage, and impact. Indian round-trip costs are typically 0.1–0.2% of position value at retail scale.

Walk-forward analysis — the structural cure

Walk-forward simulates what would have happened if you'd been deploying live and re-tuning periodically. Optimise on first 2 years; test on year 3. Slide forward; optimise on years 2-3, test on year 4. Continue. The aggregate of all out-of-sample test years gives the walk-forward equity curve. This curve is your honest forward-looking estimate.

In-sample vs out-of-sample gap

IS performance is biased upward — the setup has been (consciously or unconsciously) tuned to data it was developed on. Typical retail tuning produces IS results 20–40% better than OOS. Particularly overfitted setups produce IS results 100%+ better than OOS — meaning the IS performance is essentially noise. The IS/OOS gap is itself diagnostic. Small gap (under 15%): robust. Large gap (over 30%): overfitting.

Sample size — 30, 100, 300 thresholds

Below 30 trades, expectancy estimates are dominated by single-trade variance. Below 100 trades, the confidence interval is ±0.4R wide for typical retail-grade setups. At 300 trades, the interval narrows to ±0.12R. Decisions about scaling capital should wait for the 100-trade threshold. Most retail makes scaling decisions at 10-20 trades, which is statistically meaningless.

Where Stage 4 fits

Stage 4 (Mastery I — Quantitative) of the Bharath Shiksha curriculum builds the Python-based systematic translation: walk-forward integrity, factor decomposition, Monte Carlo stress testing. Pre-requisite: Stage 3 capstone passed. The 8-week capstone exercise — pick one Stage 3 setup, translate to Python, backtest with walk-forward, validate with Monte Carlo, paper-trade for 2 weeks, submit code + journal — is the structural Stage 4 deliverable.

FAQs

Can I do walk-forward analysis without Python?

Possible in spreadsheets but tedious. Stage 4 covers Python implementation with 12 Jupyter notebooks. Spreadsheet walk-forward works for very simple setups but rapidly becomes unmaintainable beyond 1-2 parameters.

How long does proper backtesting take?

30-90 minutes per setup including IS/OOS validation, walk-forward, cost modelling, and writing up findings. Less time produces results that are easier to misinterpret. The discipline is the point.

What if my backtest shows promising results?

Audit it against the four biases first. Then run walk-forward. Then paper-trade for 60 days. Most 'promising' backtests fail one of these three filters. The ones that survive all three are worth scaling capital toward.

Are paid backtest platforms (Streak, Tradetron) reliable?

They handle the mechanics correctly. They cannot stop you from data-snooping. Discipline is on the user, not the platform.

How do I know if I have look-ahead bias?

Common test: implement the backtest twice, once with strict-time-walk discipline, once with the original code. If the strict version produces materially worse results, the original had look-ahead. Most retail backtests fail this test.

Start with Foundation

73-page printed curriculum book + 28 video lessons + tutor channel. ₹4,999. 7-day refund.

Enrol — ₹4,999

Bharath Shiksha is an educational publisher. We do not provide investment advice. Curriculum uses anonymised historical examples with at least 30-day data lag; no specific securities are named for buy/sell/hold; no performance claims or return projections.