MOEasymmetry← All articles
Methodology · 2026-06-12 · 4 min read

Why My Backtest Number Got Smaller When I Made the Test Harder

Track. Study. Wait. Strike.
English อ่านภาษาไทย (Thai)
⚠️ Personal research and trading journal — not investment advice. The author does not provide licensed advisory services.

One of the counterintuitive lessons from quantitative testing is that a smaller number from a harder test is more trustworthy than a bigger number from a softer test.

I learned this specifically from the volume-pop study on Thai stocks.

The Setup

My base method is the contracting-base breakout — stocks that form tight bases with higher lows, then break through the pivot point. The volume-pop hypothesis is that requiring a 1.5× volume increase on the breakout day improves outcomes: if the stock breaks quietly, skip it; if real buying shows up, take it.

I described the core finding in an earlier article. Volume-pop improves performance in Thailand. It degrades performance in the US. The results in both directions are real.

But when a user asked me whether the Thai finding was believable — whether I could make the test harder and still see the result — I went back and ran additional stress tests.

The original pooled result: +3.60% improvement in mean forward return with the volume-pop gate applied.

That's the number that looked strong.

What Harder Tests Show

Test 1: Regime gate

I split the results by market condition. Volume-pop only works in a Confirmed Uptrend (SET index above 50d MA, 50d above 200d). In correction conditions, the improvement disappears. The Confirmed Uptrend confidence interval: [+0.10%, +1.60%]. The correction CI spans zero — no signal.

The improvement is real, but it's regime-gated. The pooled +3.60% average pooled a genuine signal in uptrends with zero signal in corrections. That inflation made the number look larger than its true addressable effect.

Test 2: Walk-forward

The pooled mean comes from all years combined. But some years are exceptional — 2009 (recovery), 2014 (SET bull run), 2020 (post-COVID surge). These years have strong momentum across the board, and the volume-pop signal is especially strong in them. They inflate the pooled average.

In walk-forward testing — where each out-of-sample window is evaluated independently — the story changes:

The walk-forward median is honest. The pooled mean was flattering.

Why the Smaller Number Is Better

Here's the unintuitive part: the +1.9% WF median is more trustworthy than the +3.60% pooled mean. Not because it's bigger — it isn't — but because it survived:

Something that is still positive after all of those tests was tested harder. The reduction from 3.60% to 1.9% is the test telling you that 1.7 percentage points of the original number was from favorable conditions that won't always apply.

The 1.9% is what you should expect in a typical year. The 3.60% was what you got when you averaged together the typical years and the exceptional ones.

What This Means for Interpreting Backtests

There are two ways to make a backtest number bigger:

Making the method more refined: Add filters, tune parameters, select lookback windows. This inflates the number by fitting the method to the historical data. The improvement may not repeat.

Harder tests on the fixed method: Run it through walk-forward windows. Split by regime. Remove the top-N outlier years. Re-test on OOS data. The number often shrinks — but what remains is more likely to represent real edge.

The first approach is how most retail traders improve their backtests. The second approach is what I try to do.

When a number shrinks under harder testing, that is not a failure. That is the test working correctly — filtering out the lucky-sample component of the original estimate and leaving the replicable core.

The volume-pop improvement on Thai stocks went from +3.60% to +1.9% under harder tests. I trust the +1.9% more. It's the number I'm willing to rely on for capital decisions.

The Practical Rule

Before citing any backtest result, I now ask: what would this number look like under walk-forward? Regime-split? Drop-top-3?

If the result collapses under those tests, the original number was flattering noise.

If the result shrinks but survives — as volume-pop did — you have something. It's smaller than it looked, but it's real.

Get new research by email
Tested across decades. Failures published. Real money.
Subscribe — free
📊 See the live dashboards, the breakout scanner, and the real track record at the MOEasymmetry hub — research, not advice.
← Previous
The Emergency Kill Switch in My Trading System
งานวิจัยและบันทึกการเทรดส่วนบุคคล ไม่ใช่คำแนะนำการลงทุน · Personal research & trading journal — not investment advice. The author does not provide licensed advisory services.
Home · Articles · Methodology · Track record