Three Filters That Looked Smart and Failed Testing

⚠️ Personal research and trading journal — not investment advice. The author does not provide licensed advisory services.

In my assumption audit, I swept six potential improvements to the contracting-base breakout method. One passed — the volume pop requirement in Thailand. Three others looked promising, showed improved means in the raw results, and failed when tested properly.

These are the three. Each one teaches something specific about how retail traders fool themselves with backtests.

Failure 1: ATR-based position sizing

The idea: Average True Range (ATR) measures a stock's daily volatility. The intuition for using it: instead of sizing every trade to the same percentage risk (say, 0.5% of portfolio per trade), size based on ATR so that stocks with higher daily volatility get smaller positions and low-volatility stocks get larger ones. Risk-adjusted exposure.

This is widely taught. Many professional systems use it. I tested it as an overlay on the contracting-base setup, measuring whether ATR-based sizing improved the distribution of outcomes.

What happened: The ATR ratio did correlate with trade outcomes — higher-ATR stocks had more variable results. But when I tested whether filtering or weighting by ATR improved the overall system, the result was flat. The improvement in risk-adjusted terms was offset by removing trades that happened to be in the higher-ATR region during strong trending markets, where volatility and momentum go together.

More importantly: the standard 7% structural stop (or the higher-low stop) already does most of the work ATR does. If you're sizing to a fixed percentage of capital risked per trade, the stop placement is your volatility adjustment. ATR-based sizing adds a second layer of the same calculation without adding signal.

The lesson: When a new input correlates with outcomes, it doesn't automatically mean it improves decisions. Sometimes you're just restating information you already have in a different form.

For a deeper treatment of ATR specifically — including the Thai vs US reversal and three separate tests — see [We Tried to Make the Method Smarter. ATR Didn't Help.](/articles/atr-adds-nothing-to-breakout.html)

Failure 2: Volume dry-up during base construction

The idea: Minervini's "VCP" (Volatility Contraction Pattern) includes volume dry-up during the base as a feature — the idea being that as volume dries up, sellers are exhausted and the stock is coiling for a move. I tested whether requiring below-average volume during the 15 days before the breakout improved results.

This one hurt to falsify. It looks obviously right on the good charts. I can find fifty examples where the dry-up preceded a big winner. The narrative is compelling.

What happened: When I tested it systematically across decades of data, the by-year check told the truth. The mean improved slightly. The median year didn't — in fact, the median year slightly worsened because requiring dry-up filtered out a portion of valid breakouts in trending markets where volume stays elevated.

The deeper problem: the volume dry-up filter reduced the sample size significantly, which made results more volatile year-to-year. When a filter concentrates your sample into fewer, higher-ATR situations, it looks like it's selecting for quality — but it may just be selecting for variance.

Volume dry-up works as an aesthetic read — it adds to the visual quality score of a chart. It doesn't survive as a mechanical rule.

The lesson: The things that look most right on curated examples are often the hardest to test honestly. If you select your examples from the charts where the rule worked, of course it will look true. The test is whether it works across all charts, including the ones you never remember.

Failure 3: Stacking filters for a "super system"

The idea: Each individual improvement — pivot near 52-week high, volume dry-up, tighter stop, stricter RS cutoff — showed improved means individually. What if I combined them? Stack three or four filters together and get a much higher-quality subset of trades.

What happened: The combined system showed dramatically higher per-trade R. Mean nearly doubled. Win rate improved. It looked like a breakthrough.

The by-year split showed the truth: the extra mean came almost entirely from 2020, a single exceptional year with low sample size (n=6 in that filter combination) where the COVID bounce produced enormous returns on every trade. In every other year, the stacked system performed at or below the baseline.

This is the clearest example of the tail-year trap. When you stack restrictive filters, two things happen simultaneously: (1) the sample shrinks dramatically, and (2) the remaining sample becomes more concentrated in the years where everything worked. You're not finding the best setups — you're finding the years where even random trades would have worked, and calling that your system.

A system with n=6 in a bull year and n=3 in a neutral year is not testable. It's a rumor masquerading as evidence.

The lesson: More filters means smaller samples, which means more noise. The right direction for a system is usually toward fewer, more fundamental rules — ones with theoretical mechanisms and large sample support.

What these three failures have in common

All three looked reasonable. All three had supporting logic. All three showed improvement in naive tests.

They all failed because they: 1. Concentrated samples into lucky sub-periods instead of improving typical performance 2. Restated existing information in a new form (ATR vs fixed-stop sizing) 3. Reduced sample size until noise dominated signal

The surviving filter — the volume pop on breakout day — avoids all three failure modes. It improves the typical year, not just exceptional ones. It adds information not already in the existing stops. It maintains adequate sample size across the full history.

That's the bar. Filters that don't clear it aren't improvements. They're noise with good PR.

Track. Study. Wait. Strike.

Personal research and trading journal — not investment advice. The author does not provide licensed advisory services. — MOEasymmetry