MOEasymmetry← All articles
Research · 2026-06-12 · 3 min read

How I Extract Trading Signals from Videos Without Getting Fooled

Track. Study. Wait. Strike.
English อ่านภาษาไทย (Thai)
⚠️ Personal research and trading journal — not investment advice. The author does not provide licensed advisory services.

When I built the IBD transcript corpus — over 4,800 videos, a decade of daily commentary — I needed a way to extract structured signals from unstructured text. The naive approach seemed obvious: search for stock symbols, pull the surrounding context, and tag what was said about each name.

That approach was wrong. And the error wasn't subtle.

Symbol-First: The Wrong Starting Point

Symbol-first extraction works like this: find every ticker mention in a transcript, extract the surrounding 200 words, and classify whether the commentary was bullish, bearish, or neutral.

This produces a lot of data. n=10 "clean" ACTIONABLE_BUY signals on the ASML 2026-05-13 transcript, using this method.

The problem: analysts don't always name a stock when they make their most important claim about it. They set up the case using pattern language ("the base is tight, volume dried up, it broke out this morning on 3× average volume"), and THEN say the name — or sometimes they never say it directly because they expect the audience to have been following.

Symbol-first captures the explicit mentions and misses the surrounding context that makes the signal interpretable.

Phrase-First: The Correct Architecture

Phrase-first extraction flips the search order: scan first for the phrases that carry the trading signal, then attribute the symbol from context.

The phrase inventory I built covers IBD commentary patterns: - PATTERN language: "cup and handle", "flat base", "VCP", "consolidation", "pivot point" - BREAKOUT language: "broke out", "breaking out", "breakout", "new high on volume" - FRESHNESS language: "this morning", "today's session", "just broke", "fresh breakout" - ACTION language: "actionable", "in a buy zone", "buyable", "add to" - CAUTION language: "extended", "too extended", "climax move", "chasing"

When these phrases fire, the extraction system notes the anchor phrase, expands context, and attributes the nearby stock symbol (within 150-word window, with tie-breaking for multiple mentions).

On the ASML 2026-05-13 transcript, phrase-first produced n=31 CLEAN ACTIONABLE_BUY signals vs n=10 from symbol-first. The 3× increase came from capturing cases where the signal language appeared without an explicit ticker nearby — the method found the intent, then resolved the attribution.

The Falsification It Led To

This method improvement also produced a cleaner falsification.

When I re-ran the buy-zone analysis with phrase-first extraction and separated "fresh breakout" language from "buy zone" language:

Symbol-first had obscured this split because it was conflating both types of language in the same "positive IBD mention" bucket. Phrase-first separated them, revealing that IBD's language about breakouts (fresh pattern description) is predictive, while IBD's explicit "buy zone" claims are late.

Why This Matters Beyond IBD

The phrase-first principle applies to any text corpus you want to mine for signals:

1. Find the language patterns that carry the signal — not the entities (companies, symbols) the signal is about 2. Build a phrase inventory from domain knowledge — what does the source say when they're describing the most actionable setups vs. the extended ones? 3. Attribute entities from context — let the signal language anchor the extraction, then resolve the stock

Applied to earnings transcripts: the signal language in a strong quarter often appears in the MD&A before the explicit EPS number — phrase-first would catch it first. Applied to news: headline sentiment differs from body-paragraph qualifier language — phrase-first separates them.

The common failure mode in text-based signal mining is building a classifier that categorizes entities when you should be classifying language. Fix: phrase-first, entity-second.

Track. Study. Wait. Strike.


Personal research and trading journal — not investment advice. The author does not provide licensed advisory services. — MOEasymmetry

Draft 2026-06-12. Source: IBD transcript corpus 4,800+ videos 2016-2026. Symbol-first vs phrase-first comparison on ASML 2026-05-13: n=10 vs n=31. Finding: feedback_phrase_anchored_extraction.md (2026-05-14). Falsification from phrase-first: PATTERN+BREAKOUT_FRESH = +3.17pp; explicit buy-zone = -7.77pp. See feedback_pattern_breakout_conjunction_2026-05-14.md.

Get new research by email
Tested across decades. Failures published. Real money.
Subscribe — free
📊 See the live dashboards, the breakout scanner, and the real track record at the MOEasymmetry hub — research, not advice.
← Previous
What Happens When You Remove Charts Entirely
งานวิจัยและบันทึกการเทรดส่วนบุคคล ไม่ใช่คำแนะนำการลงทุน · Personal research & trading journal — not investment advice. The author does not provide licensed advisory services.
Home · Articles · Methodology · Track record