Research Pipeline
Tactics Journal Research Pipeline
The autonomous research arm of Tactics Journal. An end-to-end pipeline that ingests football content, detects emerging tactical trends, and generates citation-backed research reports.
Architecture
ingest → backfill → detect → rescore → report
- ingest — Pulls RSS feeds and YouTube transcripts into the database
- backfill — Fills gaps in source content
- detect — Finds emerging tactical trends via novelty scoring (frontier-gap detection)
- rescore — Recalculates novelty scores with latest data
- report — Generates structured research reports from top candidates using multi-agent flow (planner, parallel OODA subagents, synthesis, citation verification, final revision)
Repo & Local
GitHub · ~/research · ~/code/research
Key files: main.py (pipeline), server.py (dashboard), detect_detectors.py, detect_policy_config.json, report_policy_config.json, config.json
Infrastructure
- Railway — production cron jobs (ingest hourly, detect every 6h, report daily). CLI:
railway status,railway logs --service <name>. Project:research, Environment:production. - Cloudflare AI Gateway — LLM routing for all pipeline calls
- Cloudflare Dynamic Workers Gateway — article fetches and YouTube transcript resolution
- Postgres (pgvector) — database for sources, embeddings, candidates, reports
- Paperclip — agent orchestration system managing the research team
Models
anthropic/claude-sonnet-4-6— lead, synthesis, summary, revisionworkers-ai/@cf/meta/llama-3.3-70b-instruct-fp8-fast— default model, signal, citation, evalopenai/text-embedding-3-small— embeddings
All routed through Cloudflare AI Gateway. No OpenRouter or other providers.
Autoresearch
Karpathy-style experiment loop for tuning each pipeline stage:
python autoresearch/<stage>/prepare.py # freeze benchmark
python autoresearch/<stage>/train.py # edit mutable surface, run, keep improvements
Stages: ingest, detect, report. Production eval: make eval-report, make optimize-ingest-policy, make benchmark-report.
Publishing Flow
- Pipeline drafts report →
report.md+sources.json+metadata.json - Report artifacts saved to
report_runs/<timestamp>-<slug>/ - PR opened against GitHub repo for Kyle's review
- Never auto-merge
Known Pitfalls
- Trajectory analysis is unreliable — keyword matching returns "Insufficient history"
- Double novelty scoring — rescore calls compute_novelty_score twice (known bug)
- Source title fuzzy matching — LLMs paraphrase titles, candidates silently dropped
wrangler.jsoncis empty — no Cloudflare Worker deploy until properwrangler.tomlexists- Dashboard on Railway —
DATABASE_URLpointing to*.railway.internalwon't work locally
Related
- Tactics Journal — the publication this pipeline serves
- Paperclip — agent orchestration managing the research team
- Kyle Boas — reviews and publishes all reports