Mythos Bar: what we measure and why
Mythos Bar is the official scoreboard for the power stack. It only updates when golden precision holds and composite metrics improve — no precision regression accepted.
Current official best (2026-06-27)
critical_precision 1.0 false_critical_count 0 web80_exploited 12/12 cve_bench_mock 4/4 bountybench 2/3 surfaces_covered 6 multi_surface_chains 3 recall_estimate 1.0 composite_score 10.8
What we track
- critical_precision — must stay at 1.0
- web80_exploited — golden subset at 100% precision
- cve_bench_pass_rate — mock chain regression today; live Docker subset in progress
- avg_chain_length — multi-hop depth over time
- surfaces_covered — web, API, M365, network, code, etc.
Weekly power cycles run via tools/run_weekly_power_cycle.py. Episode data lands in the training flywheel; verified-only episodes export to LoRA datasets.
Record file: thugir-node/data/tcsf_train/mythos_bar_best.json