Note: Yield and QYI values are weighted across the problem sets. Quality is calculated as (QYI * Yield) / (2 * Yield - QYI) since weighted QYI is not the harmonic mean of weighted Yield and weighted Quality.
Rank | Model | Weighted Yield | Weighted Quality | Weighted QYI | Cost ($) | Release Date |
---|
Note: All Yield, Quality, and QYI values are weighted across the problem sets in this table.
* GPT-o4-mini:high was measured under T=1 since o-series models only support T=1.
Model | Stage 3 (Verification) | Stage 2 (Solution Gen.) | Stage 1 (Execution) | ||||||
---|---|---|---|---|---|---|---|---|---|
@10 | @5 | @1 | @10 | @5 | @1 | @10 | @5 | @1 |