HeuriGym Leaderboard
An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization

📈 Overall Performance

Note: Yield and QYI values are weighted across the problem sets. Quality is calculated as (QYI * Yield) / (2 * Yield - QYI) since weighted QYI is not the harmonic mean of weighted Yield and weighted Quality.

📊 Leaderboard (T=0)

Rank Model Weighted Yield Weighted Quality Weighted QYI Cost ($) Release Date

Note: All Yield, Quality, and QYI values are weighted across the problem sets in this table.

* GPT-o4-mini:high was measured under T=1 since o-series models only support T=1.

📊 solves@i Metrics (T=0)

Model Stage 3 (Verification) Stage 2 (Solution Gen.) Stage 1 (Execution)
@10 @5 @1 @10 @5 @1 @10 @5 @1

📚 Citation

@article{chen2025heurigym, title={HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization}, author={Hongzheng Chen and Yingheng Wang and Yaohui Cai and Hins Hu and Jiajie Li and Shirley Huang and Chenhui Deng and Rongjian Liang and Shufeng Kong and Haoxing Ren and Samitha Samaranayake and Carla P. Gomes and Zhiru Zhang}, journal={arXiv preprint arXiv:2506.07972}, year={2025} }