ICLR 2026 Benchmark

An agentic benchmark for evaluating how well LLMs generate and refine heuristics for real-world combinatorial optimization tasks.

Hongzheng Chen*, Yingheng Wang*, Yaohui Cai*, Hins Hu*, Jiajie Li*, Shirley Huang, Chenhui Deng, Rongjian Liang, Shufeng Kong, Haoxing Ren, Samitha Samaranayake, Carla P. Gomes, Zhiru Zhang
* Core contributor
30
Models Evaluated
9
Problems
4
Domains
218
Test Cases

Overall Performance

Company

Yield and QYI values are weighted across the 9 problem sets by test case count. Quality is back-calculated from weighted QYI and weighted Yield via QYI = 2QY/(Q+Y). Select a single company to overlay a chronological trend line of its model releases.

QYI vs. API Cost

Company

Total API cost estimated from official list pricing × token counts across all 9 problems (5 iterations, T=0). Cost axis is on a logarithmic scale. Hover over a point for details. Select a single company to overlay a chronological trend line of its model releases.

Leaderboard (T = 0)

# Model QYI ↕ Yield ↕ Quality ↕ Cost ($) ↕ Released ↕

* GPT-o4-mini:high is measured at T=1 (o-series only supports T=1); its values appear in all temperature views unchanged.

Per-Problem Results

EDA
Compilers
Computational Biology
Logistics
QYI by Model — Operator Scheduling

Per-problem QYI for all 30 evaluated models, sorted by QYI. Original 9 models ran 10 iterations (T=0); the 21 recent models ran 5 iterations (T=0).

Solve Metrics (T = 0)

Model Stage III — Validity Stage II — Output Gen. Stage I — Execution
@10 @5 @1 @10 @5 @1 @10 @5 @1

solves@i: fraction of test cases solved at Stage s within i attempts. Stage I = no crash, Stage II = parseable output, Stage III = valid solution (≡ Yield). indicates the model was evaluated with 5 iterations only (no @10 data); the first 9 models ran 10 iterations.

Citation

@article{chen-heurigym-iclr2026,
  title={HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization},
  author={Hongzheng Chen and Yingheng Wang and Yaohui Cai and Hins Hu and Jiajie Li
          and Shirley Huang and Chenhui Deng and Rongjian Liang and Shufeng Kong
          and Haoxing Ren and Samitha Samaranayake and Carla P. Gomes and Zhiru Zhang},
  journal={International Conference on Learning Representations (ICLR)},
  year={2026}
}