Search for a command to run...
Standardized AI model benchmarks and evaluation methodologies
AIME 2024
1 min read
Benchmark Sample
GPQA
HumanEval
Humanity's Last Exam
LiveCodeBench
MATH-500
MMLU-Pro
SciCode