MATH-500 measures a model's ability to solve difficult mathematics problems in a controlled exam format.1 References References vals.ai ↩