Evaluation of LLMs on math

MathEval is a benchmark dedicated to the holistic evaluation on mathematical capacities of LLMs, consisting of 20 evaluation datasets in various mathematical fields and nearly 30,000 math problems. The aim is to comprehensively evaluate the mathematical problem-solving abilities of LLMs in various grades and difficulties, including arithmetic, primary and secondary school competitions, and some advanced branches of mathematics. MathEval can not only serve as a one-stop reference for horizontal comparison of mathematical abilities among LLMs, but also provides hints and insights on how to further improve the mathematical abilities of LLMs in the future.

Leaderboard Github

News

Five fundamental characteristics

MathEval is the first one-stop LLM benchmark for mathematical evaluation, providing comprehensive assessment of LLMs’ mathematical abilities.
Strong flexibility to add new mathematical evaluation datasets into MathEval.
Supports multidimensional evaluation of LLM mathematical abilities.
Abundant model support is available, allowing various types of models to be integrated into the evaluation (HF models, API models, and custom open-source models).
Diverse evaluation methods are supported, including zero-shot evaluation and few-shot evaluation.

20 specialized math evaluation datasets

Contact US

The MathEval benchmark is provided with free computing power support by the Nationwide smart education platform for Open Innovation of New-Generation AI. If you wish to participate in the evaluation or discuss cooperation, please send your requirements and questions to the email address matheval.ai@gmail.com .

Evaluation of LLMs on math

Five fundamental characteristics

20 specialized math evaluation datasets

Math world EN

Math world CN

Arithmetics

Models joined to the evaluation

Contact US