Evaluation of LLMs on math

MathEval is a benchmark dedicated to the holistic evaluation on mathematical capacities of LLMs, consisting of 19 evaluation datasets in various mathematical fields and nearly 30,000 math problems. The aim is to comprehensively evaluate the mathematical problem-solving abilities of LLMs in various grades and difficulties, including arithmetic, primary and secondary school competitions, and some advanced branches of mathematics. MathEval can not only serve as a one-stop reference for horizontal comparison of mathematical abilities among LLMs, but also provides hints and insights on how to further improve the mathematical abilities of LLMs in the future.

Leaderboard Github
feature image

Five fundamental characteristics

  • MathEval is the first one-stop LLM benchmark for mathematical evaluation, providing comprehensive assessment of LLMs’ mathematical abilities.
  • Strong flexibility to add new mathematical evaluation datasets into MathEval.
  • Supports multidimensional evaluation of LLM mathematical abilities.
  • Abundant model support is available, allowing various types of models to be integrated into the evaluation (HF models, API models, and custom open-source models).
  • Diverse evaluation methods are supported, including zero-shot evaluation and few-shot evaluation.

Contact US

The MathEval benchmark is provided with free computing power support by the Nationwide smart education platform for Open Innovation of New-Generation AI. If you wish to participate in the evaluation or discuss cooperation, please send your requirements and questions to the email address matheval.ai@gmail.com .