Evaluation of LLMs on math
MathEval is a benchmark dedicated to the holistic evaluation on mathematical capacities of LLMs, consisting of 20 evaluation datasets in various mathematical fields and nearly 30,000 math problems. The aim is to comprehensively evaluate the mathematical problem-solving abilities of LLMs in various grades and difficulties, including arithmetic, primary and secondary school competitions, and some advanced branches of mathematics. MathEval can not only serve as a one-stop reference for horizontal comparison of mathematical abilities among LLMs, but also provides hints and insights on how to further improve the mathematical abilities of LLMs in the future.
Leaderboard GithubFive fundamental characteristics
- MathEval is the first one-stop LLM benchmark for mathematical evaluation, providing comprehensive assessment of LLMs’ mathematical abilities.
- Strong flexibility to add new mathematical evaluation datasets into MathEval.
- Supports multidimensional evaluation of LLM mathematical abilities.
- Abundant model support is available, allowing various types of models to be integrated into the evaluation (HF models, API models, and custom open-source models).
- Diverse evaluation methods are supported, including zero-shot evaluation and few-shot evaluation.
20 specialized math evaluation datasets
Math world EN
Math world CN
Arithmetics
Models joined to the evaluation
GPT-4
GPT-3.5
LLaMA2-7B
LLaMA2-7B-chat
LLaMA2-13B
LLaMA2-13B-chat
LLaMA2-70B
LLaMA2-70B-chat
ChatGLM2-6B
Baichuan2-13B-Base
InternLM-20B
InternLM-chat-20B
InternLM2-base-20B
InternLM2-chat-20B
InternLM2-math-20B
MathGPT
Qwen
WizardMath-13B-V1.0
WizardMath-70B-V1.0
MOSS-003-base-16B
文心一言
讯飞星火
MammoTH-70B
GAIRMath-Abel-70B
Mistral-7B-Instruct-v0.1
Mistral-7B-v0.1
Llemma-7B
Llemma-34B
MetaMath-70B
GLM4
Evaluation
Contact US
The MathEval benchmark is provided with free computing power support by the Nationwide smart education platform for Open Innovation of New-Generation AI. If you wish to participate in the evaluation or discuss cooperation, please send your requirements and questions to the email address matheval.ai@gmail.com .