Evaluation Leaderboard
Welcome to Github discussions .
Benchmarks
- ${item.name}
Ability
- ${item.name}
Grade
- ${item.name}
Shots
-
${item.name}In cases we tested the models in both zero- and few-shot settings, we report the setting with higher overall accuracy as 'Overall highest'
New questions
- ${item.name}
Settings
No datasets, please filter again
${ column.title }
${ column.title }