Evaluation Leaderboard

Welcome to Github discussions .
Benchmarks
  • ${item.name}
Ability
  • ${item.name}
Grade
  • ${item.name}
Shots
  • ${item.name}
    In cases we tested the models in both zero- and few-shot settings, we report the setting with higher overall accuracy as 'Overall highest'
New questions
  • ${item.name}
empty
No datasets, please filter again