Welcome to the VisualOverload Leaderboard!
Below you will find the public leaderboard for the VisualOverload benchmark, which evaluates models on their ability to understand and reason about complex visual scenes. We seperate by models and 'special' inference techniques (e.g., special prompts, ICL, CoT etc.) to better understand the source of their performance.
The leaderboard ranks models based on their overall accuracy across a six tasks (activity recognition, attribute recognition, counting, OCR, reasoning, and global scene recognition). We provide an aggregate score (Total) as well as individual scores on three distinct splits per difficulty (Easy, Medium, Hard), and each task.
InternLM-XComposer2-4KHD | No | 76.7 | 69.8 | 36.7 | 62.7 | 75.1 | 94.7 | 99.9 | 80.2 | 19.8 | 69.5 |
Please see the evaluation tab for evaluation and details on how to list your results.