2026-03-09

SWE-Vision: A Minimal Agent for Advancing Visual Intelligence

While coding capabilities have surpassed human-level performance in many benchmarks, visual reasoning continues to lag behind. In this work, we introduce SWE-Vision, a minimal agentic workflow that leverages a simple coding environment to enhance visual understanding, also a more achievable test time scaling direction.

2026-01-12

BabyVision: Visual Reasoning Beyond Language

State-of-the-art MLLMs achieve PhD-level language reasoning but struggle with visual tasks that 3-year-olds solve effortlessly. We introduce BabyVision, a benchmark revealing the infancy of AI vision.