Skip to the content.

This section collects short papers and devlogs produced from Babel benchmark batches.

March 2026

One Hundred Trees, One Hundred Public Syntactic Theories

Research Note v1: the first benchmark of public syntax in frontier language models.

Gemini 3.1 Pro Gemini 3.1 Flash Lite
Pro English long-distance wh growth Flash Lite English long-distance wh growth

Explicit Syntax Under Forced Commitment

Mini Paper v1: a paired 20-case Babel benchmark of Gemini 3.1 Pro and Gemini 3.1 Flash Lite.

Gemini 3.1 Pro Gemini 3.1 Flash Lite
Pro English long-distance wh growth Flash Lite English long-distance wh growth