This section collects short papers and devlogs produced from Babel benchmark batches.
March 2026
One Hundred Trees, One Hundred Public Syntactic Theories
Research Note v1: the first benchmark of public syntax in frontier language models.
- Date: March 13, 2026
- Data: gauntlet100-v1-report.json
- Capture script: gauntlet100_dual.cjs
- Full atlas: all 100 trees with sentence-by-sentence analysis
| Gemini 3.1 Pro | Gemini 3.1 Flash Lite |
|---|---|
![]() |
![]() |
Explicit Syntax Under Forced Commitment
Mini Paper v1: a paired 20-case Babel benchmark of Gemini 3.1 Pro and Gemini 3.1 Flash Lite.
- Date: March 11, 2026
- Data: random20-v1-report.json
- Capture script: random20_dual_showcase.cjs
| Gemini 3.1 Pro | Gemini 3.1 Flash Lite |
|---|---|
![]() |
![]() |



