A Benchmark of Expert-level Academic Questions to Assess AI Capabilities
Nature 649, pp. 1139–1146 · 2026 Nature Benchmark
Also known as Humanity's Last Exam. A 3,000-question benchmark spanning expert-level academic domains designed to be adversarial to frontier LLMs at the limit of human knowledge.



