The history of AI shows how setting evaluation standards fueled progress. But today's LLMs are asked to do tasks without ...