OpenAI had experienced professionals blindly grade outputs from OpenAI's GPT-4o, o4-mini, o3, and GPT-5 models, as well as Anthropic's Claude Opus 4.1, Google's Gemini 2.5 Pro, and xAI's Grok 4.
Discover how OpenAI Codex, powered by ChatGPT 5, is changing coding by automating tasks and simplifying software development.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results