OpenAI upgrades Codex with a new version of GPT-5

OpenAI announced on Monday that it is releasing a new version of its AI coding agent, Codex, called GPT-5-Codex. The company states this new model spends its thinking time more dynamically than previous versions, allowing it to work on a coding task for anywhere from a few seconds up to seven hours. This enhanced approach leads to better performance on agentic coding benchmarks.

The new model is now rolling out within Codex products, which are accessible via a terminal, an IDE, GitHub, or ChatGPT. It is available to all ChatGPT Plus, Pro, Business, Edu, and Enterprise users. OpenAI has indicated it plans to make the model available to API customers at a later date.

This update is a key part of OpenAI’s strategy to make Codex more competitive with other AI coding products like Claude Code, Anysphere’s Cursor, and Microsoft’s GitHub Copilot. The market for AI coding tools has grown significantly more crowded over the past year due to intense user demand. Cursor surpassed five hundred million dollars in annual recurring revenue earlier in 2025, and Windsurf, a similar code editor, was recently the subject of a chaotic acquisition attempt that resulted in its team splitting between Google and Cognition.

According to OpenAI, GPT-5-Codex outperforms the standard GPT-5 on the SWE-bench Verified benchmark, which measures agentic coding abilities. It also shows improved performance on a benchmark that evaluates code refactoring tasks from large, established repositories.

The company also says it specifically trained GPT-5-Codex to conduct code reviews. When experienced software engineers evaluated the model’s review comments, they reportedly found that it submitted fewer incorrect comments while providing more high-impact feedback.

In a briefing, OpenAI’s Codex product lead Alexander Embiricos explained that much of the improved performance is due to the model’s dynamic thinking abilities. Unlike the router in ChatGPT that directs queries to different models based on task complexity, GPT-5-Codex operates without a router and can adjust how long it works on a task in real-time.

Embiricos noted this is a significant advantage over a router, which must decide on computational power and time at the very beginning of a task. In contrast, GPT-5-Codex can decide five minutes into a problem that it needs to spend another hour on it. Embiricos confirmed he has observed the model take upwards of seven hours to complete some tasks.