AI models are starting to crack high-level math problems

Over the weekend, software engineer, former quant researcher, and startup founder Neel Somani was testing the math skills of OpenAI’s new model when he made an unexpected discovery. After pasting a problem into ChatGPT and letting it think for fifteen minutes, he returned to find a complete solution. He evaluated the proof and formalized it with a tool called Harmonic, and everything checked out.

Somani said he was curious to establish a baseline for when large language models can effectively solve open math problems compared to where they struggle. The surprise was that, using the latest model, the frontier began to push forward. ChatGPT’s chain of thought is even more impressive, rattling off mathematical axioms like Legendre’s formula, Bertrand’s postulate, and the Star of David theorem. Eventually, the model located a Math Overflow post from 2013 where Harvard mathematician Noam Elkies had given an elegant solution to a similar problem. However, ChatGPT’s final proof differed from Elkies’ work in important ways and provided a more complete solution to a version of a problem posed by legendary mathematician Paul Erdős, whose vast collection of unsolved problems has become a proving ground for AI.

For anyone skeptical of machine intelligence, this is a surprising result, and it is not the only one. AI tools have become ubiquitous in mathematics, from formalization-oriented models like Harmonic’s Aristotle to literature review tools like OpenAI’s deep research. But since the release of GPT 5.2, which Somani describes as anecdotally more skilled at mathematical reasoning than previous iterations, the sheer volume of solved problems has become difficult to ignore. This raises new questions about large language models’ ability to push the frontiers of human knowledge.

Somani was examining the Erdős problems, a set of over one thousand conjectures by the Hungarian mathematician that are maintained online. The problems have become a tempting target for AI-driven mathematics, varying significantly in both subject matter and difficulty. The first batch of autonomous solutions came in November from a Gemini-powered model called AlphaEvolve, but more recently, Somani and others have found GPT 5.2 to be remarkably adept with high-level math.

Since Christmas, fifteen problems have been moved from “open” to “solved” on the Erdős website, and eleven of the solutions have specifically credited AI models as involved in the process. The revered mathematician Terence Tao maintains a more nuanced look at the progress on his GitHub page, counting eight different problems where AI models made meaningful autonomous progress on an Erdős problem, with six other cases where progress was made by locating and building on previous research. It is a long way from AI systems being able to do math without human intervention, but it is clear that there is an important role for large models to play.

On Mastodon, Tao conjectured that the scalable nature of AI systems makes them better suited for being systematically applied to the “long tail” of obscure Erdős problems, many of which actually have straightforward solutions. He continued that as such, many of these easier Erdős problems are now more likely to be solved by purely AI-based methods than by human or hybrid means.

Another driving force is a recent shift towards formalization, a labor-intensive task that makes mathematical reasoning easier to verify and extend. Formalization does not require the use of AI or even computers, but a new crop of automated tools have made the process far easier. The open-source “proof assistant” Lean, developed at Microsoft Research in 2013, has become widely used within the field as a way of formalizing proof, and AI tools like Harmonic’s Aristotle promise to automate much of the work of formalization.

For Harmonic founder Tudor Achim, the sudden jump in solved Erdős problems is less important than the fact that the world’s greatest mathematicians are starting to take those tools seriously. Achim said he cares more about the fact that math and computer science professors are using these AI tools. These people have reputations to protect, so when they say they use Aristotle or they use ChatGPT, that is real evidence.