Last month, I wrote about Mercor’s new benchmark measuring AI agents’ capabilities on professional tasks like law and corporate analysis. At the time, the scores were quite dismal, with every major lab scoring under 25 percent. We concluded that lawyers were safe from AI displacement, at least for the immediate future.
However, AI capabilities can change significantly in just a couple of weeks. This week’s release of Anthropic’s Opus 4.6 shook up the leaderboards. Anthropic’s new model scored just shy of 30 percent in one-shot trials, and achieved an average of 45 percent when given a few more attempts at the problem. Notably, the release included a number of new agentic features, such as “agent swarms,” which may have assisted with this kind of multistep problem-solving.
Regardless, this score represents a huge jump from the previous state-of-the-art. It is a clear sign that progress on foundation models is not slowing down. Mercor CEO Brendan Foody was particularly impressed, stating that jumping from 18.4 percent to 29.8 percent in a few months is insane.
Thirty percent is still a long way from one hundred percent, so lawyers do not need to worry about being replaced by machines next week. But they should be a lot less confident than they were just last month.

