Anthropic has to keep revising its technical interview test as Claude improves

Since 2024, Anthropic’s performance optimization team has used a take-home test for job applicants to verify their technical knowledge. However, as AI coding tools have advanced, the test has required significant changes to prevent candidates from simply using Claude to provide all the answers.

Team lead Tristan Hume detailed the evolution of this challenge. He explained that each new Claude model has forced a redesign of the test. When given the same time limit, Claude Opus 4 outperformed most human applicants. That still allowed the team to identify the strongest candidates, but then Claude Opus 4.5 matched even those top performers.

Candidates are permitted to use AI tools on the test, but this creates a serious assessment problem. If humans can no longer improve upon the model’s output, then the test only measures the capabilities of different AI models and becomes useless for finding top human talent. Hume noted that under the constraints of the take-home test, they no longer had a way to distinguish between the output of their best candidates and their most capable AI model.

The issue of AI use on assessments is already causing significant disruption in schools and universities worldwide, making it ironic that AI labs themselves are now grappling with it. Anthropic, however, is uniquely positioned to address this problem. Hume ultimately designed a new test that focused less on hardware optimization, making it novel enough to stump current AI tools.

As part of his announcement, Hume also shared the original test publicly, inviting readers to attempt a better solution. The post concludes by stating that if anyone can best Opus 4.5, Anthropic would love to hear from them.

A correction was issued regarding Anthropic’s policy on AI tool use for the take-home test. Contrary to an earlier statement, AI use is expressly permitted.