Anthropic has to keep revising its technical interview test so you can’t cheaton it with Claude

Since 2024, Anthropic’s performance optimization team has used a take-home test to evaluate job applicants and ensure their technical knowledge. However, as AI coding tools have advanced significantly, the test has required major revisions to stay ahead of potential AI-assisted cheating.

Team lead Tristan Hume detailed the evolution of this challenge. He explained that each new Claude model has necessitated a redesign of the test. When operating under the same time constraints, Claude Opus 4 outperformed most human applicants. That still allowed the team to identify the strongest candidates, but then Claude Opus 4.5 matched even those top performers.

This created a serious candidate-assessment problem. Without in-person proctoring, there is no reliable way to ensure an applicant is not using AI to complete the test. If they do use AI, their results will quickly rise to the top. Under the constraints of the take-home format, the team found they no longer had a way to distinguish between the output of their best candidates and their most capable AI model.

The issue of AI cheating is already causing significant disruption in schools and universities worldwide, making it ironic that AI labs themselves now have to confront it. However, Anthropic is also uniquely positioned to address this problem.

Ultimately, Hume designed a new test that focused less on hardware optimization, making it sufficiently novel to stump current AI tools. As part of his announcement, he shared the original test publicly to see if anyone could devise a better solution. The post includes an open invitation, stating that if you can outperform Opus 4.5, the team would love to hear from you.