A new AI coding challenge just published its first results – and they aren’tpretty

A new AI coding challenge has crowned its first winner, setting a higher standard for AI-powered software engineers. On Wednesday at 5 PM PST, the nonprofit Laude Institute announced Eduardo Rocha de Andrade, a Brazilian prompt engineer, as the inaugural winner of the K Prize. This multi-round AI coding challenge, launched by Databricks and Perplexity co-founder Andy Konwinski, awarded Andrade $50,000 for his achievement.

What made the win surprising was his final score—he answered just 7.5% of the test questions correctly. Konwinski remarked, “We’re glad we built a benchmark that is actually hard. Benchmarks should be hard if they’re going to matter.” He has committed $1 million to the first open-source model that scores above 90% on the test.

The K Prize operates similarly to the well-known SWE-Bench, evaluating AI models on real-world programming problems using flagged GitHub issues. However, unlike SWE-Bench, which relies on a fixed dataset, the K Prize is designed to prevent benchmark-specific training by using a timed entry system. For the first round, submissions were due by March 12, and the test was built exclusively from GitHub issues flagged after that date.

The 7.5% top score contrasts sharply with SWE-Bench’s current results, where models achieve 75% on its easier “Verified” test and 34% on the harder “Full” test. Konwinski is unsure whether the gap stems from contamination in SWE-Bench or the difficulty of sourcing fresh GitHub issues, but he believes the K Prize will provide clarity. “As we get more runs of the thing, we’ll have a better sense,” he told TechCrunch. “We expect people to adapt to competing every few months.”

Despite the abundance of AI coding tools, many critics argue that benchmarks have become too easy, making projects like the K Prize essential for addressing AI’s evaluation challenges. Princeton researcher Sayash Kapoor supports this approach, stating, “Without such experiments, we can’t actually tell if the issue is contamination or just targeting the SWE-Bench leaderboard with human intervention.”

For Konwinski, the K Prize is more than a benchmark—it’s a challenge to the industry. “If you listen to the hype, it’s like we should be seeing AI doctors and AI lawyers and AI software engineers, and that’s just not true,” he says. “If we can’t even get more than 10% on a contamination-free SWE-Bench, that’s the reality check for me.”

Separately, TechCrunch announced its Disrupt 2025 agenda, featuring industry leaders from Netflix, ElevenLabs, Wayve, and Sequoia Capital. The event, marking TechCrunch Disrupt’s 20th anniversary, will take place in San Francisco from October 27-29, 2025. Attendees can save up to $675 on tickets before prices increase.