Clarifai’s new reasoning engine makes AI models faster and less expensive

On Thursday, the AI platform Clarifai announced a new reasoning engine that it claims will make running AI models twice as fast and forty percent less expensive. Designed to be adaptable to a variety of models and cloud hosts, the system employs a range of optimizations to get more inference power out of the same hardware.

The company’s CEO, Matthew Zeiler, explained that the engine uses a variety of different types of optimizations, all the way down to CUDA kernels and advanced speculative decoding techniques. The goal is to allow users to get more performance out of the same hardware cards.

These performance claims were verified by a series of benchmark tests conducted by the third-party firm Artificial Analysis. The tests recorded industry-best results for both throughput and latency.

The reasoning engine focuses specifically on inference, which is the computing demand of operating an AI model that has already been trained. This computing load has grown particularly intense with the rise of agentic and reasoning models, as these require multiple steps in response to a single command.

Clarifai was first launched as a computer vision service but has grown increasingly focused on compute orchestration as the AI boom has drastically increased demand for GPUs and data centers. The company first announced its compute platform at AWS re:Invent in December. The new reasoning engine is the first product specifically tailored for multi-step agentic models.

This product arrives during a period of intense pressure on AI infrastructure, which has spurred a series of billion-dollar deals. OpenAI, for example, has laid out plans for as much as one trillion dollars in new data center spending, projecting nearly limitless future demand for compute.

While the hardware buildout has been significant, Clarifai’s CEO believes there is more to be done in optimizing existing infrastructure. Zeiler states that software tricks, like those in the Clarifai reasoning engine, can take a good model further. He also points to algorithm improvements as a way to combat the need for gigawatt data centers, noting that he does not believe we are at the end of algorithm innovations.