With significant funding flowing into AI startups, it is an excellent time to be an AI researcher with an idea to develop. If the concept is sufficiently novel, securing the necessary resources may be easier as an independent company rather than within one of the large, established labs.
This is the story of Inception, a startup creating diffusion-based AI models that recently secured fifty million dollars in seed funding. The round was led by Menlo Ventures, with additional angel funding provided by Andrew Ng and Andrej Karpathy.
The project is led by Stanford professor Stefano Ermon, whose research specializes in diffusion models. These models generate outputs through a process of iterative refinement instead of a sequential, word-by-word approach. They are the technology behind image-based AI systems like Stable Diffusion, Midjourney, and Sora. Having worked on these systems before the current AI boom, Ermon is now using Inception to apply the same diffusion models to a wider variety of tasks.
Alongside the funding announcement, the company released a new version of its Mercury model, which is designed for software development. Mercury has already been integrated into several development tools, including ProxyAI, Buildglare, and KiloCode. Ermon states that the diffusion approach will help Inception’s models excel in two critical areas: latency, which is response time, and compute cost.
According to Ermon, these diffusion-based large language models are significantly faster and more efficient than what others are building today. He describes it as a completely different approach where substantial innovation can still be introduced.
Understanding the technical distinction requires some background. Diffusion models are structurally different from auto-regressive models, which currently dominate text-based AI services. Auto-regressive models like GPT-5 and Gemini work sequentially, predicting each next word or token based on the material that came before. In contrast, diffusion models, which were originally trained for image generation, take a more holistic approach. They incrementally modify the overall structure of a response until it aligns with the desired outcome.
The conventional approach is to use auto-regressive models for text applications, a strategy that has been highly successful for recent generations of AI models. However, a growing body of research indicates that diffusion models may perform better when a model is processing large volumes of text or operating under data constraints. As Ermon explains, these qualities become a real advantage when performing operations across large codebases.
Diffusion models also offer greater flexibility in hardware utilization, a particularly important advantage as the infrastructure demands of AI become more apparent. While auto-regressive models must execute operations one after another, diffusion models can process many operations simultaneously. This allows for significantly lower latency in complex tasks.
Ermon reports that their models have been benchmarked at over one thousand tokens per second, a rate far exceeding what is possible with existing auto-regressive technologies. He attributes this speed to the parallel nature of their system, which is built to be exceptionally fast.

