Lemon Slice nabs $10.5M from YC and Matrix to build out its digital avatar tech

Developers and companies are increasingly deploying AI agents and chatbots within their apps, but so far these interactions have been mostly restricted to text. Digital avatar generation company Lemon Slice is working to add a video layer to those chats with a new diffusion model that can create digital avatars from a single image.

Called Lemon Slice-2, the model can create a digital avatar that works on top of a knowledge base to play any role required of the AI agent. This includes addressing customer queries, helping with homework questions, or even working as a mental health support agent.

Co-founder Lina Colucci explained the inspiration, noting that in the early days of generative AI, it became obvious that video was going to be interactive. The compelling part of tools like ChatGPT was their interactivity, and the company wants video to have that same layer.

Lemon Slice says this is a 20-billion-parameter model that can work on a single GPU to live-stream videos at 20 frames per second. The company is making the model available through an API and an embeddable widget that companies can integrate into their sites with a single line of code. After an avatar is created, you can change the background, styling, and appearance of a character at any point.

Besides human-like avatars, the company is also focusing on being able to generate non-human characters to suit different needs. The startup is using ElevenLabs’ technology to generate the voices for these avatars.

Founded by Lina Colucci, Sidney Primas, and Andrew Weitz in 2024, Lemon Slice is betting that using its own general-purpose diffusion model for making avatars will set it apart. A diffusion model is a type of generative model that learns to work backwards from noisy training data to generate new data.

Colucci commented on the current state of avatar solutions, saying that existing options often add negative value to a product. She described them as creepy and stiff, looking good for only a few seconds before the interaction feels uncanny and unsettling. The thing that has prevented avatars from really taking off, she said, is that they simply haven’t been good enough.

To fund its efforts, the company recently raised $10.5 million in seed funding from Matrix Partners, Y Combinator, Dropbox CTO Arash Ferdowsi, Twitch CEO Emmett Shear, and The Chainsmokers.

The company says it has guardrails in place to prevent unauthorized face or voice cloning, and that it uses large language models for content moderation. Lemon Slice would not name the organizations using its technology, but said the model is being put to work for use cases like education, language learning, e-commerce, and corporate training.

The startup faces stiff competition from video generation startups like D-ID, HeyGen, and Synthesia, as well as other digital avatar makers.

Ilya Sukhar, a partner at Matrix, thinks that avatars will be useful in areas where videos are prominent. For instance, people often prefer learning from YouTube rather than reading long blocks of text. He noted that Lemon Slice’s technical prowess and its own model will give it an edge over other startups. He described the team as deeply technical with a track record of shipping machine learning products, not just demos and research. He also pointed out that Lemon Slice is taking a generalized scaling approach of data and compute that has worked in other AI modalities, unlike other players who are bespoke to particular scenarios.

Y-Combinator’s Jared Friedman believes that using a diffusion-style model allows Lemon Slice to generate any kind of avatar, compared to some other startups focused on either human-like or game character-like avatars. He stated that Lemon Slice is taking the fundamental machine learning approach that can eventually overcome the uncanny valley and break the avatar Turing test. Because it is a general-purpose model that works end-to-end, it has no ceiling on how good it can get, and it works for both human and non-human faces while requiring only an image to add a new face.

The startup currently has eight employees and plans to use the new funds to hire engineering and go-to-market staff, along with paying the compute bills to train its models.