Large language models trained on vast datasets hold the potential to speed genomics research, streamline clinical documentation, improve real-time diagnostics, support clinical decision-making, accelerate drug discovery, and generate synthetic data to advance experiments. However, their promise to transform biomedical research often encounters a bottleneck. Beyond the structured data healthcare relies on, these models struggle with edge cases like rare diseases and unusual conditions, where reliable, representative data is scarce.
A New York-based company called Mantis Biotech claims it is developing a solution to fill this data availability gap. The company’s platform integrates disparate sources of information to create synthetic datasets. These datasets are used to build “digital twins” of the human body, which are physics-based, predictive models of anatomy, physiology, and behavior.
These digital twins are pitched for use in data aggregation and analysis. They could be applied to studying and testing new medical procedures, training surgical robots, and simulating and predicting medical issues or patterns of behavior. For example, a sports team might predict the likelihood of a specific NFL player developing an Achilles injury based on recent performance, training load, diet, and activity history, as explained by the company’s founder.
To construct these twins, the platform first gathers data from sources like textbooks, motion capture cameras, biometric sensors, training logs, and medical imaging. It then uses a system based on large language models to route, validate, and synthesize the various data streams. This information is processed through a physics engine to create high-fidelity renders of the dataset, which can then train predictive models.
The physics engine layer is crucial. It enhances available information by grounding the generated synthetic data and realistically modeling the physics of anatomy. For instance, creating a dataset for hand-pose estimation for someone missing a finger would be difficult with public data, but the platform can generate it easily by modifying its physics model.
Since the platform fills gaps in data sources, there is potential for wide use across the biomedical industry, where information on procedures or patients can be difficult to access, unstructured, or siloed. This is particularly relevant for edge cases or rare diseases, where data is hard to obtain due to ethical and regulatory constraints around patient data.
The founder expressed a vision for these digital twins to be used freely for testing, likening it to a child playing with a doll, to open minds to the idea that virtual humans can be tested without exploiting real people’s private data.
Currently, Mantis has found success in professional sports, where there is a need to model high-performing athletes. One of its main clients is an NBA team. The platform creates digital representations of athletes, tracking metrics like how they jump over time and correlating that with factors like sleep or specific movements.
The startup recently raised $7.4 million in seed funding. The capital will be used for hiring, advertising, marketing, and go-to-market functions.
The next step for the company is to continue building out the technology, with eventual plans to release the platform to the general public with a focus on preventative healthcare. It is also working to serve pharmaceutical labs and researchers conducting FDA trials, aiming to deliver insights into how patients respond to treatments.

