OpenAI launches two ‘open’ AI reasoning models

OpenAI announced the launch of two open-weight AI reasoning models on Tuesday, offering capabilities similar to its O-series. Both models are freely available for download on Hugging Face, described by OpenAI as “state-of-the-art” based on multiple benchmarks for open models.

The models come in two sizes: a larger GPT-OSS-120B, which can run on a single Nvidia GPU, and a lighter GPT-OSS-20B, designed for consumer laptops with 16GB of memory. This marks OpenAI’s first open language model release since GPT-2, which debuted over five years ago.

OpenAI revealed that its open models can send complex queries to AI models in the cloud. If a task exceeds the model’s capabilities—such as image processing—developers can connect it to OpenAI’s more advanced closed models. While OpenAI initially embraced open-source AI, it later shifted to a proprietary approach, building a lucrative business by selling API access to enterprises and developers.

CEO Sam Altman acknowledged in January that OpenAI may have been “on the wrong side of history” regarding open-sourcing its technology. The company now faces competition from Chinese AI labs like DeepSeek, Alibaba’s Qwen, and Moonshot AI, which have produced some of the world’s most capable open models. Meta, once a leader in open AI, has seen its Llama models fall behind in recent months.

The Trump Administration has also encouraged U.S. AI developers to open-source more technology to promote global adoption of AI aligned with American values. With GPT-OSS, OpenAI aims to win over developers and policymakers concerned about China’s growing influence in the open-source AI space.

Altman reiterated OpenAI’s mission to ensure AGI benefits all of humanity, emphasizing the importance of an open AI stack based on democratic values.

**Performance Benchmarks**
OpenAI claims its open models lead among open-weight AI models. On Codeforces, a competitive coding test, GPT-OSS-120B and GPT-OSS-20B scored 2622 and 2516, respectively—outperforming DeepSeek’s R1 but trailing OpenAI’s O3 and O4-mini.

On Humanity’s Last Exam, a challenging multi-subject test, GPT-OSS-120B and GPT-OSS-20B scored 19% and 17.3%, surpassing DeepSeek and Qwen but falling short of O3.

However, OpenAI’s open models hallucinate significantly more than its latest reasoning models. On PersonQA, GPT-OSS-120B and GPT-OSS-20B hallucinated in response to 49% and 53% of questions—far higher than O1 (16%) and O4-mini (36%). OpenAI attributes this to smaller models having less world knowledge.

**Training Process**
The open models were trained similarly to OpenAI’s proprietary models, using mixture-of-experts (MoE) to improve efficiency. GPT-OSS-120B, with 117 billion parameters, activates only 5.1 billion per token.

High-computer reinforcement learning (RL) was used for post-training, refining the models’ reasoning through simulated environments. This process enables the models to power AI agents, call tools like web search, and execute Python code—though they remain text-only, unable to process images or audio.

**Licensing and Safety**
OpenAI released the models under the Apache 2.0 license, allowing commercial use without restrictions. However, it will not disclose the training data, likely due to ongoing lawsuits over copyrighted material.

The launch was delayed multiple times to address safety concerns. OpenAI tested whether bad actors could fine-tune the models for cyberattacks or bioweapon development. While minor risks were identified, the models did not meet OpenAI’s threshold for high capability threats.

Despite OpenAI’s strong performance among open models, developers await DeepSeek’s R2 and Meta’s next open model from its superintelligence lab.