Cohere launches an open source voice model specifically for transcription

Enterprise AI company Cohere launched its first voice model on Thursday. The model, called Transcribe, is an open source automatic speech recognition tool designed for tasks like note-taking and speech analysis.

Relatively lightweight at just 2 billion parameters, the model is intended for use with consumer-grade GPUs for those who prefer to self-host it. It currently supports 14 languages: English, French, German, Italian, Spanish, Portuguese, Greek, Dutch, Polish, Chinese, Japanese, Korean, Vietnamese, and Arabic.

Cohere states that Transcribe outperforms models such as Zoom Scribe v1, IBM Granite 4.0 1B, ElevenLabs Scribe v2, and Qwen3-ASR-1.7B Speech on the Hugging Face Open ASR leaderboard. It achieves an average word error rate of 5.42, which is lower than any other model on that benchmark.

The company claims Transcribe had an average win rate of 61% over other models when human evaluators assessed its transcriptions for accuracy, coherence, and usability. However, the model fell behind its rivals when transcribing Portuguese, German, and Spanish.

Cohere says Transcribe can process 525 minutes of audio in a single minute, a rate considered high for its class of model.

The company plans to integrate Transcribe into its enterprise agent orchestration platform, North, and is making the model available through its API for free. The model will also be available on Model Vault, Cohere’s managed inference platform.

Speech recognition models are growing increasingly popular as demand grows for note-taking and dictation apps.

Earlier this year, Cohere reportedly told investors that it was generating annual recurring revenue of $240 million in 2025. Its CEO, Aidan Gomez, was cited as saying that the startup may go public soon.