Imagine being able to understand and communicate with people across the globe, regardless of their language. That’s the vision behind NVIDIA’s latest release: a massive open-source multilingual speech dataset and two new models for multilingual speech-to-text.
The Granary Dataset
The dataset, called Granary, is a game-changer. With 1 million hours of audio, it supports 25 European languages, including low-resource ones like Croatian, Estonian, and Maltese. This is a huge step forward for language understanding and accessibility.
The Canary and Parakeet Models
Alongside Granary, NVIDIA released two high-performance speech-to-text (STT) models: Canary-1b-v2 and Parakeet-tdt-0.6b-v3. These models are designed to work seamlessly with the Granary dataset, enabling fast and accurate multilingual speech recognition.
Canary-1b-v2
The Canary-1b-v2 model boasts 1 billion parameters and has achieved top accuracy on Hugging Face for multilingual speech recognition. It can translate between English and 24 languages, and is 10 times faster in inference.
Parakeet-tdt-0.6b-v3
The Parakeet-tdt-0.6b-v3 model has 600 million parameters and is designed for real-time and large-scale transcription. It has the highest throughput in its class, making it perfect for applications that require speed and accuracy.
What This Means for the Future
NVIDIA’s release has huge implications for the future of language understanding and communication. With Granary and these two models, we’re one step closer to a world where language barriers no longer exist.
Imagine being able to communicate with people from different cultures and backgrounds without any obstacles. Imagine the possibilities for education, business, and global understanding.
Get Started
If you’re interested in exploring these new models and dataset, you can find them on Hugging Face:
- Granary: https://huggingface.co/datasets/nvidia/Granary
- Canary-1b-v2: https://huggingface.co/nvidia/canary-1b-v2
- Parakeet-tdt-0.6b-v3: https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3
Read more about NVIDIA’s release on their blog: https://blogs.nvidia.com/blog/speech-ai-dataset-models/