- 333
Amazon Introduces New Voice Model Nova Sonic
Amazon has taken a significant leap in AI by unveiling Nova Sonic, a generative voice model that can process speech locally and generate natural-sounding conversations. The new model aims to compete with OpenAI and Google.
Compared to earlier digital assistants like Alexa, Nova Sonic offers a more flexible and human-like speaking experience. With advancements in technology, legacy assistants such as Alexa and Siri now sound mechanical, while Nova Sonic is set to change that perception.
The Most Cost-Effective Voice Model
Nova Sonic is available through Amazon's Bedrock development platform, featuring a new bidirectional streaming API accessible to developers. According to Amazon, it is the most affordable voice AI model on the market, costing approximately 80% less than OpenAI's GPT-4o.
Rohit Prasad, head of Amazon’s AGI division, noted that components of Nova Sonic are already being used in the new Alexa Plus assistant. He emphasized the model’s superior orchestration capabilities, allowing it to accurately route requests to appropriate APIs or apps.
Smarter and Faster
According to Amazon’s data, Nova Sonic significantly outperforms competitors in voice recognition accuracy. Even when users mumble, mispronounce words, or speak in noisy environments, the model maintains high comprehension accuracy. In tests across English, French, German, Italian, and Spanish, the average word error rate was just 4.2%, and in noisy settings, it outperformed GPT-4o by 46.7%.
The average response time is 1.09 seconds, making it faster than OpenAI’s real-time API.
Excited about the launch of Amazon Nova Sonic, our new speech-to-speech model that helps make AI voice applications feel remarkably natural.
— Andy Jassy (@ajassy) April 8, 2025
It's designed to understand not just what people say, but how they say it – working with tone, style, and conversation flow including… pic.twitter.com/QRvP4LWYQN
The Future of Multimodal AI
Amazon doesn’t just see Nova Sonic as a voice model—it’s part of a broader AGI vision. The model is seen as one of the first capable of performing tasks similar to what a human can do at a computer. Prasad also shared that future multimodal models are on the way, capable of understanding visuals, video, and other sensory inputs.
Nova Act, a recently introduced model capable of web browsing, is also part of this strategy. We are already seeing the impact of these technologies in features like Alexa Plus and “Buy for Me.”