10 April, 10:22
2238

Amazon Introduces New Voice Model Nova Sonic

Amazon has taken a significant leap in AI by unveiling Nova Sonic, a generative voice model that can process speech locally and generate natural-sounding conversations. The new model aims to compete with OpenAI and Google.

Compared to earlier digital assistants like Alexa, Nova Sonic offers a more flexible and human-like speaking experience. With advancements in technology, legacy assistants such as Alexa and Siri now sound mechanical, while Nova Sonic is set to change that perception.

The Most Cost-Effective Voice Model

Nova Sonic is available through Amazon's Bedrock development platform, featuring a new bidirectional streaming API accessible to developers. According to Amazon, it is the most affordable voice AI model on the market, costing approximately 80% less than OpenAI's GPT-4o.

Rohit Prasad, head of Amazon’s AGI division, noted that components of Nova Sonic are already being used in the new Alexa Plus assistant. He emphasized the model’s superior orchestration capabilities, allowing it to accurately route requests to appropriate APIs or apps.

Smarter and Faster

According to Amazon’s data, Nova Sonic significantly outperforms competitors in voice recognition accuracy. Even when users mumble, mispronounce words, or speak in noisy environments, the model maintains high comprehension accuracy. In tests across English, French, German, Italian, and Spanish, the average word error rate was just 4.2%, and in noisy settings, it outperformed GPT-4o by 46.7%.

The average response time is 1.09 seconds, making it faster than OpenAI’s real-time API.

Excited about the launch of Amazon Nova Sonic, our new speech-to-speech model that helps make AI voice applications feel remarkably natural.

It's designed to understand not just what people say, but how they say it – working with tone, style, and conversation flow including… pic.twitter.com/QRvP4LWYQN
— Andy Jassy (@ajassy) April 8, 2025

The Future of Multimodal AI

Amazon doesn’t just see Nova Sonic as a voice model—it’s part of a broader AGI vision. The model is seen as one of the first capable of performing tasks similar to what a human can do at a computer. Prasad also shared that future multimodal models are on the way, capable of understanding visuals, video, and other sensory inputs.

Nova Act, a recently introduced model capable of web browsing, is also part of this strategy. We are already seeing the impact of these technologies in features like Alexa Plus and “Buy for Me.”

Popular Tags

telsat

Rohit Prasad

Amazon

Yeni sesli suni intellekt

insanabənzər bir danışıq təcrübəsi

Amazon Introduces New Voice Model Nova Sonic

The Most Cost-Effective Voice Model

Smarter and Faster

The Future of Multimodal AI

Popular Tags

Share

Comments

New OnePlus 15 Design Revealed

Meta’s First Ray-Ban Glasses with HUD Leaked!

Xiaomi 17 Series Launching This Month!

Doogee Note56 Series Surfaces Before Launch