MAIL.TELSAT.AZ - Create ads
  • 333

Amazon Introduces New Voice Model Nova Sonic

Amazon has taken a significant leap in AI by unveiling Nova Sonic, a generative voice model that can process speech locally and generate natural-sounding conversations. The new model aims to compete with OpenAI and Google.

Compared to earlier digital assistants like Alexa, Nova Sonic offers a more flexible and human-like speaking experience. With advancements in technology, legacy assistants such as Alexa and Siri now sound mechanical, while Nova Sonic is set to change that perception.

The Most Cost-Effective Voice Model

Nova Sonic is available through Amazon's Bedrock development platform, featuring a new bidirectional streaming API accessible to developers. According to Amazon, it is the most affordable voice AI model on the market, costing approximately 80% less than OpenAI's GPT-4o.

Rohit Prasad, head of Amazon’s AGI division, noted that components of Nova Sonic are already being used in the new Alexa Plus assistant. He emphasized the model’s superior orchestration capabilities, allowing it to accurately route requests to appropriate APIs or apps.

Smarter and Faster

According to Amazon’s data, Nova Sonic significantly outperforms competitors in voice recognition accuracy. Even when users mumble, mispronounce words, or speak in noisy environments, the model maintains high comprehension accuracy. In tests across English, French, German, Italian, and Spanish, the average word error rate was just 4.2%, and in noisy settings, it outperformed GPT-4o by 46.7%.

The average response time is 1.09 seconds, making it faster than OpenAI’s real-time API.

The Future of Multimodal AI

Amazon doesn’t just see Nova Sonic as a voice model—it’s part of a broader AGI vision. The model is seen as one of the first capable of performing tasks similar to what a human can do at a computer. Prasad also shared that future multimodal models are on the way, capable of understanding visuals, video, and other sensory inputs.

Nova Act, a recently introduced model capable of web browsing, is also part of this strategy. We are already seeing the impact of these technologies in features like Alexa Plus and “Buy for Me.”