AI voice model, sparks online debate

Researchers at Sesame AI have announced an innovative CSM or conversational speech model, surpassing in terms of human mimicry tech like those from Google (Duplex) and OpenAI (Omni). The demonstration had two different AI voices, dubbed ‘Miles’ for male and ‘Maya’ for female, which has been voted by some end-users for lifelike quality.

However, accessing the technology by the people may not be as easy as attempts to engage this resulted in a message saying that Sesame is working to scale up its capacity to entice more people and transform themselves as customers out of mere spectators. For now, it is possible to access a delightful 30-minute demo via the YouTube channel Creator Magic.

Sesame’s newest sophisticated technology is multimodal, which means it combines text and audio in a unified structure another, more natural speech synthesis experience. Similar to what OpenAI has done for its voice models, one can see the similarities between the two entities. 

Almost human quality is achieved by the system but still has trouble holding conversational context, pacing, and flow which Sesame admits challenges. Co-founder Brendan Iribe puts it frankly, “It is very much in the valley,” although he also hopes that future progress would shrink that divide.

However, it is a revolutionary seed, and much controversy about its societal effects has erupted. Reactions have been everything from admiration and enthusiasm to apprehension and even fear. 

These ‘flawed’ factors in the speech-such as breath sounds, chuckles, or a few corrective actions naturalize conversations through the CSM. The aforementioned features strike the realistic touch and may help the technology to cross through the uncanny valley in the following versions.

The expressiveness of the software has earned admiration from users, who often feel as though conversing with a real person as they listen to it. Some even claimed to have formed emotional attachments

Related Posts
×