A new startup called Retell AI has launched from the latest Y Combinator batch with ambitions to transform voice-based AI. Retell AI offers a conversational speech API that allows developers to easily create natural-sounding voice agents using large language models.
Today, while there are state-of-the-art synthetic voice providers like ElevenLabs, building actual voice AI solutions that mirror human conversation is still very challenging. Traditional approaches often involve cobbling together speech-to-text, LLMs, and text-to-speech technologies, resulting in experiences plagued by unnatural pauses, awkward interruptions, and robotic intonations.
This disjointed approach can lead to frustrating user experiences, characterized by long latencies and misunderstandings. Things we take for granted in human conversations like quick response times, handling interruptions, and natural turn-taking don't come intrinsically to AI systems.
As Retell AI co-founder and CMO Evie Wang explained, "Developers spend hundreds of hours on the AI conversation experience but end up with poor experiences like 4-5s long latencies, inappropriate cutoffs, speaking over each other."
Retell AI's solution is an API that handles these conversation orchestration complexities on behalf of developers. Their specialized models build on top of core speech and language components to emulate the dynamics of human discussion. What sets it apart is its emphasis on creating a "magical" AI conversation experience. The startup has fine-tuned its system to achieve impressive response times averaging 800ms, closely mirroring the pace of human interactions.
Their platform boasts features such as voice stability control, backchanneling, live ASR transcripts, and the ability to add custom voices. Upcoming enhancements include ambient noise addition, conversationalization of text responses, and sentiment analysis, further bridging the gap between human and machine communication.
Developers can bring their own LLM and frontend while Retell AI handles all the conversational heavy-lifting behind the scenes. Integration involves plugging the LLM into Retell's pipeline and connecting via WebSocket to a website, mobile app, or telephony provider.
Retell AI also offers a no-code sandbox to let anyone prototype a voice agent through their dashboard. Users can design conversational flows, connect phone numbers, and try out sample voices without writing a single line of code.
Use cases span AI call centers, voice-enabled coaching apps, virtual companions, and much more. With the tedious conversation engineering work automated by their API, developers can focus entirely on building the unique capabilities of their voice application.
Beyond the technical innovations, Retell AI's mission is rooted in a vision of voice AI as the primary interface for interacting with digital services. As conversational AI moves mainstream, the startup's deceptively simple value proposition of "plug in your LLM, a voice agent is born" could prove to be brilliant.