Question 1

What is speech-to-speech and how is it different from a cascaded voice pipeline?

Accepted Answer

A speech-to-speech (S2S) model takes audio in and produces audio out in a single forward pass — no separate STT, LLM and TTS hops. Cascaded pipelines stitch three models together. S2S typically lands at 600–900ms end-to-end latency vs 900–1,500ms for cascaded, with better cross-language code-switching and more natural prosody, at the cost of less fine-grained per-layer control.

Question 2

Which speech-to-speech models does ThinnestAI support?

Accepted Answer

Two production-grade S2S providers: Google Gemini Live (Gemini 2.5 Flash native audio + Gemini 3.1 Flash Live preview) and OpenAI Realtime (gpt-4o-realtime and gpt-realtime). You can pick either from the model dropdown in the flow editor and swap between them without re-platforming.

Question 3

How much does speech-to-speech cost on ThinnestAI?

Accepted Answer

All-in pricing including the ThinnestAI platform fee (₹1.5/min), telephony and model usage: Gemini Live runs ~₹4/min, OpenAI Realtime runs ~₹22/min. Model usage is billed at the official provider rates ($3/M audio input, $12/M audio output for Gemini Live; $32/M audio input, $64/M audio output for OpenAI Realtime). No minimums, no commitments — pay only for what you use.

Question 4

Can I run speech-to-speech over real Indian phone numbers?

Accepted Answer

Yes. ThinnestAI ships Indian phone numbers via Vobiz, Twilio, Plivo and Exotel, and S2S models route through the same SIP/LiveKit telephony stack as our cascaded pipeline. Inbound, outbound and bidirectional flows all work the same way regardless of model type.

Question 5

Can I bring my own Gemini or OpenAI API key (BYOK)?

Accepted Answer

Yes — BYOK is supported for Gemini (Google AI Studio + Vertex AI), OpenAI (direct API + Azure OpenAI) and most other LLM providers. When you bring your own key, the platform fee still applies (₹1.5/min) but model usage is billed to your own provider account, which lets you use enterprise credits, prepaid commitments or compliance-restricted hosting like Azure OpenAI Mumbai.

Question 6

Is speech-to-speech good for Hindi and Hinglish?

Accepted Answer

Gemini Live handles Hindi and Hinglish code-switching natively and is now genuinely competitive with cascaded Indic-first stacks for everyday conversational quality. For low-support Indic languages (Bhojpuri, Awadhi, Maithili, Konkani) or maximum control over Indian-language quality, a cascaded path with Sarvam STT/LLM/TTS or Aero TTS still wins — and you can run both side by side on the same flow.

Question 7

Do I get a free trial without a credit card?

Accepted Answer

Yes. Sign up and you get welcome credits — no card required. That's enough to test both Gemini Live and OpenAI Realtime end-to-end, including outbound calls from an Indian number. After the trial, you can stay on PAYG (top-up wallet, billed per actual usage) or move to Enterprise for committed-use discounts and GST invoicing.

Question 8

How does S2S latency compare to a cascaded stack in production?

Accepted Answer

On ThinnestAI's production traffic: Gemini Live averages ~600–700ms first-audio-out, OpenAI Realtime ~800–900ms. A well-tuned cascaded stack (Deepgram nova-3 + Groq Llama + Aero TTS) lands around 900–1,100ms. S2S wins on latency by skipping the STT and TTS network hops, but cascaded wins when you need deterministic tool-call gating or per-layer model swaps.

Capability	ThinnestAI (S2S)	Sarvam (cascaded)	Gnani (cascaded)
Native speech-to-speech	Yes — Gemini Live + OpenAI Realtime	No — STT + LLM + TTS pipeline	No — proprietary cascaded stack
End-to-end latency	600–800ms	900–1200ms	800–1100ms
Hindi / Hinglish code-switching	Native via Gemini Live	Native via Sarvam-M	Native via Gnani LLM
Best-in-market Indic language quality	Available via cascaded option	Yes (Sarvam Bulbul + Saaras)	Yes (Gnani in-house)
Indian phone numbers	Yes — Vobiz, Twilio, Plivo, Exotel	BYO carrier	Yes — Gnani-managed
INR billing + GST invoice	Yes	Yes	Yes
BYO model API key	Yes — Gemini, OpenAI, Azure OpenAI	No — Sarvam-hosted only	No — Gnani-hosted only
No-code flow editor	Yes — drag-and-drop, branching, tools	No native flow editor	Yes — Gnani Studio
Pricing transparency	Public ₹/min, public model pricing	Public ₹/min	Enterprise quote

Native speech-to-speech
voice agents

Gemini Live

OpenAI Realtime

What changes with speech-to-speech

How S2S on ThinnestAI compares to Indian competitors

Try speech-to-speech today

Frequently asked questions

What is speech-to-speech and how is it different from a cascaded voice pipeline?

Which speech-to-speech models does ThinnestAI support?

How much does speech-to-speech cost on ThinnestAI?

Can I run speech-to-speech over real Indian phone numbers?

Can I bring my own Gemini or OpenAI API key (BYOK)?

Is speech-to-speech good for Hindi and Hinglish?

Do I get a free trial without a credit card?

How does S2S latency compare to a cascaded stack in production?

Platform

Docs

Native speech-to-speechvoice agents

Gemini Live

OpenAI Realtime

What changes with speech-to-speech

How S2S on ThinnestAI compares to Indian competitors

Try speech-to-speech today

Frequently asked questions

What is speech-to-speech and how is it different from a cascaded voice pipeline?

Which speech-to-speech models does ThinnestAI support?

How much does speech-to-speech cost on ThinnestAI?

Can I run speech-to-speech over real Indian phone numbers?

Can I bring my own Gemini or OpenAI API key (BYOK)?

Is speech-to-speech good for Hindi and Hinglish?

Do I get a free trial without a credit card?

How does S2S latency compare to a cascaded stack in production?

Native speech-to-speech
voice agents