DeepgramBacked by Deepgram Startup ProgramLearn more
OpenAI · Speech-to-Speech

OpenAI Realtime on ThinnestAI

OpenAI Realtime runs GPT-4o (and now gpt-realtime) end-to-end on audio: input speech, model reasoning and output speech in one streaming connection. ThinnestAI ships it as a managed model and as a BYOK option — same flow editor, same Indian telephony, same INR billing, with a transparent cost picture before you commit.

~800ms latency50+ languages · 8 voices~₹22/min all-in

Why pick OpenAI Realtime on ThinnestAI

The honest list — what this model is genuinely good at, in production, on Indian workloads.

GPT-4o-class reasoning over audio

The strongest general-purpose voice model in production. Handles complex multi-turn dialogue, tool use, and reasoning that cascaded stacks struggle with.

Expressive output voices

Eight voices with natural prosody, emotion control, and the ability to follow tone instructions ("speak slowly", "sound concerned", "be enthusiastic") mid-call.

Strong tool-use over voice

Function calling works reliably during audio turns — important for booking, lookups, escalations and structured output flows mid-conversation.

BYOK on ThinnestAI = best of both

Pay OpenAI directly for the model (your own API key, your own quotas, your own contract), pay ThinnestAI ₹1.5/min for the platform — Indian phone numbers, flow editor, dashboards, INR billing.

Multi-language support

Handles 50+ languages including Hindi, Tamil, Telugu, Bengali — quality is best on English, then European, with usable but not state-of-the-art Indic output (a Sarvam half-cascade often improves Indic TTS).

Mature streaming primitives

Interruption handling, partial response cancellation, audio-input commits — all production-grade. The realtime SDK is well-documented.

Three hosting paths

  • 1Platform-managed — ThinnestAI fronts the OpenAI Realtime API and bills you in INR with GST.
  • 2BYO OpenAI API key — bring your own OpenAI key, pay OpenAI directly for model usage. Often the right call for spend management and contract optionality.
  • 3Azure OpenAI BYOK — for enterprise compliance, use the Azure-hosted Realtime endpoint with your own Azure subscription. Same model, different procurement story.

Honest limitations

Where OpenAI Realtime isn't the right answer — and what we recommend instead.

  • Most expensive S2S option in production — ~$0.27/min puts all-in cost ~₹22/min in India, vs ~₹4/min for Gemini Live.
  • English is its strongest modality by a wide margin; Hindi/Hinglish quality, while usable, is meaningfully behind native-Indic options like Sarvam.
  • Voice timbre is fixed to OpenAI's 8 voices — no voice cloning (use the half-cascade option for that).
  • Tokens-per-minute caps on standard tiers can throttle high-concurrency production — enterprise tier or Azure OpenAI is the workaround.

OpenAI Realtime vs Indian competitors

Sarvam and Gnani are the strongest Indian voice AI stacks. Both ship cascaded pipelines, not native S2S. Here's how OpenAI Realtime on ThinnestAI compares.

FeatureThinnestAI · OpenAI RealtimeSarvam AIGnani.ai
Architecture
Native speech-to-speech (GPT-4o family)
Cascaded — Saaras STT + Sarvam-M LLM + Bulbul TTS
Cascaded — proprietary STT/LLM/TTS stack
Reasoning quality
GPT-4o-class — industry-leading
Sarvam-M — strong Indic, English competitive
Proprietary LLM — solid for CCaaS workflows
End-to-end latency
~800ms (single round-trip)
~900-1200ms (three hops)
~800-1100ms (three hops)
Indian-language quality
Usable Hindi/Tamil; English is the strength
Best-in-market for Indic languages
Production-grade across 12+ Indian languages
Cost (all-in INR/min)
~₹22/min
~₹3-4/min
Custom (enterprise quoted)
Hosting flexibility
Platform / BYOK OpenAI / BYOK Azure OpenAI
Sarvam API only
Gnani-hosted only
Tool use over voice
Best-in-class function calling
Supported
Supported
Indian phone number out of the box
Yes — Vobiz, Twilio, Plivo via ThinnestAI
BYO carrier
Yes — Gnani-managed

Where OpenAI Realtime fits

Premium English Customer Support

Global SaaS, US/UK/ANZ B2B support where caller experience and reasoning quality are non-negotiable and English is the only language.

Complex Sales / Solution Consulting

Outbound or inbound calls where the agent needs to reason about product configurations, pricing scenarios, or multi-system lookups mid-call.

High-stakes Healthcare

Clinical triage, post-discharge follow-up, and patient education flows where reasoning accuracy is more valuable than per-minute cost.

Enterprise IT Helpdesk

Multi-turn troubleshooting with structured tool calls (ticket creation, system lookup, knowledge base search). Reasoning depth matters.

Pick OpenAI Realtime when
  • Your callers are English-speaking and you want the strongest voice agent reasoning available.
  • Your flow has heavy tool use, mid-call lookups, and structured outputs where GPT-4o's reasoning matters.
  • You're fine with ~$0.27/min model cost for premium quality.
  • You're an existing OpenAI customer and want to use the same API key, contract and observability you already have.
  • You need Azure OpenAI for compliance, procurement, or data-residency reasons.
Pick something else when
  • You're cost-sensitive at production volume — Gemini Live (~₹4/min all-in) or a Sarvam cascade (~₹3-4/min) deliver comparable Indian-language outcomes at a fraction of the cost.
  • Your callers speak Hindi, Marathi, Tamil, Telugu or Bengali — Sarvam's Indic-pretrained cascade still beats GPT-4o's Indic output for naturalness.
  • You need specific cloned voices — use the half-cascade option or stick with Sarvam Bulbul + your own cloned voice.

Ship OpenAI Realtime on ThinnestAI

Free welcome credits, no card required. Pick OpenAI Realtime from the model dropdown, dial out from an Indian number in minutes.

FAQ

Frequently asked questions

How much does OpenAI Realtime cost on ThinnestAI?

+

OpenAI charges roughly $0.27/min for the Realtime API (blended audio input + output for GPT-4o-realtime-preview). ThinnestAI adds a ₹1.5/min platform fee. All-in lands around ₹22/min in India. For high-volume English production work this is the premium tier; for Indian-language production work Gemini Live or a Sarvam cascade are typically the better cost-per-outcome.

Should I use BYOK or platform-managed OpenAI Realtime?

+

BYOK is usually the right call once you're past prototyping. You get direct OpenAI billing (better spend visibility), your own enterprise quotas, your own contract terms, and the option to use Azure OpenAI for compliance. Platform-managed is simpler for early development.

Can OpenAI Realtime do Hindi?

+

Yes, but it's not its strongest modality. Output Hindi is intelligible but doesn't match Sarvam Bulbul or native-Hindi callers' expectations. If Hindi quality is the goal, run Sarvam in cascaded mode or use the half-cascade option (OpenAI Realtime for reasoning, Sarvam Bulbul for output).

How does it compare to Gemini Live on ThinnestAI?

+

Gemini Live wins on cost (~₹4/min vs ~₹22/min) and Hindi/Hinglish naturalness. OpenAI Realtime wins on raw reasoning, tool-use reliability and English quality. Both are speech-to-speech; pick by language, budget and reasoning needs. ThinnestAI lets you swap models per-agent without re-architecting.

Does OpenAI Realtime work over Indian phone numbers?

+

Yes. ThinnestAI bridges OpenAI Realtime into LiveKit-hosted Indian SIP — your agent can take inbound calls on a Vobiz number or dial outbound campaigns through Twilio/Plivo, with DLT-compliant routing automatically applied.

Is gpt-realtime the same as GPT-4o Realtime?

+

Yes — OpenAI rebranded the Realtime model line as 'gpt-realtime' (and gpt-4o-realtime-preview before it). ThinnestAI tracks both API names; the latest stable version is the default unless you pin a snapshot in BYOK config.