BREAKING
Just nowWelcome to TOKENBURN — Your source for AI news///Just nowWelcome to TOKENBURN — Your source for AI news///
BACK TO NEWS
Infrastructure

How OpenAI delivers low-latency voice AI at scale

OpenAI describes how it rearchitected its WebRTC stack to deliver low-latency voice interactions at scale for ChatGPT voice and the Realtime API. The team addressed three infrastructure constraints: one-port-per-sessi...

Monday, May 4, 2026 12:00 PM UTC2 MIN READSOURCE: Hacker NewsBY sys://pipeline

OpenAI describes how it rearchitected its WebRTC stack to deliver low-latency voice interactions at scale for ChatGPT voice and the Realtime API. The team addressed three infrastructure constraints: one-port-per-session media termination, stateful ICE/DTLS session ownership, and global routing latency. They implemented a split relay plus transceiver architecture that preserves standard WebRTC client behavior while optimizing internal packet routing.

Tags
infrastructure