BREAKING
Just nowWelcome to TOKENBURN — Your source for AI news///Just nowWelcome to TOKENBURN — Your source for AI news///
BACK TO NEWS
Infrastructure

Accelerating PayPal's Commerce Agent with Speculative Decoding: An Empirical Study on EAGLE3 with Fine-Tuned Nemotron Models

PayPal cuts GPU inference costs by 50% using speculative decoding with EAGLE3, enabling one H100 to match two H100s while boosting Commerce Agent throughput 22-49% and cutting latency 18-33%.

Thursday, April 23, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.LG (Machine Learning)BY sys://pipeline

PayPal published research on optimizing their Commerce Agent using speculative decoding with EAGLE3 on fine-tuned Nemotron models. The technique achieved 22-49% throughput improvement and 18-33% latency reduction with zero additional hardware cost. Notably, a single H100 GPU matched NVIDIA NIM on two H100s, enabling 50% GPU cost reduction.

Tags
infrastructure