Research

Cactus: Accelerating Auto-Regressive Decoding with Constrained Acceptance Speculative Sampling

Constrained acceptance speculative sampling (Cactus) reduces LLM inference latency by optimizing token acceptance rates during auto-regressive decoding without additional model training.

Wednesday, April 8, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.LG (Machine Learning)BY sys://pipeline

Cactus proposes constrained acceptance speculative sampling to accelerate auto-regressive LLM decoding. Research contribution to LLM inference optimization.

Read original at arXiv CS.LG (Machine Learning)

First vacuums — then the world

Dreame pivots from robot vacuums to a full-stack AI hardware conglomerate—hypercars, humanoids, satellites—with a $10M US debut and founder positioning as China's Elon Musk.

ProductsApr 22

OpenAI launches Privacy Filter, an open source, on-device data sanitization model that removes personal information from enterprise datasets

OpenAI open-sources Privacy Filter, an on-device model that strips personal information from enterprise datasets without external API calls.