Infrastructure

FairyFuse: Multiplication-Free LLM Inference on CPUs via Fused Ternary Kernels

FairyFuse enables practical LLM inference on commodity CPUs by replacing expensive multiplication operations with fused ternary kernels, eliminating dependency on specialized accelerators.

Friday, April 24, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.LG (Machine Learning)BY sys://pipeline

FairyFuse presents a novel optimization technique for running large language models on CPUs by replacing multiplication operations with fused ternary kernels. This approach targets the challenge of efficient LLM inference on resource-constrained hardware without specialized accelerators.

Read original at arXiv CS.LG (Machine Learning)