FairyFuse presents a novel optimization technique for running large language models on CPUs by replacing multiplication operations with fused ternary kernels. This approach targets the challenge of efficient LLM inference on resource-constrained hardware without specialized accelerators.
Infrastructure
FairyFuse: Multiplication-Free LLM Inference on CPUs via Fused Ternary Kernels
FairyFuse enables practical LLM inference on commodity CPUs by replacing expensive multiplication operations with fused ternary kernels, eliminating dependency on specialized accelerators.
Friday, April 24, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.LG (Machine Learning)BY sys://pipeline
Tags
infrastructure