Research

Top-K Retrieval with Fixed-Size Linear-Attention Completion: Backbone- and KV-Format-Preserving Attention for KV-Cache Read Reduction

Top-K retrieval technique reduces KV-cache memory access overhead in transformer inference while maintaining full compatibility with existing model architectures and formats.

Wednesday, April 8, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.LG (Machine Learning)BY sys://pipeline

This arXiv paper proposes a KV-cache optimization technique for transformer inference that uses top-K retrieval with fixed-size linear-attention completion. The method preserves model backbone and KV-format compatibility while reducing memory access overhead. The approach targets a key efficiency bottleneck in large language model deployment.

Read original at arXiv CS.LG (Machine Learning)

Microsoft researchers have revealed the 40 jobs most exposed to AI—and even teachers make the list

Microsoft's 2025 study ranks 40 occupations by AI exposure, identifying translators, historians, and writers as most vulnerable, with ~5 million customer service roles directly threatened as major employers freeze hiring in anticipation of AI displacement.

PolicyApr 28

BookStack Moves from GitHub to Codeberg

Self-hosted documentation platform BookStack migrated to privacy-focused Codeberg forge in July 2024 over GitHub's code-scraping-for-AI practices and Microsoft's shift toward an "AI-powered developer platform."