BOSCH is a black-box binary optimization technique for selecting a reduced set of attention heads in LLMs to improve inference efficiency. The method identifies which attention heads are essential, enabling compression for short-context scenarios.
Models
BOSCH: Black-Box Binary Optimization for Short-Context Attention-Head Selection in LLMs
BOSCH uses black-box optimization to automatically identify and prune redundant attention heads in LLMs, enabling faster inference for short-context scenarios without retraining.
Wednesday, April 8, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.CL (Computation & Language)BY sys://pipeline
Tags
models
/// RELATED