CONConceptsResearch

Mixture-of-Experts

7 mentions across all digests

Mixture-of-Experts (MoE) is a neural network architecture that routes inputs to specialized subnetworks, used in models like Gemma 4 (26B-A4B MoE) and studied for how expert load balancing evolves through three distinct training phases.

/// Stats

First Seen2026-03-27

Last Seen2026-04-26

Total Mentions7

Subject Mentions1

Last 7 Days1

Sources4

Peak Relevance4/5

Active Predictions0

/// Recent Stories

2026-04-26HIGH

Which one is more important: more parameters or more computation? (2021)

ParlAI decouples model parameters from computation—hash-based MoE routing scales capacity without added compute, while staircase attention increases compute without new parameters, with orthogonal gains when combined.

2026-04-23HIGH

Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts

Expert Upcycling technique optimizes Mixture-of-Experts model efficiency, enabling cheaper inference through smarter expert reuse and routing.

2026-04-07HIGH

Three Phases of Expert Routing: How Load Balance Evolves During Mixture-of-Experts Training

Research reveals how Mixture-of-Experts models optimize expert routing and load balancing across three predictable training phases, demystifying scaling dynamics in modern LLMs.

2026-04-03HIGH

Gemma 4: Byte for byte, the most capable open models

Google DeepMind released Gemma 4, a family of four Apache 2.0-licensed multimodal models (up to 31B parameters) with optimized parameter efficiency through Per-Layer Embeddings, supporting images, video, and audio.

2026-03-27HIGH

A Dream of Spring for Open-Weight LLMs: 10 Architectures from Jan-Feb 2026

Raschka surveys 10 open-weight LLM architectures from Jan-Feb 2026 (Arcee, Moonshot, Qwen, Cohere) spanning 3B to 1T parameters, revealing divergent design choices in MoE configs and efficiency strategies.

/// Connected Entities

USRSebastian Raschka

3 shared