NorBERTo is a ModernBERT-based language model trained on a 331 billion token Portuguese corpus. The work presents architecture, training methodology, and evaluation on Portuguese NLP benchmarks.
Models
NorBERTo: A ModernBERT Model Trained for Portuguese with 331 Billion Tokens Corpus
Portuguese language modeling scales with NorBERTo — a 331B-token ModernBERT variant showing language-specific foundation models matching English-centric predecessors.
Monday, May 4, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.CL (Computation & Language)BY sys://pipeline
Tags
models