ResBM: a new transformer-based architecture for low-bandwidth pipeline-parallel training, achieving 128× activation compression [R]

Macrocosmos has released a paper on ResBM (Residual Bottleneck Models), a new transformer-based architecture designed for low-bandwidth pipeline-parallel training.

https://arxiv.org/abs/2604.11947

ResBM introduces a residual encoder-decoder bottleneck across pipeline boundaries, with the goal of reducing inter-stage communication while preserving an explicit low-rank identity path. The paper reports SOTA 128× activation compression without significant loss in convergence relative to uncompressed baselines.

In their experiments, the strongest compressed results use Muon, and the paper positions ResBM as a development in decentralized / internet-grade pipeline parallel training.

submitted by /u/network-kai
[link] [comments]

ResBM: a new transformer-based architecture for low-bandwidth pipeline-parallel training, achieving 128× activation compression [R]

Want to read more?

Tagged with