LLMs Are Not a Higher Level of Abstraction2026-05-09
I scaled a pure Spiking Neural Network (SNN) to 1.088B param2026-05-09
Introspective Diffusion Language Models2026-05-09
Is Attention sink without Positional Encoding unavoidable? [2026-05-09
Can Geometric Deep Learning lead eliminate the need of "Brut2026-05-09
Scientific Theory of Deep Learning2026-05-09
Decoupled DiLoCo: Resilient Distributed Pre-training2026-05-09
Hyperloop Transformers: Efficient Language Modeling2026-05-09
Generalization at the Edge of Stability2026-05-09