Deep dives into how models actually work — from attention mechanisms to mixture of experts.

First post coming soon.