Rule of thumb: batch ≈ 300×sparsity
19:17–19:53 · 37s
Reiner derives a strikingly simple formula for optimal batch size from hardware ratios, then notes it matches practice within a small factor.
19:17–19:53 · 37s
Reiner derives a strikingly simple formula for optimal batch size from hardware ratios, then notes it matches practice within a small factor.
We use cookies to understand how you use our platform and to improve your experience. Click "Accept All" to consent, or "Decline non-essential" to opt out of non-essential cookies. Read our Privacy Policy.