gpu memory calculator
Will it fit? Pick a model, turn the knobs, and watch where the gigabytes go — before your GPU finds out the hard way.
8.0B params · 32 layers · 4096 hidden · 8 KV heads
weights
kv cache
gpus
estimated memory
16.4GB
fits on a single RTX 4090 (24 GB)
where the gigabytes go
- weights8.0B × 2 bytes15.0 GB
- KV cache1 × 4,096 tokens, 8 KV heads (fp16)0.50 GB
- workspaceprefill buffers + logits (est.)0.17 GB
- CUDA contextdriver + framework, per GPU0.75 GB
will it fit?
- RTX 306012 GBdoes not fit
- RTX 409024 GBfits
- RTX 509032 GBfits
- A600048 GBfits
- A10080 GBfits
- H10080 GBfits
- H200141 GBfits
- B200192 GBfits
- M4 Max128 GB unifiedfits
- M3 Ultra512 GB unifiedfits
estimates for dense transformers with flash attention — MoE models are a different animal and not modeled. activations are ±20%; real frameworks fragment and buffer, so keep ~5% free. formulas live in lib/vram.ts of this site’s repo. spot an error? tell me in the roost.