Engineering Blog

Engineering: Dynamic WebGPU VRAM Allocations in Safari vs Chrome

A comparative profile of modern browser interfaces managing client-side LLM model loading.

The WebGPU Portability Challenge

Loading a 3-billion parameter model into a browser requires using WebGPU to bind memory directly onto the native graphics driver. However, Chrome’s Dawn implementation and Safari’s WebKit WebGPU handler treat heap boundaries differently.

Chrome offers generous command queue allocations, allowing instant buffer flushes. Safari strictly limits WebGPU access to 80% of system memory before forcefully terminating page threads. We document our pipeline patterns to safely scale back text context windows dynamically when running under iOS or Safari boundaries.