Applying sub-millisecond particle mechanics inside canvas contexts to map Transformer weights in real-time.
To peek into the "brain" of a LLM, you can map the attention values of different token registers as they are processed. Doing this on the DOM results in thousands of node creations, crushing the viewport.
By delegating to an offscreen HTML5 canvas element, we draw dynamic visual vectors connecting corresponding tokens. Each token represents a structural node, and the thickness of the connecting filaments indicates the weights of the query-key dot products. This allows drawing 12 blocks of 32 attention heads simultaneously on normal devices with 60FPS fluid physics.