Reframing expectations around edge-silicon inference speeds vs latency.
A local Llama-3 model running on a Macbook might only output 30 Tokens Per Second. However, its TTFT (latency before the first word) can be under 200 milliseconds, compared to the 800+ milliseconds of a remote API. Local feels instantly responsive.