Concepts

Why Local AI Doesn't Mean Poor Performance

Reframing expectations around edge-silicon inference speeds vs latency.

Network Latency vs Direct Memory

A local Llama-3 model running on a Macbook might only output 30 Tokens Per Second. However, its TTFT (latency before the first word) can be under 200 milliseconds, compared to the 800+ milliseconds of a remote API. Local feels instantly responsive.