Decoding the primary metric for perceivable latency in modern LLM chats.
When you hit "Enter", the AI takes a moment to process the entire history of your conversation before it spits out the first word. That delay is TTFT. A high TTFT feels laggy and unresponsive.
By tracking TTFT in real-time, you can empirically determine which models (Local GPU vs Anthropic Cloud) are providing the crispest user experience based on your specific prompts.