Concepts

What is Time-To-First-Token (TTFT)?

Decoding the primary metric for perceivable latency in modern LLM chats.

The Silence Before the Stream

When you hit "Enter", the AI takes a moment to process the entire history of your conversation before it spits out the first word. That delay is TTFT. A high TTFT feels laggy and unresponsive.

By tracking TTFT in real-time, you can empirically determine which models (Local GPU vs Anthropic Cloud) are providing the crispest user experience based on your specific prompts.