Documentation
A glossary mapping TTFT, TPS, and load scaling calculations.
Decoding Stream Statistics
Duplex displays metrics in real-time. Understanding these helps you optimize model selections dynamically based on computational load:
- Time To First Token (TTFT): Shows milliseconds taken from clicking Send to the receipt of the initial character unit. Higher ratios represent network choke or pre-fill delays.
- Tokens Per Second (TPS): Shows characters generated per unit of time. It maps output speed. Higher scores reflect powerful local GPU cores or responsive API queues.
- Memory Footprint (VRAM): Maps the memory overhead of the selected local weight.