Documentation

Interpreting Core Performance Metrics

A glossary mapping TTFT, TPS, and load scaling calculations.

Decoding Stream Statistics

Duplex displays metrics in real-time. Understanding these helps you optimize model selections dynamically based on computational load:

  • Time To First Token (TTFT): Shows milliseconds taken from clicking Send to the receipt of the initial character unit. Higher ratios represent network choke or pre-fill delays.
  • Tokens Per Second (TPS): Shows characters generated per unit of time. It maps output speed. Higher scores reflect powerful local GPU cores or responsive API queues.
  • Memory Footprint (VRAM): Maps the memory overhead of the selected local weight.