Documentation

Managing Local Telemetry and Metric Engines

How to toggle, configure, and extract JSON traces representing the performance parameters of local versus cloud models.

Telemetry Engine Overview

Unlike standard chat wrappers, Duplex implements a granular hardware execution monitoring toolset inside the interface. You can access this via the "Tuning & Engineering" panel on the far right of the interface footer.

  • Time To First Token (TTFT): Calculates network handshake latency down to the millisecond block.
  • Tokens Per Second (TPS): Uses moving averages over window chunk sizes to accurately portray stream velocity.
  • Total Duration: Generates delta outputs between instantiation and connection closure.

Exporting Traces

Once an inference multiplex finishes, users can click the "JSON Trace" button to emit raw telemetry mapping logs. These files are pure JSON vectors that can be directly piped into D3 dashboards, Grafana integrations, or raw Pandas dataframes for empirical A/B testing algorithms.