Understanding the pricing structures of the major labs from inputs versus outputs to batch queuing.
Frontier AI labs do not charge symmetric margins. Supplying 10,000 words (the input context) is frequently an order of magnitude cheaper than asking the AI to write 1,000 words (the output context).
Anthropic's Claude 3.5 Sonnet operates heavily under this asymmetry, allowing users to upload gigantic codebases natively for mere pennies, while keeping output reasoning highly robust.
By utilizing Duplex's model selector, you can strategically decide when to ping an expensive output model versus when to use a free local instance.