Tokens as the economic unit of AI is the right framing. Once you internalise that, the whole landscape makes more sense.
What's interesting from a practitioner angle is that the billing model for tokens is also fragmenting. You've got per-token (OpenAI, Anthropic), per-GPU-hour (CoreWeave), and now flat subscription proxies that bundle an entire model library under one monthly fee. I've been running my coding agents through one of these for a few weeks and it genuinely changes how you think about inference spend. You stop rationing tool calls. I wrote up the economics and the setup here https://reading.sh/how-to-get-3x-claude-rate-limits-for-30-a-month-1d3fdb8658df
Curious whether you think flat-rate subscriptions can scale, or if they'll get squeezed once inference runtimes like Together and Fireworks start offering similar bundles to developers directly.
Tokens as the economic unit of AI is the right framing. Once you internalise that, the whole landscape makes more sense.
What's interesting from a practitioner angle is that the billing model for tokens is also fragmenting. You've got per-token (OpenAI, Anthropic), per-GPU-hour (CoreWeave), and now flat subscription proxies that bundle an entire model library under one monthly fee. I've been running my coding agents through one of these for a few weeks and it genuinely changes how you think about inference spend. You stop rationing tool calls. I wrote up the economics and the setup here https://reading.sh/how-to-get-3x-claude-rate-limits-for-30-a-month-1d3fdb8658df
Curious whether you think flat-rate subscriptions can scale, or if they'll get squeezed once inference runtimes like Together and Fireworks start offering similar bundles to developers directly.
in consumer applications its almost always come down to flat rate pricing (internet, cable, mobile, music, etc.)