2 Comments
User's avatar
JP's avatar

Tokens as the economic unit of AI is the right framing. Once you internalise that, the whole landscape makes more sense.

What's interesting from a practitioner angle is that the billing model for tokens is also fragmenting. You've got per-token (OpenAI, Anthropic), per-GPU-hour (CoreWeave), and now flat subscription proxies that bundle an entire model library under one monthly fee. I've been running my coding agents through one of these for a few weeks and it genuinely changes how you think about inference spend. You stop rationing tool calls. I wrote up the economics and the setup here https://reading.sh/how-to-get-3x-claude-rate-limits-for-30-a-month-1d3fdb8658df

Curious whether you think flat-rate subscriptions can scale, or if they'll get squeezed once inference runtimes like Together and Fireworks start offering similar bundles to developers directly.

David Levy's avatar

in consumer applications its almost always come down to flat rate pricing (internet, cable, mobile, music, etc.)