GitHub Copilot has recently announced that it will be moving to usage based billing.
I see this as an inflection point, where one of the big players in the AI market is confident that the level of usefulness of their product warrants customers paying for access to it in a similar way to how they would expect to pay for Internet access or mobile phone contracts, or water use.
At risk of showing my ignorance, the section of the announcement that caught my attention was “Usage will be calculated based on token consumption, including input, output, and cached tokens”.
Now I’m curious about how caching works, whether it is configurable, what the cost vs value proposition is. Seeing that the pricing of cached tokens is lower, will that incentivise consumers to leverage it more?
This leads me to think about some opportunities to fill gaps in the emerging AI cost management market:
- Smart proxying of prompts, to ensure efficient use of cached tokens
- Smart metering of token utilisation to avoid bill shock and allow for early plugging of leakages
- Detection of duplication of queries within an organisation, intercepting to avoid duplicate processing and billing
- Off the top of my head this seems like a new definition of insanity: “Asking the same question, and paying for it more than once”
- Scheduling of non-urgent processes to benefit from off-peak weighted token utilisation
