Google Finds a Way to Slash AI Models Cost by 75%
Implicit caching is automatic and enabled by default

Google is rolling out a new feature called "implicit caching" in its Gemini API, aimed at reducing the cost of using its AI models. The feature, available for Gemini 2.5 Pro and 2.5 Flash, promises up to 75% savings on repetitive context shared across API requests—offering significant relief to developers burdened by high model usage costs.
Unlike previous explicit caching, which required developers to manually define reusable prompts, implicit caching is automatic and enabled by default. If a new request shares a common prefix with a past one, the system automatically applies savings, simplifying workflow and reducing unexpected billing.
"Implicit caching directly passes cache cost savings to developers without the need to create an explicit cache. Now, when you send a request to one of the Gemini 2.5 models, if the request shares a common prefix as one of previous requests, then it’s eligible for a cache hit. We will dynamically pass cost savings back to you, providing the same 75% token discount," Google said in a blog post.
The move follows developer backlash over costly and inconsistent behavior from Gemini 2.5’s explicit caching. In response, Google’s Gemini team issued an apology and committed to fixes.
While the feature looks promising, developers are advised to structure prompts carefully—placing static content first—to maximize the chance of cache hits and cost savings.
Google recommends that developers place consistent content at the beginning of prompts and move variable elements—such as user queries or changing context—to the end. This increases the likelihood of triggering a cache hit. According to the company, this best practice helps optimize the effectiveness of its new implicit caching feature.
If you want to guarantee cost savings, you can continue to use the explicit caching API we shipped last May.
— Logan Kilpatrick (@OfficialLoganK) May 8, 2025
Also, make sure to keep the initial content of the requests the same if you want them to hit the cache. More details on the launch here:https://t.co/5fubUe8CB6
To further improve cache eligibility, Google has also lowered the minimum request size requirements: 1,024 tokens for Gemini 2.5 Flash and 2,048 tokens for Gemini 2.5 Pro. Additional guidance is available in the Gemini API documentation.
"In cases where you want to guarantee cost savings, you can still use our explicit caching API, which supports our Gemini 2.5 and 2.0 models. If you are using Gemini 2.5 models right now, you will start to see cached_content_token_count
in the usage metadata which indicates how many tokens in the request were cached and therefore will be charged at the lower price," Google added.
Comments ()