DeepSeek V4 is not just another model launch. It is also one of the most aggressive pricing stories in AI right now.
As of May 1, 2026, DeepSeek’s official API pricing makes DeepSeek-V4-Flash look extremely cheap and DeepSeek-V4-Pro look far cheaper than many teams would expect for a frontier-adjacent model. But the reason matters: some of today’s headline numbers are being pulled down by a temporary promotional discount and by very favorable cache-hit pricing.
If you are evaluating DeepSeek V4 seriously, you should understand the pricing mechanics before you assume your long-term production bill will match the screenshot making the rounds online.
The official DeepSeek V4 prices right now
DeepSeek’s current API docs list the following pricing for the V4 family, measured per 1 million tokens:
| Model | Input tokens (cache hit) | Input tokens (cache miss) | Output tokens |
|---|---|---|---|
| DeepSeek-V4-Flash | $0.0028 | $0.14 | $0.28 |
| DeepSeek-V4-Pro | $0.003625 | $0.435 | $0.87 |
Those Pro numbers are not the normal list price. DeepSeek’s pricing page says V4-Pro is currently offered at a 75% discount until May 31, 2026 at 15:59 UTC. The same page also notes that the cache-hit input price for all models was reduced to one-tenth of launch price starting April 26, 2026 at 12:15 UTC.
That means two things are happening at once:
- Pro is temporarily discounted.
- Cached input is dramatically cheaper than uncached input.
So yes, DeepSeek V4 is cheap. But the cheapest screenshots usually reflect the best-case current conditions, not a timeless pricing law.
Why DeepSeek V4 looks so inexpensive
The DeepSeek V4 preview announcement helps explain the strategy. DeepSeek positions V4-Pro as the flagship model with 1.6 trillion total parameters and 49 billion active parameters, while V4-Flash is the smaller 284 billion total / 13 billion active version designed to be faster and more economical.
The same announcement says Flash closely approaches V4-Pro on reasoning, performs on par with V4-Pro on simple agent tasks, and offers smaller parameter size, faster response times, and highly cost-effective API pricing.
That is the key pricing story. DeepSeek is not only discounting a flagship. It is also giving buyers a deliberately attractive lower-cost model that still covers a lot of practical agent work.
Flash vs Pro: what are you really paying for?
Choose DeepSeek-V4-Flash if:
- You care most about cost efficiency
- Your workload is high-volume and repetitive
- You expect simple or moderately complex agent tasks
- You want 1M context and tool-call support without paying flagship rates
For many production workflows, Flash is likely to be the value play. If the task is routing-heavy, extraction-heavy, or mostly composed of repeatable reasoning patterns, Flash will often be the first version worth testing.
Choose DeepSeek-V4-Pro if:
- You want the stronger reasoning and coding model
- You are running harder agentic coding tasks
- You need more headroom for complex multi-step work
- You want to exploit the current discount window before May 31, 2026
Right now, Pro is unusually attractive because the discounted pricing narrows the gap between “economical” and “premium” far more than many teams would expect.
The hidden lever is cache economics
If you only compare uncached input and output pricing, you are missing one of the biggest reasons DeepSeek V4 can become so cost-effective in real use.
DeepSeek’s cache-hit input pricing is tiny relative to cache-miss pricing:
- Flash: $0.0028 cache hit vs $0.14 cache miss
- Pro: $0.003625 cache hit vs $0.435 cache miss
That gap is enormous. It means teams that reuse long system prompts, repeated context blocks, shared instructions, or recurring reference material can drive costs down much further than the raw “per million tokens” headline suggests.
In other words, DeepSeek V4 is especially attractive when your workflow is architected well. If you constantly resend giant fresh prompts with no reuse, the economics get worse. If you design around caching, they get much better.
What this looks like in simple budgeting terms
Consider a simple request with 1 million uncached input tokens and 250,000 output tokens:
- Flash: $0.14 input + $0.07 output = $0.21
- Pro: $0.435 input + $0.2175 output = $0.6525 at the current discounted rate
Now consider a cache-heavy workload with 10 million cached input tokens and 2 million output tokens:
- Flash: $0.028 cached input + $0.56 output = $0.588
- Pro: $0.03625 cached input + $1.74 output = $1.77625 at the current discounted rate
That second example shows why DeepSeek V4 pricing is getting so much attention. Once caching is doing real work, the economics become hard to ignore.
There is more here than token price
The pricing page also makes clear that both DeepSeek-V4-Flash and DeepSeek-V4-Pro support:
- 1M context
- Thinking and non-thinking modes
- JSON output
- Tool calls
- OpenAI-format and Anthropic-format base URLs
That matters because the price is not buying a stripped-down API. Teams are getting long context and agent-friendly features in both tiers. The choice is mainly about performance headroom and economics, not whether one version is “real” and the other is a toy.
A migration detail teams should not miss
DeepSeek’s docs also note that deepseek-chat and deepseek-reasoner will be deprecated in the future and currently map to the non-thinking and thinking modes of deepseek-v4-flash. The V4 preview announcement goes further: those older names will be fully retired and inaccessible after July 24, 2026 at 15:59 UTC.
If your tooling still depends on the older model names, this is the time to update. Otherwise, you risk budgeting correctly but breaking compatibility later.
How teams should think about DeepSeek V4 budgets
For prototyping
Use Flash first. It is the cleanest way to learn whether DeepSeek’s quality is good enough for your workload without overpaying for benchmark headroom you may not need.
For production pipelines
Design around caching aggressively. DeepSeek V4 pricing is strongest when your workflow reuses context well instead of spraying fresh tokens at every call.
For premium coding and harder agent runs
Test Pro while the discount is live. The temporary pricing window makes May 2026 a very unusual moment to benchmark stronger model quality without paying normal flagship rates.
For long-term procurement
Do not assume today’s Pro pricing is permanent. The discount ends on May 31, 2026, and DeepSeek explicitly says product prices may vary. Model your budgets with the possibility of higher future Pro pricing.
The real takeaway
DeepSeek V4 feels cheap right now because DeepSeek is combining a strong low-cost model, an unusually generous cache-hit policy, and a temporary promotional discount on the flagship.
That does not make the pricing fake. It makes it strategic.
For builders, the practical conclusion is simple:
- Flash is the economic default.
- Pro is the premium option with a time-limited bargain attached.
- Caching architecture may matter almost as much as model choice.
If your team is evaluating DeepSeek V4, do not just compare benchmark scores. Compare the full operating cost of the workflow you want to run. That is where DeepSeek V4 becomes genuinely interesting.