MiniMax M2.7 has become one of the more interesting pricing stories in AI coding because it gives builders two completely different ways to buy the same core model. You can pay by token through the API, or you can subscribe to a Token Plan that meters usage by requests over rolling windows.
That split is exactly why many teams misread the cost. The headline API rate looks unusually low for a frontier-class coding model. But if your team wants higher-speed access, predictable tooling access, or fixed-cost daily development, the subscription path may be the more relevant number.
MiniMax M2.7 pay-as-you-go API pricing
For standard pay-as-you-go API usage, MiniMax lists MiniMax-M2.7 at $0.30 per 1 million input tokens and $1.20 per 1 million output tokens. Prompt cache reads are $0.06 per 1 million tokens, and prompt cache writes are $0.375 per 1 million tokens.
There is also a faster variant, MiniMax-M2.7-highspeed, with the same model quality target but higher throughput. That version is priced at $0.60 per 1 million input tokens and $2.40 per 1 million output tokens, while cache read and write pricing stays the same.
| Model | Input | Output | Cache read | Cache write |
|---|---|---|---|---|
| MiniMax-M2.7 | $0.30 / 1M | $1.20 / 1M | $0.06 / 1M | $0.375 / 1M |
| MiniMax-M2.7-highspeed | $0.60 / 1M | $2.40 / 1M | $0.06 / 1M | $0.375 / 1M |
For teams used to frontier-model pricing, those numbers are aggressive. They make M2.7 especially attractive for long-context coding workflows where a lot of value comes from large prompt state and not just short, flashy responses.
What the MiniMax Token Plan actually gives you
MiniMax also offers a Token Plan, but despite the name, the key quota for M2.7 is request-based rather than token-metered. Standard plans currently look like this:
| Standard Token Plan | Monthly price | M2.7 quota |
|---|---|---|
| Starter | $10/month | 1,500 requests / 5 hours |
| Plus | $20/month | 4,500 requests / 5 hours |
| Max | $50/month | 15,000 requests / 5 hours |
There are also high-speed plans for teams that specifically want the faster variant:
| High-Speed Token Plan | Monthly price | M2.7-highspeed quota |
|---|---|---|
| Plus-Highspeed | $40/month | 4,500 requests / 5 hours |
| Max-Highspeed | $80/month | 15,000 requests / 5 hours |
| Ultra-Highspeed | $150/month | 30,000 requests / 5 hours |
Annual versions are also available, with standard plans at $100, $200, and $500 per year, and high-speed plans at $400, $800, and $1,500 per year.
The other important detail is that these plans are multimodal bundles, not just text access. Depending on tier, they also include daily quotas for speech, image, video, and music products. That makes the plans more attractive for teams experimenting with broader agent workflows than the M2.7 name alone might suggest.
Why the Token Plan can be a better deal than API billing
If you are using M2.7 inside coding tools for sustained daily work, the Token Plan can be dramatically simpler than pure token billing. Instead of thinking about prompt length, output length, and every cache event, you are mostly thinking in terms of how many requests you can run over a rolling five-hour window.
That matters for terminal agents and editor workflows, where a single “task” may involve many short or medium interactions. Under straight API billing, those micro-steps can be cheap individually but hard to predict in aggregate. Under a Token Plan, the cost ceiling is easier to understand.
The tradeoff is flexibility. Token Plan keys are specific to the plan, separate from pay-as-you-go API keys, and meant for that subscription flow. If you are building your own backend or shipping a product, pay-as-you-go is usually the cleaner fit.
Where MiniMax M2.7 bills can rise faster than expected
The first surprise is the highspeed multiplier. Teams often see “same performance, faster speed” and assume the premium is modest. It is not. Input and output pricing both double on M2.7-highspeed.
The second surprise is cache write cost. Cache reads are cheap, but writing large prompt state repeatedly can add up if your workflow constantly rebuilds context instead of reusing it efficiently.
The third surprise is that the cheapest plan is not automatically the cheapest real workflow. A Starter subscription may look better than API billing on paper, but if your team routinely hits the five-hour ceiling, you may end up switching back to pay-as-you-go anyway.
Which pricing path makes sense for different teams?
Choose pay-as-you-go if:
- you are building an app or product backend
- you want token-level cost control
- you need flexible orchestration outside the subscription flow
- your workload is bursty rather than daily and continuous
Choose a Token Plan if:
- you use M2.7 mainly through coding tools
- you want a fixed monthly cost
- you value simple quota tracking over token accounting
- you want bundled multimodal quotas alongside text access
For many individual developers, the strongest value is actually the middle of the range: enough fixed-cost usage to work daily, but not so much that you pay for throughput you never touch.
What teams should actually budget for MiniMax M2.7
If your priority is the absolute lowest model cost for productized inference, MiniMax M2.7’s base API pricing is the main story. At those rates, it is one of the more budget-friendly ways to run a capable long-horizon coding model.
If your priority is all-day development inside supported tooling, the Token Plan is often the more practical story. In that case, the question is not “What is the token price?” It is “How much rolling request capacity do we need before the plan becomes frustrating?”
That is the real budgeting lens for MiniMax M2.7 in 2026. The cheap API rates are real. The subscription economics are real too. The right answer depends on whether you are buying model inference or buying a daily workflow.