hckrnws
DeepSeek makes the V4 Pro price discount permanent
by Tiberium
by Tiberium
I tried it and it's impressive.
[1]: https://api-docs.deepseek.com/quick_start/agent_integrations...
# After installed (or when run portably with ./ccode)
ccode init-config
ccode edit-config
# Run with default profile
ccode
# Run with named profile
ccode --deepseek
# Set default profile
ccode set-default-profile deepseek
Also turns out that with a local proxy you can get Remote Control working and see the DeepSeek sessions in the desktop app, screenshots on the page. Other than that, I'm happy that it works pretty well and the discount is enough to make me consider going from Anthropic's Max subscription to Pro and using it only where DeepSeek is insufficient. With that proxy I eventually hope to be able to transparently switch models mid-task, if I need Opus for like 5 turns or something.Overall though I'm not sure exactly how well Claude Code would stack up against OpenCode, since the latter overall feels a bit less hacky with 3rd party models and is even getting niche but nice features like a locally runnable web version: https://opencode.ai/docs/web/
It's not good enough to fully replace any of the frontier models yet but it's definitely great to have as a backup!
I used DeepSeek, Kimi, GLM, Qwen, and MiMO against GPT-5.5 high as reference, all running in Pi harness without anything installed.
So far, Kimi and MiMO look the most promising to me. I haven’t tested them rigorously enough to make a strong statement, but my first impression is that, in practice, all those models may be less behind on typical daily tasks than people think.
They are a bit “work hard, not smart". Getting to same-ish results more slowly and using more tokens, but at a fraction of the price
I am looking forward to things slowing down and stabilizing. I'm not saying that should happen today, just I am looking forward to it.
[deleted]
- how do/would you add the WebSearch tool to your harness? pay for a separate service or does deepseek offer something with their subscriptions?
- do pi/opencode support pasting images in prompts?
- how do you handle reading images? deepseek is not multi modal IIRC? do you pay for another model and route to it?
Any of these missing would really annoy me in day to day use...
The chains of thought for Deepseek are very very interesting reads. Open code won't show them but do read them and you'll be surprised at how underrated the model is.
My model usage is very low but I still do pay directly to Deepseek regularly as my tribute and contribution to them open sourcing their models as my gratitude and showing support for what I deem positive for overall social good.
The same model hosted by other providers is much more expensive [0]. So either DeepSeek can host it much cheaper than anyone else, or their business model is different. I suspect the latter, especially since their privacy policy [1] says personal data, including “User Input,” can be used "To improve and develop the Services and to train and improve our technology".
[0]: https://openrouter.ai/deepseek/deepseek-v4-pro/providers
[1]: https://cdn.deepseek.com/policies/en-US/deepseek-privacy-pol...
> (2) For all models, the input cache hit price has been reduced to 1/10 of the launch price. This price adjustment takes effect from 2026/4/26 12:15 UTC.
There is no end date. Currently, it's 2% of the input price for DeepSeek V4 Flash and 0.8% with this new V4 Pro pricing, which is extremely low compared to competitors to the point that it affects the unit economics a bit and I thought it would be temporary.
In the case of V4 Pro, the effective cost is ~$0.04/M input tokens given the caching (based on OpenRouter's metrics: https://openrouter.ai/deepseek/deepseek-v4-pro), which is significantly cheaper than even small models from competitors.
DeepSeek V4 Pro: $0.87
Qwen 3.7 Max: $7.50
Grok 4.3: $2.50
GLM 1.5: $3.08
Opus 4.7: $25.00
GPT-5.5: $30.00
I hesitated to even post this comment as it sounds biased and xenophobic. I would love for someone to convince me I am wrong. Does anyone have any insight into the company behind deepseek hosting, and what their history of respecting data privacy is?
Data at https://gertlabs.com/rankings
China sell lithium at a loss to make it unprofitable for Australian/US miners, for example (https://www.miningweekly.com/article/china-is-oversupplying-...).
I'll keep running Flash locally for the stuff I care about data privacy, but the value of Pro through their API is unreal for anything else (and I want to give them my training data as long as they keep putting out open models).
We've been working on a project which can be thought of as an agent, just not for coding. So we've been building everything: agents, sub-agents, RAG, dynamic intent detection, changing models based on what's being done, etc. In our tests, DeepSeek V4-flash is the cheapest model with acceptable replies (few hallucinations, while finding the right information). It's not the cheapest one we run overall (we're actually surviving with 3B models for some tasks), but it's definitely the one powering the system and driving the main "agent".
US companies dont sell AI services in China (as far as I know) but deepseek markets to US companies and customers.
First accessible model with useable 1 million context window for me.
RIP.
Claude literally refuses to finish tasks in auto mode and just keeps saying, now is a good stopping point, when it's 1% done (and doing the EXACT OPPOSITE of what I tell it).
Codex is barely better...
May as well pay 1/20th the price for DeepSeek.
Claude seems to have something that looks at how long you've been a customer and then just massively degrades quality.
When I started my subscription, Claude had none of these problems.
2 months into subscriptions Claude is completely unusable garbage, and Codex is not much better.
You don't get the discount that Deepseek is providing, but it's still a cheap model (v4-pro is cheaper than sonnet)
https://api-docs.deepseek.com/quick_start/agent_integrations...
max is really chatty for minimal gain.
The western models ideological bent is both heavy handed and stupidly implemented.
Remember Jevons paradox? [0] It isn't at Anthropic or Microsoft [0], but it is at DeepSeek.
[0] https://www.thelowdownblog.com/2026/05/microsoft-cancels-int...
[deleted]
[deleted]
FWIW, I this is what I have in my settings.json
"env": {
"ANTHROPIC_AUTH_TOKEN":"sk-nope_not_real",
"ANTHROPIC_BASE_URL": "https://api.deepseek.com/anthropic",
"ANTHROPIC_MODEL": "deepseek-v4-flash",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "deepseek-v4-flash",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "deepseek-v4-flash",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "deepseek-v4-flash",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1",
"CLAUDE_CODE_EFFORT_LEVEL": "low",
"CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING": "1",
"CLAUDE_CODE_DISABLE_THINKING": "0",
"CLAUDE_CODE_ENABLE_AWAY_SUMMARY": "0",
"CLAUDE_CODE_SUBAGENT_MODEL": "deepseek-v4-flash",
"CLAUDE_CODE_MAX_OUTPUT_TOKENS": "8000",
"CLAUDE_CODE_FILE_READ_MAX_OUTPUT_TOKENS": "4000",
"BASH_MAX_OUTPUT_LENGTH": "20000",
"CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "60",
"CLAUDE_CODE_AUTO_COMPACT_WINDOW": "200000",
"CLAUDE_CODE_DISABLE_GIT_INSTRUCTIONS": "1"
}I think out tokens would be a better metric.
I run a proxy that allows me switching back to Opus when necessary.
Deepseek isn't like Z.ai which is bit cheaper only on the surface. Or like Qwen 3.7 Max which is Opus-level but very expensive.
Deepseek is my favorite since V3 but V4 is definitely catch-up to newer Anthropic models
I did some back of the envelope calculations and it seems like you would pay $5/month using DeepSeek directly or $15-20 with OpenRouter or similar. But would be interested to hear real world usage.
But as usual, there are far cheaper subscriptions with higher limits than Anthropic and OpenAI, that also provide DeepSeek v4 Pro. So you should use those subscriptions first until you max them out, then look at a different subscription.
the only real family models that work were claude and openai, surprisingly, for tasks that needs faster speed, gpt 5.4 is very impressive. Deep seek was very average , doing things somewhere in gemini flash 3.0 domain.
It's basically not possible with claude code, the api endpoint is a single environment variable and whatever models are on that endpoint are what's available.
HOWEVER, if you run a proxy like LiteLLM, you can configure it to send requests to different api endpoints on the back end and expose them as different "models" on the front end, then configure claude code to switch between those virtual models.
It allows for switching models in Claude Code.
I've been using Deepseek v4 with Cline in VS Code as a replacement for Github Copilot, and it's not been too bad.
Which begs the question, regardless of the model, which Claude Code alternative is better? (I keep saying "Claude Code alternative" because I don't know the term... LLM CLI?)
https://mariozechner.at/posts/2025-11-30-pi-coding-agent/#to... (the pi-coding-agent section)
[deleted]
Later, they can always lock it down more or add Claude LLM only features to it.
Edit: here is a really good twitter thread about this exact topic: https://xcancel.com/kunchenguid/status/2057700714626105412
I can't claim it's "the best"...
But the Pi.dev and OpenRouter combo is what I'm doing at home, and I love it. Setup was easy, I can use /model to switch between any of the openrouter models and whatever I'm hosting locally via VLLM.
Based on these benchmarks, here's a rough mapping:
- Qwen 3.7 ~= GPT 5.3
- Kimi K2.6 ~= GPT 5.15
- DS V4 ~= GPT 5.1
So yes, we have GPT 5 at home now. No need to pay the Legacy Labs anymore.
Here's the benchmark I used since I can't post images here: https://x.com/trydotworks/status/2058004995195490706?s=20
[deleted]
They support image locations like a file or url, but not regular images (opencode desktop might though?)
Both pi and opencode make it very easy to change models so you can easily call to 5.4-mini or whichever multi-modal LLM for reading images. I'm sure you could even create a skill to automate the process too, having the model use the cli to send the photo to the multi-modal and give it back a description.
No, of course not, why do you ask?
I'm not sure if it's when you run out of crypto, or when your bank gets hit by ransomeware.
Either way, something interesting about that accidental misspelling. It will probably become someone's band name one day.
[deleted]
Inference stack efficiency: Many of these providers take off the shelf sglang / vllm / trtllm and hope for the best. Meanwhile DeepSeek team is known for pushing the boundary of optimizations.
Now, sglang and vllm are great pieces of software, but take DeepSeek's Sparse Attention (DSA). Introduced 1.5 years ago (https://arxiv.org/abs/2512.02556), used by DeepSeek 3.2, GLM 5, DeepSeek V4. Only now is it slowly strating to get optimized in the major inference engines: (https://github.com/sgl-project/sglang/issues/19380 https://github.com/sgl-project/sglang/pull/22851 etc.). Of course, DS V4 adds extra optimizations into the model architecture on top of DSA, and those will take more time to be taken full advantage of by the open source inference engines.
Privacy: Betting that people will pay extra for inference hosted outside China. This is especially true with DeepSeek, because DeepSeek is transparent about using API data for model improvements.
And few other things (scale (matters a lot for MoEs), reliability, soft enterprise lock in, etc.)
---
There is also, likely, tacit collusion at play here. Look at GLM 5 and GLM 5.1 prices. GLM 5 and 5.1 cost the same to run, but providers decided to charge much more for 5.1 because it is much better model, and because Z.AI raised their price as well.
But I agree that the main driver is that they are really good at optimizing. They will have chosen their architecture in such a way that it will be as efficient as possible on their own infrastructure, so they have a massive head start. Inference framework developers still have to catch up.
I'd love to give these models a try, but I'd rather not use a provider that trains on or stores my data (beyond standard legal requirements of course).
Though to be honest, I'm not sure I want to trust business workflows to a website where the only contact is a Gmail address and no physical contact address. That site looks incredibly dodgy.
But why not? Gaining market share at a loss isn't the US's patent.
Loss leading only works when
- it leads to a situation that allows you to prevent competitors from selling to your customers (gilded age railroad and pipeline industries are great examples). Then you can eventually raise prices and not lose back any market share.
- or when it allows you to remarket to customers and make back the difference (selling a single console at a loss to sell a whole library of high margin videos games, or selling jet engines at a loss to lock in 30-year maintenance contracts).
Also, in case of LLM, market share = more people uploading their whole codebase/legal documents/unfinished books/literally everything to your servers for you to use in future training. So the incentive to sell at a loss is much stronger than other kinds of service.
DeepSeek V3.2 which uses DSA only (sparse attention, but without compression from HCA and CSA) is a smaller model but uses 10x more memory at 1M context window compared to DS V4 Pro.
Also, I have to say, DeepSeek's API has a very good cache hit rate. With the same workload, I see ~80% KV cache hit rate with the DS API vs ~50% with the major western inference providers for open weight models.
Probably the most direct competitor of Flash model :
GPT 5.4 mini
Cache Read $0.075 /M tokens
Gemini 3 flash :
Cache Read $0.05 /M tokens
e.g nothing very magical or ground breaking.
Have not actually compared it to other models, but I would not consider it in the same price range.
Gemini 3.5 flash : Cache Read $0.15
For Gemini 3.5 Flash, it's also 10% of input cost.
Which is why 2%/0.8% change the economics in a meaningful way, given the input/cache-heavy way agents operate.
Stats from pi:
↑400k ↓438k R432M 71.9%/1.0M
Half a billion tokens, $2.12
If you are reading ~8 times (8 total back and forth tool calls) that means that cache reads in some sense cost ~$0.4 / M toks (Amortizing the write surcharge over all reads).
It's really quite ridiculously expensive considering what you are paying for is some residence on a VRAM that sometimes gets offloaded to NVMe.
And it's multi modal, and available at whatever you might imagine rates limits.
The speed is absolutely bonkers too. I once misconfigured a mcp I was developing locally, and told it to use the tools provided by this mcp to get certain task done. It figured out that the mcp is misconfigured, and then automatically went ahead and started to fix the mcp, fixed it, and then started using it by passing raw jsonrpc messages using stdin/out, bypassing the harness integration (since it would have needed a restart).
It did all of this in under 30 seconds and made over 15 tool calls in all of this (yes, I use yolo mode in a container, so my agents have full access to everything in the container).
Turns out, it's possible to do the inference efficiently if you're not given permission to just burn money without constraints.
It doesn't matter how good Opus is if 2 months into your subscription they make it worse than GPT 3 to save money.
I imagine when onlyrealcuzzo said "they don't make the model worse once you have a subscription", he didn't mean OpenCode Go, otherwise they would have probably said so.
[deleted]
If you're interested in trying DeepSeek V4 privately, you can try Tinfoil (tinfoil.sh) where all models are hosted in an attested secure hardware enclave, making the inference end-to-end private. Full disclosure: I'm one of the cofounders.
[1] https://cdn.openai.com/trust-and-transparency/openai-law-enf...
We use it that way and it works great.
There are widespread reports about how foreign actors (not limited to China) have infiltrated critical networks across many industries in the US en masse and are simply waiting for the right time to exploit them. Frontier models are simply another attack vector (and much more easily exploitable when you think about it).
The fact is that there is potential for this with any cloud-hosted model, whether it is intentional by the actual company building the models or a malicious actor is able to exploit a vulnerability.
If I was working on something that the Chinese government considered of strategic importance, then I would certainly be worried about it. But I don't do that.
I'm much more worried about techbros in this country using their LLMs to extensively profile me and produce something vastly more dystopian in this country than the real or imagined social credit scores in China. The people trying to convince you that the Chinese government are the people you should be worried about (as an individual in the United States) are probably the people you really need to be worried about.
The tech bro threat model has always been pure jingoism and xenophobia. Ironically, the worst thing a Chinese company has done with my data is sell Tiktok to an American technofascist.
[dead]
Waaay too many people think China is structurally identical to the US with the only difference being the language.
Deepseek servers are CCP servers, there is no functional difference or any form of friction to keep the government "in check". In fact there isn't even a concept of "keeping the government in check".
And for the apologists who love to flood comments like this with whataboutism...look at all the shit Trump has tried to do that has been shot down or derailed. That shit doesn't happen in China. Xi Jinping has never been over ruled because that isn't even a thing that can happen there.
If he wants a team to do a daily read of chosen Americans deepseek conversations, he will have it tomorrow, and all he needs to do is say it.
Nearly all requests are cached now. It's amazing.
[dead]
DeepSeek V4 Pro price on OpenRouter:
deepseek: $0.435 / $0.87
baidu/fp8: $1.521 / $3.042
novita/fp8: $1.64 / $3.38
Yup. DeepSeek either has next-generation hardware that somehow no one else has access to, or they're selling at a loss.
DeepSeek likely operates at a loss. How big the loss is anyone's guess.
Meanwhile I am happy using their model. It is really good, to a point I forget I am not using Codex or Claude.
[deleted]
Deepseek has made some incredible advancements in model efficiency, and more importantly actually publishes those advancements so everyone can benefit from them.
I suspect American inference providers implement the efficiency gains, and pad their margins rather than pass the savings along to the consumer.
It’s going to be hard to enforce it for most consumers though. It’s only going to apply to large corporations in effect.
That being said for coding and most actual “frontier” purposes the American models leave Deepseek in the dust.
For a while, US automakers thought the same of Japanese, then Korean car manufacturers, and Musk laughed at Chinese EV makers in an interview >12 years ago. People learn and get better at making things until they catch up with the frontier.
When VC pulls out, some of them may go bankrupt.
China is gonna win long term there’s no doubt. The fact that the American firms haven’t created immense escape velocity despite the disparity in spending is quite telling.
That's more than good enough if you're actually getting what CC Opus is capable of.
I've never been so excited for the future.
If the Chinese model of open weights wins, AI will benefit everyone.
If the American model of closed weights wins, AI will benefit a few rich guys and everyone else will be thrown into precarity.
I am completely convinced they just screw over their customers after so much usage or so long of a subscription thinking they have them for life.
I have NEVER been so happy to cancel a subscription.
DS$ Pro on Tensorix. That is not exactly cheap. Input:$1.75 / 1M tokens Output:$3.50 / 1M tokens
I recall reading about that in an issue or in their Discord server.
But I would contact them formally to verify that.
What's frustrating is that they give no information on who the provider(s) are!
Pi's developer is obviously not anti-AI, and he definitely doesn't hate OpenClaw, since it's based on Pi. But there's a growing number of people who take those things too far, and a lot of them are on HN. You can easily find them in the comments of any AI-related post here. I assume that's the type of people the image is portraying.
[deleted]
Personally I'm not going to choose one harness or another based on +/- a few percentage points in a benchmark. I'm going to use one the one that I find the most ergonomic, that isn't too bloated, etc. The models are the primary lever, not the harness.
[deleted]
Once they cross a certain threshold, nVidia can say goodbye to it's monopolisitic profit margins of over 70%.
GPU infra capex is the biggest spend for the inference providers as of now, power, second biggest.
China has already cracked the power part, they are now close to cracking the GPU part.
Before DeepSeek, no one sold cheap tokens anyways and then DS showed the profit margins.
[dead]
So their strategy now is to try get as much raw content for their inference. You're being "paid", via discount, for your use
From what I've read online, people have reported that DS4Flash-xHigh works even better than DS4Pro-xHigh .. so, you can try. No harm in trying :)