Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> overall my productivity is 10-15% better. That is nothing to sneeze at, but it isn't 10x.

It is something to sneeze at if you are 10-15% more expensive to employ due to the cost of the LLM tools. The total cost of production should always be considered, not just throughput.






> It is something to sneeze at if you are 10-15% more expensive to employ due to the cost of the LLM tools.

Claude Max is $200/month, or ~2% of the salary of an average software engineer.


Does anyone actually know what the real cost for the customers will be once the free AI money no longer floods those companies?

I'm no LLM evangelist, far from it, but I expect models of similar quality to the current bleeding-edge, will be freely runnable on consumer hardware within 3 years. Future bleeding-edge models may well be more expensive than current ones, who knows.

For the purpose of keeping the costs of LLM-dependent services down, you don't need to run bleeding-edge models on single consumer GPUs. Even if it takes a hundred GPUs, it still means people can start businesses around hosting those models, and compete with the large vendors.

How do the best models that can run on say a single 4090 today compare to GPT 3.5?

Qwen 2.5 32B which is an older model at this point clearly outperforms it:

https://llm-stats.com/models/compare/gpt-3.5-turbo-0125-vs-q...


Even when quantized down to 4 bits to fit on a 4090?

Not in my experience, running qwen3:32b is good, but it’s not as coherent or useful as 3.5 at a 4bit quant. But the gap is a lot narrower than llama 70b.

yeah there was an analysis that came out on hackernews the other day. between low demand side economics, virtually no impact to GDP, and corporate/vc subsidies going away soon we're close to finding out. Sam Altman did convince Softbank to do a 40B round though so it might be another year or two. Current estimates are that its cheaper than search to run so its probabilistic that there will be more search features swapped. OpenAi hasn't dropped their ad platform yet though, so interested to see how that goes

There's a potential for 100x+ lower cost of chips/energy for inference with compute-in-memory technology.

So they'll probably find a reasonable cost/value ratio.


Too cheap to meter? Inference is cheap and there's no long-term or even mid-term moat here.

As long as the courts don't shut down Meta over IP issues with LLama training data, that is.

I can't stress that enough: "open source" models are what can stop the "real costs" for the customers from growing. Despite popular belief, inference isn't that expensive. This isn't Uber - stopping isn't going to make LLMs infeasible; at worst, it's just going to make people pay API prices instead of subscription prices. As long as there are "open source" models that are legally available and track SOTA, anyone with access to some cloud GPUs can provide "SOTA of 6-12 months ago" for the price of inference, which puts a hard limit on how high OpenAI, et al. can hike the prices.

But that's only as long as there are open models. If Meta loses and LLama goes away, the chilling effect will just let OpenAI, Microsoft, Anthropic and Google to set whatever prices they want.

EDIT:

I mean LLama legally going away. Of course the cat is now out of the bag, the Pandora's box has been opened; the weights are out there and you can't untrain or uninvent them. But keeping the commercial LLM offerings' prices down requires a steady supply of improved open models, and the ability for smaller companies to make a legal business out of hosting them.


You can't just take cost of training out of the equation...

If these companies plan to stay afloat, they have to actually pay for the tens of billions they've spent at some point. That's what the parent comment meant by "free AI"


Yes, you can - because of LLama.

Training is expensive, but it's not that expensive either. It takes just one of those super-rich players to pay the training costs and then release the weights, to deny other players a moat.


If your economic analysis depends on "one of those super-rich players to pay" for it to work, it isn't as much analysis as wishful thinking.

All the 100s of billions of $ put into the models so far were not donations. They either make it back to the investors or the show stops at some point.

And with a major chunk of proponent's arguments being "it will keep getting better", if you lose that what you got? "This thing can spit out boilerplate code, re-arrange documents and sometimes corrupts data silently and in hard to detect ways but hey you can run it locally and cheaply"?


The economic analysis is not mine, and I though it was pretty well-known by now: Meta is not in the compute biz and doesn't want to be in it, so by releasing Llamas, it denies Google, Microsoft and Amazon the ability to build a moat around LLM inference. Commoditize your complement and all that. Meta wants to use LLMs, not sell access to them, so occasionally burning a billion dollars to train and give away an open-weight SOTA model is a good investment, because it directly and indirectly keeps inference cheap for everyone.

You understand that according to what you just said, economically the current SOTA is untenable?

Which, again, leads to a future where we're stuck with local models corrupting data about half the time.


No, it just means that the big players have to keep advancing SOTA to make money; Llama lagging ~6 months behind just means there's only so much they can charge for access to the bleeding edge.

Short-term, it's a normal dynamics for a growing/evolving market. Long-term, the Sun will burn out and consume the Earth.


The cost to improve training increases exponentially for every milestone. No vendor is even coming close to recouping the costs now. Not to mention quality data to feed the training.

The R&D is running on hopes that increasing the magnitude (yes, actual magnitudes) of their models will eventually hit a miracle that makes their company explode in value and power. They can't explain what that could even look like... but they NEED evermore exorbitant amounts of funding flowing in.

This truly isn't a normal ratio of research-to-return.

Luckily, what we do have already is kinda useful and condensing models does show promise. In 5 years I doubt we'll have the post-labor dys/utopia we're being hyped up for. But we may have some truly badass models that can run directly on our phones.

Like you said, Llama and local inference is cheap. So that's the most logical direction all of this is taking us.


Nah, the vendors have generally been open about the limits of scaling. The bet isn't on that one last order of magnitude increase will hit a miracle - the bet is on R&D figuring out a new way to get better model performance before the last one hits diminishing returns. Which, for now, is what's been consistently happening.

There's risk to that assumption, but it's also a reasonable one - let's not forget the whole field is both new and has seen stupid amounts of money being pumped into it over the last few years; this is an inflationary period, there's tons of people researching every possible angle, but that research takes time. It's a safe bet that there are still major breakthroughs ahead us, to be achieved within the next couple years.

The risky part for the vendors is whether they'll happen soon enough so they can capitalize on them and keep their lead (and profits) for another year or so until the next breakthrough hits, and so on.


If LLama goes away we would still get models from China that don't respect the laws that shut down LLama, at least until China is on top, they will continue to undercut using open source/model. Either way, open models will continue to exist.

Rapid progress in open source says otherwise.

In the US, maybe. Several times that by percentage in other places around the world.

the average software engineer makes $10000 a month after taxes?!

> if you are 10-15% more expensive to employ due to the cost of the LLM tools

How is one spending anywhere close to 10% of total compensation on LLMs?


That's a good insight be because with perfect competition it means you need to share your old salary with an LLM!



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: