Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Random observation 1: I was running DeepSeek yesterday on my Linux with a RTX 4090 and I noticed that the models should fit into VRAM, which is 24GB. Or they are simply slow. So the Apple shared memory architecture has an advantage here. A 192GB Mx Ultra can load and process large models efficiently.

Random observation 2: It's time to cancel the OpenAI subscription.



I canceled my OpenAI subscription last night, as did many many others. There were some threads in reddit with everyone chiming in they all just canceled too. imo OpenAI is done, and will go through massive cuts and probably acquired by the end of the year for a very tiny fraction of its current value.


You want to bet? The panic around deepseek is getting completely disconnected from reality.

Don’t get me wrong what DS did is great, but anyone thinking this reshape the fundamental trend of scaling laws and make compute irrelevant is dead wrong. I’m sure OpenAI doesn’t really enjoy the PR right now, but guess what OpenAI/Google/Meta/Anthropic can do if you give them a recipe for 11x more efficient training ? They can scale it to their 100k GPUs clusters and still blow everything. This will be textbook Jevons paradox.

Compute is still king and OpenAI has worked on their training platform longer than anyone.

Of course as soon as the next best model is released, we can train on its output and catch up at a fraction of the cost, and thus the infinite bunny hopping will continue.

But OpenAI is very much alive.


> The panic around deepseek is getting completely disconnected from reality.

This entire hype cycle has long been completely disconnected from reality. I've watched a lot of hype waves, and I've never seen one that oscillates so wildly.

I think you're right that OpenAI isn't as hurt by DeepSeek as the mass panic would lead one to believe, but it's also true that DeepSeek exposes how blown out of proportion the initial hype waves were and how inflated the valuations are for this tech.

Meta has been demonstrating for a while that models are a commodity, not a product you can build a business on. DeepSeek proves that conclusively. OpenAI isn't finished, but they need to continue down the path they've already started and give up the idea that "getting to AGI" is a business model that doesn't require them to think about product.


In a sense it doesn't, in that if DeepSeek can do this, making OpenAI-type capabilities available for Llama-type infrastructure costs, then if you apply OpenAI scale infrastructure again to a much more efficient training/evaluation system, everything multiplies back up. I think that's where they'll have to head: using their infrastructure moat (such as it is) to apply these efficiency learnings to allow much more capable models at the top end. Yes, they can't sleep-walk into it, but I don't think that was ever the game.


> The panic around deepseek is getting completely disconnected from reality.

Couldn’t agree more! Nobody here read the manual. The last paragraph of DeepSeek’s R1 paper:

> Software Engineering Tasks: Due to the long evaluation times, which impact the efficiency of the RL process, large-scale RL has not been applied extensively in software engineering tasks. As a result, DeepSeek-R1 has not demonstrated a huge improvement over DeepSeek-V3 on software engineering benchmarks. Future versions will address this by implementing rejection sampling on software engineering data or incorporating asynchronous evaluations during the RL process to improve efficiency.

Just based on my evaluations so far, R1 is not even an improvement on V3 in terms of real world coding problems because it gets stuck in stupid reasoning loops like whether “write C++ code to …” means it can use a C library or has to find a C++ wrapper which doesn’t exist.


OpenAI issue might be that it is extremely inefficient with money (high salaries, high compute costs, high expenses, etc..). This is fine when you have an absolute monopoly as investors will throw money your way (open ai is burning cash) but once an alternative is clear, you can no longer do that.

OpenAI doesn't have an advantage in compute more than Google, Microsoft or someone with a few billions of $$.


> You want to bet?

Why would anyone bet? They can just short the OpenAI / MS stocks, and see in a few months if they were right or not.


OpenAI isn't publicly traded and MSFT's stake is so minor compared to their other business that it will have a negligible impact on their stock price.


1) OpenAI isn't public, so not possible. 2) MS is one of the most well diversified tech companies, so, if anything, this will be a positive.


How is that any different from a bet?


Deepseek is not the only reason. I cancelled my OpenAI subscription because I've replaced it wholesale with Anthropic.


I replaced that with kagi, unliminted access to multiple models including Claude, O1 and V3/R1 + you also get Kagi, which was already a good deal


oh wow. I have been using kagi premium for months, and never noticed, that their AI assistant now has all the good AIs too. I was using kagi exclusively for search, and perplexity for ai stuff. I guess I can cut down on my subscriptions too. Thanks for your hint. (Also I noticed that kagi has a pwa for their ai assistent, which is also cool)


Computing is not king, DeepSeek just demonstrated otherwise. And yes, OpenAI will have to reinvent itself to copy DS, but this means they'll have to throw away a lot of their investment in existing tech. They might recover but it is not a minor hiccup as you suggest.


I just don't see how this is true. OpenAI has a massive cash & hardware pile -- they'll adapt and learn from what DeepSeek has done and be in a position to build and train 10x-50x-100x (or however) faster and better. They are getting a wake-up call for sure but I don't think much is going to be thrown away.


...until they distill that "11x efficient training" again ...


In my experience with deepseek and o1, openai's big talk about (and investment into) hallucination avoidance might save their hides here. Deepseek may be smarter, and understand complex problems better, but it also seems to make mistakes more often. (It's as if it's comprehension is better, but it's worse at memorization/recall.)

Need an LLM to one-shot some complex network scripting? as of last night, o1 is still where its at.


My experience gels with yours. Given the same code sample, DeepSeek has better, more creative suggestions about how to improve it, but it can't implement them without breaking the code. o1, generally, can implement DeepSeek's suggestions successfully. I think chaining them together might have quite interesting results.


Is there a tool that can automate chaining like that?


Aider has an architect mode where it asks one model to plan out the changes and another to actually write the code.


I've used it today, with R1 as architect and Sonnet as editor model. So far, this works great. There's no need to use a reasoning model as editor IMO.

Alex (https://alexcodes.app) also does this now btw.


That's ok if all you want to know is which model should I use today, but a test like that is totally dependent on training data, and there is no reason to expect that either DeepSeek-V3 (the base model for R1) or the additional training data for R1 is that same as what OpenAI used for O1 and whatever base model it was built on.

The benchmark comparisons are perhaps, for now, the best way to compare reasoning prowess of R1 vs O1, since it seems pretty certain they both trained for those cases.

I think the real significance of R1 isn't the released model/weights itself, but more the paper detailing (sans training data) how to replicate it, and how effective "distillation" (i.e. generate synthetic reasoning data for SFT) can be to enhance reasoning even without using RL.


IMHO o1 it’s still comparable to a lot better for accomplishing actual stuff than DeepSeek. At least for my use cases.

Of course cost is incomparably higher since plus has a very low limit. Which of course is a huge deal.


Why every time there is a new model all the other competitors are declared immediately dead?


The big deal here isn't that R1 makes any other models obsolete in terms of performance, but how cheap it is $2 vs $60 per million output tokens compared to O1 (which it matches in benchmark performance).

O1 vs R1 performance on specific non-benchmark problems is also not that relevant until people have replicated R1 and/or tried fine-tuning it with additional data. What would be interesting to see is whether (given the different usage of RL) there is any difference in how well R1 vs O1 generalize to reasoning capability over domains they were not specifically trained for. I'd expect that neither do that well, but not knowing details of what they were trained on makes it hard to test.


Because we like drama.


1. You can get all the models by buying Kagi subscription (excluding o1). Includes DeepSeek models. You can also feed the assistant with search data that you can filter.

2. If you have GitHub Copilot, you get o1 chat also there.

I haven't seen much value with OpenAI subscription for ages.


I have Kagi Ultimate and it is nice for this. But a cheaper suggestion would be to use OpenRouter and then use these models via Fireworks or TogetherAI. It also integrates into much more applications. AFAIK Kagi doesn't document a user facing API for the assistant feature.


Unfortunately those are both 10-15x the cost of deepseek direct.

Deepinfra is pretty cheap though as a deepseek provider.


Sure. I meant moreso that this would be cheaper than Kagi while providing the same selection of models.

As for deepseek, I couldn't even sign up because my email domain is not on their whitelist. To just try it out for now I don't mind the increased cost.


That's because DeepSeek is subsidizing their API massively to get more training data.


Doesn't Microsoft own 49% of OpenAI? They'll end up with it all as a division of Microsoft.


I think they “own” 49% of OpenAI’s net income until a certain very high amount. Not a share of the actual company.


They "own" even 75% of profits until Microsoft has recouped its $13 billion investment. 49% comes after that.


I disagree, I don't really need "conversational chat responses", I need multimodal

ChatGPT is the king of the multimodal experience still. Anthropic is a distant second, only because it lets you upload images from the clipboard and responds to them, but it can't do anything else like generate images - sometimes it will do a flowchat which is kind of cool, GPT won't do that - but will it speak to you, have tones, listen to you? no.

And in the open source side, this area has been stagnant for like 18 months. There is no cohesive multimodal experience yet. Just a couple vision models with chat capabilities and pretty pathetic GUIs to support them. You have to still do everything yourself there.

There is a huge utility for me, and many others that dont know it yet, if we could just load a couple models at once that work together seamlessly in a single seamless GUI like how ChatGPT works.


The real insult here is graphics card vendors refusing to make ones with more than 24GB for several years now. They do this so you'll have to buy several cards for your AI workstation. Hopefully Apple eating their lunch fixes this.


The 5090 is 32GB out of the box. Not that that's anywhere near the top of what you can do on an Apple, but at least it's movement.


> They do this so you'll have to buy several cards for your AI workstation.

AFAIK you can't do that with newer consumer cards, which is why this became an annoyance. Even a RTX 4070 Ti with its 12 GB would be fine, if you could easily stack a bunch of them like you used to be able with older cards.


It's "easy" if you have a place to build an open frame rig with riser cables and whatnot. I can't do that, so I'm going the single slot waterblock route, which unfortunately rules out 3090s due to the memory on the back side of the PCB. It's very frustrating.


I think parents point is that NVLink no longer ships with consumer cards. Before you could buy two cards + a cable between them, and software can treat them as one card. Today you need software support for splitting between the cards, unless you go for "professional" cards or whatever they call them.


Maybe that's what they meant, and it'd be cool if nvidia still offered that on consumer cards, but thankfully you don't need it for LLM inference. The traffic between cards is very small.


Isn't the issue that the software needs to explicitly add support for it now, compared to yester-yesterday when you could just treat them as one in software?


There was a rumor that 5090 or 5090D for China may or may not come with multi-GPU software locked. I think GP's referring to that. It's not clear if it is the case with retail cards.


I honestly don’t know why people aren’t more upset by this and still get on their knees for Nvidia. They made the decision specifically to cripple consumer card memory because they didn’t like data centers were using them instead of buying their overpriced enterprise cards that were less performant. They removed NVLink because people were getting better performance out of their two $400 cards than the $1,500 cards Nvidia was trying to peddle. They willfully screw consumers and people love them for it.


Because sensible people just use the cloud at this point, you can probably get several years of training for $6000


It buys you approximately two days (with reservation discount) of a single p5.48xlarge instance, which has 2TB of RAM, and 640GB of VRAM in 8x H100 cards. In fact that is the pricing example they use: https://aws.amazon.com/ec2/capacityblocks/pricing/


MI300X (RunPod) 192gb ram Hourly Rate: $2.49/hr. Break-even Point: You can rent for 2,410 hours (~100 days of non-stop-continuous use) before reaching the cost of the $6000 Mac. Mac's top out at 192GB not 2TB ;) Consideration: If your AI training requires sporadic use (e.g., a few hours daily or weekly), renting is significantly cheaper. MI300X will also get you result many times faster too, so you could probably multiply that 100 days!


Or buy 2 Nvidia digits for $6,000 to get 256GB vram.


I disagree with cancelling the OpenAI subscription. I've been getting some help from o1 for both python and php recently, and o1 was doing massively better for the python stuff (it ran, deepseeks didn't and wont with prompt refinement).


Also for some philosophical stuff DeepSeek just won't do it. I'm working on an essay about spirituality and sometimes it just responds that it doesn't know how to work on those types of problems and we should do something fun like math or games, claud tends to reply with something more like "I have to be honest with you, reincarnation is not real" and ChatGPT doesn't seem to care about that kinda thing at all.


Just don’t ask it about anything related to Tiananmen square or president Pooh..

I’d guess they didn’t quite a bit of fine tuning to censor some more sensitive topics which probably impacts the output quality for other non technical subjects.


Would fine-tuning by using a LoRA paper over the censorship to a large degree?


Why even bother decensoring it (except academic curiosity ig)? There are a million other ways you can learn about those subjects.

The people making the model probably don't really give a shit about politics and just did the minimum to avoid being embarassed, but if people start jailbreaking it they will be forced to care.


Because the best way to learn is through interrogation.

I don't give a damn about ideology I just want everything ever thought or written searchable and interactive


IIRC thezvi's summary post on R1 mentioned that R1 is amazing for general reasoning and is very clearly a successful proof of concept/capability but a lot of effort seems to have been put into making o1 Good At Code as a practical matter, whereas R1 seems to have been more a research project which proved out the approaches and then was released without sanding the rough edges off because that wasn't the point.


Were you running a local model?


While 192GB of ram is appealing, it's also quite expensive at $6000. For that price I rather buy a system with 5 used 3090s, which while being "only" 120GB of VRAM, you benefit from much faster tokens/s and prompt processing speed (the macs are notoriously slow at consuming large contexts).


I think just getting nvidia Project Digits might be the best option. A lot of people when it was announced were underwhelmed. But I think now it could be just the thing for people making their own ai home servers.

https://www.nvidia.com/en-us/project-digits/


Yep, I think the same. With 128GB fast memory one could run this.


Can I use that on the train though? I can with a 128GB MacBook, without it sounding like a helicopter taking off as well.


you don't need to take ai training quite so literally (:


Honestly, if you have a residence of some kind and an Internet connection, you don't need to bring your beefy computer with you everywhere. It is cool to be able to have ridiculously powerful mobile computers, but I don't think I would ever be willing to take a $6,000 laptop anywhere it has a decent chance of being stolen.


Do you live in a third world country? If so I might agree, but otherwise trains are perfectly safe.


I am very happy for you, but laptops get stolen in public in most countries.


Laptops get stolen on a train? An enclosed, single-direction space that only occasionally allows you to exit between infrequent, long-distance stops? A thing that contains ticket inspectors and a literal guard?

How many laptops have you personally seen be stolen on a train?


You mean a tight, enclosed, single-direction space, crowded with people who are tired, and/or trying to relax, and/or thinking about the destination, and/or otherwise not particularly focused after hours of travel; a thing that contains ticket inspectors that show up every now and then to check tickets, and from which passengers embark and disembark at dozens point along the length of the thing, simultaneously, with no supervision or security checks.

Depending on the train type and configuration, many actually seem like pickpocket paradise.


Pickpocketing is a very different proposition. They relying on a lack of awareness, taking your wallet and being long gone before you’ve even noticed. If someone steals your laptop from in front of you without you even noticing I’d suggest that one is on you.

FWIW I’ve used my laptop on the train plenty, I’ve never had anything stolen nor felt in any danger of it.


But would you consider leaving it unattended on your seat and going for lunch to the restaurant car, or for an extended toilet break?


You might have seen some laptops have screens that fold down, I know MacBooks do. This "clam shell" effect protects the keyboard, trackpad, and even the screen from bumps and jostles. Many laptops when so closed can even fit in a backpack.

So a little trick I figured out is to close my laptop lid and then slide it into a pocket of my backpack. I can then carry it with me when I get up and move around.

So then I can take it with me to eat lunch or an extended toilet break. Maybe some day all laptops will have that feature.


...why would I ever do that? You leave something worth several thousand dollars anywhere in public you're risking losing it. What are we even debating here?


Is your laptop in your pocket?


Yes, all the time. It's happened to two people I know, in France and in the US.

People get up to use the bathroom or the cafe car, the laptop is left behind for ten minutes, one of the train stops is while they're away from their seat, and someone sees an opportunity, snags it, and gets off at the stop.

This is an actual thing. And if it's worth a thousand bucks then it's very much worth getting off at an earlier stop then you'd planned, and continuing your journey on the next train.

Ticket inspectors or guards are irrelevant. There isn't one in your car 99% of the time.

I don't why you're trying to argue laptop theft on trains in first-world countries isn't a thing. It absolutely is.


Different regions of the world would see different degrees of responsibilities regarding theft. I would consider absurd to leave unattended in a public space something valuable, considering the effort required to avoid that (that is: taking it with you).

So, yes, theft on trains for people that think they are 100% safe are a thing, but applying the same idea (to assume something is 100% safe and not be cautious) I wonder how do such people use the internet...


My coworker was having coffee and using his work laptop at an outdoor coffeeshop in Mountain View, CA. Someone on a bike rode by and attempted grab his phone and bike off with it.

The attempted thief didn't succeed in taking the phone, but did knock the laptop onto the ground, damaging it.


The discussion was about leaving unattended valuable objects in public places. Sure, a theft can happen even if attended, or using violence, but I personally avoid increasing the chance of having something stolen by leaving it unattended.

If I would make a statistics of primary cause of remaining without a laptop among people I know, the biggest danger is liquids in glasses (that ends up on the laptops) ...


A random person able to dart in and then make a getaway is not what "working on a train" is like and that was the original comment's point.


You're going to take your laptop with you into the toilet on the train...?

I don't think I've ever seen a human being do that before on a train. Not to go to the toilet, nor to grab a coffee in another car.

You can't be paranoid about everything. My friend in France had put his laptop back into his bag where it wasn't visible and assumed that was good enough, but someone must have seen him do it and just took the whole bag.

You are applying a totally unreasonable standard, to suppose that the thefts were due to unreasonable carelessness. What, do you think someone should take their large luggage into the bathroom too, every time they need to pee?

Talk about victim-blaming.


Yes, if I go to the toilet I take my backpack/small bag with me, because usually I have valuable stuff in them and are easy to carry. This does not apply to a large bag (in which I don't put valuable stuff).

The standard is mine and I follow it. The same way I find absurd not to do it, you find it unreasonable to do it.

I find the expectation that things are not stolen (if unsupervised in public places) strange considering the huge amount of inequalities in wealth around even in civilized countries. I do not agree with the idea of stealing, thiefs should be punished, but expecting everybody "to behave" given the situation seems unrealistic to me.

That does not mean that I think that things are stolen 100% of the time. I have a friend that forgot a laptop on a bus (Netherlands) and the driver found it at the end of the line and gave it to lost objects so my friend got it back.


I mean, that's great for you, but it's not just what 99% of people do. You don't usually see people take their backpack into a train bathroom. I've taken a lot of trains and sat near the bathroom often enough (unfortunately). But like I said, it applies to the cafe car too.

If you find it absurd how 99% of people act on long-distance trains, I don't know what to tell you.


Ok - that's really poor opsec. If I'm going to the bathroom in a train with my laptop (whether it's expensive or not - it has access to all my stuff - which is arguably more valuable), I'll sleep it, put it in my backpack and take the backpack to the bathroom with me.

My work policies state you simply cannot leave your laptop out of sight for any period unless it's in a secure location (work|home). I feel the same way for my personal laptop as well.


You don't hear much about laptop thefts these days because phones are more valuable, more numerous, and much easier to steal.

Obviously, nobody steals things while the train is in motion. They wait until the train is about to leave the station, snatch a phone or handbag and jump out just as the door is closing. The train leaves, the thief blends in with other passenger leaving the station, and by the time news of the theft has made it from the passengers to the driver to the station staff the thief is long gone.

Of course people drive around $6,000+ cars all the time, so....


> Obviously, nobody steals things while the train is in motion.

Something interesting: I live near a train line where the doors are not automatic (they have to be opened manually on each stop), and there have been incidents where people get pickpocketed while the train is still in motion, and the thief jumps out right before the station, when the train has slowed down significantly but is still in motion. Many people have been hurt doing this.


People stabbed maybe, but that tends to be more sports related than laptop related. (Yes on a national line(!))


Yeah, a long, enclosed space with no exits is more amenable to drunken violence than petty theft.


Only on Hacker News would I have someone arguing with me that laptop theft is not a concern. You know what, you win. It's your $6,000 laptop, not mine.


A $6000 laptop doesn’t look much different than a $1000 laptop. I don’t think it’s a bigger theft risk than any other laptop.

Make sure the laptop is insured and that full disk encryption is enabled. If it’s a Mac, make sure you have it in Find My so you can wipe it remotely if that’s something you worry about.


Honestly, I didn't bother making a better case for why I wouldn't want a $6,000 laptop in large part because the nerve people have to argue that theft isn't a concern at all made me stubborn. Theft is one reason, but a laptop is also a hell of a lot easier to simply break or lose than a desktop that is permanently installed somewhere, and a desktop is more upgradable and repairable, with typically much more I/O.

Today's baseline laptops are really good as it is. 32-64 GiB of RAM is plenty, and at least on PC laptops you can do it fairly cheaply. Apple has been a consistent year or two ahead in mobile CPU performance but it fell out of my consideration ever since I realized the M1 and 7040 were both very sufficient for any local computation I cared about. (I'm not going to say I'd specifically go for less efficiency or performance, but it has become significantly lower priority over other things like repairability.)

Not really specifically hating on Apple, here. If I was going to get another Mac it'd be a Mac Mini or Mac Studio probably, ideally with a third-party SSD upgrade to both save on costs and get a slight bit of extra drive performance too. I've definitely considered it, even though I am very far from an Apple fan, just due to the superior value and efficiency they have in many categories.


Yes! This goes in my forthcoming blog post "Only on Hacker News..."

Yesterday's entry: "... kind of a mind flex that you noted you used Meta Stories glasses to take that photo."


So, zero times then. Ok!


For what it's worth, I never once insinuated that a laptop would get stolen on a train, only that I wouldn't want to bring such a laptop into the public in the first place. (Presumably, the laptop doesn't come into and exit existence upon entering and exiting the train, so this remains somewhat of a concern even if trains are involved.)

But yes, you're right. I've never personally seen a laptop get stolen. In fact, most people who have their laptop get stolen never see their laptop get stolen either.

I have, however, had coworkers who've had their laptops stolen. Multiple times.


For real? Grab it before the door closes.


it must be amazing to have so much faith in people like you seem to have


> Can I use that on the train though? I can with a 128GB MacBook, without it sounding like a helicopter taking off as well.

What kind of timescale do you expect to be able to train a useful LLM with that?


Well it’s about an hour to commute on the train so I guess that long :3


If you have an internet connection then sure you can?


You can use a desktop computer on a train if it's one with power outlets. Might get some funny looks, but I've seen it happen (or at least pictures). :)


Only time I've seen that done was with assistive tech and I do sympathise that those setups are difficult enough with desktops


>> While 192GB of ram is appealing, it's also quite expensive at $6000.

That's because it's Apple. It time to start moving to AMD systems with shared memory. My Zen 3 APU system has 64GB these days and its a mini ITX board.


What is the performance in ML workloads like on AMD APUs compared to Apple Silicon?


The power requirement for 5x5090s is 10x higher , so you'll spend far more than $6000 in electricity over time.


5x 3090 is also much more power hungry?


For personal usage, does it matter though? In most places residential electricity is cheap compared to everything else. In a DC context I feel it matters a lot more compared to the capex.


1x 3090 (350W power limit) already makes it feel like I'm running a fan heater under my desk, 5x would be nuts.


Place and time your use right, and you'll save a bit on heating at winter and/or at nights.


When running inference workloads via something like llama.cpp, only 1 GPU is ever used at a time, so you would have 1 active GPU and 4 idle GPUs. That should make the power usage less insane in practice than you expect.


I think the last time any of my computers had a case was back when I realized the pair of 900gx2 cards I was running was turning my computer into an easy bake.


The good thing is since MoEs are mainly memory bound, we just need (VRAM + RAM) to be in the range of 80GB or so in my tests for at least 5 tokens or so /s.

It's better to get (VRAM + RAM) >= 140GB for at least 30 to 40 tokens/s, and if VRAM >= 140GB, then it can approach 140 tokens/s!

Another trick is to accept more than 8 experts per pass - it'll be slower, but might be more accurate. You could even try reducing the # of experts to say 6 or 7 for low FLOP machines!


Oh yes 192GB machines should be able these quants (131GB for 1.58bit, 158GB for 1.73bit, 183GB for 2.22bit) well :)


Great release Daniel. Applaud the consistency you have shown.

Can you release slightly bigger quant versions? Would enjoy something that runs well on 8x32 v100 and 8x80 A100.


Thanks! Oh I did release 4bit quants, 5bit, 6bit etc all at https://huggingface.co/unsloth/DeepSeek-R1-GGUF if that helps - they're not dynamic though but it should function fine :)


Yes, shared memory is a pretty big leg up since it lets the GPU process the whole model even if the bandwidth is slower which still has some benefits.

Apple's M chips, AMD's Strix Point/Halo chips, Intel's Arc iGPUs, Nvidia's Jetsons. The main issue with all of these though is the lack of raw compute to complement the ability to load insanely large models.


So I'm thinking, inference seems mostly memory bound. With a fast CPU (for example 7950x with 16 cores), and 256GB of RAM (seems to be the max), shouldn't that give you plenty of ability to run the largest models (albeit a bit slowly).

It seems that AMD Epyc CPUs support terabytes of ram, some are as cheap as 1000 EUR. why not just run the full R1 model on that - seems that it would be much cheaper than multiple of those insane NVidia-Karten.


The bottleneck is mainly memory bandwidth. AMD EPYC hw is appealing for local inference because it has a higher memory bandwidth than desktop gear (because 8-12 memory channels vs 2 on almost everything else), but not as fast as the Apple architectures and nowhere near VRAM speeds. If you want to drastically exceed ~3-5 tokens/s on 70b-q4 models, you usually still need GPUs.


This was beautifully illustrated in the recent Phoronix 5090 LLM benchmark[1], which I noted here[2]. The tested GPUs had an almost perfect linear relationship between generated token/s and GB/s memory bandwidth, except the 5090 where it dipped slightly.

I guess the 5090 either started ever so slightly to become compute limited as well, or hit some overhead limitation.

[1]: https://www.phoronix.com/review/nvidia-rtx5090-llama-cpp

[2]: https://news.ycombinator.com/item?id=42847284


On Zen5 you also get AVX512 which llamafile takes advantage of for drastically improved speeds during prompt processing, at least. And the 12 channel Epycs actually seem to have more memory bandwidth available than the Apple M series. Especially considering it's all available to the CPU as opposed to just some portion of it.


Maybe EPYC can make better use of the available bandwidth, but for comparison I have a water cooled Xeon W5-3435X running at 4.7GHz all-core with 8 channels of DDR5-6400, and CPU inference is still dog slow. With a 70B Q8 model I get 1 tok/s, which is a lot less than I thought I would get with 410GB/s max RAM bandwidth. If I run on 5x A4000s I get 6.1 tok/s, which makes sense... 448GB/s / 70GB = 6.4 tok/s max.


very strange as I get on old i5-12400+DDR4 2 tok/sec with 14B/q8 model.


It’s more expensive, but Zen4 Threadripper Pro is probably the way to go on that front. 8 memory channels, with DIMMs available up to DDR5-7200 for 8x32GB (256GB), or DDR5-6800 for 8x48GB (384GB). It’ll set you back ~$3k for the RAM and ~$6k for a CPU with 8 CCDs (the 7985WX, at least), and then ~$1k for motherboard and however much you want to spend on NVME. Basically ~$10k for a 384GB DDR5 system with ~435GB/s actual bandwidth. Not quite as fast as the 192GB Apple machines, but twice as much memory and more compute for “only” a few thousand more.


At these prices, I would just get 2xDigits for $6k and have 256gb.


I have a feeling that Digits will probably get sold out and will pricing will get hiked WAY up.


is it confirmed that you can get 256gb of vram for that amount? Because my understanding is that digits pricing will start at $3k for some basic config.


What they meant is buying two whole separate computers.


I understand. It is still unclear if you can get 128GB vram for $3k.


Well, I mean, the press release is pretty unambiguous.

>Each Project DIGITS features 128GB of unified, coherent memory and up to 4TB of NVMe storage.

Even if $3k is only the starting price, it doesn't sound like spending more buys you more memory.


Ok, but it is not clear what kind of RAM is that, how many memory channels, etc. If the goal is to have just 128GB of some ram, then it could be achieved by paying few $100.


Fine, but at that point you're arguing about the concept of the product. It's billed as a computer for AI and you're saying that it might not be more suitable for AI than a regular PC.


it is possible that one could build better PC than digits for AI. We will see once they release digits.


FWIW Threadrippers go up to 1TB and Threadripper Pro up to 2TB. That's even in the lowest model of each series. (I know this because it happens to be the chip I have. Not saying you shouldn't go for Epyc if it works out better.)


Have you tried running the full R1 model with that? People in sibling comments mention high end EPYCs gor a 10K machine, but I’m curious whether it’s possible to make a 1-2K machine that could still run those big models simply because they fit in RAM.


I spent about $3000 on my machine, have the cheapest Threadripper CPU and 256GB of RAM, so no, 600GB won't fit in RAM on a $2K machine.

But everyone is using the distilled models which are much smaller.


idk, in my daily work i still see o1 being more useful, did you observe both having the same reasoning power?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: