> And worse, different binary slices are used between different dies of the same...

dotnet00 · on Feb 18, 2023

>ROCm has always really struck me as being aimed at the HPC market - only supporting pro-tier GPUs, focusing on users who are already running custom software, openly disdaining amateur support, etc. And frankly the distribute-as-source model makes ROCm just a complete nonstarter for any sort of commercial or other model where the vendor would absolutely be distributing libraries or a compiled application, let alone trying to wrangle dumb customers through getting a working ROCm environment. Which right now is the authentic "linux in 1995" level experience/ordeal.

While I somewhat agree, I'd say they also fail at properly targeting the HPC market. The specific slice they're targeting is people who are already forced to work with them, such as developers for software intended ahead of time to be run on their supercomputers. This leads to them still missing out on the portion of the HPC market which isn't bound to a specific supercomputer.

For example, with the software we work on, we currently benefit from our choice of CUDA due to the supercomputer we have access to being A100 based. But if working with ROCm were better and we could test it on more consumer hardware first, it could make a convincing case for gaining access to supercomputers with AMD GPUs, which would then influence the selection of their hardware in other machines. But since we can't, once our CUDA support is more mature it's essentially a given that our lab will be buying many more NVIDIA cards to allow other researchers to take advantage of it.

JonChesterfield · on Feb 18, 2023

It's more subtle than that. AMDGPU doesn't have an equivalent to PTX to abstract over differences between hardware generations. So while CUDA caused problems with Volta changing the intrinsics it was able to mostly paper over it in the toolchain. For AMDGPU, changing from a gfx1030 to a gfx1031 probably means recompiling all the machine code.

Because there are _lots_ of different cards and they all have their own machine code, distributing libraries is a pain. You either distribute N copies or do some packaging effort to hide that you've distributed N copies, or you ship raw LLVM IR and cross your fingers that patching it up on the fly works out, in defiance of LLVM not really supporting that.

AMD as a company is in favour of open source. How they feel about user freedom is less obvious, but as long as I can rebuild their stack from source when it falls over, I'm going to stick with it over nvidia. I especially like that the driver ships in the linux kernel.

paulmd · on Feb 18, 2023

I don't know man, did you watch the video I linked from AMD's FSR lead?

"we won't support [an open source API/framework] because it might be used to plug things we don't like" is pretty explicitly hostile to both user freedom and open source as a whole. The freedom to do things that you might not want me to or that you won't do is really the only user freedom in this sense, right?

What happens when some game won't update to FSR 2.2 and you're stuck on FSR 2.1 forever? That's the thing open-source user freedom is supposed to fix, right? AMD doesn't get to determine that "our software is the best and alternatives offer no benefits" either, that's the end-user's choice, and when that's followed by "therefore we will work against adoption of pluggability and interoperability standards" that crosses into openly hostile.

The lead is being very diplomatic and careful but that's a very coached way to say that "this interoperability standard isn't good for us and we will work to kill it and prevent adoption, regardless of its open-source nature. Our product is better than theirs and we are going to work to deny you the freedom to choose otherwise."

That's some microsoft level embrace-extend-extinguish shit right there: this is them embracing upscaling tech, extending it with their own proprietary implementation and deliberately kneecapping interoperability, with the goal of eventually extinguishing DLSS. They're just saying the quiet bit out loud and generally being delusional to think they could ever make this happen.

--

Look, big-picture here: AMD did the open-source driver on linux thing because it was a way to get a force-multiplier on their dev time - the community does the work instead of employees that AMD has to pay for. It's a niche community with a DIY mentality and AMD (and Intel before them) leaned on that to get more work done.

AMD's not open-sourcing drivers on Windows. They slap down open APIs like Streamline when it doesn't support their business strategy.

The same thing is happening with ROCm - everyone else does software this way, but, AMD wants you to do it that way, even though it's more work and more cumbersome for end-users, because it results in more lock-in for their software ecosystem. Is that not fundamentally the same thing as people accuse NVIDIA of doing? And that's how they've handled FSR too - user freedom doesn't really matter that much as a principle, actually they've stated they're explicitly against users having these freedoms. They want people statically compiling it because if they embraced user freedom it would help users move more freely between these ecosystems and they don't want that, they are fine with lock-when you're locked into their ecosystem. What they mean by "user freedom" is they want to prevent users from plugging a library that interfaces with their competitor's hardware accelerators. That's actually the opposite of user freedom.

And where does PSP fit into user freedoms - a whole closed processor running underneath your user experience doing god knows what? Wasn't that something people flipped an absolute fucking shit over with Intel ME? Yeah it's got closed-source elements that make it a problem to open, but isn't that also true of the NVIDIA drivers people constantly whinge about? What's the reason for giving AMD a pass on external IP but not NVIDIA? Blobs don't matter anymore if it's AMD?

Or the platform lock - literally preventing secondhand resale of server cpus (and now desktop CPUs too) if they're ever used in a branded system. Note that it's not locked to a motherboard - it's not about preventing parts swapouts. It's locked to a brand, so you can swap any other HP-locked cpu into a HP-locked system. Clearly 100% targeted at killing the secondhand market, and that's pretty damn anti-user-freedom as well, why shouldn't I have the freedom to buy a used CPU if I want? Because it would impact AMD's bottom line I guess?

Like, at the end of the day AMD is rolling in the anti-user-freedoms shit same as everyone else. It's a bit, when you're the underdog you need an angle to get people to buy you. When the incentives align and you and AMD are both seeing a benefit from the open-source strategy it's great, but it's not something they "are in favor of as a company", they're happy to say no to open-source when it gives them a strategic advantage.

JonChesterfield · on Feb 19, 2023

I didn't watch the youtube link. I have now and looked up who the speaker is. He's not aligned with my philosophy. In a past life I made proprietary tools for games studios releasing proprietary products. Most games dev is decidedly hostile to open source, and most windows development is likewise, so I'm not hugely surprised to see that position stated. There may be similar factors at play with the drivers on windows, not my sandbox.

I remember the epyc processor lock story breaking. I don't know how that played out in practice. I do know I'm going to be _really_ angry if it turns out my chip only works in asrock motherboards since that was the first one I put it in. I'd forgotten about that when buying it :(

It seems plausible that whoever is presently the underdog makes nice with open source and whoever is presently on top is not. See e.g. Microsoft over time.

I don't think ROCm will go proprietary because the commercial pressure is in the direction of supercomputers. Specifically, customers of these computers have their own engineers working directly on the open source upstream of the ROCm stack. Both writing optimisations targeting the applications they care about and fixing bugs that trouble them. I mostly get my code reviewed by people outside of AMD. That sort of dynamic doesn't work if you tell customers they need to connect through your VPN and deal with the internal bug tracking systems in order to request changes, as opposed to patching the toolchain themselves.

I am absolutely sure that the overall ROCm architecture was not designed to achieve user lock-in. To the extent I can see an overarching design, it looks like doing the simplest thing we can think of that works OK on clusters. I do see what looks like scar tissue from an initial bring up during the period where AMD was flirting with bankruptcy. OpenCL was implemented first and then apparently ignored by industry. The current compute model (HSA) was designed in collaboration with various other companies, of which I think Qualcomm are still using it but noone else is. I believe there are commercial reasons why we can't implement CUDA (and thus created HIP, which looks pretty similar and I'm told runs on amdgpu or on nvptx). Clang's OpenMP is converging on identical implementations for amdgpu and nvptx and will do the same for intel if they ever show up. That'll already compile a program that runs on either GPU arch if you ask it to.

Obviously I can't totally rule out a change in policy - some corporate mandate may come down that GPU compute is done with this open source model and customers will just have to accept what they're given. There are evidently factions within the company who would claim it is the right thing to do and there's a lag on consequences to changes like that. Hopefully maximising HPC sales will imply open source software for a long time.

paulmd · on Feb 24, 2023

> Most games dev is decidedly hostile to open source, and most windows development is likewise,

Yes but there's no problem with incorporating BSD/MIT license tools into proprietary games - that's the point of BSD/MIT. I know in general open-source isn't how games roll but games can use MIT tools without issue.

Windows drivers being open isn't something I realistically expect AMD to do, there's not really a demand for it (and maybe not even a delivery pipeline for it in this age of anticheat) but, still sucks. y'all also still have plenty of proprietary lock-ins, like Infinity Fabric Coherent Link vs CXL. They just are in areas where "everyone does that". And yup, so do you. Building a hardware+software stack is a large expense that nobody wants to give away. Gasp, proprietary. Is Gsync different from Infinity Fabric Coherent Link? Someone paid a lot to build them both.

The lack of third-party chipsets nowadays is a bummer. Nobody's gonna allow Nforce chipsets anymore. Although to be fair on AMD you can boot it as X300 (and epyc is SOC) and then do whatever you want as a chipset, I guess. It's just an I/O expander, not tied to the bringup like Intel.

> I remember the epyc processor lock story breaking. I don't know how that played out in practice. I do know I'm going to be _really_ angry if it turns out my chip only works in asrock motherboards since that was the first one I put it in. I'd forgotten about that when buying it :(

Asrock isn't the problem, it's putting Dell/HPE/etc into your Asrock. Essentially AMD destroyed the threat of any competition from secondhand corporate server sales, or at least significantly complicated the issue. There will never be the kind of flood (like is currently happening with 2011-3) where you can drop a server chip into your whitebox build at 1/100th the price it was originally sold for.

Or at least you will have to be very very careful about what you buy - and ebay sellers won't deal in that level of detail most of the time. Is that CPU that doesn't say it's brand-locked really unlocked, or just the seller doesn't know the history? Tune in next week to find out!

Yes, zey do zat on purpose. Absolutely. If it was about swapping CPUs inside datacenters to prevent attackers/hostile hardware you couldn't swap CPUs between motherboards at all, if I need to find another HP to swap with this HP umm ok it's $20 on ebay? It's only between brands because that's the MVP to kill secondhand sales, and it's permanent.

> I am absolutely sure that the overall ROCm architecture was not designed to achieve user lock-in. To the extent I can see an overarching design, it looks like doing the simplest thing we can think of that works OK on clusters.

Yeah that's fair. I can buy that ROCm is the minimum viable product for getting to National Labs HPC sales. I also just think you've got a huge problem with ROCm in general, it's not mature and it's not even a path that will lead you to something viable.

People don't want to distribute source, ever. Windows user application stories starting with "install wsl2" are no bueno let alone if you make it tough because they bought the wrong one. Compiling sucks, that's why Docker is a thing now, distribute a whole userland because compiling sucks and dependencies suck.

Static compilation on FSR may come back to bite AMD sooner or later - games are already dropping off the treadmill and because AMD didn't distribute it as libraries... welp sux, can't even swap DLLs. And they're going to wish they had, when FSR 3.x and FSR 4.x come out. AMD has their own improvements they will need to make and they've ruled out modularity or microdeploys, everyone has to compile and revalidate and submit the whole damn game again.

> I believe there are commercial reasons why we can't implement CUDA (and thus created HIP, which looks pretty similar and I'm told runs on amdgpu or on nvptx).

Microsoft funded GPU ocelot lol, but maybe that's a less direct conflict of interest. Maybe join Intel in OneAPI? I've heard good things about the concept (SyCL) although I haven't looked at specifics.

HIP is dependent on ROCm. Tied at the HIP, if you will. Don't have a $5k commercial datacenter card? wow sucks. Actually you will also need to compile that as a different slice so that will be even more complicated.

You gotta fix ROCm before you can lean on the "but we have HIP" thing. You aren't going to win adoption without widespread prosumer support, which ROCm doesn't really do. You kinda need to fix the support story with binary slices, why can't it be a family-wide ("RDNA2") instead of die-by-die? that sucks. CUDA does that. There is a well-defined CUDA Compute Capability table too. That makes feature-targeting code easy to write too. Umm this compiles on Turing and up.

https://en.wikipedia.org/wiki/CUDA#Version_features_and_spec...

It's the college kids and the universities who win you the next 15 years. Know where I learned CUDA? University. What did we have? K40s. What did I do all my dev work on? My grad workstation with a GT 640 I bought for $60 (2 whole GB), and my Thinkpad. Is that the kind of thing that is supported by ROCm? No.

> There are evidently factions within the company who would claim it is the right thing to do and there's a lag on consequences to changes like that. Hopefully maximising HPC sales will imply open source software for a long time.

I know and I'm not ragging on you personally. I didn't know who you were either but it's been an interesting chat.

I appreciate the "not my philosophy" at least.