> And worse, different binary slices are used between different dies of the same product line. As an example, for RDNA2, ROCm math libraries are compiled only for Navi21. This means that on a (smaller) Navi22 die (notably present in the 6700 XT), those components aren’t functional. The workaround is manually recompiling ROCm with support for more targets. Such a roadblock is very discouraging for adopters – and does complicate application distribution too.
holy shit I didn't realize the portability story was that bad, it can't even target a whole uarch/family at once (eg target all RDNA2) and it has to know about every specific die it will ever run on?
yeah I mean that's the deep deep problem with ROCm is there's no equivalent to PTX, AMD just wants you to distribute source and recompile everything at runtime, everything about ROCm pushes you towards source distribution rather than any kind of bytecode/IL or gosh even an executable file you could actually just run.
On NVIDIA the support story is simple: the driver does the final translation from PTX to assembly at runtime, so, as long as there is at least one overlap in the PTX versions packaged in the app and the PTX versions supported by the driver, it runs.
In practice this means you can take a program that was compiled for a CUDA 1.0 GPU (tesla uarch) and it'll run today on an ada, no questions asked. You might be leaving performance on the table by not fully exploiting the newer uarchs, but it'll run, just like x86. And you can take a program today and compile it against CUDA 1.0 capability targets (as long as it doesn't use any features that don't exist in those older versions) and the full 2023-era software toolchain will all just magically work on your 8800GTX even though driver support for that hardware has been dead for 10+ years. Because the driver knows how to run PTX 1.0 and the toolchain knows how to build PTX 1.0, and that's all that matters.
ROCm has always really struck me as being aimed at the HPC market - only supporting pro-tier GPUs, focusing on users who are already running custom software, openly disdaining amateur support, etc. And frankly the distribute-as-source model makes ROCm just a complete nonstarter for any sort of commercial or other model where the vendor would absolutely be distributing libraries or a compiled application, let alone trying to wrangle dumb customers through getting a working ROCm environment. Which right now is the authentic "linux in 1995" level experience/ordeal.
Getting serious would be restarting the GPU Ocelot project and going after PTX compatibility. But a corporation is never going to accept being a client on someone else's platform, just like they won't support the open-source Streamline framework because something something 'pluggable frameworks and library code are anti-user-freedom'. Same thing there too, AMD will only support FSR in statically-compiled code and they expect everyone to recompile all their shit and validate and push updates for every single game, every time AMD releases an update, because it's inconvenient for AMD's corporate strategy.
(oh and now that they finally have the ML hardware, the rumor is they're working on their own ML-based upscaler, quelle surprise that the "gosh we would never do anything that legacy users couldn't run" was just a bit too.)
AMD doesn't want user freedom, they want to be the one with the leash. That's why they do the source-distribution-only model... you'll have to work with your code in the ROCm ecosystem and not NVIDIA's. You'll be compiling against HIP and not NVIDIA's stuff. Etc etc. It's not a zero-effort thing to leap the gap, and they want to keep you once you do it, you'll have the same leap to get back out.
>ROCm has always really struck me as being aimed at the HPC market - only supporting pro-tier GPUs, focusing on users who are already running custom software, openly disdaining amateur support, etc. And frankly the distribute-as-source model makes ROCm just a complete nonstarter for any sort of commercial or other model where the vendor would absolutely be distributing libraries or a compiled application, let alone trying to wrangle dumb customers through getting a working ROCm environment. Which right now is the authentic "linux in 1995" level experience/ordeal.
While I somewhat agree, I'd say they also fail at properly targeting the HPC market. The specific slice they're targeting is people who are already forced to work with them, such as developers for software intended ahead of time to be run on their supercomputers. This leads to them still missing out on the portion of the HPC market which isn't bound to a specific supercomputer.
For example, with the software we work on, we currently benefit from our choice of CUDA due to the supercomputer we have access to being A100 based. But if working with ROCm were better and we could test it on more consumer hardware first, it could make a convincing case for gaining access to supercomputers with AMD GPUs, which would then influence the selection of their hardware in other machines. But since we can't, once our CUDA support is more mature it's essentially a given that our lab will be buying many more NVIDIA cards to allow other researchers to take advantage of it.
It's more subtle than that. AMDGPU doesn't have an equivalent to PTX to abstract over differences between hardware generations. So while CUDA caused problems with Volta changing the intrinsics it was able to mostly paper over it in the toolchain. For AMDGPU, changing from a gfx1030 to a gfx1031 probably means recompiling all the machine code.
Because there are _lots_ of different cards and they all have their own machine code, distributing libraries is a pain. You either distribute N copies or do some packaging effort to hide that you've distributed N copies, or you ship raw LLVM IR and cross your fingers that patching it up on the fly works out, in defiance of LLVM not really supporting that.
AMD as a company is in favour of open source. How they feel about user freedom is less obvious, but as long as I can rebuild their stack from source when it falls over, I'm going to stick with it over nvidia. I especially like that the driver ships in the linux kernel.
I don't know man, did you watch the video I linked from AMD's FSR lead?
"we won't support [an open source API/framework] because it might be used to plug things we don't like" is pretty explicitly hostile to both user freedom and open source as a whole. The freedom to do things that you might not want me to or that you won't do is really the only user freedom in this sense, right?
What happens when some game won't update to FSR 2.2 and you're stuck on FSR 2.1 forever? That's the thing open-source user freedom is supposed to fix, right? AMD doesn't get to determine that "our software is the best and alternatives offer no benefits" either, that's the end-user's choice, and when that's followed by "therefore we will work against adoption of pluggability and interoperability standards" that crosses into openly hostile.
The lead is being very diplomatic and careful but that's a very coached way to say that "this interoperability standard isn't good for us and we will work to kill it and prevent adoption, regardless of its open-source nature. Our product is better than theirs and we are going to work to deny you the freedom to choose otherwise."
That's some microsoft level embrace-extend-extinguish shit right there: this is them embracing upscaling tech, extending it with their own proprietary implementation and deliberately kneecapping interoperability, with the goal of eventually extinguishing DLSS. They're just saying the quiet bit out loud and generally being delusional to think they could ever make this happen.
--
Look, big-picture here: AMD did the open-source driver on linux thing because it was a way to get a force-multiplier on their dev time - the community does the work instead of employees that AMD has to pay for. It's a niche community with a DIY mentality and AMD (and Intel before them) leaned on that to get more work done.
AMD's not open-sourcing drivers on Windows. They slap down open APIs like Streamline when it doesn't support their business strategy.
The same thing is happening with ROCm - everyone else does software this way, but, AMD wants you to do it that way, even though it's more work and more cumbersome for end-users, because it results in more lock-in for their software ecosystem. Is that not fundamentally the same thing as people accuse NVIDIA of doing? And that's how they've handled FSR too - user freedom doesn't really matter that much as a principle, actually they've stated they're explicitly against users having these freedoms. They want people statically compiling it because if they embraced user freedom it would help users move more freely between these ecosystems and they don't want that, they are fine with lock-when you're locked into their ecosystem. What they mean by "user freedom" is they want to prevent users from plugging a library that interfaces with their competitor's hardware accelerators. That's actually the opposite of user freedom.
And where does PSP fit into user freedoms - a whole closed processor running underneath your user experience doing god knows what? Wasn't that something people flipped an absolute fucking shit over with Intel ME? Yeah it's got closed-source elements that make it a problem to open, but isn't that also true of the NVIDIA drivers people constantly whinge about? What's the reason for giving AMD a pass on external IP but not NVIDIA? Blobs don't matter anymore if it's AMD?
Or the platform lock - literally preventing secondhand resale of server cpus (and now desktop CPUs too) if they're ever used in a branded system. Note that it's not locked to a motherboard - it's not about preventing parts swapouts. It's locked to a brand, so you can swap any other HP-locked cpu into a HP-locked system. Clearly 100% targeted at killing the secondhand market, and that's pretty damn anti-user-freedom as well, why shouldn't I have the freedom to buy a used CPU if I want? Because it would impact AMD's bottom line I guess?
Like, at the end of the day AMD is rolling in the anti-user-freedoms shit same as everyone else. It's a bit, when you're the underdog you need an angle to get people to buy you. When the incentives align and you and AMD are both seeing a benefit from the open-source strategy it's great, but it's not something they "are in favor of as a company", they're happy to say no to open-source when it gives them a strategic advantage.
I didn't watch the youtube link. I have now and looked up who the speaker is. He's not aligned with my philosophy. In a past life I made proprietary tools for games studios releasing proprietary products. Most games dev is decidedly hostile to open source, and most windows development is likewise, so I'm not hugely surprised to see that position stated. There may be similar factors at play with the drivers on windows, not my sandbox.
I remember the epyc processor lock story breaking. I don't know how that played out in practice. I do know I'm going to be _really_ angry if it turns out my chip only works in asrock motherboards since that was the first one I put it in. I'd forgotten about that when buying it :(
It seems plausible that whoever is presently the underdog makes nice with open source and whoever is presently on top is not. See e.g. Microsoft over time.
I don't think ROCm will go proprietary because the commercial pressure is in the direction of supercomputers. Specifically, customers of these computers have their own engineers working directly on the open source upstream of the ROCm stack. Both writing optimisations targeting the applications they care about and fixing bugs that trouble them. I mostly get my code reviewed by people outside of AMD. That sort of dynamic doesn't work if you tell customers they need to connect through your VPN and deal with the internal bug tracking systems in order to request changes, as opposed to patching the toolchain themselves.
I am absolutely sure that the overall ROCm architecture was not designed to achieve user lock-in. To the extent I can see an overarching design, it looks like doing the simplest thing we can think of that works OK on clusters. I do see what looks like scar tissue from an initial bring up during the period where AMD was flirting with bankruptcy. OpenCL was implemented first and then apparently ignored by industry. The current compute model (HSA) was designed in collaboration with various other companies, of which I think Qualcomm are still using it but noone else is. I believe there are commercial reasons why we can't implement CUDA (and thus created HIP, which looks pretty similar and I'm told runs on amdgpu or on nvptx). Clang's OpenMP is converging on identical implementations for amdgpu and nvptx and will do the same for intel if they ever show up. That'll already compile a program that runs on either GPU arch if you ask it to.
Obviously I can't totally rule out a change in policy - some corporate mandate may come down that GPU compute is done with this open source model and customers will just have to accept what they're given. There are evidently factions within the company who would claim it is the right thing to do and there's a lag on consequences to changes like that. Hopefully maximising HPC sales will imply open source software for a long time.
> Most games dev is decidedly hostile to open source, and most windows development is likewise,
Yes but there's no problem with incorporating BSD/MIT license tools into proprietary games - that's the point of BSD/MIT. I know in general open-source isn't how games roll but games can use MIT tools without issue.
Windows drivers being open isn't something I realistically expect AMD to do, there's not really a demand for it (and maybe not even a delivery pipeline for it in this age of anticheat) but, still sucks. y'all also still have plenty of proprietary lock-ins, like Infinity Fabric Coherent Link vs CXL. They just are in areas where "everyone does that". And yup, so do you. Building a hardware+software stack is a large expense that nobody wants to give away. Gasp, proprietary. Is Gsync different from Infinity Fabric Coherent Link? Someone paid a lot to build them both.
The lack of third-party chipsets nowadays is a bummer. Nobody's gonna allow Nforce chipsets anymore. Although to be fair on AMD you can boot it as X300 (and epyc is SOC) and then do whatever you want as a chipset, I guess. It's just an I/O expander, not tied to the bringup like Intel.
> I remember the epyc processor lock story breaking. I don't know how that played out in practice. I do know I'm going to be _really_ angry if it turns out my chip only works in asrock motherboards since that was the first one I put it in. I'd forgotten about that when buying it :(
Asrock isn't the problem, it's putting Dell/HPE/etc into your Asrock. Essentially AMD destroyed the threat of any competition from secondhand corporate server sales, or at least significantly complicated the issue. There will never be the kind of flood (like is currently happening with 2011-3) where you can drop a server chip into your whitebox build at 1/100th the price it was originally sold for.
Or at least you will have to be very very careful about what you buy - and ebay sellers won't deal in that level of detail most of the time. Is that CPU that doesn't say it's brand-locked really unlocked, or just the seller doesn't know the history? Tune in next week to find out!
Yes, zey do zat on purpose. Absolutely. If it was about swapping CPUs inside datacenters to prevent attackers/hostile hardware you couldn't swap CPUs between motherboards at all, if I need to find another HP to swap with this HP umm ok it's $20 on ebay? It's only between brands because that's the MVP to kill secondhand sales, and it's permanent.
> I am absolutely sure that the overall ROCm architecture was not designed to achieve user lock-in. To the extent I can see an overarching design, it looks like doing the simplest thing we can think of that works OK on clusters.
Yeah that's fair. I can buy that ROCm is the minimum viable product for getting to National Labs HPC sales. I also just think you've got a huge problem with ROCm in general, it's not mature and it's not even a path that will lead you to something viable.
People don't want to distribute source, ever. Windows user application stories starting with "install wsl2" are no bueno let alone if you make it tough because they bought the wrong one. Compiling sucks, that's why Docker is a thing now, distribute a whole userland because compiling sucks and dependencies suck.
Static compilation on FSR may come back to bite AMD sooner or later - games are already dropping off the treadmill and because AMD didn't distribute it as libraries... welp sux, can't even swap DLLs. And they're going to wish they had, when FSR 3.x and FSR 4.x come out. AMD has their own improvements they will need to make and they've ruled out modularity or microdeploys, everyone has to compile and revalidate and submit the whole damn game again.
> I believe there are commercial reasons why we can't implement CUDA (and thus created HIP, which looks pretty similar and I'm told runs on amdgpu or on nvptx).
Microsoft funded GPU ocelot lol, but maybe that's a less direct conflict of interest. Maybe join Intel in OneAPI? I've heard good things about the concept (SyCL) although I haven't looked at specifics.
HIP is dependent on ROCm. Tied at the HIP, if you will. Don't have a $5k commercial datacenter card? wow sucks. Actually you will also need to compile that as a different slice so that will be even more complicated.
You gotta fix ROCm before you can lean on the "but we have HIP" thing. You aren't going to win adoption without widespread prosumer support, which ROCm doesn't really do. You kinda need to fix the support story with binary slices, why can't it be a family-wide ("RDNA2") instead of die-by-die? that sucks. CUDA does that. There is a well-defined CUDA Compute Capability table too. That makes feature-targeting code easy to write too. Umm this compiles on Turing and up.
It's the college kids and the universities who win you the next 15 years. Know where I learned CUDA? University. What did we have? K40s. What did I do all my dev work on? My grad workstation with a GT 640 I bought for $60 (2 whole GB), and my Thinkpad. Is that the kind of thing that is supported by ROCm? No.
> There are evidently factions within the company who would claim it is the right thing to do and there's a lag on consequences to changes like that. Hopefully maximising HPC sales will imply open source software for a long time.
I know and I'm not ragging on you personally. I didn't know who you were either but it's been an interesting chat.
holy shit I didn't realize the portability story was that bad, it can't even target a whole uarch/family at once (eg target all RDNA2) and it has to know about every specific die it will ever run on?
yeah I mean that's the deep deep problem with ROCm is there's no equivalent to PTX, AMD just wants you to distribute source and recompile everything at runtime, everything about ROCm pushes you towards source distribution rather than any kind of bytecode/IL or gosh even an executable file you could actually just run.
On NVIDIA the support story is simple: the driver does the final translation from PTX to assembly at runtime, so, as long as there is at least one overlap in the PTX versions packaged in the app and the PTX versions supported by the driver, it runs.
In practice this means you can take a program that was compiled for a CUDA 1.0 GPU (tesla uarch) and it'll run today on an ada, no questions asked. You might be leaving performance on the table by not fully exploiting the newer uarchs, but it'll run, just like x86. And you can take a program today and compile it against CUDA 1.0 capability targets (as long as it doesn't use any features that don't exist in those older versions) and the full 2023-era software toolchain will all just magically work on your 8800GTX even though driver support for that hardware has been dead for 10+ years. Because the driver knows how to run PTX 1.0 and the toolchain knows how to build PTX 1.0, and that's all that matters.
ROCm has always really struck me as being aimed at the HPC market - only supporting pro-tier GPUs, focusing on users who are already running custom software, openly disdaining amateur support, etc. And frankly the distribute-as-source model makes ROCm just a complete nonstarter for any sort of commercial or other model where the vendor would absolutely be distributing libraries or a compiled application, let alone trying to wrangle dumb customers through getting a working ROCm environment. Which right now is the authentic "linux in 1995" level experience/ordeal.
Getting serious would be restarting the GPU Ocelot project and going after PTX compatibility. But a corporation is never going to accept being a client on someone else's platform, just like they won't support the open-source Streamline framework because something something 'pluggable frameworks and library code are anti-user-freedom'. Same thing there too, AMD will only support FSR in statically-compiled code and they expect everyone to recompile all their shit and validate and push updates for every single game, every time AMD releases an update, because it's inconvenient for AMD's corporate strategy.
https://youtu.be/8ve5dDQ6TQE?t=974
(oh and now that they finally have the ML hardware, the rumor is they're working on their own ML-based upscaler, quelle surprise that the "gosh we would never do anything that legacy users couldn't run" was just a bit too.)
AMD doesn't want user freedom, they want to be the one with the leash. That's why they do the source-distribution-only model... you'll have to work with your code in the ROCm ecosystem and not NVIDIA's. You'll be compiling against HIP and not NVIDIA's stuff. Etc etc. It's not a zero-effort thing to leap the gap, and they want to keep you once you do it, you'll have the same leap to get back out.