Hacker Newsnew | past | comments | ask | show | jobs | submit | subharmonicon's commentslogin

Hotels in Dallas for at least 15 years have disuaded people from walking even a few blocks downtown because they equate all the homelessness with crime.

I found this funny because by far the biggest danger I have seen there are endless electric scooters littering the sidewalks.


Second strongest I remember in my 16 years here, with the 2014 Napa Quake being notably more shaking.


I was in SF on a trip at the time and I got woken only by a few friends texting me to check that I was OK.


Saw an exhibit with some of her work, I think in Albuquerque. Was surprised/delighted to see weavings of circuits.


I’ve also been curious to see actual users compare/contrast their experiences with other options, but so far haven’t seen that.

There seem to be enthusiasts who have experimented a bit and like what they see but I haven’t seen much else.


TLDR: In order to get good performance you need to use vendor-specific extensions that result in the same lock-in Modular has been claiming they will enable you to avoid.


Correct. There is too much architectural divergence between GPU vendors. If they really wanted to avoid vendor specific extensions in user level code, they would have gone with something that could be said to be loosely inspired by tiny grad (which isn't ready yet).

Basically, you need a good description of the hardware and the compiler automatically generates the state of the art GEMM kernel.

Maybe it's 20% worse than Nvidia's hand written kernels, but you can switch hardware vendors or build arbitrary fused kernels at will.


I don’t follow your logic. Mojo can target multiple gpu vendors. What is the Modular specific lock in?


Not OP but I think this could be an instance of leaky abstraction at work. Most of the time you hand-write an accelerator kernel hoping to optimize for runtime performance. If the abstraction/compiler does not fully insulate you from micro-architectural details affecting performance in non-trivial ways (e.g. memory bank conflict as mentioned in the article) then you end up still having per-vendor implementations, or compile-time if-else blocks all over the place. This is less than ideal, but still arguably better than working with separate vendor APIs, or worse, completely separate toolchains.


Yes, it looks like they have some sort of metaprogramming setup (nicer than C++) for doing this: https://www.modular.com/mojo


I can confirm, it’s quite nice.


jw: why do you use mojo here over triton or the new pythonic cute/cutlass?


Because I was originally writing some very CPU intensive SIMD stuff, which Mojo is also fantastic for. Once I got that working and running nicely I decided to try getting the same algo running on GPU since, at the time, they had just open sourced the GPU parts of the stdlib. It was really easy to get going with.

I have not used Triton/Cute/Cutlass though, so I can't compare against anything other than Cuda really.


The blog post is about using an NVIDIA-specific tensor core API that they have built to get good performance.

Modular has been pushing the notion that they are building technology that allows writing HW-vendor neutral solutions so that users can break free of NVIDIA's hold on high performance kernels.

From their own writing:

> We want a unified, programmable system (one small binary!) that can scale across architectures from multiple vendors—while providing industry-leading performance on the most widely used GPUs (and CPUs).


They allow you to write a kernel for Nvidia, or AMD, that can take full advantage of the Hardware of either one, then throw a compile time if-statement in there to switch which kernel to use based on the hardware available.

So, you can support either vendor with as-good-vendor-library performance. That’s not lock-in to me at least.

It’s not as good as the compiler being able to just magically produce optimized kernels for arbitrary hardware though, fully agree there. But it’s a big step forward from Cuda/HIP.


The market tends to be pretty efficient for things like these. We’ve seen significant rapid adoption of several different ML solutions over the last decade, yet Mojo languishes. I think that’s a clear sign they aren’t solving the real-world pain points that users are hitting, and are building a rather niche solution that only appeals to a small number of people, no matter how good their execution may be.


Cancelled my membership many years ago over their refusal to support open access.


Time to resubscribe! Show your support for this change with your wallet!


Windows machine with NVIDIA CPU & 5070 class GPU?


It’s a mediatek CPU with standard ARM cores , in partnership with Nvidia who provide the GPU.


I think for a lot of people in the LLM scene a 5070 is good enough if it has a lot of ram


Yes though on proviso that it has decent mem throughout else the lots of ram isn’t useful


Love this paper and read it several times, most recently around 10 years ago when thinking about whether there were looping constructs missing from popular programming languages.

I have made the same point several times online and in person that the famous quote is misunderstood and often suggest people take the time to go back to the source and read it since it’s a wonderful read.


I recall the original post about Lungy.

Having had an incentive spirometer prescribed for post-surgical use after being on bypass, my experience was that it seemed boring and like a waste of time, so anything that makes breathing exercises more engaging and feel more worthwhile is a win.


Yes, exactly - that's what I saw working on post-op wards – the spirometer would just sort of sit on the bedside and collect dust.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: