Hacker Newsnew | past | comments | ask | show | jobs | submit | ingenieroariel's commentslogin

With Apple devices you get very fast predictions once it gets going but it is inferior to nvidia precisely during prefetch (processing prompt/context) before it really gets going.

For our code assistant use cases the local inference on Macs will tend to favor workflows where there is a lot of generation and little reading and this is the opposite of how many of use use Claude Code.

Source: I started getting Mac Studios with max ram as soon as the first llama model was released.


> With Apple devices you get very fast predictions once it gets going but it is inferior to nvidia precisely during prefetch (processing prompt/context) before it really gets going

I have a Mac and an nVidia build and I’m not disagreeing

But nobody is building a useful nVidia LLM box for the price of a $500 Mac Mini

You’re also not getting as much RAM as a Mac Studio unless you’re stacking multiple $8,000 nVidia RTX 6000s.

There is always something faster in LLM hardware. Apple is popular for the price points of average consumers.


Not many are getting useful inference out of a $500 mac mini, due to only having 16GB of RAM.

It depends. This particular model has larger experts with more active parameters so 16GB is likely not enough (at least not without further tricks) but there are much sparser models where an active expert can be in RAM while the weights for all other experts stay on disk. This becomes more and more of a necessity as models get sparser and RAM itself gets tighter. It lowers performance but the end result can still be "useful".

This. It's awful to wait 15 minutes for M3 Ultra to start generating tokens when your coding agent has 100k+ tokens in its context. This can be partially offset by adding DGX Spark to accelerate this phase. M5 Ultra should be like DGX Spark for prefill and M3 Ultra for token generation but who know when it will pop up and for how much? And it still will be at around 3080 GPU levels just with 512GB RAM.

All Apple devices have a NPU which is potentially able to save power for compute bound operations like prefill (at least if you're ok with FP16 FMA/INT8 MADD arithmetic). It's just a matter of hooking up support to the main local AI frameworks. This is not a speedup per se but gives you more headroom wrt. power and thermals for everything else, so should yield higher performance overall.

AFAIK, only CoreML can use Apple's NPU (ANE). Pytorch, MLX and the other kids on the block use MPS (the GPU). I think the limitations you mentioned relate to that (but I might be missing something)

Vllm-mlx with prefix caching helps with this.

I like Nix as well, you can use this one liner in OSX or Linux to try out arcan/durden/cat9, as it matures you can expect arcan applications to be made available the way html pages/apps and this kind of nix derivation would let you run the "browser":

nix run --impure 'git+https://codeberg.org/ingenieroariel/arcan?ref=nix-flake-buil...'


To add to lproven's point.

An article called "A Spreadsheet and a Debugger walk into a Shell" [0] by Bjorn (letoram) is a good showcase of an alternative to cells in a Jupyter notebook (Excel like cells!). Another alternative a bit more similar to Jupyter that also runs on Arcan is Pipeworld.

[0] https://arcan-fe.com/2024/09/16/a-spreadsheet-and-a-debugger... [1] https://arcan-fe.com/2021/04/12/introducing-pipeworld/

PS: I hang out at Arcan's Discord Server, you are welcome to join https://discord.com/invite/sdNzrgXMn7


The scipy/numpy to matlab is a good example. In my opinion it is on its way but in many places the timing is more like 2010-2013 where a lot of people knew python was the future but universities still used only Matlab.


I think the answer depends on the country: In places where the government uses QGIS it is like Blender. In places where ESRI has a stronghold it is like LibreOffice.


Mark Knol is a great generative coding artist: https://github.com/markknol/

Chris Randall is pretty awesome too: https://www.instagram.com/chris.randall.art/


Hey AJ, this is almost on topic, do you know of a more up to date version of the dataset you used on the blog post release for H3 v4.0.0 [1]? They stopped updating in Oct 2023. Thanks! [1] https://data.humdata.org/dataset/kontur-population-dataset


I don't. And maybe I should have emphasized "and have a data source" more, since its doing a lot of the heavy-lifting in my statement :)


It is a gui framework that allows you create terminals where you can detach any running process into another terminal.

Since it is a complete toolkit, you can have detachable applications where you send both code and state to a server and retrieve it from another device (like Apple's continuity).

In the end it is just a bunch of lua scripts talking to other components via /dev/shm and to other computers using a new protocol called a12://


I did not believe you and just typed it on OSX, half a minute later the app was ready for me to use.

nix run nixpkgs#pyspread [0/1 built, 3/113/132 copied (1311.8/1721.6 MiB), 280.4/300.7 MiB DL] fetching llvm-16.0.6 from https://cache.nixos.org

https://pasteboard.co/P1eh7B7W8C9R.png


I'll let Kyle chime in but I tested it a few months ago with millions of polygons on an M2 16GB of RAM laptop and it worked very well.

There is a library by the same author called lonboard that provides the JS bits inside JupyterLab. https://github.com/developmentseed/lonboard

<speculation>I think it is based on the Kepler.gl / Deck.gl data loaders that go straight to GPU from network.</speculation>


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: