LLMs don't "hallucinate" or "lie." They have no intent. They're Weighted Random Word Generator Machines. They're train mathematically to create series of tokens. Whenever they get something "right," it's literally by accident. If you get that rate of accidental rightness up to 80%, and people suddenly thing the random word generator is some kind of oracle. It's not. It's a large model with an embedded space, tokens and a whole series of computationally expensive perceptron and attention blocks that generate output.
The title/introduction is very baited, because it implies some "physical" connection to hallucinations in biological organism, but it's focused on trying to single out certain parts of the model. LLMs are absolutely nothing at all like a biological system, of which our brains are orders of magnitudes more complex than the machines we've built that we no longer fully understand. Believing in these LLMs as being some next stage in understanding intelligence is hubris.
Up to a hair over 60% utilization the queuing delays on any work queue remain essentially negligible. At 70 they become noticeable, and at 80% they've doubled. And then it just turns into a shitshow from there on.
The rule of thumb is 60% is zero, and 80% is the inflection point where delays go exponential.
The biggest cluster I ran, we hit about 65% CPU at our target P95 time, which is pretty much right on the theoretical mark.
"B" just means "billion". A 7B model has 7 billion parameters. Most models are trained in fp16, so each parameter takes two bytes at full precision. Therefore, 7B = 14GB of memory. You can easily quantize models to 8 bits per parameter with very little quality loss, so then 7B = 7GB of memory. With more quality loss (making the model dumber), you can quantize to 4 bits per parameter, so 7B = 3.5GB of memory. There are ways to quantize at other levels too, anywhere from under 2 bits per parameter up to 6 bits per parameter are common.
There is additional memory used for context / KV cache. So, if you use a large context window for a model, you will need to factor in several additional gigabytes for that, but it is much harder to provide a rule of thumb for that overhead. Most of the time, the overhead is significantly less than the size of the model, so not 2x or anything. (The size of the context window is related to the amount of text/images that you can have in a conversation before the LLM begins forgetting the earlier parts of the conversation.)
The most important thing for local LLM performance is typically memory bandwidth. This is why GPUs are so much faster for LLM inference than CPUs, since GPU VRAM is many times the speed of CPU RAM. Apple Silicon offers rather decent memory bandwidth, which makes the performance fit somewhere between a typical Intel/AMD CPU and a typical GPU. Apple Silicon is definitely not as fast as a discrete GPU with the same amount of VRAM.
That's about all you need to know to get started. There are obviously nuances and exceptions that apply in certain situations.
A 32B model at 5 bits per parameter will comfortably fit onto a 24GB GPU and provide decent speed, as long as the context window isn't set to a huge value.
One thing I found systemd really confusing was its treatment of ExecStop in a service script. ExecStart is the command to run when systemd starts the service at system boot up (or when a user tells systemd to start the service). However, ExecStop is run when the starting command has finished running. You have to set RemainAfterExit=yes to have the desired function of running the stop command on system shutdown or on user stopping the service. ExecStop is basically the "on-cleanup" event rather than "to-shutdown-the-service" event.
> I installed Wine in Ubuntu running in WSL on a Windows 11 machine, and the game runs in that environment! Never thought I would run an old game in such a convoluted way.
Just drop them in the game folder and the game should load them instead of the real directx.
There's also other implementations of old APIs to keep old video games running, some of them are even used by linux users who use wine, like dgVoodoo :
https://github.com/FunkyFr3sh/cnc-ddraw (fixes all issues you can have with DirectDraw, an old 2d API, can have its use for both windows users and people who use wine on linux)
This, along with Windows's own compatibility mode tweaks, should run almost any game that has ever been released on Windows, without having the heavy overheard of a VM (as far as I know, WSL doesn't even know how to free memory it has claimed).
I got a Samsung S21 FE 5G as an award on a programming competition.
OEM unlocking wasn't even an option in the developer settings until I connected the phone to the internet and set the date to one month in the past (I assume this has something to do with the warranty -- you can't even unlock the bootloader right away).
An internet connection was required before even using the phone on the Android initial setup screen.
Apart from that, developer settings can't even be enabled before agreeing with Samsung EULA. Initial setup screen can be weirdly manipulated into opening settings (Accessibility, Additional apps, Live transcribe, Connectivity settings (only shows if there's no inet connection), back), but spamming the Build number does not enable developer settings.
Granted that I did not buy the phone, but it's still disgusting that such devices are being sold.
Something I try to spread general awareness for whenever someone posts a local account workaround:
The easiest and most foolproof workaround to get a local account is to attempt to use a banned account. Someone conveniently got "no@thankyou.com" banned so if you attempt to log in with it (use anything for the password, you're not actually trying to succeed in logging in) it'll dump you straight to the local account screen since they don't want a disabled user to create a new account.
Unlike other workarounds this is intended behavior so not only is it easy to trigger but it's far less likely to stop working when the installer is updated again.
So, the rules of copyright are conceptually very simple:
- Authors own a copyright over their work for a limited amount of time, then it is escheated to the public domain. While a work is under copyright, you need to get permission to copy it.
- You can't copyright ideas - that requires owning a patent, which has a far higher bar[0] to clear in order to get. You only get copyright over expressions of ideas - at a minimum some combination of uncopyrightables that itself can be considered to have a "thin copyright".
- Works that are "based on" another work are called derivative works. If a work is under copyright, you also need permission to make derivative works. If you got permission, then the new work gets its own separate copyright owned by the new artist.
Now, you would assume that whatever is in the public domain is public domain, right? Well, only sort of. Because derivative works get a fresh shiny new copyright, that casts a shadow on the public domain. So I can publish the original text of Shakespeare's Romeo and Juliet, but that doesn't mean that I can perform West Side Story just because it's a derivative of Shakespeare. But at the same time Jerome Robbins can't sue me for performing Shakespeare. The exact shape of a derivative's copyright is the amount of creativity added, and no more.
Therefore, I can still make my own twist on Shakespeare. But I have to be careful. If I decide "hey let's make our own 1950s New York gang warfare take on Romeo and Juliet", then I'm getting closer to just ripping off West Side Story. In fact, there's even a term-of-art for the minimum quanta of copyright: "thin copyright", which is applied to creative combinations of uncopyrightable elements.
Sherlock Holmes is a series of detective stories published as serial fiction[1]. Notably, the series was ended by the creator killing off Sherlock[2], and then brought back about a decade later. This is known by Sherlock fans as "The Great Hiatus"; and after Sherlock was brought back the author started writing him with a lot more emotion.
Let's go forward about one life plus 70 years ahead of time. You're a descendant of Arthur Conan Doyle and your gravy train is about to end, because people can just use Sherlock Holmes and not pay you anymore. Except that only part of the franchise is in the public domain. Specifically the part before the Great Hiatus. And afterwards, Sherlock is arguably a different character. So obviously, if someone makes a Sherlock Holmes adaptation where he acts like post-Hiatus Sherlock, then clearly it's infringing the copyrighted stories!
And if it weren't for those meddling kids, the estate of Arthur Conan Doyle would have gotten away with it, too. Actually, I'm kinda toning down the original argument. They thought that they could recopyright all of Sherlock Holmes by just owning one of the stories with him in it, which is not how copyright works. "Only copyrighted Sherlock is allowed to emote" was their second argument, which they abandoned when settling with Netflix.
It just occurred to me: next year, Mickey Mouse is public domain. I REPEAT: THERE IS ONLY ONE YEAR UNTIL THE COPYRIGHT ON STEAMBOAT WILLIE EXPIRES. So we're going to see all sorts of litigative fireworks as Disney tries to hold onto that cartoon mouse for dear life.
[0] Copyright is automatic, patents require a filing fee. Please stop laughing.
[1] Specifically, as part of a larger magazine. Victorian Brits subscribed to magazines to read Sherlock Holmes in the same way that Japanese teenagers subscribe to Shonen Jump today to read One Piece.
[2] Which was just as controversial and shocking as, say, a manga ending its run in a magazine today. Sherlock Holmes is basically the ur-fandom that all other fandoms were cloned from.
Spider-Man is almost entirely owned by Marvel. Spider-Man's live action movie rights are owned by Sony as long as they produce at least one film with the character every 5 years and 9 months. This includes co-producing them with Marvel. Universal owns the film rights to both solo Hulk and Namor films. Finally, Fox owned the right to make 13 X-Men films, Fantastic Four, and Deadpool films over 20 years, but Disney bought those rights back.
tmux (on workstation) integrates naturally with system clipboard and after finally getting it into my workflow, none of the annoyances in the readme anymore. It Just Works.
A pipeline approval tool (internal at Amazon) that counts metrics.
I was a fairly fresh college-hire SDE1 at Amazon. And I was annoyed, because I'm lazy. Every time I was oncall, I had to manage the deployment pipeline for my teams software- the UI for the tool used by Pickers inside Amazon Warehouses. On Monday, deploy the latest changes to the China stack (small). On Tuesday, check if anything bad happened, and then deploy to the Japan stack (small-ish). On Wednesday, Europe (big). Thursday, North America (biggest). Repeat each week.
And I thought "why am I doing this? There are APIs for all of this stuff!". So I made an automated workflow that hooked into the pipeline system. You gave a metric to look for, a count of how many times the thing should have happened, and an alarm to monitor. If everything looks good, it approves. I hooked it up for my pipeline, and then it usually finished the entire weekly push before Tuesday afternoon. I made it in about 2 weekends on my own time.
And I left it open for anyone in the company to configure for their own pipelines. A few weeks later I was checking if it was still operating normally and realized there were something like 50 teams using it. Then 100. Then a lot more.
The last I heard, it's considered a best practice for all teams within the company to use it on their pipelines. Before I left in 2021, it was running something like 10,000 approval workflows per day.
I named it after the BBQ/grilling meat thermometer in my kitchen drawer- "RediFork". Given the overlap of "people who read HN" and "devs who worked at Amazon", I probably saved someone reading this an aggregate hour or two of work.
During a centralisation of public school local servers to a data centre, I created a consolidated library enquiry system. It served over 2,000 libraries, had 330 million titles, and had about a million users. It was efficient enough to run off my laptop, if need be.
AFAIK it was one of the top five biggest library systems in the world at the time.
I was asked to add some features that would have been too difficult in the old distributed system. Things like reading competitions, recommended reading lists by age, etc…
I watched the effect of these changes — which took me mere days of effort to implement — and the combined result was that students read about a million additional books they would not have otherwise.
I’ve had a far greater effect on the literacy of our state than any educator by orders of magnitude and hardly anyone in the department of education even knows my name!
This was the project that made realise how huge the effort-to-effect ratio that can be when computers are involved…
The legacy back-end system being migrated was Clipper + dBase III on DOS, which is reminiscent of COBOL.
The part I added was built with ASP.NET 2.0 on top of Microsoft SQL Server 2005, and was eventually upgraded to 4.0 and 2008 respectively.
The only magic sauce was the use of SQLCLR to embed a few small snippets of C# code into the SQL Server database engine. This allowed the full-text indexing to be specialised for the high level data partitioning. Without this, searches would have taken up to ten seconds. With this custom search the p90 response time was about 15 milliseconds! I believe PostgreSQL is the only other popular database engine out there that allows this level of fine-tuned custom indexing.
You need to sign up (with email address only) for an API key. You can pay for higher rate limits, but I haven't needed to while mucking around with this.
#!/bin/bash
# Make an account at https://textsynth.com/ get API key and put it in below
curl -s https://api.textsynth.com/v1/engines/gptneox_20B/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOURAPIKEYGOESHERE" \
-d '{"prompt": "Using the Linux shell\n\nQ: how do I '"$*"'?\nA: ", "max_tokens": 200, "temperature":1.0 }' | jq -r .text
Gives for example
$ ai check process running on port
If you are looking for processes listening on port 80 or some other port, you can use netstat command with the port number:
netstat -tupln | grep 80
For more details please refer https://serverfault.com/a/538945/192371
# Cat version with progress meter
cat image.iso | pv >/dev/sdb
The progress meter here is mostly meaningless. You'll see the initial progress go very quickly (because you're writing to in-memory cache), and once it's "done", you'll have to wait some amount of additional time for a final `sync` to complete (and if you forget to do that, you might remove the drive while writes are still in progress).
The best way to write an image to a drive is like so:
`oflag=dsync` bypasses the write cache, so your progress bar is actually meaningful. It also guarantees that writes are actually completed[1]. Yes, that 4M block size may be improved by manual tweaking, and it would be nice if that happened automatically. I'm sure tools to do this exist, but they're not installed ubiquitously by default. Older versions of dd do not support `status=progress`, and as a workaround you can do:
pv foo.iso | dd of=/dev/sdx bs=4M oflag=dsync
(alternatively, you can set up a bash for-loop that periodically sends SIGUSR1 to dd)
[1] Unless the drive has onboard dram cache etc., but this is rare for removable media
P.S. If you use "/dev/sdx" in example commands, it will fail when someone blindly copy-pastes without reading anything, instead of erasing their whole OS
On an ad-supported site, you're not the customer. The advertiser is the customer, you're the product. Making that even worse is social media, where being good on a technical level is only meaningful when we're talking about handling scale -- the rest is if people you care about are using the platform.
If you want to see how bad it gets, look at how Facebook abuses CSS in order to have sponsored posts say "Sponsored" at the top while avoiding a straightforward place in the generated html that says "Sponsored" and could be matched by adblockers. It's nightmarish.
Not the parent, but this is exactly what happens. ArgoCD is pointed to an "application" chart which just points to a path of a helm chart (of your app/sevice) in Git. So when your CI changes the image hash in the values file in your helm chart, ArgoCD with notice that, and change the deployment resource's image hash. It is just a nice system to have resources in sync between your git repo and what you have live in the cluster. Of course, if you change anything else, ArgoCD will change it too.
You're also right, rolling back just includes changing the hash back to whatever you had, or to a new hash which was a result of a git revert or whatever. Also the good thing is that if that newly deployed service is very broken (that it doesn't pass the k8s health check), ArgoCD will hold on to the old ReplicaSet and will not let your service die because of it.
What's also nice about ArgoCD is that you can play a bit with some service/application in a branch. Say you have some live service, and you want to adjust some configuration of it. Usually third party services have a lot of options to set. The problem is that you're not 100% sure how to get what you want, so doing a pull request for every small change can be very slow and exhausting. To work around that, you can point your ArgoCD's application chart to a chart which is in a branch, test/dev/fiddle with just pushing to that remote branch, and when you're satisfied, you merge your branch, and at the same time point your ArgoCD application manifest to point to the master/HEAD for that chart. In effect, at this step, only your Git repo will be updated, your service already has all the changes so ArgoCD will do nothing. That way you can iterate faster, and undo whatever regression you've introduced just by pointing ArgoCD to watch the master, not your branch (or you can just reset your branch to be indentical to master).
Yes! If you want to maintain a full history (and you should!), you can make bash consistently log it:
promptFunc() {
# right before prompting for the next command, save the previous
# command in a file.
echo "$(date +%Y-%m-%d--%H-%M-%S) $(hostname) $PWD $(history 1)" \
>> ~/.full_history
}
PROMPT_COMMAND=promptFunc
If you are rooted, you can also force your own add-ons into stable firefox like so:
USER=16201230
COLLECTION=What-I-want-on-Fenix
cd /data/data/org.mozilla.mozilla.firefox/files
curl -o mozilla_components_addon_collection_*.json "https://addons.mozilla.org/api/v4/accounts/account/$USER/collections/$COLLECTION/addons/?page_size=50&sort=-added"
touch -a -m -t 203012300130.00 mozilla_components_addon_collection_*.json
edit: remove fennec fdroid because TIL that it already has the same add-on override that the FF nightly has. So there is no need for this hack if you have fennec.
So every time my PC is on @ 20:00, a shell window will pop-up, asking me for password and runs the backup :). Since they are incremental, it takes maybe 10-15 minutes top.
For network observability I'm using Cilium's Hubble, which I will soon figure out how to get into a greylog setup or something. For container image vulnerability interrogation I'm running Harbor with Trivy enabled, initial motivation was to have an effective pull through cache for multiple registries because I got rate limited by AWS ECR (due to a misconfigured CI pipeline, oops), but it ended up killing two birds with 1 stone.
Next on my list is writing an admission controller to modify supported registry targets to match my pull through cache configuration.
My home k8s cluster is now "locked down" using micro-vms (kata-containers[0]), pod level firewalling (cilium[1]), permission-limited container users, mostly immutable environments, and distroless[2] base images (not even a shell is inside!). Given how quickly I rolled this out; the tools to enhance cluster environment security seem more accessible now than my previous research a few years ago.
I know it's not exactly a production setup, but I really do feel that it's atleast the most secure runtime environment I've ever had accessible at home. Probably more so than my desktops, which you could argue undermines most of my effort, but I like to think I'm pretty careful.
In the beginning I was very skeptical, but being able to just build a docker/OCI image and then manage its relationships with other services with "one pane of glass" that I can commit to git is so much simpler to me than my previous workflows. My previous setup involved messing with a bunch of tools like packer, cloud-init, terraform, ansible, libvirt, whatever firewall frontend was on the OS, and occasionally sshing in for anything not covered. And now I can feel even more comfortable than when I was running a traditional VM+VLAN per exposed service.
- /var/ is read-write state data and will persist between boots. /var/log/ is for log files.
- /run/ is read-write scratch on a tmpfs, and will be cleaned up between boots. So, UNIX sockets and other temporary files go here.
- /etc/ is per-system configuration overrides. There is a goal to move to "empty /etc/", meaning you should be able to remove all files in /etc/ and still have a functioning system.
- /opt/ is "vendor space". The OS should never touch /opt/.
The names are weird, but that's the intention behind the new scheme. If you want to change configuration, the default config might be shipped read-only in /usr/, but it should be overridable in /etc/. This is why systemd ships read-only unit files in /usr/, which are symlinked into /etc/ to be "enabled".
The title/introduction is very baited, because it implies some "physical" connection to hallucinations in biological organism, but it's focused on trying to single out certain parts of the model. LLMs are absolutely nothing at all like a biological system, of which our brains are orders of magnitudes more complex than the machines we've built that we no longer fully understand. Believing in these LLMs as being some next stage in understanding intelligence is hubris.