Hacker Newsnew | past | comments | ask | show | jobs | submit | vsundar's favoriteslogin

I'm the author of https://jalammar.github.io/illustrated-transformer/ and have spent years since introducing people to Transformers and thinking of how best to communicate those concepts. I've found that different people need different kinds of introductions, and the thread here includes some often cited resources including:

https://peterbloem.nl/blog/transformers

https://e2eml.school/transformers.html

I would also add Luis Serrano's article here: https://txt.cohere.com/what-are-transformer-models/ (HN discussion: https://news.ycombinator.com/item?id=35576918).

Looking back at The Illustrated Transformer, when I introduce people to the topic now, I find I can hide some complexity by omitting the encoder-decoder architecture and focusing only on one. Decoders are great because now a lot of people come to Transformers having heard of GPT models (which are decoder only). So for me, my canonical intro to Transformers now only touches on a decoder model. You can see this narrative here: https://www.youtube.com/watch?v=MQnJZuBGmSQ


Here is my list, which is biased towards Linux. Almost all the books mentioned here are dated and primarily written for 2.6 based kernels. Although many concepts are still applicable and in certain subsystems, the code will be as is with minor changes. So, despite them being old they are still good references.

* Design and implementation of the FreeBSD operating system: Good and thorough deep dive in FreeBSD OS. Must have.

* FreeBSD Device Drivers: A Guide for the Intrepid: Didn't quite read it all but looks fine for FreeBSD.

* Mac OSX internals - a systems approach by Amit Singh: It was good back in days but now outdated.

* Linux Device Drivers, 3ed : Very dated but still good to grasp Linux device drivers in general. The code examples are bit silly but good enough. There are some GitHub repos which have updated code for latest kernels.

* Essential Linux Device Drivers by S Venkateswaran: It complements LDD3 book quite nicely. Has some real device examples and very exhaustive. Must have.

* Linux kernel development by Robert Love: Very good for short/quick intro. Best for preparing before interviews ;)

* Professional Linux kernel architecture by Wolfgang Mauerer: As other book, its dated but some of the explanations about interrupts, PCI, etc are good. His callgraph approach was very handy in understanding things.

* The Linux programming interface by Michael Kerrisk: Not really about OS but next thing close to it - system programming (the real system programming). Must have.

* Understanding Linux virtual memory manager by Mel Gorman: As other books, it's dated but still one of the best available to get a handle on memory management under Linux. Must have.

* Understanding the Linux kernel: Dated but still my go to book to refresh certain subsystems. Must have.

* Linux Kernel Programming by Michael Beck: Mentioned for historical reasons, else most out dated book here (2.4 based). Horrible editing and English but heck! I loved it back in the days :)

* Linux kernel networking by Rami Rosen: Never read it but quite dated.

* Understanding Linux network internals by Christian Benvenuti: A real bible of Linux networking. If nothing else, your jaw drops at the effort author made to write this book. Dated but unlike more generic Linux kernel things, network stack is still same in its core. Must have.

* The Linux kernel primer: A top-down approach for x86 and PowerPC Architecture: Very dated book but good to read from PPC perspective. Lot of things have changed since its publication.

* See MIPS Run by Dominic Sweetman: dated but gives good idea about MIPS internals and how Linux uses it.

* IA-64 Linux Kernel: Design and Implementation: Its dated not just w.r.t code but also w.r.t. hardware. Nonetheless, it gives a good insight into IA-64 architecture and Linux from non-x86 perspective.

* Definitive guide to Xen hypervisor: this is the only book on virtual machine which is not just from user perspective. While best way to learn VM is through reading architecture spec and code, this book still satisfied me w.r.t. virtual machine internals.

Every other book on amazon about Linux kernel is more or less useless. For more academic book, "Operating systems - three easy pieces" is good.


I would question all these Raspberry PI-ish NAS attempts, especially when it includes some power adapters and milling out cases. It all feels so fiddly and sluggish while still being "not that cheap". Storing my important personal data on an USB-drive somewhat feels risky. It probably wouldn't burn a house down, but still...

The real benefit is the small form factor and the "low" power consumption. Paying 43 bucks for the whole thing - now asking myself if it is worth saving a few bucks and living with 100Mbit network speed, instead of spending 150 bucks and having 2.5Gig.

There are so many (also "used") alternatives out there:

- Fujitsu Futro S920 (used < 75, ~10W)

- FriendlyElec NanoPI R6C (< 150, ~2W, https://www.friendlyelec.com/index.php?route=product/product...)

- FriendlyElec Nas Kit (< 150, ~5W, https://www.friendlyelec.com/index.php?route=product/product...)

- Dell T20 / T30 (used < 100, ~25W)

- Fujitsu Celsius W570 (used < 100, ~15W)

My personal NAS / Homeserver:

  Fujitsu D3417-B
  Intel Xeon 1225v5
  64GB ECC RAM
  WD SN850x 2TB NVMe
  Pico PSU 120
  
More expensive, but reliable, powerful and drawing <10W Idle.

A little bit of history about the book series may help understand what is in it.

In 1956, Knuth graduated high school and entered college, where he encountered a computer for the first time (the IBM 650, to which the series of books is dedicated). He took to programming like a fish to water, and by the time he finished college in 1960, he was a legendary programmer, single-handedly writing several compilers on par with or better than professionals (and making good money too). In 1962 when he was a graduate student (and also, on the side, a consultant to Burroughs Corporation), the publisher Addison Wesley approached him with a proposal to write a book about writing compilers (given his reputation), as these techniques were not well-known. He thought about it and decided that the scope ought to be broader: programming techniques were themselves not well-known, so he would write about everything: “the art of computer programming”.

This was a time when programming a computer meant writing in that computer's machine code (or in an assembly language for that machine) — and some of those computers were little more than simple calculators with branches and load/store instructions. The techniques he would have to explain were things like functions/subroutines (a reusable block of assembly code, with some calling conventions), data structures like lists and tries, how to do arithmetic (multiplying integers and floating-point numbers and polynomials), etc. He wrote up a 12-chapter outline (culminating in "compiler techniques" in the final chapter), and wrote a draft against it. When it was realized the draft was too long, the plan became to publish it in 7 volumes.

He had started the work with the idea that he would just be a “journalist” documenting the tricks and techniques of other programmers without any special angle of his own, but unavoidably he came up with his own angle (the analysis of algorithms) — he suggested to the publishers to rename the book to “the analysis of algorithms”, but they said it wouldn't sell so ACP (now abbreviated TAOCP) it remained.

He polished up and published the first three volumes in 1968, 1969, and 1973, and his work was so exhaustive and thorough that he basically created the (sub)field. For example, he won a Turing Award in 1974 (for writing a textbook, in his free time, separate from his research job!). He has been continually polishing these books (e.g. Vols 1 and 2 are in their third edition that came out in 1997, and already nearly the 50th different printing of each), offering rewards for errors and suggestions, and Volume 4A came out in 2011 and Volume 4B in 2023 (late 2022 actually).

Now: what is in these books? You can look at the chapter outlines here: https://en.wikipedia.org/w/index.php?title=The_Art_of_Comput... — the topics are low-level (he is interested in practical algorithms that one could conceivably want to write in machine code and actually run, to get answers) but covered in amazing detail. For example, you may think that there's nothing more to say about the idea of “sequential search” than “look through an array till you find the element”, but he has 10 pages of careful study of it, followed by 6 pages of exercises and solutions in small print. Then follow even more pages devoted to binary search. And so on.

(The new volumes on combinatorial algorithms are also like that: I thought I'd written lots of backtracking programs for programming contests and whatnot, and “knew” backtracking, but Knuth exhausted everything I knew in under a page, and followed it with dozens and dozens of pages.)

If you are a certain sort of person, you will enjoy this a lot. Every page is full of lots of clever and deep ideas: Knuth has basically taken the entire published literature in computer science on each topic he covers, digested it thoroughly, passed it through his personal interestingness filter, added some of his own ideas, and published it in carefully written pages of charming, playful, prose. It does require some mathematical maturity (say at the level of decent college student, or strong high school student) to read the mathematical sections, or you can skim through them and just get the ideas.

But you won't learn about, say, writing a React frontend, or a CRUD app, or how to work with Git, or API design for software-engineering in large teams, or any number of things relevant to computer programmers today.

Some ways you could answer for yourself whether it's worth the time and effort:

• Would you read it even if it wasn't called “The Art of Computer Programming”, but was called “The Analysis of Algorithms” or “Don Knuth's big book of super-deep study of some ideas in computer programming”?

• Take a look at some of the recent “pre-fascicles” online, and see if you enjoy them. (E.g. https://cs.stanford.edu/~knuth/fasc5b.ps.gz is the one about backtracking, and an early draft of part of Volume 4B. https://cs.stanford.edu/~knuth/fasc1a.ps.gz is “Bitwise tricks and techniques” — think “Hacker's Delight” — published as part of Volume 4A. Etc.)

• See what other people got out of the books, e.g. these posts: https://commandlinefanatic.com/cgi-bin/showarticle.cgi?artic... https://commandlinefanatic.com/cgi-bin/showarticle.cgi?artic... https://commandlinefanatic.com/cgi-bin/showarticle.cgi?artic... are by someone who read the first three volumes in 3 years. For a while I attended a reading group (some recordings at https://www.youtube.com/channel/UCHOHy9Rjl3MlEfZ2HI0AD3g but I doubt they'll be useful to anyone who didn't attend), and we read about 0.5–2 pages an hour on average IIRC. And so on.

I find reading these books (even if dipping into only a few pages here and there) a more rewarding use of time than social media or HN, for instance, and wish I could make more time for them. But everyone's tastes will differ.


I spent three years reading the first three and (attempting, at least) every exercise. I enjoyed it, but I don’t know if it made me a better programmer. I definitely did learn a lot, although maybe not that much that was really practical.

First, I disagree that early employees of startups are "probably" getting screwed, but it definitely can happen, and often does to people who don't know their real value.

The part of this that resonates with me isn't the mathematics. The math isn't very relevant because there's a really large unknown: the eventual value of the company. One percent could be a lot of money, or it could be nothing. There's also the matter of dilution: is he protected against dilution from investor and employee stock grants over the next N years? I would guess not. His 1% could be 0.2% or less by the time an exit happens.

What is obvious is the emotional undercurrent to this very common anti-pattern. It sounds like he's not a real co-founder, he's "just a coder". They seem to be trying to sell him on a rotten deal because they think it's just such a privilege to work on their golden idea that they don't need to compensate properly. He's going to bust his ass to make the code work, for a salary half of his market rate, and in return he gets a tiny sliver of the company that gives him no real control, on a 4-year vesting cycle. I'm sorry, but these two guys are not (after 4 years, after he's done some real work) worth 79 times what he is just because they had the connections to raise money.

Prospective employees tend to view equity grants in a pre-employment context, when a 1% share seems extremely generous because the employee hasn't done anything yet. But that's what vesting's for! Vesting allows companies to compensate based on future contributions, with the knowledge that if the employee quits or is fired before the 4-year period is up, they won't have to pay for all 4 years of work.

At the least, if still thinks it's an "exciting" opportunity worth pursuing, he should recognize that he probably can't value the company better than the market, that we are in frothy times, and that the equity is worth more to an investor than to him (different risk profiles). So the value of 1% (post-money) of a $2.5 million company is $25,000 at most. That's $6,250 per year, far less than what he's giving up.

The first employee of a startup is not necessarily getting screwed. If that employee gets appropriate respect for his skill set, and reasonable compensation for the risks inherent in a startup, then it's a fair trade. A lot of people go into startups as early employees knowing the risks and upsides and that's fine.

What he should do if he actually wants to work on the startup: First, he needs to value his contribution to the company over the next 4 years appropriately and put a number on his "sweat equity". Let's say his market salary is $100,000 and he's being paid $50,000. Now add to his base salary: benefits (15% for health insurance, 401k matching), job-loss risk (25%, since typical severance offers are 1/4 tenure at current salary), career risk and opportunity cost (15%), and overage hours (30%, assuming a 50-55 hour work week). That's $185,000 per year. Take that, less the $50,000 he's making, and his sweat equity is $135,000 per year. Over 4 years, that's $540,000. The company's valuation is $2.5 million, "pre" to his contributions. He should be getting about 16% of the company, assuming he remains for 4 years. This number seems high, but if he's there after 4 years he will have been there almost as long as the founders, so it's about right.

First action: he needs to ask for 20% and settle for no less than 12%. If they say, "but you haven't done anything yet", he should point out that the equity grant is subject to vesting and that he won't get anything if he doesn't do any work.

Second action: he needs to demand the right to listen in on investor and client meetings. Otherwise, the other two founders will hold all the power in the organization because they, and they alone, hold that special knowledge of what investors want. If they think he's "just a coder", they'll show it by saying (in effect) that no, he's not "good enough" to be in the investor meetings.

The most likely outcome of his making these two demands is that they'll tell him to get lost. If that's the outcome, it's also the best outcome because it means the startup's a tarpit.


Imagine three twenty-something guys working on a startup that has more lines of code than dollars in the bank. They're working out of an apartment and spend most evenings eating ramen noodles from the same MSG-laden box. They work approximately equal hours (too many). They suffer approximately equal stress (more than they ever expected). They bear approximately equal responsibility for not tanking the company through poor performance. They each accept dramatic pay-cuts relative to easier, better jobs which they could sleepwalk their way into.

Next door, there are another three guys, eating ramen, etc etc.

Now, it seems to me like the three guys behind Door #1 are very similar to the three guys behind Door #2. However, in one case they're all co-founders, and in one case they are two co-founders and a first employee. Those are very, very different statuses for the third guy. The third co-founder gets mentioned in press hits about the company. The third co-founder can call himself a co-founder, a status of value in an industry (and society) which is sometimes obsessed with status. The third co-founder cannot get excised from the cap table without that being mentioned as a subplot in the eventual movie.

The first employee will not usually get mentioned. The first employee gets no social status of particular esteem. The first employee will not have a seat at the table -- literally or figuratively -- when the eventual disposition of the first employee's equity is decided. The first employee's equity stake is approximately 1/6 to 1/40th (or less!) of what the third co-founder's was. Well, theoretically. 0.5% is 1/40th of 20% in engineering math, not in investor math, because investors can change the laws of mathematics retroactively. 0.5% of millions of dollars is sometimes nothing at all. (This is one of the least obvious and most important things I have learned from HN.)

If you're good enough to be a first employee, you're probably epsilon away from being good enough to be a third co-founder. There may be good reasons to prefer being an employee... but think darn hard before you make that decision.


We usually encourage folks to give at least 10% of the equity to the first 10 employees. That's what Stripe did and it worked well.

That number is going up over time, so I can report at least a little bit of improvement, though certainly not fast enough.


1-2%. A lot of people say 5%. 5% is what you'll give to an executive down the road. First employees are very important, but probably not as important as your second CEO. 6% is what you give to YC. First employees are very important, but a first employee who is worth almost as much as YC should probably be your cofounder, not an employee.

This is also for your first employee. You may live or die based on the work this person does, and more importantly they may be accepting extraordinary risk (if your runway is less than 6 months, say). The same rules do not apply to employee #3.

Finally, if you don't pay the person, or if you pay them bare subsistence wages (say, $20-30k) and they're senior talent, you've introduced an additional level of risk, and you should pay a premium for that. However, there may be better ways to compensate that risk; for instance, you can defer salary at a premium rate.

A really important thing to remember about employee options is that employees don't know how to value them. You need to be really engaged with the strategic course of the business (not just the product: the business and its future cash flows) to really grok what options mean in rational terms. This is a good reason not to grant huge amounts to early employees; it costs you dearly and doesn't make them much happier.


If by "real" financing you mean a series A round, then you're probably mistaken. People who join a startup with just angel funding might get roughly 4-5x as much stock as they'd get post series A. It's only a worse deal if a series A round makes a company 4-5x less risky.

There's a market price for all the different options, from founder to early hire to later hire. It would not make sense for there to be points on the continuum that would suddenly be a much worse (or better) deal, and in my experience there aren't.

You're also mistaken in saying that an early hire will get 1 percent tops. This number varies by 30-40x, depending on where the company is and how early the hire is. 1 percent would be typical for a very good hacker fairly soon after a series A. You can get much more pre series A.


You can write an article claiming that anyone doing x is probably getting screwed, if you choose numbers that make it a bad deal. In my experience (which at this point is pretty extensive) the numbers he uses here are extreme outliers.

I'd expect a startup that was only able to raise money at $2m pre to be giving the first employee way more than 1%. How much more depends on how good he is (a factor that's not even considered in this article). Someone as good as the founders could reasonably expect 15%.


On the YC SAFE model, there's no stock, so it isn't about that. You give away equity using the SAFE document.

We faced this issue in my company and I definitely get the fear of giving away something to someone unproven. I started out with the employee just on salary. In the written agreement, we offered ownership interest over time. In practice, I wound up giving him the full equity and the co-founder title in 3 months because of his contributions.


If someone joins you within the first 30 days, contributes as much as you do and is taking as much risk as you do then they are an equal co-founder.

If someone joins you later, gets paid from day #1 (and you don't), does a 40 hour work week while you work 80 then they're an employee.

In between you can vary the percentage to reflect the difference.


0.2% seems really low for a first employee. I have no equity with my current employer (employee #10 or so), but my boss also doesn't expect more than 8 hours and pays a competitive salary. My other offer was about 0.01% as employee #22. In a previous startup, I had about 0.1% as employee #13, but that was straight out of high school (the two recent ones have been as a graduate of a top college). I've heard that 2.5% or so is typical for the first employee, and it decreases exponentially from there.

I'd give some thought to what you want to get out of a startup. If you're just looking for experience (as I am, mostly), the equity isn't too important, and the real criteria should be whether you're learning stuff and are involved in decisions. If you want to get rich, be aware that 0.2% of a typical $40M exit is only $80K - not chump change, but you can't look at it as any more than a nice bonus. It will take a $500M exit to make you a million dollars, which basically means you either need a really hot product (Facebook/MySpace/YouTube) or you need to be able to go public. Salary also factors in - if you're being underpaid by $20K relative to market rates, your stock payout would get eaten up in 4 years.

If you're in it for money, I would suggest trying to renegotiate. Losing the first employee would be a tragedy for most startups, so they'd probably be willing to go up to 1-2% equity if they're smart. They may not like it, but part of being a successful entrepreneur is doing stuff you don't like for the good of your company.

If they aren't smart or their ego gets in the way, you want to leave now anyways, because they aren't going anywhere. Also, try to have another job offer in hand when you renegotiate - it helps your bargaining power significantly.

The micro-managing is a bit of a red-flag. It happens a lot with technical cofounders. My current boss has a bad case of it. I had a bad case with it the first time I managed a software project (at a volunteer nonprofit), and letting go and actually trusting the other developers to do things right was one of the hardest things I had to do on that project. The usual result is that the business stagnates and doesn't go anywhere, then folds when the employees all leave because the business isn't going anywhere.


Very cool! Here is the paper: https://zenodo.org/record/7816398. It uses the well known Immix heap layout/algorithm. https://users.cecs.anu.edu.au/~steveb/pubs/papers/immix-pldi...

The old gencgc was pretty cool for the single core era, and it sounds like it still holds up well. If I recall correctly, it was based on the Bartlett Mostly Copying paper, which is an elegant and practical GC design. https://www.hpl.hp.com/techreports/Compaq-DEC/WRL-TN-12.pdf. I miss these old papers that described this stuff in a way you didn’t have to be a math major to understand. I think the first version of that paper had the C++ code as an appendix: https://www.hpl.hp.com/techreports/Compaq-DEC/WRL-88-2.pdf.

Clarity in your technical communications matters. The Immix papers are similarly well written and clear. I don’t think it’s a surprise that both GC designs have also been independently implemented over and over. The Chaitin-Briggs register allocator is another example where I’d attribute at least some of the success in widespread industrial implementation to Briggs’ excellent and approachable PhD thesis describing the algorithm: https://www.cs.utexas.edu/users/mckinley/380C/lecs/briggs-th...


You can take the MIT sequence of courses on edX (taught by, I believe the CEO of edX, so, in a sense, this is the original flagship edX course) https://www.edx.org/course/circuits-electronics-1-basic-circ... https://www.edx.org/course/circuits-electronics-2-amplificat... https://www.edx.org/course/circuits-electronics-3-applicatio...

To the folks I've seen in this thread despairing about how this seems really, really hard at first brush: it is. This is not really a page of intro level electronics concepts. Doesn't mention a whole lot about voltage, current, time vs frequency response, or how to get a grasp on any of those concepts.

If you want a gentler intro that walks you through some of the foundational concepts, Analog Devices has some great free courses on Circuit Theory:

https://wiki.analog.com/university/courses/circuits#circuits...

And another great one on electronics:

https://wiki.analog.com/university/courses/electronics/text/...

They are the best in the business at analog circuit design and A/D conversion. Worthy of your attention if you're serious about learning this stuff.


You can host Photoprism[1] with Pikapods[2] and upload your pictures using PhotoSync[3] and then sync(backup) all your pictures to Hetzner using their Nextcloud hosting[4]. Pikapods is in LA and Hetzner is in Germany. This provides a good reliable setup for your invaluable pictures. This is what I use and I am very happy with, I don’t mind paying more than Icloud knowing that I am supporting open source products. [1]https://www.photoprism.app/ [2]https://www.pikapods.com/ [3]https://www.photosync-app.com/ [4] https://www.hetzner.com/storage/storage-share

For the average systems programmer, 90% of the really useful CPU information can be found in two places that change infrequently:

- Agner Fog's instruction tables for x86[1], which has the latency, pipeline, and ALU concurrency information for a wide range of instructions on various microarchitectures.

- Brief microarchitecture overviews (such as this one for Skylake[2]), that have block diagrams of how all the functional units, memory, and I/O for a CPU are connected. These only change every few years and the changes are marginal so it is easy to keep up.

Knowing the bandwidth (and number) of the connections between functional units and the latency/concurrency of various operations allows you to develop a pretty clear mental model of throughput limitations for a given bit of code. People that have been doing this for a long time can look at a chunk of C code and accurately estimate its real-world throughput without running it.

[1] https://www.agner.org/optimize/instruction_tables.pdf

[2] https://en.wikichip.org/wiki/intel/microarchitectures/skylak...


Some Ulrich Drepper papers:

- on memory access/caches/memory hierarchy: https://www.akkadia.org/drepper/cpumemory.pdf

- on DSO: https://www.akkadia.org/drepper/dsohowto.pdf

- Agner Fog on CPU optimization, latencies, vectors etc: https://www.agner.org/optimize/


Jerry is a phenomenal chess instructor. His latest video on tactical awareness[1] explains more in 2 hours than you might learn in a thousand games at the board.

[1] https://www.youtube.com/watch?v=fzGKPxJ5NYI


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: