Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I can't claim to be at the level of the joyent guys presented here, but I think taking a Operating System class and Computer Architecture class, or reading the respective textbooks helps, and at the same time you have to be familiar with the particular OS you happen to use, probably up to the point of reading and having basic understanding of the source code of the most important subsystems (virtual memory, process scheduling, filesystem handling, TCP/IP stack) and understanding what the system calls are and what they do. Then you need to know a wide range of tools the given OS offers for examining things, so that you do not get hopelessly stuck in the face of an emergency, since you often have to investigate a crash while it happens to even be able to reproduce it, so you need to know how to examine a running process etc. For Linux this means knowing stuff like:

http://en.wikipedia.org/wiki/Strace

http://en.wikipedia.org/wiki/Lsof

http://en.wikipedia.org/wiki/Vmstat

http://en.wikipedia.org/wiki/Netstat

http://en.wikipedia.org/wiki/DTrace

http://en.wikipedia.org/wiki/Tcpdump

http://en.wikipedia.org/wiki/Magic_SysRq_key

https://perf.wiki.kernel.org/index.php/Main_Page

...

There is a big bunch of tools in the OS very few developers know, sysadmins know more, but they often don't understand the OS and use the tools without understanding their output too well.

Some confirmation of what I have written here is the fact that Joyent forked OpenSolaris to create an OS precisely to make it easier to do things of this kind:

http://wiki.smartos.org/display/DOC/Why+SmartOS+-+ZFS%2C+KVM...

In 2005, Sun Microsystems open sourced Solaris, its renowned Unix operating system, eventually to be released as a distribution called OpenSolaris. Among the earliest adopters and most effective advocates of OpenSolaris was Ben Rockwood, who wrote The Cuddletech Guide to Building OpenSolaris in June, 2005 – the first of his many important contributions to the nascent OpenSolaris community. Meanwhile, Joyent's CTO Jason Hoffman was frustrated by the inability of most operating systems to answer seemingly-simple questions like: "Why is the server down? When will it be back up? ... Now that it's back up, why is my database still slow?"

Jason knew that these questions would be a lot easier to answer on Solaris-based systems, and recognized Sun's open-sourcing initiative as a huge opportunity.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: