Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Learn to read the source, Luke (codinghorror.com)
170 points by mgrouchy on April 16, 2012 | hide | past | favorite | 78 comments


I find it a bit ironic that Jeff Atwood is advocating reading the source when he's built his company on top of a closed-source stack. Can he read the source of ASP.Net when he encounters an issue with the web frontend? Can he debug into MSSQL Server when he has an issue with database performance?

Jeff can talk about the importance of having source available, but his actions speak louder than his words. He's built a very successful startup on top of a closed-source stack. Having the source isn't as important as it seems, then.


Can he read the source of ASP.Net when he encounters an issue with the web frontend?

Yes. Recently they even went a step beyond the traditional "shared source" thing by releasing it under the Apache 2.0 license.

http://aspnetwebstack.codeplex.com/

Less so with MSSQL. But it's less of an issue there, because MSSQL provides a very good view of what's going on under the hood to begin with, and Microsoft has done an extremely good job of documenting the whole thing.


Having worked on the MSSQL source code -- even if you could see it it would not help. It's a incredibly complicated monolith with a lot of historical baggage. Not kind to fresh eyes. Remember that SQL Server was originally Sybase SQL Server.


I remember Sybase ads from my teens. I did not know they were assimilated to generate MSSQL, interesting!


Actually, MS bought the rights to the Sybase software, not the company. There are still quite a few similarities.


That apocryphal last comment is irrelevant though as the code bases entire split many, many, versions ago. (pre 7.0)


Why wouldn't it be relevant? It emphasizes that SQL Server has been a product longer than Google has been a company. Moreover, there's still Sybase code still in the project.

Is the Sybase code the biggest problem? Hell no. How about the fact that there is no "standard library" and there are no less than four different hash table implementations written by different people at different times -- only one of which you should probably use, although you wouldn't know it from the "documentation"? That's a pretty big one and almost definitely what I'd characterize as "historical baggage."


And if memory serves it's been through at least one, possibly two ground up rewrites since then.


It's been through massive rewrites, but they didn't throw away the code and start from scratch. They took five years (between SQL 2000 and SQL 2005) and completely revamped the offering. Looking at SQL 2012, it's really quite impressive how far they've come along, albeit in 12 years.


Whoa. That is actually pretty cool. I didn't realize that Microsoft would open-source such a core piece of technology.


Depending on how you want to define "open source", they've been doing it for a long time. They often make the source available to look at, but don't permit the creation of derivative works.

That said, one of my the nicer things about developing in .NET is that the library source is so accessible. I can easily step into library code from the debugger if I need to. If I haven't already downloaded the source code for that component then the IDE will automatically grab it for me before stepping in. So for my purposes (just wanting to figure out WTF is happening under the hood), the source code generally feels much more accessible on Microsoft's platform than it does on more orthodox open source ones.


That depends on the library, of course. Yes, if the library is open source or shared source, you can step into it with no trouble. But if you're using a closed source library (a phenomenon that's far more common in the Windows world than in the Linux world), you're eventually going to run into problems where the library returns an unexpected value and you just don't know why.


He's referring to the .NET base class libraries (System.* and others). Microsoft releases the source so you can debug right into it.[1] The BCL is far from "open source", it's a "read only, nothing else, not even compile" type license IIRC. But you can step right into the C# implementation of most of the shipped-with-.NET libraries.

1: http://weblogs.asp.net/scottgu/archive/2007/10/03/releasing-...


But even then, it depends on the library. For the most part .NET assemblies are so easy to decompile that even when you don't have the source that doesn't end up being much of an impediment. (The only thing you really don't get is comments.) There are some vendors who insist on obfuscating their assemblies, but they're easy enough to avoid.


Even if they don't obfuscate the library's license will often have an anti-reverse engineering clauses.


Seriously? I can see a license that prevents you from creating a clone by inspecting decompiled source.. but a license that doesn't allow you to attach a debugger?


Wouldn't be enforceable, I'm guessing, as per the protections for reverse engineering that the DMCA codified in 17 USC ss. 1201(f):

. . .a person who has lawfully obtained the right to use a copy of a computer program may circumvent a technological measure that effectively controls access to a particular portion of that program for the sole purpose of identifying and analyzing those elements of the program that are necessary to achieve interoperability of an independently created computer program with other programs. . .


If I recall correctly, I read a while ago they have even opened up their kernel code to Univ programs. Don't know if that is still true.


They have done that for quite awhile, at least it was happening ~8 years ago when I took an OS course at my school. Windows source was available, can't recall which version.


Windows Research Kernel. It's an almost-complete source code of Windows ~2003. There's a few omissions, like the TCP/IP stack.

https://www.facultyresourcecenter.com/curriculum/pfv.aspx?ID...


That argument _always_ emerges somewhere though… Do you have access to the sourcecode of the bios running your "open-from-top-to-bottom-linux stack"? How about the firmware in the RAID card or ethernet device? How about the CPU microcode? (anyone else here old enough to remember the F00F bug?)

I might not agree with Jeff's choice to demand source code only down as far as his database API, but I've got to admit I've read no more of the source code to MySQL (upon which a _lot_ of my work relies) than I've read of the Oracle's source code. (And I'm pretty sure I've not looked at the Apache httpd source more recently than 1.3 or so)


You can always read the source "INSIDE" your company.

Many commercial products also have source code available as part of certain deals.

As much as I like open source, there are also other business models where reading the source is possible.


Atwood wrote: "The idea that you'd settle down in a deep leather chair with your smoking jacket and a snifter of brandy for a fine evening of reading through someone else's code is absurd."

Not so absurd, really. Reading well-written source code is a great way to learn the finer points of the art of programming; it's not just for fixing bugs. In fact, entire books have been published that consist of annotated source code, the most famous probably being Lions' Commentary on Unix and Knuth's "TeX: The Program".


The hard part is finding the "well-written source code". Sure there are plenty that have been battle-tested for years, which we can refer to, but every day we come across tens to hundreds of github links. Some are legible. Some are clever but maybe interesting. Some are in need of serious refactoring. Some are first-attempts. Some are "drunken rants" (usually < 10 checkins, with every file marked as 6+ months ago). And once in a great while, some are worth some serious attention.

I "grew up" in PHP, and while the majority of the open source code I've read has been difficult to parse, at best, I've read some amazing code. Most from intelligent peers for whom I still hold a deep respect.

It's what has kept me inspired. It's what brought me to eventually appreciate Javascript (server and browser), Actionscript (3), Java, Python, C, Ruby, a few others, most recently including Clojure. And when it comes to amazing code (regardless of language), I will gladly sit and read with more intrigue than the best fiction has to offer.

It's the massive amount of mediocre or less that defines the divide making source code painful. I skim a lot. I look for whitespace, the occasional well-written comment block, proper variable / method naming, things to hint that this was written to be read by another human who likes to read, and from there I dig in deeper with the hopes that I have something worth reading.

That is to say, I LOVE to read code, but it can be difficult to find the code worthy of the smoking jacket and voting-age scotch.


Entire books have been published that consist of annotated source code, the least famous probably being Lions' Commentary on Unix and Knuth's "TeX: The Program", since they are pretty much the only widely distributed books consisting primarily of annotated source.


I have at least three others on my bookshelves. (Two Knuths -- METAFONT and the Stanford GraphBase -- and Fraser&Henson's "A retargetable C compiler".)

For sure, there aren't a large number of books with this structure. So what?



Just glancing over at my bookshelf, how about Tanenbaum's OSDI and Knuth's MMIXware?


How can they be the least famous if they are the only ones?


If they're the only ones, then they're the least famous and the most famous at the same time.


In the same way they're the most famous.


Yeah, I do this frequently with the source code to Lua, which is an astonishing piece of software engineering that constantly inspires me. I can hardly think of another open source project that does so much with so little code.


I'm sure I'm not the only one that wants a good tool that would convert arbitrary source-code to Lulu-sized PDFs that could be printed on demand.


Try the command line program "enscript". It has many options that control the output formatting. For example, you can print pages in landscape orientation, two-up, with source highlighting and line numbers.


If you also want coloring (I know I would), Pygments - http://pygments.org/ - outputs PDF via LaTeX.


Yes, it's educational, but generally not entertaining or relaxing, as is implied by the scenario of settling down in a comfy chair, etc. If code is entertaining, it's probably not the kind of code to learn good habits from.


  Nobody reads other people's code for fun.
Not true for me. I /love/ reading code from great engineers. I've learned a lot doing so.


Yeah I think he's really off-base there, and the rest of the paragraph is wrong too: "The idea that you'd settle down in a deep leather chair with your smoking jacket and a snifter of brandy for a fine evening of reading through someone else's code is absurd."

Absurd? It's basically what I did recently with ClojureScript One, replacing brandy with beer and deep leather chair with kitchen table and chair. I found it very enjoyable and enlightening. And I'm not trying to brag, I really don't think this is a mark of anything special.

IIRC publishing and highlighting code that was interesting to read was one of the goals that Peter Seibel wanted to tackle with Code Quarterly (which didn't pan out, but still, he didn't think it was an absurd suggestion). I also seem to recall reading code being something a lot of the people interviewed in Coders At Work described as being valuable. And one of the stock questions Seibel asked everyone in that book was if they had tried literate programming ("a la Knuth"), which is really just a way to make large pieces of code easier to be read by someone else. All's to say, absurd it clearly is not.


I like reading code from great engineers too. But most of the code I read is by pretty good to mediocre engineers. Nobody likes reading code by mediocre engineers.


Yeah, agreed.

Writing good source code despite not liking to read source is about as likely as writing a great novel despite not liking books.

And like reading books, it only became enjoyable when you reach a level of fluency. Sadly lots of developers never get there, and so they go through convolutions to avoid reading code.


Same here. If I find a project that upon inspection has some interesting source code I will often sit down and just read through it. I often end up learning a trick or two and by reading it I typically understand the system better and so when using it can take full advantage of what is there.


Something he misses is that good documentation isn't just an English translation of what the source code does, it's also a contract that describes what the source code may or may not do now and in the future.

If "the source code is the ultimate truth", then your source code is indelible---you can never change your implementation because you've given users freedom to depend upon the behaviour produced by any line of it. If you don't want people depending on implementation details, then you need documentation to hide those away.


Again go to the source... design by contract is a way to embed and execute the contracts in the source code is a great way to increase code quality. Take a look at the Eiffel programming language is impressive what having contracts in the code gives you.


Hum, in my world (Python) things that are implementation details and suggest to change are clearly marked by the underscore prefix. Test cases do show the interfaces and probe them. Thus reading the code give the clearest idea on the intended usage off three code .


This post seems to take the attitude that "documentation will always suck, so just go right to the source". I think this attitude can impede software projects with changing code (that is, all of them).

The problem is that the source can only tell you what a program does, not what it is supposed to do. If you don't know what it is supposed to do, it can be difficult for consumers of the code to know whether some behavior is is intended or a side-effect of the current implementation. Likewise, code maintainers can be prevented from changing the implementation when they don't know if consumers are relying on undocumented behavior. It's more difficult to file bugs against undocumented code; how do you know it's a bug if you don't know what the code is supposed to do?

In brief, good documentation and good code are a virtuous cycle. Reading the source is often necessary but it should be viewed as a failure of documentation.


Sigh. This is a rather myopic view of documentation.

Of course the source code is the ultimate arbiter of truth. But having a few roadmaps to that source code is _incredibly_ valuable. And as long as the underlying code does "what it says on the box", there's no reason to read the code.

Reading your stack's source should be a last recourse, not the default mode of operation. (Yes, I do read source code of my stack. Plenty of it. Which is why I appreciate any occasion where I don't have to.)

And when I see his "brilliant HN post" mention that suggests that "sometimes, you recompile your compiler", I'd like to smack some sense into people. You really don't. I've been working on low-level software for a loooong time, and I find about one compiler bug a year. I even do have a bit of a background in compiler writing. And yet, the sane choice is to write a small repro case, file it with the maintainers, and write your code to work around that bug, at least in most cases.


I didn't read source as the "default mode of operation", but rather a necessary fallback that you 1) shouldn't shy away from and 2) you should demand lest you be unable to access the ultimate truth of what you're building on top of.

I definitely see a hallmark of experienced/skilled coders as not being afraid to follow the trail of code farther than I sometimes have patience for.


It's a very thin line to walk. I know I've spent days reading Linux or Mach sources when I could just have coded around the issue, and I wouldn't list that as something that makes me a skilled coder. It just means for me sometimes shiny outweighs expedient ;)


Reading the source gives you the confidence to tackle bigger projects. Once you realize what a complete and utter hack job most of the projects you use are it gives you the confidence to just build your own hackjob, or take their project and fix it.

For me the biggest one was an FTP library, all it did was figure out when the server stopped sending data for a particular command and then run a Regex over it, populate an array of objects and return them.

Unread source is like a David Copperfield trick, it's magic, once you read the source and know how it's done the magic is lost and you understand what is really going on behind the hand waving.


this is a good skill to have. however, there are a lot of problems / domains where reading the source isn't enough. debug the operation of compiler logic, for example. what's really important is to know how the algorithm has been implemented and how the algorithm, as implemented, is interacting with your current problem / use case.

if understanding the algorithm involves boning up on two semesters of type theory or graduate-level courses in algorithms, number theory and abstract algebra, as debugging problems in modern databases, compilers and high-performance integer libraries would, then having the source code is probably not going to help you as much as you think it would...


Long term, I think having a solid grounding in (to continue your example) real world compiler infrastructure and the ability to fix bugs in your tool chain is going to "help" you an awful lot more than getting whatever instantaneous problem you have fixed.

I mean, sure: for everyone there are some problems that are so obscure as to be near-impossible. But if you go through life always deferring those solutions (by calling tech support, or giving up, or playing voodoo games until the problem goes away), that set of problems will never shrink. You'll end your career, broadly, just as incompetently as you started it.

If, on the other hand, you make a practice of always digging for bugs, even across library boundaries into "other people's" code, you'll find over time that things like compiler bugs stop looking so scary.


In my view, documentation -- not just any documentation, but correct, complete documentation -- is at least as important as code. I don't release any of my own personal projects for general consumption until the documentation is done.

Not having good documentation demonstrates a lack of respect for the user's time. To be a successful project, people of varying skill levels should be able to use it.

In order to compete successfully with closed-source Unix variants, GNU had to have as good documentation as its competitors and the result was excellent documentation (even if Info files were a bit baroque). The result was comprehensive and useful manuals for GNU projects such as GCC, Bash, Emacs and so forth. It's a real shame developers today haven't followed in their footsteps.


Serious question for Jeff - If I gave you a 500,000 line app (any language) with zero documentation and asked you to start adding features and fixing bugs, you'd be cool with that because you had "the source"?

Also, how did you get so far in your career as an MS developer with such limited access to source code?


Serious question for Jeff - If I gave you a 500,000 line app (any language) with zero documentation and asked you to start adding features and fixing bugs, you'd be cool with that because you had "the source"?

I'm not Jeff - but that situation has occurred multiple times in my career. Along with the more problematic one of there being documentation, and there being serious discrepancies between the docs and the code.

I like to have both by preference, but if I had to pick one I'd pick the source. I can figure out what it does from the code. I can't figure out the bugs from the docs.

Both of these situations outnumber the times I've had large code bases with good accurate documentation.

Also, how did you get so far in your career as an MS developer with such limited access to source code?

I'm not an MS developer, but from those I know there seems to have been pretty wide access to lots of source for some years now - you just can't fix and re-distribute it :-)


I absolutely agree with Jeff. Often times I catch myself choosing open source over better features so that I can fix the damn thing on my own if it goes wrong.


I really like reading source code, that's why I put together some advise how to do it: http://himmele.blogspot.de/2012/01/how-do-you-read-source-co...

From reading source code both for fun and for purpose e.g. like the Android, Minix, QNX, Linux and NetBSD, network protocol stacks, filesystems, web frameworks, CouchDB etc. I got a lot of insights into interesting software technologies and architecture patterns. Good software engineers and architects should be good and fast at reading code.


Anyone thought they were pretty good at reading source until they encountered a Spring application context spread over multiple XML files.


We had 2 tier application - Rich client in Qt3 based framework and PL/SQL for "servre side".

We moved to J2EE using EJB3, Hibernate, Eclipse RCP. Our application was meant to be 3-tier, but actually it's more like 10-tier. We have hibernate mappings, java model, xml files specyfing possible queries and reports, java EJB3 beans wrapping these xml files, java classes for DTO, xml files specyfing possible views and editors in RCP, and xml files specyfing how to map from query to view or editor. And java classes for custom code in views/editors.

When I want to see what database column is shown in view, I need to start with view class, and descend all those layers down to hibernate mapping.

In our previous qt framework we had one xml file per client view, specyfing columns/sorts/filters/etc just for this view. Our consultants understood these files and changed them when they needed to. Now they would need to understand all those layers.

Now I think more than 2 layers in application is antipattern.


Brandon Bloom here. Glad you liked my post, Jeff!

Shameless plug: My startup, http://www.thinkfuse.com is hiring developers who already know how to read the source! Email me at brandon@thinkfuse.com if you're in Seattle and looking to join a bunch of great developers who know how to build cool stuff and have a fun time.


With great power comes great responsibility. Sure, read the source. But don't think that because you followed some internal code path and figured out that it's "safe" to pass `null` to a function that it will always be safe. If the docs for a project don't say it's safe, ask them to clarify.


>>"That project is too big, I'll never find it!" or "I'm not smart enough to understand it"

That rings so true here. When I started Python web development, I needed to understand some concept related to middleware and handlers (somewhat foreign coming from PHP). My first thought was to look for blog posts explaining how it works in Django, but that wasn't satisfactory. I took a chance and dove into the Django source code— going against the voice in my head telling me, "You'll never understand it!"— and found myself learning so much. It was great!

In software development, we're taught to abstract everything and only think of the smallest problem, but this sometimes forces us to think of libraries as magic. This was a problem for me as a beginner, but it's been getting better as I've progressed.


Depending on the problem, there might be other tools that lie between 'source code' and 'documentation' on the readability-versus-accuracy scale.

I'm specifically thinking of strace, which I've used to diagnose problems that I was having with apache and chromium, among others. I don't think I would have got anywhere from reading the source. lsof is another, though I can't offhand think what I've used it for.

If you do find yourself in the source code, being willing to play around with it is invaluable for working out what's going on. If nothing else, you can insert a printf to confirm that you're looking in the right place.


> That's why, when it comes to code, all the documentation probably sucks. And because writing for people is way harder than writing for machines, the documentation will continue to suck for the forseeable future.

I don't think it sucks because it's harder to write. I think it sucks because it's not strictly necessary to document your code in order to compile/ship it, and it's easy to justify putting it off. Poorly documented code is a form of technical debt, a compromise between getting it done right and getting it done right now.


Some environments make it easier to browse the source than others. See for example the way you would figure out how to customize indentation in Emacs, by asking Emacs for the source code to the command that is run when you press TAB: http://david.rothlis.net/emacs/customize_c.html#style

(Note that the above refers to browsing the source of Emacs itself, not using Emacs to browse any arbitrary source -- unfortunately the tools for that are still very primitive).


There's something to be said for a language that's written in itself. Been picking up Clojure over the weekend; just for fun I clicked "go to definition" on defn and there[1] I was looking at the source, all Clojure. Not that I think I'd be spending a lot of time looking for bugs in defn, but it's a neat feeling to be able to go under the hood like that.

[1] https://github.com/clojure/clojure/blob/master/src/clj/cloju...


FYI, it's not actually all Clojure; stuff like the parser and the core data types are still written in Java. (Of course, it's still open-source.)

https://github.com/clojure/clojure/tree/master/src/jvm/cloju...

I think translating the rest is a slow ongoing project (as performance, etc. gets up to par.)


Aware of that; still feels like you can get pretty deep in the language without hitting the java. It's still IMHO worlds away from always seeing C (vs. the language you're actually working in at the moment) whenever you open up the source


Note that this applies to hiring. A candidate who can write FizzBuzz given a spec but can't derive a spec given FizzBuzz is not a programmer. Do not hire them.


Basically bollocks. Be kind to your users. Provide complete trivial working examples for everything whenever possible.


At first I thought it was going to be about this hilarious response from Jon Corbet, to a newbie asking about kernel development: https://lkml.org/lkml/2012/4/15/114


One of the main reasons I hang out on programming help channels on IRC is to hone my skills in reading other peoples' messy code when helping them solve a problem. Plus, there's the bonus good feeling from helping people :)


My only problem with reading source code of larger projects is architecture. It can be quite difficult sometimes to understand the architecture of some of the larger projects in order to actually read the source.


He should get into ruby. 99% of the time all you get is an uncommented generated API doc that is more confusing then helpful, the only choice you HAVE is to "read the source"



Sorry, link got hosed.

http://www.zazzle.com/read_the_source_tshirt-235248361224605...

This one should do the trick.


"The transformative power of "source always included" in JavaScript is a major reason..."

Unless it's minified, in which case you might as well be running cat on a binary.


Next time my boss razzes me for not writing documentation about something, I am going to point him to this article :)

Seriously though, the article does seem to belittle documentation. Documentation gives an API context, it is often invaluable. Even if you have the source with some comments, this often does not give a clear picture right away.


Maybe I'm unique, but this is regularly my MO in my own work or projects. Feel like a bug in django, or an edge case isn't documented, I do two things: ask if anyone knows off the top of their head in an IRC channel and then go and read the source while I wait to see if anyone has any sagely advice.

Sure beats trying to post an unformatable block of text into an MSDN forum and then well, nothing happens after that.

I also agree with the other comment here's implication, reading source code from others is a great way to get insight on how to use APIs, how to write idiomatic code and ways to avoid pitfalls.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: