* Async - https://github.com/caolan/async - is very readable. Async is a great library for async ops handling in JS, and it's really neat seeing the implementation patterns.
* Ray - is a games DSL in Ruby - https://github.com/Mon-Ouie/ray - I love it, and it's a great read especially if you are interested in DSLs as much as I do.
* Jekyll - https://github.com/mojombo/jekyll - really helped me going forward with my ruby skills. It's nice, clean, commented, not too big, and useful.
* Redis - https://github.com/antirez/redis - lots of data structure implementations. Antirez just writes C so nicely. anet.c has networking, TCP bits, ae.c is an event loop. Lots of great code there.
"Programs must be written for people to read, and only incidentally for machines to execute." - SICP
If you believe this (and I do), then it follows that reading fluently is absolutely critical. One cannot judge writing without being a good reader.
You don't learn to write novels without reading a whole lot of them first. And someone who hates to read is never going to be a great writer.
I disagree that any special tool support is needed. If you depend on any open source library or platform at all, just don't stop bug hunting at the API boundary. Go down inside and see what's really going on.
Absolutely agree with this. Starting at the base with small libraries in your language of choice and working your way up (similar to beginning with picture books and moving onto novels) is a fantastic way to both become more familiar with a coding language and also various coding styles as well.
When you come across pieces of code that may seem esoteric, doing a "rubber ducky" debug (talk through the code line by line with ann inanimate object) on it I find helps tremendously with both understanding what's going on along with the author's mental approach to the problem.
In the event they may just not be a great programmer you can still learning something from the way they approached the challenge.
This is a great idea. And I think that leads to a broader question I've had.
I completed a 10-week programming course (Hackbright) in SF this summer and am currently waiting for my visa to come through to come back to the Bay Area to start a job at a YC company.
There are lots of resources out there for picking up a language (or coding, to begin with), eg, codecademy, treehouse, udemy, even offline courses like Hackbright, DevBootCamp. Beyond that, what are some of the best ways to go from a programmer to a good programmer?
I find writing code a couple hours a day and working on my personal projects helpful, but only to a certain degree as I feel I am re-using a lot of my existing knowledge (or hitting some kind of roadblock).
Hence, reading more quality codes has been one of the priorities for me to keep up the learning. The major challenge though is to find good code to read. Perhaps a weekly curated "reading list" akin to StartupDigest but for coders, instead of startup related articles, they can be handpicked "codes of the week" on github / other sources? Or, do these things already exist out there?
The major challenge though is to find good code to read.
In my experience, reading and working with bad code has proven to be extremely valuable too. Until you realize why something is bad (e.g. extremely long methods), you won't appreciate "good code". So, I suggest picking a medium-sized open source project in a language of your choice and start contributing some bug patches. You will come across plenty of both good and ugly code.
I suggest picking a medium-sized open source project in a language of your choice
Thanks. What's the best way to find medium size (what is considered medium sized) open source project? I code mostly in python and have several repos on github/michellesun.
Use the language all day, every day. Usually this means being full-time employed in the language.
Read all you can about the language. Especially, "best practices" and idioms.
Join a users group to talk with others about the language and what they do with it.
Work with other people's code! There is no faster way to learn what not to do in a language than to have to clean up after someone did something awful.
Support the code you write - every bug becomes a tour of your worst decisions!
Study computer science and languages in general
Learn a very different language. A great compliment to C would be a functional language like Lisp. This will turn the way you think about your procedural language inside out.
Learn to use the frameworks and APIs available for that language.
Take the time to do your own experiments with the language. SICP is not applicable to C, but the attitude of learning a language by testing its limits is a very productive one.
Read the history of the language to learn why it was made the way it is.
Attend conferences to hear the language authors speak, or to hear what industry leaders are doing with the language.
We're in the middle of doing some exercises in pintos for school (it's a toy operating system where you're supposed to implement part of the functionality) and I can really recommend the code in it. The documentation is good as well. Operating systems might not be for everyone, but the code is good.
I think that, except for the pure functional case, code can't be read without running it with a debugger. In imperative and oo there is too much "spooky action from a distance": with global variables, dependency injection or, god forbid, aspect oriented design, things seems to always happen elsewhere. Oh, and the solution to this problem is the same to the "cat -v considered harmful" problem: every program should do one thing and do it correctly, programs should be easy to compose, and interfaces between programs should be easy to inspect.
It is a vicious cycle: if a program source code is bloated, i won't be able to read it and understand it. If I don't understand it, I won't be able to use. Being not able to use it, I will reimplement its functionality in my program. Then, my program becomes bloated.
It's a really interesting idea. That said, talk is cheap, and the github repo seems to be nothing more than a 100b README at the moment. I'm looking forward to seeing a HN submission in a few months with a release.
There is a lot of fluff in this article, but the idea he leads up to and presents at the end is pretty fascinating.
I have tried to understand a large Linux kernel patch series by starting at the beginning and trying to go through the commits. I didn't make it far. Maybe I would have, if I had had a good tool to help (besides just git).
I disagree with a number of points in this article.
Businesses have been built on reading code. We (Architexa) are one such example, but there are lots more companies trying to help you browse/understand code. One of the older companies in this area is SciTools, which makes the 'Understand' product.
Beyond the above trivial point. The problem with books is that they are often designed to start from only one place (the first chapter). Blogs overcome this by letting you start reading at any post. If there really is a tool that is needed, it is a tool that helps you pick a way to start reading code. It is NOT to give you the one annointed way to read code (from where the writer started writing the code). You sometimes care about the code from the start to finish, or MUCH more likely you would want to read the code in a manner that aligns to your tasks - perhaps by seeing how the code responds to some event (by looking at the event hook), or seeing how the code uses a library or subsystem like the filesystem (by finding dependencies to the appropriate library).
Code reading/understanding is a big problem. And I am biased, but there is a lot of work that needs to be done here.
By the way, I would love your thoughts on what we are doing. Please check the video on our homepage (http://www.architexa.com/) and let me know either here or via e-mail what you think.
Every time I've set out to spend a good 20 minutes "code reading" random projects, I've learnt something. It might not be something big or even something positive, but I've always picked up something.. and strongly recommend it to anyone I tutor.
I don't mean to be disparaging but I am lost trying to understand the tool. I can see it is cool, and am fascinated by the different project styles (the tight development of git versus drupal or RoR is clearly indicative of something)
But...
How do you use that? Once you start looking at s specific function, then you need to read that function, not see a prefix
Gource doesn't tell you a lot about history, but on larger projects with many people I could see how it could aid in coming up to speed with team dynamics. Maybe you could see that Module A gets updated by the newest hire once every 6 months, and that explains why it is a mix of many different coding styles. Or Module B has been maintained solely by Bob for the last 3 years, so if you have any questions about it he's probably the one to ask.
By default it creates a time based view to see which persons edited which files and a overview over the files at that point of time.
It does not put the emphasis on the code changes but on the dynamics of the participants.
To generate an overview over the paticipants commits the generated logs were parsed. Sourcecode in this file: https://github.com/educs/Documentation/blob/master/gource/ma...
I've come to realize that writing good code also involves good storytelling. There may be a thousand ways to get the job done, but the better coder will arrange and cluster things in ways that "flow" for the next human reader. Good coders also keep the next reader in mind, and don't use fancy-obscure ways to accomplish what could be done with more common techniques.
Programs must be written for people to read, and only incidentally for machines to execute. (Abelson & Sussman, Structure and Interpretation of Computer Programs)
I like the idea of replaying commits. Never really thought about it because I tend to look at the current code without really bothering with the way it was developed or follow the commit history anyways if I'm developing myself.
I think the main benefits would actually be in the area of teaching. Just have people that learn to program watch replays of stuff in the language they want to learn and pick up on certain things via patter matching :)
At the very least even pretending to record your commits (rubber duck version) could probably improve your development process quite a bit. I think I'll actually pretend to record all my commits from here on out and give a running commentary to myself :)
Entering a new field is a fun experience. You start off by thinking "Wow, why has no-one done this before?" Sometimes, you've hit a new idea, and become a pioneer. Sometimes, you find a few similar attempts, but are still in unexplored territory. But what often happens is that you find there are entire conferences dedicated to the idea, and you simply hadn't heard of them. The world's a pretty big place, so this happens a lot.
Google for "program comprehension tool," and you'll see there's no shortage of tools to help people read code. Rather than create another one, someone should really just look at a bunch of those and create a "Top 10" list.
When reading code, I like to break projects down into chunks that I don't understand. Then I'll proceed to Google those bits, in an attempt to gain a broader understanding of what's going on (which in turn, increases my understanding of the language itself). Personally, I find this more fun than reading tutorials that never seem to get the point across. Personally, I love watching people code. Most just see the finished code, that is, the code the author committed. Few get to see the mistakes, typos, and general flow of the code in real-time. That's what I like to read/watch.
As the article somewhat alludes to, code isn't like a sequential book, but more like a conceptual system or a complex model. Rather than reading from top to bottom in a literal sense, the top you need to find to start with is the overview from 1000 feet (may the API or the design docs/specs), and you work your way down to the conceptual bottom or end of the story, which is way down in the implementation.
I'm not sure that the solution in the article achieves this in a coherent manor, and is only of relevance to one particular code repository (albeit a very popular one!), although it's never the less useful.
There is a class or two of tools that already exist and do help in this top to bottom process, doc generators and profilers (and sometimes debuggers). These work across all types of code repositories and all variations in quality of code. For instance, see the chap recently on HN who was trying to read and get to grips with a large code base [1] and my comments to him [2].
Granted these tools are versatile and go beyond just "reading" the code, perhaps there is some space for derived products which use these to create a "story" of the code from a top to bottom perspective.
Books are by default sequential, but they also offer a lot of different means of indexing to get what you want: the table of contents, the index, or just flipping through. And if you have no idea where you start, you know where to begin: at the beginning of the book. Software projects are often not linear, but trees where the leaves are files. Even in a reasonably modular code base, there's no real beginning. But I'd disagree that code is inherently unsequential when it comes to understanding it. One would have to do exactly as you suggest. They'd have to start with an overview, tell the conceptual story, then write the implementation. They'd have to write a literate program.
Great idea for a post and great idea for a project. I have thought about something similar. Encouraging people to read code, by making it easier to do, is a worthy objective.
When doing code reviews, I go through pull request commits in chronological order. It sometimes takes more time (eg. if the author didn't know exactly how to approach the problem from the start), but gives a much clearer picture of what was going on inside their mind when they wrote it.
That's one of the main reasons I dislike pull req squashing (or doing large compound "clean" commits in the first place). It destroys this information.
how much time do people spend reading other people's code?
am i unusual (lucky?) in having to do very little of this? usually i am developing new projects or maintaining things i wrote. even when extending existing work, reading the existing code is only a small part (largely at the start).
Even working mostly alone, I read other people's code frequently. We all depend on libraries and platforms written by others and none of them are perfect. There are always bugs, performance problems, or missing features down there.
You get the simplest code in the least effort if you solve each problem at the appropriate layer. People who never look down into the layers below them end up with lots of unnecessary duplication and cruft.
> You get the simplest code in the least effort if you solve each problem at the appropriate layer. People who never look down into the layers below them end up with lots of unnecessary duplication and cruft.
Well said. True in general, but especially when you're working on some middle layer of a large scale project, it really pays off to understand what's going on at least on the adjacent layers.
This does really sound unusual. For me, the ratio of reading to writing is about 4:1. Are you sure you are not writing too much new code, because you didn't invest the time to properly understand the existing one?
When you mostly work with your own code, do you still have that feeling of constantly improving your coding style and code quality (like looking at code that is X months old and knowing you would write it differently today)?
that's probably a good point, in that i have been trying to get the people i work with to think more abut code re-use. i sometimes worry we have too much of a culture of "make our own version" (but see below).
but also, i think it's just the nature of the work we do - typically i am asked to implement something new for a third party (i'm more a hired gun or "consultant" than part of a development team).
so probably it was a dumb comment of mine to post without reflecting more first. sorry.
ps to answer your last question - when i work on my own projects i do a lot of rework. at work i rarely get the chance, because one contract finishes and another starts. but not sure what you were getting at there.
In my last question I was just wondering how or when you (or others) tend to improve their coding style. For me its mostly when reading other peoples code and noting that a) certain aspects are great or b) not so great. But maybe that just comes naturally to me because I don't have that much experience under my belt.
"Lucky" is in the eye of the beholder. Where I work, code review is a regular event, so there's always another set of eyes on your work to help you catch issues that aren't the purview of unit tests. It's also a great way to learn from more experienced developers.
which got me thinking. is there any place where you can get code reviews? something like a book club, but each week you review someone else's code. could be online or local meetups...
This is rather late, but http://codereview.stackexchange.com/ was made for this purpose. Post code for improvements, or review other people's code to suggest improvements. Working code only.
* Express - https://github.com/visionmedia/express - I don't like TJ's coding style, and general trolling of CoffeeScript, but this is a really great, clean and tiny codebase. (As well as all the connect middlewares - https://github.com/senchalabs/connect)
* Async - https://github.com/caolan/async - is very readable. Async is a great library for async ops handling in JS, and it's really neat seeing the implementation patterns.
* Ray - is a games DSL in Ruby - https://github.com/Mon-Ouie/ray - I love it, and it's a great read especially if you are interested in DSLs as much as I do.
* Jekyll - https://github.com/mojombo/jekyll - really helped me going forward with my ruby skills. It's nice, clean, commented, not too big, and useful.
* Redis - https://github.com/antirez/redis - lots of data structure implementations. Antirez just writes C so nicely. anet.c has networking, TCP bits, ae.c is an event loop. Lots of great code there.