I find it useful to consciously separate input data structures from intermediate...

gruseom · on Dec 19, 2009

That's very interesting. We're doing something similar. I'm curious as to how you reconcile your "intermediate data structures" with one of the principles in the OP, that of minimizing the transformations you have to do on your data in the first place. The latter is a profound insight that I am slowly digesting. One thing it throws out the door, for example, is layered architectures. Not a small deal! Yet it makes sense to me, because my experience with layered architectures has been that the more nicely modular and well-defined you make each layer, the more bloated and nasty the mappings between layers become.

Sometimes I think I'm chasing pure functional programming

No question this is more suited to FP than OO.

Edit: this really is a rich subject. It's interesting that a lot of this discourse is coming out of the game dev world, because that's a section of the software universe which is relatively free of pseudo-technical bullshit (probably because it's so ruthlessly competitive and the demands on the apps are so high).

akkartik · on Dec 19, 2009

I usually don't think about performance. nostrademons sounds right: "minimize transformations when you productionize." I don't have experience with productionizing.

So far when I've found myself scaling one piece of my pipeline up, I do one of two things: 1) I switch key pieces from arc to scheme or C. 2) I add a periodic precompute stage to reduce cache misses.

Like pg said somewhere (update: http://ycombinator.com/newsnews.html, /15 Jan), the goal isn't optimizing but keeping performance mediocre as you scale up.

Update: After reading http://news.ycombinator.com/item?id=1005145 I see 'productionize' isn't a big-bang, stop-the-world step. By the new definition I find I do rewrite things for performance fairly often.

The image in my head now: a ball of mud (http://www.laputan.org/mud) with rewrites as layers. Older layers that have proven themselves harden and fossilize as subsequent rewrites focus more on their performance without changing semantics. But even they aren't immune to the occasional tectonic upheaval.

gruseom · on Dec 19, 2009

I usually don't think about performance.

I was wrong then. We're not doing something similar :)

Like pg said somewhere (update: http://ycombinator.com/newsnews.html, /15 Jan), the goal isn't optimizing but keeping performance mediocre as you scale up.

Where did he use the word, or the concept, "mediocre"?

akkartik · on Dec 19, 2009

Yes, not there, but I seem to remember it.

Update: Ah, it's here: http://www.paulgraham.com/hackernews.html

gruseom · on Dec 20, 2009

Oh. But he didn't say the goal was mediocrity, only that performance at least wasn't getting worse as the site grew.

But then I'm sure you didn't really mean that mediocrity was your goal, either, right? Right? :)

akkartik · on Dec 20, 2009

:)

I think my choice of the word 'goal' was incorrect. If you have other priorities performance need be just good enough.

nostrademons · on Dec 19, 2009

I found that minimizing transformations on your data is a principle you apply when you productionize. For most of the development cycle, you want to keep things as debuggable as possible (at the possible expense of performance), and intermediate data products + debugging hooks are a good way to do this.

This brings up a much bigger question of when to productionize, though. Most programs are never actually "done", but at some point you have to release to the public and hopefully get millions of users. You need to make the performance/maintainability tradeoff sometime. The later you push it off, the more productive you can be in the critical early stages, and the better a product you can bring to market. But if you push it off too long, you miss the market window entirely and don't get the benefit of user feedback.

eru · on Dec 19, 2009

Is this related to `Fusion' of functional programming? There the compiler removes some of intermediate structures.

E.g. http://homepages.inf.ed.ac.uk/wadler/papers/deforest/defores...

gruseom · on Dec 19, 2009

But these are fundamental design issues. You can't change fundamental design when you "productionize"; coming up with that design and implementing it is the development cycle.

nostrademons · on Dec 19, 2009

Productionize usually means "rewrite". I think that software engineers in general have become too averse to rewriting code; as long as you do it with the same team that wrote the prototype, it's often a good idea to throw away everything and start from scratch.

The development cycle for me is much more about collecting requirements than coming up with a design that satisfies those requirements. That's what iterative design is about - you try something out, see if it works for the user, see what other features are really necessary for it to work for the user, and then adjust as necessary. Once you know exactly what the software should do, coming up with a design that does it is fairly easy.

My current project is nearing its 3rd complete rewrite since September, plus nearly daily changes that rip out large bits of functionality and re-do them some other way.

akkartik · on Dec 19, 2009

"software engineers in general have become too averse to rewriting code"

Fervently agree. I was one of them.

No amount of rewriting is too much - as long as you constantly have a working app.

gruseom · on Dec 19, 2009

Productionize usually means "rewrite".

Oh ok, you meant something quite different than I thought, and I don't disagree.

akkartik · on Dec 19, 2009

I try not to have a 'fundamental design'. If you rely wholly on caching, you have no intermediate data structures, and code becomes easier to change in dramatic ways. This is the ideal I've been striving for.

Check out arc's defmemo function. Given the ability to memoize (or cache) function invocations, changing your data structures can become simply a matter of refactoring your function boundaries and deciding which ones perform caching.