Safe: Robust programming in Haskell via types, testing, debugging

cageface · on April 3, 2010

Functional language advocates have been making these kinds of claims for years but I don't think they have yet produced any hard evidence that languages like Haskell actually do increase programmer productivity in typical real-world scenarios. It may be that they do but I wouldn't be surprised if we find that the kinds of errors they minimize aren't the kind that often impede development in practice and that the strictures of FP languages, particularly lazy ones, make some common programming tasks significantly more difficult.

People also like to claim that functional languages are particularly well suited for concurrent programming, but, as Simon Peyton Jones has himself said, just getting rid of mutable state doesn't automatically make your code parallelizable.

People used to hold up Darcs as a Haskell success story but as far as I can tell people have hacked together DVCS systems in "inferior" languages like C and Python that are at least as robust and featureful.

jrockway · on April 3, 2010

I use Haskell for financial ETL processes. What I've found is that the upfront development time is about the same as using Perl, but debugging time is much less. Once my process starts running, it runs to completion. It never dies in the middle with "can't call method foo on undefined value" as is pretty common in Perl. It starts running, and it finishes running... always.

Haskell is also nice because I occasionally have to interact with poorly-written third-party C/C++ libraries. The FFI works fine with ghci, so I can play with their functions interactively and learn about the bugs without having to write a lot of code. And, when it comes time to write the application, I can use Haskell's type system to make sure I don't get the library into a state where it starts messing things up.

It's nice to hack for a few hours, have the compiler find a few type mistakes, and then have the process run and produce the correct result. Much better than type, run, wait, fix error, run, wait, fix error, ...

I don't know about "everyone", but Haskell has sure saved me a lot of time; in both waiting for programs to run (Perl is slow), and in waiting for me to get all the bugs worked out.

I also really like Haddock. I never liked generated documenetation until I started using Haddock. I'm not really sure why, but I never bother doing "perldoc MyCode.pm" on my own code... but with Haddock, I always have my own docs open in a nearby web browser. I also like the syntax for in-code annotation. It's lightweight, easy, and the output is great!

ghotli · on April 4, 2010

I'd be interested to see a skeleton of your Haskell ETL scripts. I've been looking to clojure to clean up my current processes and I sure would like to see how you approach it in haskell.

jrockway · on April 4, 2010

Nothing special. Everything happens in some instance of MonadError and MonadState; error for aborting when there is some problem, and state for logging. (Why not writer for logging? After a week of playing with various levels of strictness over Writer, with the help of #haskell, I couldn't get any variant of writer to not use all my stack space. Dropping in State fixed the problem. I know why State fixed the problem, but I don't know why I couldn't make Writer strict enough. Oh well.)

Unlike with most logging, I don't really care to see the results as the process is running; the log is just a list of failed sanity checks that may indicate problems with the data. It just gets printed at the end, and I can check out the datasets that are suspicious when I feel like it.

Thanks to Catch and some wrappers I have that turn IO errors into Lefts, I know that my process isn't going to die halfway through. My only worry is possible data corruption because some assumption is violated, so if I see anything weird, I make a note and continue. If something is clearly wrong, then that dataset is Left instead of Right, and is skipped.

I also have a function that downgrades Lefts to Right + a warning, so I can have a few failures in the ETL run and worry about those after the non-error sets are processed.

The actual scripts just read config data from a file or the command-line, and run some function that actually does the work. These are named something like theFooProcess/Main.hs, and are built into executables like "theFooProcess.exe" with Cabal. (I also build the library code as a separate Cabal target, but I don't have any projects that refer to each other yet, so this really just wastes my time during compilation. Heh.)

(Oh yeah, did I mention that my processes are all Windows-based because the third-party libraries are Windows-only? Yeah. But GHC handles it great! Emacs + ghci + cabal on Windows is exactly like Emacs + ghci + cabal on UNIX. With Haskell, Windows is not much of a problem... except for the security updates that reboot my machines in the middle of ETL runs :)

ghotli · on April 4, 2010

Thanks for this. As a man who only has a cursory understanding of Haskell your comment sheds light on how things actually fit together. It also provides a good starting point for expanding my Haskell understanding.

dons · on April 3, 2010

I believe in advoacy by doing things, and yes, arguments without dollar figures attached are less convincing.

Studies on programmer productivity are rare, but a different argument might be found in where Haskell (for example) is used when correctness matters. For example, in hydraulic control of vehicles at Eaton, or secure cryptographic algorithms at Galois. Python isn't going to fly when you need to convince people of correctness, while Haskell (for example) does have tools and approaches that work.

I wrote a post last year on the collective experiences of engineers at Galois after using Haskell for the past decade, which has some nice anecdotes, http://www.galois.com/blog/2009/04/27/engineering-large-proj...

The other data point is the way finance has jumped on functional programming -- often because they can see a (huge) dollar figure in (small?) improvements in productivity or correctness. Jane Street, Barclay's, Credit Suisse, ABN Amro, Standard Chartered, JP Morgan and others (http://cufp.org) don't use FP because it is a fad.

jrockway · on April 3, 2010

FP is also catching on at BAML; we have a bit of production Haskell, and are starting to hire Scala developers instead of Java developers for new projects. (Java developers are too hard to find. Never thought I'd be saying that...)

Of course, most of the code that actually makes us money is Excel spreadsheets :)

dons · on April 4, 2010

Great to know! You should add a small entry to http://haskell.org/haskellwiki/Haskell_in_industry#Haskell_i... ...

cageface · on April 3, 2010

The correctness argument is interesting, but certainly plenty of critical-to-be-correct software has been written in imperative languages too. I'd like to see something closer to hard numbers on bug counts, man hours etc on similar projects in Haskell or ML vs C++ or Java, for instance. I realize it's hard to find a truly apples to apples comparison here.

The uptake of FP in finance is interesting. I'd like to hear more about exactly how FP is being used in these contexts. Is it used in a narrow, specialized domain or broadly? To what extent are the advantages of FP in this domain realizable in others? Is FP only suitable for small, elite teams or can mainstream programmers pick it up with the right training? To what extent does the less mature or less conventional toolchain offset the advantages of the language itself?

dons · on April 3, 2010

Check the experience reports from CUFP for such examples:

* http://haskell.org/haskellwiki/Haskell_in_industry#Haskell_i...

Barclay's Capital has written a journal article on why they use FP (Haskell in this case): http://lambda-the-ultimate.org/node/3331

cageface · on April 3, 2010

Thanks for the links. Looks like some good afternoon reading.

It's interesting that the Barclay's article discourages overuse of the point-free style. In my attempts at learning Haskell I've also found heavy use of this style can make Haskell code hard to understand.

viraptor · on April 4, 2010

> I don't think they have yet produced any hard evidence that languages like Haskell actually do increase programmer productivity in typical real-world scenarios.

I know of at least one such study: http://www.haskell.org/papers/NSWC/jfp.ps There might be more.

tedunangst · on April 3, 2010

    reverse (x:xs) = xs ++ [x]

Does that really reverse a list in Haskell?

dons · on April 3, 2010

It's a "buggy" implementation (no recursive call) for the purpose of illustrating code coverage and quickcheck testing...

tedunangst · on April 3, 2010

That's what I guessed, but the text doesn't make that clear at all.

hristov · on April 3, 2010

Good catch. I think it has to be:

reverse (x:xs) = reverse(xs) ++ [x]

But then again this is Haskell so there may be some mind boggling property of the language that is only understood by math postdocs and the more enlightened monks in Shaolin monasteries that makes the original line ok.

jrockway · on April 3, 2010

But then again this is Haskell so there may be some mind boggling property of the language that is only understood by math postdocs and the more enlightened monks in Shaolin monasteries that makes the original line ok.

Nope, and there never is. Haskell is just programming!

steveklabnik · on April 3, 2010

Nope, your revised code is correct.

nearestneighbor · on April 4, 2010

http://www.reddit.com/r/programming/comments/bm1u8/safe_robu...

Ironic, considering the title, if you ask me.

Periodic · on April 3, 2010

I'm glad to see such a rational argument about why Haskell matters. After reading this I wonder why more people aren't using Haskell.

Why are the reasons to stick with Java over Haskell in the long run?

mncaudill · on April 3, 2010

In my experience, Haskell takes a bit more to "get." If Java is good enough and managers are able to find multiple, interchangeable developers, Haskell doesn't make a lot of business sense.

Though I imagine that if Haskell was given more of a chance in enterprise settings, the level of software quality would go up, though it may cost a bit more to write.

baguasquirrel · on April 3, 2010

The Haskell community shouldn't try to do better than Java. Enterprise is what it is because it has particular motivations. We should play to the strengths of our language and our community.

It's easy to agree that there's lots of software floating around, but that not a lot of it is good. That's particularly true about enterprise software (I used to work in the field). It doesn't have to be good. You could have an excessively complicated UI and do your numerics in Python (for the love of God, numerics and statistics in Python?) and people will still use it. The damn app can take 12 seconds to load, and people will still use it. How are you going to sell Haskell to the enterprise community when that's their bar?

If people want to write code in Haskell for a living, if they're serious about this freeing-software-from-the-von-Neumann-paradigm thing, they should seek out problem spaces where software quality matters. This is hardly the sole core competency of Haskell, IMO. When one of my friends was introduced to it, he noted that it was a very "scientific" language. Which is to say, he thought it was impractical for regular software. The flip side to this is that we can very easily do things like check for gene sequences in Haskell (Brian O'Sullivan was doing something like this).

The fact that things like sets and maps are treated like values means that they are very natural and easy to work with (albeit a little slower than their mutable counterparts in other languages), so why aren't we seeing Haskell used to analyze social networks? Haskell is a powerful language. We just need to see it for what we could do with it, rather than redoing the old things that have already been done.

Quiark · on April 4, 2010

(for the love of God, numerics and statistics in Python?)

See http://numpy.scipy.org/ ;) But don't worry (so much), it's actually just a C library exposed to Python.

jmillikin · on April 3, 2010

Near the end of the article, the author mentions Literate Haskell (.lhs). In my opinion, LHS is not at all literate -- it's nothing more than verbose commenting. \begin{code} and \end{code} are semantically equivalent to -} and {-. Anybody interested in literate programming with Haskell should look into NoWeb or Leo.

dons · on April 3, 2010

Tex-style .lhs isn't "classic" .lhs (Bird-style): http://www.haskell.org/haskellwiki/Literate_programming#Bird...

jmillikin · on April 3, 2010

They're just a different syntax for the same purpose. Consider:

  some comments
  \begin{code}
  foo :: Int -> Int
  \end{code}
  more comments
  \begin{code}
  foo = (+ 1)
  \end{code}

and

  some comments

  > foo :: Int -> Int

  more comments

  > foo = (+ 1)

and

  -- some comments
  foo :: Int -> Int
  -- more comments
  foo = (+ 1)

Now, how would you do this using .lhs?

  some comments
  <<Foo.hs>>=
  <<type of foo>>
  <<definition of foo>>

  comments on definition of 'foo'
  <<definition of foo>>=
  foo = (+ 1)

  comments on type of 'foo'
  <<type of foo>>=
  foo :: Int -> Int

scscsc · on April 3, 2010

Negative; you should look up literate programming (http://en.wikipedia.org/wiki/Literate_programming). Basically you compile the lhs into a pdf before reading it ;-)

jmillikin · on April 3, 2010

I'm quite aware of what literate programming is -- it's more than just writing a lot of comments. If you'd like to see the PDF of a literate library, check out the source code to my Haskell DBus implementation at < http://ianen.org/haskell/dbus/core.pdf >.

Just because the source code can be rendered to PDF, or contains a lot of comments, does not make it "literate". Literate programming is designed to abstract the compiler's strict syntactic requirements from the reader. LHS does not do this -- whether you use LaTeX or Bird-style comment markers, they're still just comments. You can't use them to re-arrange or repeat source code.

jerf · on April 4, 2010

Curiousity: I have made the argument a few times that literate programming has failed to take off in large part because Knuth-style literate programming set off to solve two problems simultaneously: Poor documentation, and poor ability to structure code. Poor documentation remains a problem, but the poor structuring was largely solved. Nowadays, if a program is poorly structured it is most likely because the author wasn't going to structure it well no matter what tools you handed him. With one of the two pillars of Knuth-style LP removed, it didn't have enough vitality to capture a large chunk of programmer mindspace.

Please note that A: I'm just referencing the argument so we are all on the same page and B: I intend it strictly as an analysis of why it did not take off, it is not a normative statement about whether it is a good idea.

A language like Haskell really thoroughly obviates the structural part of LP; it arguably has radically more powerful composition capabilities built into the language than LP does. Yet here you are writing Haskell with LP. I am curious why you are doing this, and whether you plan on continuing to do so or if this was a one-off.

(Incidentally, I have come to loath the style of documentation that Hackage affords, in the UI sense of "afford". I'm generally skeptical of LP, but I'm not asking this because I think the Haskell community has great docs that remove the need for LP documentation. Still, I would be interested in your thoughts.)

jmillikin · on April 4, 2010

Haskell has a more liberal structure than the languages early LP research focused on, but it's still not as flexible as a LP file. All exports must be before imports, which must be before definitions; modules can't be interleaved; when pattern matching, all patterns must be on consecutive lines.

I've found the usefulness of LP to be directly related to (1) how large a particular library is and (2) how much original thought went into it. Most of my Haskell libraries, so far, are relatively small -- a few hundred lines, at most. Furthermore, many are bindings to C libraries -- GNU SASL, CPython, YAJL -- and there's not much complex code involved. For Haskell, the tipping point seems to be around 1000 LOC for "native" code.

dbus-core is just over 2500 LOC, and is a reasonably complete implementation of DBus in pure Haskell. Using literate programming (specifically, NoWeb) has been tremendously helpful. It's the only library I use LP for, but that's only because it's the largest I've written in Haskell.

Notably, another of my libraries (network-protocol-xmpp) is approaching the 1000 LOC mark, and I've noticed some problems keeping track of the code. I plan to convert it to LP / NoWeb after the next release.

If you have any large, complex libraries, I highly recommend at least experimenting with LP. It's relatively easy to convert existing libraries to LP, and because NoWeb is so flexible, you can convert them a little bit at a time as you become more comfortable with the tools.