Yes on all counts. I never understood why the TDD culture was happy to write dow...

Pacabel · on April 20, 2014

Your description of TDD is a pretty blatant misrepresentation of the actual philosophy and techniques.

Furthermore, I think that you're ignoring that TDD tries to be a practical solution to certain real-world problems. This means that trade-offs are made in the name of creating software that works sufficiently well, while keeping expenses in check.

Sure, extensive formal verification of all software would probably bring some benefit in terms of correctness and perhaps security. But you're not considering the cost involved.

So far, such verification has only proven to be possible in rather limited situations, namely where there are significant resources available, or in academic exercises. Maybe this will change in the future, but in the present it's generally far more cost-effective to use a statically-typed programming language along with some form of automated testing. That gives many of the benefits of more formal methods of verification, but while also avoiding many of the costs, even if the end result doesn't conform to some theoretical idea of "perfection".

anaphor · on April 21, 2014

The QuickCheck model is at least as good as unit tests and is more likely to actually find bugs and corner cases. The resulting code is no more complex either. The only drawback I see is that it requires you to design your code in a very modular way that makes things amenable to automated input generation.

adrianhoward · on April 21, 2014

Yes, but unit tests and TDD are not synonymous. TDD is a design technique first and a testing tool second. It's goal is to get you to a good design place, not produce a unit test suite - let alone one with 100% coverage.

I love declarative testing tools like QuickCheck. They're excellent. I still test-drive my code though.

daleharvey · on April 20, 2014

Really confused at why this comment is being so harshly downvoted, it seems fairly accurate, possibly downplaying how useful formal verification may be in the future, but the parent is most definitely misrepresenting the practicality and usefulness of TDD

ryanobjc · on April 20, 2014

I also agree, have some upboats.

The reality is that programming is an engineering endeavor, and in the face of limited resources, we have to choose the tools that can offer realistic benefits now.

Formal verification is not that tool for most projects right now. Would instagram or even facebook be better off? Most people would, correctly, say no.

And ultimately in the end, nearly all software (including quite a bit of open source software!) is written in pursuit of business reasons, and has to answer to that.

sanxiyn · on April 21, 2014

I'd think the security of Facebook privacy settings will be improved by, for example, lightweight modelling using Alloy. Alloy would catch bugs like this one:

http://mashable.com/2011/12/06/facebook-bug-zuckerberg-photo...

When I read about that bug in the past, I thought "this would be a great example to use for Alloy".

andor · on April 21, 2014

I just had a quick look at Alloy ([0]), and I'm interested in how it could be integrated into the software development process -- once you're confident about your Alloy model, you have to transfer it to working code somehow. I'd try to use both "fact" and "assert" statements from Alloy code as assertions in my production code (e.g. Python or Java). How is this usually done?

[0] http://www.doc.ic.ac.uk/project/examples/2007/271j/suprema_o...

ma2rten · on April 21, 2014

Could you elaborate on that? I looked at the article and look at a description of Alloy, it doesn't seem to me like a good example to use for Alloy.

mateja · on April 21, 2014

From Daniel Jackson's book on Alloy [1], here are a few of the major examples they provide:

* Leader election in a ring

* Hotel room locking (this one is really cool)

* Media asset library management

* Memory abstractions

[1] http://books.google.com/books?id=DDv8Ie_jBUQC&lpg=PP1&dq=sof...

_qc3o · on April 20, 2014

I did not say all software, just key components as it is pretty clearly laid out in the original article. I disagree with the significant resource aspect as well. I recommend Leslie Lamport's recent talk for some very lightweight ways to get started with formal verification: http://channel9.msdn.com/Events/Build/2014/3-642.

stormbrew · on April 21, 2014

I find it strange to think that TDD and formal verification are at odds with each other. For it to be backwards to write dynamic tests, you seem to be suggesting that it is actually worse than doing neither. I also don't see how it can be documentation, glorified or otherwise, and not tell you anything about the properties of the software.

Really there are perhaps three practical levels of knowledge about what a piece of code does: - It does something. - If I give it X it gives me Y (for some finite set of X). - It will never do Z (for some finite set of Z).

The first is the state most software is in most of the time. The second is achievable with tests and some kinds of static analysis. The last is probably only achievable with formal analysis and code that fits the constraints of that formal analysis.

But both levels are at least an improvement on nothing at all. Having functional and documenting tests does bring meaningful knowledge about some subset of what the code does, even if it isn't the be-all and end-all of code analysis.

So I don't see how you can dismiss it so easily, when to me it's just a step on that striving you mention in your final sentence. For the moment it is perhaps true that the good is the enemy of the great on this, but that will become less true as the tools get better.

After all, even this article talks about only formally verifying part of the code of a web browser. Until and unless formally verifying the entire thing becomes possible, you still probably need the Acid tests to demonstrate its capabilities and help prevent regression.

_qc3o · on April 21, 2014

Suppose I tell you 3 + 2 = 5 and 2 + 3 = 5. Did you learn anything non-trivial about + from those two examples? Did you learn that x + y = y + x for all x and y or that x + (y + z) = (x + y) + z? You can sample as many points as you want and you'll still be no wiser as to how + behaves and how it interacts with * or that the implementation of + even respects any of those properties. If you're going to put in the effort to generate unit tests then why not go a bit further and formally verify the correct functionality?

All this is just a roundabout way of saying I don't write unit tests. There is no value in it when I have convinced myself each subcomponent is correct and that composition of those subcomponents preserves correctness. You can write all the unit tests in the world and if you don't verify that the composition of your components preserves the correctness properties those tests are meant to verify then you might as well not have written those tests because now you have created a false sense of security.

If you can verify the kernel pieces of your code and then the composition mechanisms then you're done. You don't need to verify the entire thing as long as the entire thing is built with the kernel pieces and verified composition mechanisms that preserve correctness.

I also suspect we are still talking about separate classes of software. For your run of the mill twitter clone if it makes the corporate overlords happy then write as many unit tests as you want because it doesn't make a difference either way. But if we're talking about actual fundamental software like OS kernels and compilers then I don't think you can have enough formal verification.

randomdata · on April 21, 2014

Your test is intended to convey to the reader that your function does addition and provide an example of how to access that function. The functional part of the test is to verify that your claim is true, rather than the stories of yore when the documentation often didn't match what the code actually did.

As a nice side effect, when you refactor your function in the future and you mistakenly turn it into multiplication, the test will give you a sanity check that you meant for it to do addition. Perhaps not a big deal in your contrived example, but is huge in the real world where functions are not quite so simple.

Testing is not for verifying a function is mathematically correct.

stormbrew · on April 21, 2014

> If you're going to put in the effort to generate unit tests then why not go a bit further and formally verify the correct functionality?

At the moment this "a bit further" is a bit like saying if you don't like the price of milk at your corner store, why don't you go to a farm to get a quart? Sure, you might be able to do that, but it's not exactly a comparable effort.

lmm · on April 21, 2014

> Suppose I tell you 3 + 2 = 5 and 2 + 3 = 5. Did you learn anything non-trivial about + from those two examples? Did you learn that x + y = y + x for all x and y or that x + (y + z) = (x + y) + z?

I gained some evidence. Do you refuse to accept the theory of gravity because we've only measured some examples where F Gm_1m_2 / r^2, not proven it from first principles?

eli_gottlieb · on April 22, 2014

Yeah, but unit testing suites don't actually reason inductively. Scientists do. Big difference.

Dewie · on April 21, 2014

> Suppose I tell you 3 + 2 = 5 and 2 + 3 = 5. Did you learn anything non-trivial about + from those two examples? Did you learn that x + y = y + x for all x and y or that x + (y + z) = (x + y) + z? You can sample as many points as you want and you'll still be no wiser as to how + behaves and how it interacts with * or that the implementation of + even respects any of those properties. If you're going to put in the effort to generate unit tests then why not go a bit further and formally verify the correct functionality?

One could take the middle road and express commutativity, distributivity etc. through invariants and use a fuzzy testing tool to generate tests for it. Generating 100 tests sounds a lot better than a single handwritten one, though of course it's not as good as actually proving it for all inputs.

noselasd · on April 20, 2014

One big advantage is regression testing - which is why you don't just test it in a REPL and forget about it.

ryanobjc · on April 20, 2014

Totally agree here, manually verifying a chunk of code once, for example early on in the dev lifecycle, then having it exist inside a big system is really the "EE" way of doing things.

But the reality is you gotta dig deep and change stuff, after all how often does an engineer dive in and change how a transistor works (laws of physics be damned!)

I feel like overall software quality has been harmed by the emphasis of mathematical proof of code. The problem is that it's not currently possible (for a variety of reasons, including underspecified languages, math, research and many others) to formally verify, end-to-end, a real world system (eg: Oracle Database, Salesforce.com CRM, etc).

Over promising on software proof, and under delivering has diverted attention away from relatively simple and "boring" practices to increase code quality.

While people in this thread are mocking TDD, the reality is most code I have seen in a production setting is really bad. Unit tests that aren't, test runs that take 4 hours, complete lack of tests, opaque tests that are hard to debug, and hard to fix. A lot of TDD and agile code/clean code says "let's make the code easier to read, have unit, real unit tests, for everything and refactor and get rid of cruft as often". While these are TOOLS and judicious application applies, in a production code scenario there is often no justification, other than ignorance (always fixable!) to disregard good advice!

To summarize the good advice:

- make code easy to read at a glance

- code should be self-documenting (refinement of the above)

- units should be testable independently

- tests should exist at higher levels, such as integration and end-to-end tests

Just like the 10 commandments, you don't have to believe in Agile or XP to agree with these basic things.

CHY872 · on April 21, 2014

I'm sorry, but I don't believe this point has any merit with regards to this discussion. Formal verification was seen as the future for large systems for some short time in the latter half of the 20th century, and since then no one has seriously suggested verifying programs of millions lines of code. The point that is made here (quite sensibly) is that in many cases, by verifying small parts of your codebase (a few hundred lines, perhaps), you can rule out whole classes of bugs.

To give some useful real-life exampled, in OCaml (or at least in Standard ML), if a program compiles one can know that it will never crash (i.e. leave the control of the system - it might still abort) unless the compiler has a bug (and there's now a verified ML compiler for x64). Other useful properties might be to prove that a driver terminates (Microsoft SLAM/T2) and so will never hang the system, or a compiler that produces code semantically equivalent to the spec of the language.

You say that software quality has been harmed by the emphasis of mathematical proof of code - I just don't see how this can be the case. In a typical education, a student will see two types of proving code - in an algorithms class, where it is required in order to show that the algorithm solves the problem (and totally separately to actually building software systems), and in the formal verification/model class, which I doubt is taught at many universities and is almost certainly optional.

If someone enters the workforce without sufficient knowledge of test driven development, isn't taught at their place of work and writes bad code, there's a nice long list of people they can blame (their bosses, the people reviewing their code, their style book, their examiners, their lecturers, etc.) but it's not valid to blame formal verification for this, no more than it is valid to blame the hypothetical course on FPGA programming they took at some point.

Yes, test driven development is great, but it is totally complementary to this sort of thing.

munin · on April 21, 2014

> if a program compiles one can know that it will never crash

that is not true, you need much stronger guarantees than just typechecking gives you. you also have a large wodge of native code for the runtime that is not written in the strongly typed language.

CHY872 · on April 21, 2014

Ok, so in retrospect I didn't mean that for OCaml. Certainly it is true for Standard ML [1, 2, 3], or rather, the parts of the language that use the underlying native code (IO etc) are generally smaller pieces of code for which the correctness properties are hopefully more easily tested and verified. In any case, I believe my point still stands on the whole :)

[1] http://www.cl.cam.ac.uk/teaching/1314/FoundsCS/fcs-notes.pdf [2] http://gallium.inria.fr/~fpottier/publis/fpottier-regis-gian... [3] https://en.wikipedia.org/wiki/Type_safety#Standard_ML

munin · on April 21, 2014

friends of mine have still segfaulted mlton-compiled SML programs. as I understand it, you need to have a bisimulation-based proof that shows equivalence in states between ML and x86, and as far as I know that just hasn't been done. until then you have the risk of a bug in some aspect of the compiler, which has happened, emitting code that produces a crash or something else bad.

CHY872 · on April 21, 2014

My first opportunity to whip this one out :) https://cakeml.org/popl14.pdf

munin · on April 21, 2014

nice!

vilhelm_s · on April 21, 2014

It has been done: https://cakeml.org/.

sanxiyn · on April 21, 2014

Does Safe Haskell guatantee enough, or do we need more?

munin · on April 21, 2014

I don't think so, there have been bugs in ghc before, the problem would be a bug in ghc that emits code that crashes.

betterunix · on April 20, 2014

The problem with proofs is that you need to know what to prove, and that is hard. Even formalizing something as seemingly simple as sorting is tough for most programmers. If your formalization of whatever you wanted to implement is wrong, then your proof is worthless -- you have only proved that your program does something other than what it was supposed to do.

It's not that TDD is perfect, but it is a pragmatic approach to reducing bugs and it helps a lot in preventing regressions.

chriswarbo · on April 21, 2014

Why are imperfect programs and test suites useful, yet imperfect specifications are worthless?

It's not all black-and-white. Specifications, like test suites, contain many properties. The problems of specifying slightly-wrong behaviour on edge-cases might be more than offset by the security guarantees that are gained.

Also, note that many properties are built on each other and specifications will grow and change along with the software. Incorrect properties are often incompatible, but unlike tests which may never hit the incompatible cases, a verification tool will refuse to accept such incompatibilities.

AnimalMuppet · on April 21, 2014

The article was talking about security. Security errors in specifications can be edge-case-like stuff. Or...

Back in the day, Microsoft very deliberately designed Outlook so that it could execute a file by having the user click on it. That was in the spec. It was also an enormous security hole. It wasn't just an edge case - the security hole was the essence of what the spec required.

If you get a spec like that, formal verification can do to things. It can tell you that yes, your code does what the spec says, without introducing any (additional) security bugs. That's somewhat useful, but it won't save you if the spec itself is the security bug.

Or, formal verification might be able to tell you that the spec itself is a security problem. That might not be called "verification", though - it might be called "theorem proving" or some such. It's in the same neighborhood, though.

Dewie · on April 21, 2014

Writing a specification for sorting sounds a lot easier than actually writing a sorting algorithm. Though admittedly I have never used a programming language where I could try to encode this property in a type.

0x0 · on April 21, 2014

It can still be useful documentation. Four years later and after a couple rounds of heavy refactoring, it's nice to verify that 2+2 still equals 4.

hyp0 · on April 21, 2014

More like glorious documentation: verified to be up to date... and has examples.

As others mentioned, regression testing is about the only really useful correctness step. There's also motivation, to keep on track for what is actually needed. Also it helps you write an API, rather than just implementation: the tests are a client. (On the downside, these extra clients depedence on your API, also make it more work to change that interface, giving it more inertia). Plus: who shall test the tests?

NB: I'm not a TDD adherent, never used it, so take what I say with a grain of salt: it's just my own reasoning + what I've heard from people who've tried it.

dllthomas · on April 21, 2014

It's a lot easier to be sure that you are correctly checking that 2 + 2 = 4, than to verify that you're proving the right thing. If you have your proof in hand, and unit tests, you should be more confident than if you have just your proof. Vice versa as well, of course (and probably more so) but that doesn't mean you shouldn't test properties of your code.

platz · on April 21, 2014

Perhaps Test Driven Design is more about producing a certain kind of design, rather than about guarantees about correctness.

sliverstorm · on April 20, 2014

Perhaps because formal verification is hard & very expensive?

For starters, where does an open source project come up with an entire compute farm...

betterunix · on April 20, 2014

You can do formal verification with tools like Coq using a laptop. The real issue is that writing a formal specification is hard even for small problems; writing a formal spec for something like TLS would be harder than actually implementing TLS.

sliverstorm · on April 20, 2014

Cool! I do hardware, I guess software is probably easier to compute.

betterunix · on April 20, 2014

Ah, you might have been thinking of techniques basic on SAT and ATPG solving. What makes software different is this:

https://en.wikipedia.org/wiki/Curry-Howard_Correspondence

Basically, checking a proof of correctness for software is equivalent to type checking (for a very fancy type system).

ryanobjc · on April 20, 2014

While you are correct, I'm not really sure how I feel about you mocking documentation really.

I guess communicating with your coworkers is pretty passe?

_qc3o · on April 20, 2014

I am not mocking documentation.

ryanobjc · on April 20, 2014

"just glorified documentation"

I think it's pretty well understood that calling something "glorified" is making fun of it. Perhaps I guess you meant to mock TDD but it also seems like you are mocking writing documentation.

While it could cut either way I guess, I decided to take a snap judgement (like 99% of your readers) and call you out on it. Sorry.

biot · on April 21, 2014

Don't make the mistake of thinking the pattern "[something perceived as greater than X] is just glorified X" calls into question the value of X. Rather, it's the perception of being greater that is being called into question.

If I say "soda is just glorified water" do you think I'm mocking water and that I don't think water is vital for all life on this planet?