There is formal specification in the large if you count formal mathematics. Mizar's library of formalized mathematics is easily millions of lines if not tens of millions of lines at this point. There are individual formalized mathematics repositories in other languages (Coq and Agda included I think? It's been a while since I looked into that) that run into the hundreds of thousands of lines of code.
So there are pointers as to how to organize formal specs for extremely large projects.
The reason as far as I can tell for why there aren't large software verification projects as opposed to pure formal math is that the cost benefit analysis is wildly out of whack. It makes essentially no business sense to try to formally verify things past several thousand lines. This doesn't seem like a bright line to me though, e.g. that there is a technological inflection point at several thousand lines that prevents further scaling. Again formal mathematics repositories are a counterexample. It's just that because of the high cost of formal verification, there is a business ceiling at the equivalent of man hours willing to be spent somewhere in the vicinity of hundreds of thousands of lines of unverified code to millions of lines of unverified code and you can rapidly exhaust that budget with full formal verification (which can easily account for orders of magnitude increases in time spent).
Ah thanks for the TLAPS repo. I forgot about the library, but even there I'm not seeing a lot of reuse in it. There's some of it, but not a lot. Maybe just because users of TLAPS are far fewer than users of TLC.
Indeed you can choose specific Isabelle tactics in your BY line. In practice I'm not sure I've ever seen it in any TLAPS proof. I've very very briefly played with that functionality and at first glance it seems much more difficult than using Isabelle directly because of the added layer of indirection. If you have one in the wild, I'd love to take a look to learn from.
I definitely agree there is no one way to do things. I've left out all the things that I think TLA+ is fantastic at. I think it's the closest thing to mathematics that can be practically used in industry. Far more so than the rhetoric you usually hear about FP languages, TLA+ is immediately familiar to anyone with a pure mathematics background, both syntactically and semantically. TLA+, unlike something like Agda, also has a fairly straightforward set of semantics and mathematical grounding in its theory. The benefit of writing each spec from scratch allows them to be independently understandable with no need to understand the idiosyncrasies of arcane libraries or frameworks.
But I do think it has problems with modularity. My state example wasn't to do with the difficulty of verification or the explosion of the state space (which honestly don't bother me, my TLA+ specs are often exponential or worse wrt model checking time in the size of my constants, but I still get a lot of mileage out of them), but rather the difficulty in even expressing that idea. When I'm confronted with wanting one state to stand in for two I often have to resort to using two separate specs, one with fewer states and then the other being the original spec and then showing my new spec fulfills both. I think the modularity problems mostly stemming from tooling and language choices, but a little bit from theory (although Mizar is also set theory based). For example I don't think I've ever used a 3rd party module in a TLA+ spec I've written. Even trying to express the equivalence between two specs that just use different strings for the same states can be tricky. Wouldn't it be cool if people could import Lamport's Paxos spec directly into their own specs of their own systems? Or any other specs someone has written specifying one part of their system? That would be pretty awesome if there was something akin to an NPM repository of all these specs for various algos that you could import into your own spec. The tools for this exist in Mizar. They do not for TLA+.
Turning my attention back to deductive verification rather than the expression part of the equation, more specifically for TLAPS, I'm actually not a fan of the imperative tactics that are found in Coq. But an error message better than what amounts to red highlighting and "no" would be nice in TLAPS. Something where you can mix YOLOing with more hands-on proof building, with tactics, functions, or even just a trace of the constraint search would be great.
I think the idea that you have some limited thousands of line to "spend" is a useful bit of intuition. One knob you're leaving out though is the richness of the properties you're trying to prove. As you decrease the richness of the properties you can prove you can drastically decrease your burden. This starts bleeding into static analyzers and then if you dial the knob even further down into mainstream type systems. But the knob can be fairly low and still be useful.
For example absence of crashing/abnormal exits is something that is feasible to prove on very large codebases. It's hard to do completely end-to-end (even CompCert's front-end isn't proven), mainly because of the difficulty of fully modeling stuff like filesystems, but can be done for modules spanning hundreds of thousands of lines of code. It is of course a weak property. That the program does something is nowhere near the same thing as doing the correct thing. But it is nonetheless a very useful one.
Absence of SQL injection attacks and CSRF attacks are other examples (can be done either by external analysis or by leveraging language level features), although those feel a bit more like cheating since you basically cleverly restrict the amount of code that "matters."
That would be the closest to an alternative to formal specifications I could suggest. I still think formal specs are lower risk because this still requires coding in certain styles amenable to this static check (or languages that enforce it).
> There is formal specification in the large if you count formal mathematics.
Sure, but TLA+ is not aimed at research. The biggest problem, by far, plaguing not just proof assistants but formal logic in general, pretty much since its inception, is complexity. Which is ironic because the entire purpose of formal logic -- from Leibniz's original dream to Frege's early success -- is to be simple, clear and easy tool for practitioners. Compared to the dismal failure of formal logic in achieving that (which was acknowledged and lamented by the likes of von Neumann and Turing, but has since been shrugged off even though the problem has gotten considerably worse), I think the fact that people who are not trained logicians use TLA+ to do real work makes it a crowning achievement not only in software verification but in the history of formal logic. To achieve that, TLA+ had to shed away everything that isn't absolutely necessary. Lamport once said after hearing a lecture about some proposed addition to TLA+: "it's taken you 6 months to add something to TLA+ that it had taken me 15 years to remove".
> It makes essentially no business sense to try to formally verify things past several thousand lines.
Well, we don't know how to do that in a way that would be feasible; I consider that a "limit" because the software we use and need is very commonly three orders of magnitude ahead of what deductive verification can handle. And the gap has been constant or even growing for the past 40 years or so.
> Again formal mathematics repositories are a counterexample.
I disagree. Formal mathematics is about taking years to prove a proposition that could be written in one or a couple of lines. I think that the resemblance between deductive mathematical proofs and deductive software verification is like that between painting the Sistine Chapel and painting a large hospital. The similarity is superficial.
> If you have one in the wild, I'd love to take a look to learn from.
> But I do think it has problems with modularity. My state example wasn't to do with the difficulty of verification or the explosion of the state space ... but rather the difficulty in even expressing that idea.
Can you DM me on Twitter (get there from the about page of my blog: https://pron.github.io) or send me an email (my name at Google's mail service)? I'm curious to see the problem you're referring to.
> Wouldn't it be cool if people could import Lamport's Paxos spec directly into their own specs of their own systems?
Well, I don't yet understand the exact issue you're having, but no, I don't think that would be particularly helpful at all, for the following reason: If people wanted to modify Paxos itself, they can get their hands on Lamport's spec and tinker with it. If they want to use it as a component, then they can write an abstraction of it in a few line. All of TLA+ is about the abstraction/refinement relation (that is, IMO, what makes its theory cooler than Coq et al. for software).
> But an error message better than what amounts to red highlighting and "no" would be nice in TLAPS.
Yep.
> One knob you're leaving out though is the richness of the properties you're trying to prove. As you decrease the richness of the properties you can prove you can drastically decrease your burden. This starts bleeding into static analyzers and then if you dial the knob even further down into mainstream type systems. But the knob can be fairly low and still be useful.
Right, but we know more than to say it's just richness. We know exactly what kind of properties scale (through abstract interpretation, of which both type systems and static analysis are examples): inductive (= compositional) ones. I.e. a property such that if it holds for any state (whether or not reachable by the program), then any step in the program preserves it, or, alternatively, a property P, such that P(x) and P(y), where x and y are terms, implies that P(x ∘ y). Problem is that most functional correctness properties are not inductive/compositional for the language (i.e. that the above two description hold for any term in the language).
> That would be the closest to an alternative to formal specifications I could suggest.
I think that's one additional tool. But a knob that's easier to turn is soundness. The difference between verifying a property to 100% certainty and to 99.999% certainty can be 10x in cost, and as -- unlike mathematics -- an engineer is not interested in the correctness of an abstract algorithm but in that of a physical system that can always fail with some nonzero probability -- absolute confidence is never strictly a requirement. Of course, this is not just my opinion but it also seems to be that of many formal methods researchers, which is why concolic testing is such a hot topic right now.
I think that specification and verification must be separate -- i.e. you specify at any "depth" and then choose a verification strategy based on cost/benefit (so, no to dependent types), and in most cases, it's possible that concolic testing would be enough.
1. The very large Coq/Agda texts are usually large because of the proofs, not the specification/propositions. There's a 5-10x ratio, if not more.
2. I don't think that TLA+'s theory is intrinsically more "straightforward" than Agda's, even though it's certainly more familiar (I know some people who would insist that type theory is more straightforward than set theory). I think that a lot of work has gone into making TLA+ relatively easy to grasp; I think it's more design than a specific choice of theory, although the choice of theory certainly affects the design. In addition, as I've already mentioned, I think that TLA itself (i.e. not the set theory in the "+") is a particularly beautiful theory for talking about computation and discrete systems, that has some obvious advantages over the lambda calculus; in particular the abstraction/refinement relation that's the core of TLA (the semantic domain of TLA is a Boolean lattice where the less-than relation is behavior refinement).
So there are pointers as to how to organize formal specs for extremely large projects.
The reason as far as I can tell for why there aren't large software verification projects as opposed to pure formal math is that the cost benefit analysis is wildly out of whack. It makes essentially no business sense to try to formally verify things past several thousand lines. This doesn't seem like a bright line to me though, e.g. that there is a technological inflection point at several thousand lines that prevents further scaling. Again formal mathematics repositories are a counterexample. It's just that because of the high cost of formal verification, there is a business ceiling at the equivalent of man hours willing to be spent somewhere in the vicinity of hundreds of thousands of lines of unverified code to millions of lines of unverified code and you can rapidly exhaust that budget with full formal verification (which can easily account for orders of magnitude increases in time spent).
Ah thanks for the TLAPS repo. I forgot about the library, but even there I'm not seeing a lot of reuse in it. There's some of it, but not a lot. Maybe just because users of TLAPS are far fewer than users of TLC.
Indeed you can choose specific Isabelle tactics in your BY line. In practice I'm not sure I've ever seen it in any TLAPS proof. I've very very briefly played with that functionality and at first glance it seems much more difficult than using Isabelle directly because of the added layer of indirection. If you have one in the wild, I'd love to take a look to learn from.
I definitely agree there is no one way to do things. I've left out all the things that I think TLA+ is fantastic at. I think it's the closest thing to mathematics that can be practically used in industry. Far more so than the rhetoric you usually hear about FP languages, TLA+ is immediately familiar to anyone with a pure mathematics background, both syntactically and semantically. TLA+, unlike something like Agda, also has a fairly straightforward set of semantics and mathematical grounding in its theory. The benefit of writing each spec from scratch allows them to be independently understandable with no need to understand the idiosyncrasies of arcane libraries or frameworks.
But I do think it has problems with modularity. My state example wasn't to do with the difficulty of verification or the explosion of the state space (which honestly don't bother me, my TLA+ specs are often exponential or worse wrt model checking time in the size of my constants, but I still get a lot of mileage out of them), but rather the difficulty in even expressing that idea. When I'm confronted with wanting one state to stand in for two I often have to resort to using two separate specs, one with fewer states and then the other being the original spec and then showing my new spec fulfills both. I think the modularity problems mostly stemming from tooling and language choices, but a little bit from theory (although Mizar is also set theory based). For example I don't think I've ever used a 3rd party module in a TLA+ spec I've written. Even trying to express the equivalence between two specs that just use different strings for the same states can be tricky. Wouldn't it be cool if people could import Lamport's Paxos spec directly into their own specs of their own systems? Or any other specs someone has written specifying one part of their system? That would be pretty awesome if there was something akin to an NPM repository of all these specs for various algos that you could import into your own spec. The tools for this exist in Mizar. They do not for TLA+.
Turning my attention back to deductive verification rather than the expression part of the equation, more specifically for TLAPS, I'm actually not a fan of the imperative tactics that are found in Coq. But an error message better than what amounts to red highlighting and "no" would be nice in TLAPS. Something where you can mix YOLOing with more hands-on proof building, with tactics, functions, or even just a trace of the constraint search would be great.
I think the idea that you have some limited thousands of line to "spend" is a useful bit of intuition. One knob you're leaving out though is the richness of the properties you're trying to prove. As you decrease the richness of the properties you can prove you can drastically decrease your burden. This starts bleeding into static analyzers and then if you dial the knob even further down into mainstream type systems. But the knob can be fairly low and still be useful.
For example absence of crashing/abnormal exits is something that is feasible to prove on very large codebases. It's hard to do completely end-to-end (even CompCert's front-end isn't proven), mainly because of the difficulty of fully modeling stuff like filesystems, but can be done for modules spanning hundreds of thousands of lines of code. It is of course a weak property. That the program does something is nowhere near the same thing as doing the correct thing. But it is nonetheless a very useful one.
Absence of SQL injection attacks and CSRF attacks are other examples (can be done either by external analysis or by leveraging language level features), although those feel a bit more like cheating since you basically cleverly restrict the amount of code that "matters."
That would be the closest to an alternative to formal specifications I could suggest. I still think formal specs are lower risk because this still requires coding in certain styles amenable to this static check (or languages that enforce it).