Tarsnap had a CTR nonce collision. It's a bad bug that's fairly common and easy to explain.
CTR mode turns AES into a stream cipher, meaning it can encrypt a byte at a time instead of 16 bytes at a time. It does this by using the block cipher core to encrypt counters, which produces a "keystream" that you can XOR against plaintext to use as a stream cipher.
For this to be secure, as with any stream cipher, it is crucial that the keystream never repeat. If you encrypt two plaintexts under the same keystream, you can XOR them together to cryptanalyze them; even easier, if you know the contents of one of the plaintexts, you can XOR the known plaintext against the ciphertext to recover the keystream!
To avoid repeating keystreams, CTR mode uses a nonce, which is a long cryptographically secure random number concatented to the counter before encrypting.
To avoid that catastrophic security bug, CTR mode users have to make sure the nonce never repeats (and also that the counter never repeats, e.g. by wrapping). We have found both bugs multiple times in shipping products, and now Colin found it in his product.
And so I come to the moral of my story: Colin is clearly a gifted crypto dev. He can talk lucidly and at length about the best ways to design crypto-secured protocols. He has found crypto flaws in major systems before. He is as expert as you could expect anyone to be on any product.
And Colin didn't get it right; what's more, the manner in which he got it wrong was devastating (in cryptographic terms).
Colin handled this well, largely due to the fact that he's an expert and knows how to handle it.
How likely is it that anyone less capable than Colin could have handled it so well? Moreover, if Colin can make a devastating mistake with his crypto code, how many worse mistakes would a non-expert make?
You should avoid writing crypto code if at all possible. Nate Lawson is fond of saying, "you should budget 10 times as much to verification as you do for construction of cryptosystems"; I would amend that only to add a price floor to it, because you cannot get real validation of a cryptosystem for less than many tens of thousands of dollars --- if your system is simple.
What should you do if you can't avoid writing crypto code?
I found myself in this situation when I tried to find a bcrypt implementation for Common Lisp. There wasn't one. Folks in #lisp suggested I adapt the blowfish implementation in Ironclad, since 'bcrypt is just blowfish anyway'.
1) Both the current C implementations are designed to be integrated into libc. The Openwall implementation does have the code factored out into its own file, but there is no support structure for building a shared library. (Python's bcrypt bundles a modified version of the Openwall C source directly with it, for example.) Common Lisp's FFI is intended for working with installed shared libraries
2) There appears to be a bias in the Lisp community towards pure-Lisp implementations, for (hopefully obvious) reasons, so an implementation as hacky as what I came up with is unlikely to see much use.
If I do go back to trying to write a webapp in Common Lisp, I think I will find myself having to reimplement bcrypt in Common Lisp. First, I'll have to find a sufficiently portable method of getting cryptographically secure random numbers; as of the writing of that blog post, there wasn't one that I could find anyone recommending. The more difficult part will be to convert the C code into Lisp code without missing any places where operations on the C types don't precisely correspond to the same operations on the Lisp types (due to, say, overflow).
I'm worried I might get something wrong, but I can't just use the crypto code written by wiser folks than I, because, at least in the Common Lisp community, that code doesn't seem to exist.
First, don't obsess too much about your password digest. I know this is head-explodey considering the source, but all I'm trying to do by ranting about it is to get people to stop using SHA1 (or SHA256 or Whirlpool or whatever) hashes. The risk to your users for doing that bit wrong is not very high.
Second, my advice about how to do crypto security is very simple:
* Use PGP for data at rest.
* Use TLS for data in motion.
Do not trust your own judgement (say, by using OTR because it "feels" like most of what you need, or trusting that you'll use Keyczar safely) on anything else without a formal external review. In practice, you will almost never need anything more than TLS or PGP.
Okay, but in the more general case, where something literally does not exist for a particular platform/language and my choices are "write it myself" or "don't use that platform/language", is there any way to feel confident that a choice to write it myself will not be a hideously wrong decision?
I'm not Thomas Ptacek, but I think his answer would be something to the effect of "to be really confident, get tens of thousands of dollars worth of code review before shipping." You may be prominent enough in the CL/hypothetical crypto-less platform community to get most of that for free, but that's the value of the validation needed.
If you're trying to make something in pure Lisp, your odds of failure are less if you take an existing hashing algorithm (e.g. SHA-256) and just iterate it a bunch of times. Ironclad has SHA-256, so this is really easy:
(defun slow-hash (password salt &key (iterations 10000))
"Produces a 256-bit hashed value of password and salt, slowly. Uses
a tweakable number of iterations, which should not be less than
1000, and which defaults to 10000."
(let ((hash (ironclad:make-digest :sha256)))
;; First, hash the salt and password
(ironclad:update-digest hash
(ironclad:ascii-string-to-byte-array salt))
(ironclad:update-digest hash
(ironclad:ascii-string-to-byte-array password))
;; Repeatedly hash the hash, to slow things down
(dotimes (x iterations)
(ironclad:update-digest hash (ironclad:produce-digest hash)))
(ironclad:produce-digest hash)))
Even after one round, repeating that knocks down your plaintext-space to 256 bit strings (or however long the output is of the hash function that you are using). Tom/Colin, is this actually a problem?
That's the case for every hash function. If it's a good hash function, the outputs will be evenly distributed across the entire 256 bit space, which comes out to 2^256 possible outputs. I don't think this is a problem; bcrypt has a much smaller output space.
1. It should be fairly trivial to test that your implementation is giving exactly the same output as the c libs (once you have chosen a particular random number that feeds into the algorithm). It seems like the trickiest part of testing will be ensuring that you are using the same character set everywhere.
2. Why is it important to have a "cryptographically strong" PRNG? Doesn't this just turn into a salt? Does a salt generator really need to be cryptographically strong?
Someone please correct me if I am being naive here.
1. I worry I might have a bug that returns the proper output for some inputs, but improper output for other inputs.
2. Cryptographically strong random numbers isn't strictly required for a bcrypt salt, I guess. But if I'm building something which I plan to share with other people, I'd rather err on the side of too strong.
In cryptography you should always use a "cryptographically strong" PRNG, even (especially) if in doubt. There have been to many mistakes with lousy random number generators undermining what would otherwise have been a strong security mechanism.
But we're talking about generating a salt here. As I understand it, the reason you use a salt is to make it much harder to brute-force a dictionary of passwords ahead of time. I don't see how the use of a cryptographically strong PRNG is going to provide any additional security here.
Nonces aren't cryptographically secure random numbers. They merely have to be different for each encryption, which is why even a counter suffices. The problem was, the counter was being incorrectly reset to zero.
It's just as secure to concatenate a string that is a function of the time of day with the counter. Another scheme would be to start out with a cryptographically hard number that is incremented each time.
The problem was, the counter was being incorrectly reset to zero.
The counter was being correctly reset to zero. The nonce was being incorrectly not set to non-zero. (In CTR mode, there is a 64-bit nonce which is different for each message and a 64-bit counter which starts at zero for each message and increments as you move through the message.)
The mistake he made is not specifically a "crypto code" problem, it was merely forgetting to increment the new variable, and that kind of thing happens in a lot of non-crypto contexts. I think that the reason people should use tested and well-known crypto code is because the stakes are high and not necessarily because cryptography is super complex; a mistake like this can have serious consequences since most encrypted data is serious by nature. As such, the maxim "don't write your own crypto" could also be applied to any high stakes programmatic endeavor; if you can't tolerate any downtime on your server, "don't write your own server", etc.
The great thing about widely used open-source utilities is the extensive vetting they receive. I was a bit uncomfortable using tarsnap's custom client and now I'm happy I went with duplicity, which is a Python script combining rsync, tar, and gpg to create encrypted archives of your data and only send the differences.
Also, perhaps Colin could look into writing a test suite for tarsnap that would automatically test for mistakes like this. It doesn't sound like the particular applicable exploit is too hard to automate.
This is a head-explodey subthread. Sure, Colin, O.K., it wasn't a "crypto mistake", it was a "refactoring mistake that happened to basically turn off your crypto". The fact that perfectly innocent and benign seeming refactoring changes can do that to crypto is exactly why generalist developers should not be writing crypto code.
I've seen major security vulnerabilities result from losing a single '=' (turning "if (uid == 0)" into "if (uid = 0)") or adding a single '=' (turning "for (...; i < N; ...)" into "for (...; i <= N; ...)"). That's half the typo size of a missing '++'.
Sure, writing crypto code is dangerous. And writing user-authentication code is dangerous. But are you seriously going to say that writing loops is dangerous and generalist developers shouldn't do it?
If the underhanded C contest taught us anything, it's that perfectly innocent and benign seeming changes can introduce security vulnerabilities anywhere.
If you ask me, "should developers avoid writing web applications in C because it's virtually always unnecessary and practically guarantees memory corruption vulnerabilities", what do you think my answer is going to be?
Are we to understand that "crypto developers" never forget to increment something or never have copy-paste errors? It seems most of the difference between a "generalist developer" and a "crypto developer" is the amount of money his employer puts into QA.
You are right. I use duplicity for remote backups too, I can really recommend it :)
However, in crypto the 'NIH' syndrome is especially prevalent because of the inherent secrecy and paranoia. Especially as there are still a lot of people on the obscurity side of security versus obscurity. A good recent example of this would be Sony...
I was pretty mean to Colin in that post. In the intervening year I've come to respect Colin's practical experience a lot more than I did before, when I thought he was hopelessly academic. I'd like to make it clear that Colin doesn't look incompetant (actually, he seems to be earning accolades from the Twitterverse for how he wrote it up).
But yeah, I hope it's obvious that I see this as a very strong vindication for my argument that generalist devs shouldn't build crypto. At all, ever. Use TLS for data in motion; use PGP for data at rest. Systems much bigger and heavier than yours have gotten away with this.
Colin is wildly competent, but I was always uncomfortable with how much he was willing to do his own work without review. He is probably one of the best people in the world to do this work, but even the best person in the world will occasionally make mistakes, as happened here.
> Expert devs shouldn't build crypto without review.
I was surprised that Colin's solution is to personally re-review his code. Good writers know--don't rely on yourself for proofreading. Usually the mental lapse that caused the problem will manifest itself during your review as well.
In my experience, I'm very good at proofreading my own writing/code, as long as I wait long enough that I've forgotten it. In this case, I was looking at code which I wrote over a year ago.
But please, go ahead and give the code another read. :-)
I sense a great justification for an alcohol budget.
Edit: Totally willing to continue burning karma on this comment if the HN community continues to decide vote it down. I've tried reviewing my own code in a different state of intoxication than when I wrote it and I'm not joking that it can help. I'm still trying to pull resources together for a study on the benefit of different mindframes for peer review. We haven't tried alcohol yet, but frankly it wouldn't be a half bad idea if we could get anyone not to laugh too loudly at the proposal.
"they are wont to deliberate when
drinking hard about the most important of their affairs, and whatsoever
conclusion has pleased them in their deliberation, this on the next day,
when they are sober, the master of the house in which they happen to be
when they deliberate lays before them for discussion: and if it pleases them when they are sober also, they adopt it, but if it does not please them, they let it go: and that on which they have had the first deliberation when they are sober, they consider again when they are drinking."
I also agree with you in general, that checking things in different mental states is a good practice. With alcohol, I suspect the benefit is outweighed by the difficulty of spotting bugs when drunk -- but who knows?
Well, I am actually a bit more curious about the difference between coding drunk and checking sober vs. coding sober, checking sober... But I suppose checking drunk would be amusing too.
Cute idea, but I don't drink (for medical reasons).
I suppose I could try reviewing code in both caffeinated and decaffeinated states, but being decaffeinated gives me enough of a headache that I don't think I'd be much use that way.
There's some sort of analogy in here about how alcohol 'loosens you up' could be related to getting in the proper state of mind to 'code fearlessly,' but I can't seem to find it.
How do you assess your effectiveness at proofreading your own code?
I know that I find more mistakes in my code when time reveals the code as it is rather than as it was intended. But I can't say this makes me good enough at proofreading myself. What about the code I've conceived and written in ignorance?
> Use TLS for data in motion; use PGP for data at rest.
That's useful advice, if you need and _want_ the guarantees given by TLS or PGP. If you have other needs then a look at, say, off-the-record messaging may be useful.
I think it's a bad idea to recommend the OTR protocol to people looking for a simple encrypted transport (or simple encrypted record storage). How do you judge whether the guarantees TLS offers are "needed" or not?
That judgment is partly outside the more mechanical parts of cryptography. You have to see what your application domain demands.
OTR is just the first example I could think of, that gives different guarantees than most normal cryptosystems. I don't particularly recommend it for anything apart from instant messaging. And I wouldn't recommend implementing your own.
If I speak to you in private (and we know each other), you can be sure you are speaking to me, but you won't be able to proof to any third party anything I said. OTR can give you something like that. PGP can't.
For most application you will be well served with PGP or TLS. But be aware of what baggage they bring. For some areas losing deniability via PGP can be worse than plain text.
This is a counterfeit argument. PGP loses "deniability" (and "forward secrecy") if by PGP you mean "the PGP user interface". But if what you mean is simply "the PGP cryptosystem" and "the PGP message format of packets and bulk encryption and signatures", then you can grant your system most any property OTR gives you.
This is a moot point, because most systems would never care enough to intricately position all their features just-so to compose OTR-like features out of PGP primitives. What they need is to be able to encrypt anything without implementing trivially exploitable crypto vulnerabilities that were discovered and solved decades ago.
This is a textbook case of everyone's good being strangled by someone's opinion of the perfect.
I don't disagree with you. And OTR is just one example, and may even be a straw-man by now. Just be aware that there are other valid choices for cryptosystems, while you still don't have to roll your own.
Just curious:
For a similar system to tarsnap where local data is encrypted via pgp and stored on a remote system, what extra benefit is there to use TLS for the data transfer/data i motion?
No one is qualified to implement crypto without review.
cperciva is extremelyqualified to implement crypto, but not without review. I think it is wise that he has implemented a bug bounty procedure. He should make sure it applies to unreleased versions too so maybe someone will put an RSS feed of his SCM checkins into their RSS reader and try and catch bugs as he's making them. :)
It would be nice if he had the money to spring for paying someone else to look at all his changes, but alas... that stuff is expensive!
No professional is going to undertake a review on spec. The demand for software security is too high; most of us have our pick of interesting projects that will pay whether we find something or not. We're not unique in that respect; top iPhone developers won't work for you on spec either, not because spec is evil, but because the economics don't work.
Furthermore, you can pay $1000 for XSS bugs and random memory corruption flaws in browsers because fuzzers can find them, because they're luck-of-the-draw findings, and because people are hammering those things whether you pay them or not. But $1000 doesn't pay for a day of qualified review, and no qualified reviewer would suggest less than two weeks for something like Tarsnap.
However, since Colin presumably doesn't want to raise his prices to pay for actual review, it is encouraging that he is at least going with bug bounties. These, at the very least, gives us a good excuse to assign them as fun things to do for graduate students with some hope that one will want to procrastinate so hard that they will actually look at the code.
Also I think any reviewer who wanted to get paid would not start with Colin's code as an easy place to find bugs.
This is one reason why I'm going to be providing bounties for more than just security bugs. Even if people don't expect to find security bugs, they might find a typo in a comment and earn themselves a $1 tarsnap account credit -- and as demonstrated here, simply looking at source code can result in finding bugs even if you weren't originally looking for them.
Clever. Especially since "normal" bugs occasionally have that nasty habit of turning into security relevant bugs.
It will be interesting to see how close you can manage to get something resembling good review on a budget. Hopefully other people who are in similar low margin code businesses will keep an eye on your experiment to see how it works out.
Thanks for being so open about how you're trying to make things work. I hope you'll be publishing all the awarded bounties? (I suppose I should just wait for your follow-up entry.)
I think there's a limited market that would pay for this now, you say you would, other users say they would. But this is not something he can offer to only some users as an extra feature, so all his users would have to be comfortable with it. From a business perspective, I imagine he has done testing to figure out what price is right and from a security perspective it would probably is not too unreasonable if he just waited until he had enough volume to allow this to happen with only very limited price increases or at his current margins.
In the meantime, the bug bounty + very qualified developer strategy seems like a reasonably sensible option while the service is presumably, still in its growth phase. I guess we'll find out.
This is a perfect example of why we shouldn't have (or not allowing the use of) ++ in languages. If instead of being buried in a long statement, the increment had been explicit on its own line there's a much smaller chance this bug would have happened.
For better or worse I've been working on the same codebase for almost a decade; now whenever I patch the code I've got half an eye on how likely it is that I or someone else could accidentally break this code in the future.
If you've got to write code that's not allowed to fail, you can't afford set up little traps like this for yourself.
That's fast. I got the email two minutes earlier :)
I meant to say this in an email, but big props to Colin for being transparent about this and responding to the issue the way he did. I'm sure it wasn't an easy weekend.
It was remarkably fast. I put up the blog post, updated the website, sent out the emails, tweeted, updated the /topic in the #tarsnap IRC channel, and then came here to submit only to find that it already had 5 votes.
Colin, one question: let's say I'm not paranoid about my backups' security. So I don't want to re-encrypt anything (even though it's only about 1GB). How does upgrade to 1.0.28 affect existing and new backups (and deduplication among those)?
Tarsnap 1.0.28 is completely compatible with earlier versions (except for key file format changes in 1.0.22). The only nontrivial change was to make new uploaded data get encrypted correctly.
This is an excellent example of why you shouldn't write your own crypto code - Colin is awesome at it and even he makes mistakes.
Edit: To be clear, this isn't aimed at Colin but meant to point out that if he still occasionally gets it wrong there's a pretty good chance that your fancy custom encryption method does too.
This may be a naive question, but are unit tests for encryption code a reasonable idea? eg, each revision is tested against a known set of data and a known set of common attacks.
It's hard to test the behavior of crypto code (even insecure crypto code can still generate data indistinguishable from random noise), so your unit tests end up tightly coupled to the actual implementation. You can do it, but it's not as clear a win as it is in other settings.
Its good users were notified to upgrade, but I am surprised he revealed as many details this soon. Does that not further reduce security for the end user for data they have stored with the old version?
I did consider that, but the NSA knows perfectly well how to attack CTR nonce collisions, and anyone looking at the diff between version 1.0.27 and version 1.0.28 can see what the change was; so I didn't disclose anything which potential attackers couldn't already figure out easily.
When even someone like Colin introduces a crypto bug like this, it makes you wonder. Are we ever going to get to a place where crypto engineering is something the open source community can take on? How long did it take to push people to stop writing C programs with trivial vulnerabilities? And that's something you can write a static analyzer for. No so with crypto.
I think that second sentence is somewhat wrongheaded. Crypto bugs aren't like normal bugs. Thousands of eyes aren't likely to surface them. Open source does not have a particularly excellent track record with exposing crypto flaws.
Simultaneously, we routinely find crypto flaws on black-box reviews of commercial products, sometimes even in firmware and hardware settings.
To my eyes, it's not the availability of source code that smokes out flaws like this, it's simply the incentive structure. Colin's project gets the attention of someone like Taylor Campbell, but Colin has made a name for himself and for Tarsnap. Even if your project becomes popular, if you aren't shouting from the mountaintops about your use of cryptography, you may be unlikely to garner the specific kind of attention you need.
I should clarify. I'm not picking on the open source community. I'm differentiating the open source community form the private sector because the incentives are different. There are crypto guys in the private sector that can build secure crypto systems for $600/hour. Now, crypto is devilishly hard to do, so there's no guarantee their system would be secure either. But if you have nation-state levels of funding, you certainly can buy a system that would take serious talent and funding to break. On the other hand, open source communities are motivated by intrinsic incentives. Clearly this is enough to implement state-of-the-art operating systems, but is intrinsic motivation enough to implement secure crypto? It may well be that the bar is too high in this area and I think the next decade will yield some interesting results here. Even if we count OpenSSL as a point for open source (generous), that's one reasonably secure system over the course of a decade.
> I'm differentiating the open source community form the private sector because the incentives are different.
The incentive in the private sector is to maximize profit, which means minimizing costs.
> But if you have nation-state levels of funding, you certainly can buy a system that would take serious talent and funding to break.
You might be able to build such a system, or you can buy a system that just passes all acceptance tests, which is where the incentive is (since this minimizes costs). Given that testing a cryptosystem for correctness is just about impossible, what do you suppose happens?
The best assurance that I get is when I'm told which standard implementation a product uses. If a private entity without a reputation in cryptography told you that they rolled their own, would you trust them? How many crytographers would you trust? I know whom I would, and I don't even need a full hand to count them.
Colin Percival told you that he uses RSA-2048, AES-256 in CTR mode, and HMAC-SHA256. None of that information helps you with a one-line implementation error that incorrectly handles CTR nonces. That's 'poet's point.
By "standard implementation", I mean something like "OpenSSL 0.9.8o". This helps me more, since I can be fairly certain that >0 experts have reviewed that code. Given that absolute verification is just about impossible, it's a question of reducing the probability of failure wherever possible. With a private, closed implementation, the number of reviewers is almost certain to be lower.
By "standard implementation", I mean something like "OpenSSL 0.9.8o". This helps me more, since I can be fairly certain that >0 experts have reviewed that code.
It's a bit more complicated than that. Yes, >0 experts have reviewed OpenSSL code. But <1 experts have reviewed all of the OpenSSL code. Did the bits which matter to you get reviewed? Who knows...
Are we ever going to get to a place where crypto engineering is something the open source community can take on?
I'm not sure I understand the question - are you suggesting that authors of open source security code are less qualified or more bug prone than those who work on closed source software?
One of the promises of open source code is fewer bugs through exposure to many eyes. That seems to be exactly how this security bug was found, according to the blog post. How long do you suppose this bug would have stayed hidden if the source were not available? Personally, I'd guess a lot longer.
I'm a massive fan of tarsnap, even though I have no need for it and probably don't have the technical competence to use it anyway. Given that using the product isn't why I'm a fan, I can only put that down to Colin - and this post, with its technical openness, easy to understand (for a layman-of-sorts) crypto and code explanation, and humility demonstrates why.
All that, plus explaining how to delete and offering a refund will probably cost only a small number of picodollars, and is worth a lot more to tarsnap's credibility.
I planned on modifying tarsnap to work on local files and upload to different resources (dropbox, local, etc) but as I dug in and looked at the license I noticed this little gem in COPYING "Redistribution and use in source and binary forms, without modification,
is permitted for the sole purpose of using the "tarsnap" backup service". Why even provide source if the license doesn't allow me to do anything with said source? I can't create a patch without being litigated and I won't due to that.
This is silly. Colin clearly isn't going to "litigate" against you. He provides the source precisely so that people can find bugs in it. It's a commercial product, though, not a GNU project. It's a good thing that he provided source code, and it's disingenuous and petty to try to punish him for doing that.
Why not just have a non-compete clause within the license. At least legally this would allow for internal use with local backups. I highly doubt execution of a similar service using tarsnap would even happen, if that's what the worry is.
Presumably because he has better things to do with his time. His commercial competitors universally do not release their source code at all. Stop picking on him for his license.
"No matter how secure Tarsnap's design is, however, you don't run the design on your computer — you run the code. For this reason, all of the source code to the Tarsnap client is available. You don't need to simply trust that Tarsnap does things right (and that it isn't a trojan planted by the US government): You can read the source code and check for yourself."
I planned on modifying tarsnap to work on local files and upload to different resources (dropbox, local, etc) but as I dug in and looked at the license I noticed this little gem in COPYING...
Exactly. That is deliberate. I don't want to end up competing with my own code.
Why even provide source if the license doesn't allow me to do anything with said source?
So that people can audit it if they wish to do so.
"Tarsnap compresses its chunks of data before encrypting them. While the compresion is not perfect (there are, for instance, some predictable header bits), I do not believe that enough information is leaked to make such a ciphertext-only attack feasible."
That part is very important. Compress then encrypt. Here you see competent crypto applications playing safe covering for unexpected problems. I say well done Colin! Full disclosure and best practices.
Compression prior to encryption is generally a good practice, but as Colin points out, it doesn't actually do much to mitigate this bug; in a bulk encryption setting, you're going to find known compressed plaintexted to back keysteam out of.
It's true, and Colin's right to point it out, that it's unlikely that this bug will be exploited (you have to be Colin to do it, and it's a general PITA to deal with), but I wouldn't want anyone to have the impression that CTR mistakes are survivable just because you compress.
Colin, thanks for the explanation. I suggest an additional change to your pre-release process: all code must be peer reviewed. This is by far the most effective quality control measure you can implement, much more so than unit testing or "double-checking". I wouldn't trust any mission-critical production code that hasn't been peer-reviewed, much less crypto code.
Code review is always good, but some code deserves more checking that other code. There are some parts of Tarsnap where the worst that could happen is that you'll get some mangled messages printed to the terminal -- that code is clearly not as deserving of testing as the core cryptographic functionality.
I didn't word what I said as well as I could have, but what I meant was to emphasize the "require review before submission" part, not the "all code" part. Do you have mandatory peer review on security-critical code already?
This seems like it would be a fantastic reason to develop a crypto-linter. Not that I think such a thing would be easy - but it could be immensely simplified by defining things like a "must change" attribute on parameters like that "encr_aes" pointer in library code, to flag potential incorrect use.
Make sure that it can also divide. My favourite Coverity glitch is when it looked at some IPv6 code and announced "assuming i % 8 != 0" and "assuming i == 128" at the same time. (And then claimed that we would access ipv6addr[16].)
I applaud the openness of this submittal. Nobody's perfect and the topic is difficult and the implementation is tricky to get absolutely right.
At AltDrive, we use a nonce generated w/ secure random and that is used for encrypting an entire file in CTR (EAX) mode. The issue with 64k chunks does not apply. The mature and well-respected BouncyCastle AES-256 libraries are used from the low level API. Usage of the API was independently reviewed by the BouncyCastle organization. I can share that on the AltDrive blog if anyone is interested. http://altdrive.com
Sort of but not really. Apart from the fact that Sony broke DSA (a wildly different crypto primitive than AES-CTR), the mistake Sony made was more fundamental to the design. I think you're right to notice that in both cases a nonce wasn't actually a nonce.
CTR mode turns AES into a stream cipher, meaning it can encrypt a byte at a time instead of 16 bytes at a time. It does this by using the block cipher core to encrypt counters, which produces a "keystream" that you can XOR against plaintext to use as a stream cipher.
For this to be secure, as with any stream cipher, it is crucial that the keystream never repeat. If you encrypt two plaintexts under the same keystream, you can XOR them together to cryptanalyze them; even easier, if you know the contents of one of the plaintexts, you can XOR the known plaintext against the ciphertext to recover the keystream!
To avoid repeating keystreams, CTR mode uses a nonce, which is a long cryptographically secure random number concatented to the counter before encrypting.
To avoid that catastrophic security bug, CTR mode users have to make sure the nonce never repeats (and also that the counter never repeats, e.g. by wrapping). We have found both bugs multiple times in shipping products, and now Colin found it in his product.
And so I come to the moral of my story: Colin is clearly a gifted crypto dev. He can talk lucidly and at length about the best ways to design crypto-secured protocols. He has found crypto flaws in major systems before. He is as expert as you could expect anyone to be on any product.
And Colin didn't get it right; what's more, the manner in which he got it wrong was devastating (in cryptographic terms).
Colin handled this well, largely due to the fact that he's an expert and knows how to handle it.
How likely is it that anyone less capable than Colin could have handled it so well? Moreover, if Colin can make a devastating mistake with his crypto code, how many worse mistakes would a non-expert make?
You should avoid writing crypto code if at all possible. Nate Lawson is fond of saying, "you should budget 10 times as much to verification as you do for construction of cryptosystems"; I would amend that only to add a price floor to it, because you cannot get real validation of a cryptosystem for less than many tens of thousands of dollars --- if your system is simple.