In case you are looking for the important information, it seems to be MD5 hash w...

chillydawg · on Dec 14, 2016

Bloody hell. Sloppy and incompetent.

dopamean · on Dec 14, 2016

I'm genuinely curious how the decision to use MD5 gets made. Who says, "hey, maybe we should use MD5." And then who responds, "that sounds like a great idea Bob." Seriously. I've known for years that MD5 is insufficient for hashing passwords and I'm just some random guy. This kind of thing really baffles me.

stanleydrew · on Dec 14, 2016

Yahoo has been a company for a long time. I imagine your conversation happened round about 1999 when using MD5 wasn't insane. And then they were just slow to upgrade.

It's still bad, I'm just saying the conversation about what hash algo to use didn't happen yesterday.

throwaway34916 · on Dec 14, 2016

I'd like to believe that. However, I was recently asked to test a new website for an organization I volunteer for, and discovered their "forgot password" flow emailed me my plaintext password. I wrote an explanation of why this was bad, and how it could be fixed, to a non-technical friend of mine who works there; he passed my email to the (Bay Area based!) consulting shop that did their website. The shop sent this response:

"We do not store passwords as a plain text in database. We have functionality which encrypts and decrypts passwords. We have only ecnrypted passwords in the database.

Almost all other servers use one-way encryption. In this case, passwords cannot be decrypted from hashing."

Again, this is a Bay Area based shop. For code written in 2016.

I was shocked to receive this, but it (among other things) leads me to suspect that there are lot of people out there, in positions of power, who aren't just ignorant, but who actively cling to password-storage anti-patterns.

I'm at a loss for how to fix this.

crazypyro · on Dec 15, 2016

Just for clarity, the "forgot password" flow emailed you the current password of the account (not a temporarily one)?

That's insane...

throwaway34916 · on Dec 15, 2016

Yes, the current password.

wtfishackernews · on Dec 15, 2016

submit the website to http://plaintextoffenders.com/

syncsynchalt · on Dec 15, 2016

Ironically, hosted on a Y! site.

cm2187 · on Dec 14, 2016

But it's not like if we didn't have a pretty much continuous stream of major data leaks for the past 5 years. Surely yahoo engineers occasionally open a newspaper...

Endy · on Dec 14, 2016

From everything I've read, the engineers did. The problem was that the security team had to go head-to-head with the budget team. And unfortunately, the budget team won - since the upper levels didn't feel that the IT security salaries were a necessary expenditure. And beyond that, there was concern that making people actually change their passwords regularly and requiring anything like security in said passwords was going to discourage users from using Yahoo and send them over to GMail.

Unfortunately... that argument wasn't wrong.

pbhjpbhj · on Dec 15, 2016

> The problem was that the security team had to go head-to-head with the budget team. //

Wouldn't engineers at such a big corp whistle-blow such incompetent decision making?

Apparently [1] they had a $1.37B net income in 2013. Given using bcrypt with a Blowfish hash and salting was pretty much a de facto standard by that point (I think that's what Wordpress were doing, hardly revolutionary security work) it seems the relative cost for Yahoo was approximately zero.

All I can imagine is that those in control were asked to leave the system open for government snooping? Why else would engineers working there not [anonymously] bring this to press attention - "hey, Yahoo security amounts to a piece of sticky tape holding a bank-vault shut".

- - -

[1] http://www.marketwatch.com/investing/stock/yhoo/financials#

danielweber · on Dec 15, 2016

It's not that hard to implement something at the start. It's more work to retrofit it on top of an existing system in a way that doesn't reduce the total security.

cm2187 · on Dec 14, 2016

But would it require users to change their password?

The way I would have implemented it, but would be keen to know how secure it is, is that you start with the md5 of the password (md5(password)). You then bcrypt or scrypt that md5 (bcrypt(md5(password))) and replace the md5 in your database with the bcrypt hash.

When a user logs in, all you need to do is to calculate the md5 first then check that md5 against the bcrypt hash you have stored.

I am not a crypto expert but intuitively it doesn't look like I would have weakened the security that way. You can't really attack bcrypt(md5(password)) much more than bcrypt(password). Can you?

lotyrin · on Dec 14, 2016

The method I've used is to add the column for the new stronghash then you update the old column to stronghash(<oldhash>), where <oldhash> is dumbhash(password) check against that on login stronghash(dumbhash(password)) and generate just stronghash(<password>) while you have the plaintext password in memory and update the row to add the new hash (simple and interoperable, not dependent on dumbhash) and drop the stronghash(<oldhash>). After a <longtime> limit (to optimize both maintenance overhead of the additional column / behavior and limit exposure to only minority users that haven't logged in for <longtime>), you drop the stronghash(<oldhash>) from everyone and do a "we sent you a reset email" for anyone that's trying to log in but has no <stronghash> password hash.

danielweber · on Dec 15, 2016

This is fine workflow, but keep in mind

> and do a "we sent you a reset email" for anyone that's trying to log in but has no <stronghash> password hash.

Yahoo is an email provider so many of these users won't have an external provider to refer to.

sah2ed · on Dec 15, 2016

This workflow is much better than the other proposals I've read up-thread.

user5994461 · on Dec 15, 2016

It's one way to do it, which is okay sometimes.

The other way is to add a new empty column for bcrypt. The next time the user logs in, you save the bcrypt hash and you remove the MD5 hash.

Over time, the active users will be migrated to the new scheme. The only issue is the abandoned accounts, they'll keep the old weak scheme.

danielweber · on Dec 15, 2016

There are other migration techniques. If you know md5(password), you can create bcrypt(md5(password)).

syncsynchalt · on Dec 15, 2016

That's what I do, though care should be taken that you can't then login against the old passwords by putting md5(password) in the password field.

Usually you do this by decorating the bcrypt(md5(p)) entries in some way so you can recognize which ones are tested with bcrypt() vs bcrypt(md5()).

cm2187 · on Dec 15, 2016

I am not sure I agree. Your way will leave all the non active users exposed in the case of a leak. They may not be active on your website but are likely active on another website using the same password.

user5994461 · on Dec 15, 2016

As I said, that's an option among others, it has drawbacks.

For a website like Yahoo with billions of abandoned accounts, that's a serious drawback ^^

noonespecial · on Dec 14, 2016

The problem is in collisions. Md5(password) can yield the same result for many different values of password so simply bcrypting that result means that you start with a restricted possibility space. So less secure. Punts the question to how much less secure. Seems to me it would still be worth it to do and then all new passwords going forward are done correctly.

cm2187 · on Dec 14, 2016

Agree, but a collision even for md5 is a relatively rare event. When brute-forcing the bcrypt hash, this would reduce the attempts you would need to try against a given hash, but only by a very small factor. With a reasonable work factor, I would assume it would still make a brute force attack impractical at scale.

I didn't do the test, but I'd expect that there wouldn't be more than a handful of collisions for the md5 of the 100m most common passwords.

[edit] I actually I just did the test on this 10m password list and no collision

https://xato.net/today-i-am-releasing-ten-million-passwords-...

jsjohnst · on Dec 15, 2016

I've done it before on a 1 billion word / password list and didn't get any collisions.

cm2187 · on Dec 15, 2016

That being said md5 does generate collisions. I was playing with the IMDB movie database that you can download. They use a combination of the title and the year as a primary key. I tried using an md5 instead to save space (but giving a reproducible ID instead if an identity column), and got many collisions. No collision with SHA256.

schoen · on Dec 15, 2016

Wait, what? No MD5 collisions at all were publicly known until Xiaoyun Wang disclosed one in 2004 using a new cryptographic technique she invented (explained in Wang and Yu's "How to Break MD5 and Other Hash Functions").

MD5 has a 128-bit output so collisions that occur by chance should require about 2⁶⁴ inputs (18 exa-inputs). Surely your database didn't contain over 2⁶⁴ different movie records.

Could you take a look at what you were doing again? Your description doesn't really make sense mathematically.

cm2187 · on Dec 15, 2016

You must be right. I can't reproduce it. I must have fucked something up then.

danielweber · on Dec 15, 2016

You likely goofed something up. No one has demonstrated two strings that are conceivably used as passwords that users type in -- and that includes the tuple {movie title:year} -- that have MD5 collisions.

The security problem with MD5 isn't collisions.

cm2187 · on Dec 15, 2016

I think you are right, I can't reproduce it.

b2600 · on Dec 15, 2016

What you're describing is not possible given the database you tested. Are there more details that would clarify your post?

jsjohnst · on Dec 15, 2016

Oh, of course md5 has collisions. It's relatively easy (not computationally easy, but there are known methods) to find two random strings that hash to the same value, it's just very difficult to find a string that hashes to the value of a specific other string.

schoen · on Dec 15, 2016

Not "relatively easy" by chance: it should require 2⁶⁴ entries in your database to see a single collision happen at random! It's only "relatively easy" following cryptographic research in the early 2000s that exploits structure in MD5 to produce collisions deliberately.

Yes, collisions are easier than preimages, but they still shouldn't occur by chance in real applications!

jsjohnst · on Dec 15, 2016

Realized my wording was way to ambiguous, clarified. Thanks!

noonespecial · on Dec 15, 2016

Very nice. Thanks for that. So yes, this is likely the thing to do in this situation.

schoen · on Dec 15, 2016

Unfortunately, this isn't an accurate description of the nature of the collision problem with MD5, which involves carefully crafted inputs using a sophisticated cryptographic attack -- not arbitrary user inputs that don't intend to collide with each other. See my and danielweber's comments about this down-thread.

(Yes, susceptibility to collisions was recognized as a problem with MD5 leading to a reason not to use it, but the collisions in question were constructed, not encountered accidentally. There isn't any evidence to date that the probability of a collision given two randomly chosen inputs is higher than the expected 1/2¹²⁸. You could test this yourself by hashing 2⁴⁰ random strings under MD5: you won't see a collision among the outputs!)

pbhjpbhj · on Dec 15, 2016

>Md5(password) can yield the same result for many different values of password //

Not "many different" using the normal constraints of text/numbers/typographical-marks and with maximum password lengths of 32 or so (I'll bet Yahoo's was shorter than that in 2013).

Are there any MD5 collisions in [:graph:]{,32} ?

danielweber · on Dec 15, 2016

I really doubt it. When people demonstrate MD5 collisions, they use a hex strings like

0e306561559aa787d00bc6f70bbdfe3404cf03659e70 4f8534c00ffb659c4c8740cc942feb2da115a3f4155c bb8607497386656d7d1f34a42059d78f5a8dd1ef

hawkice · on Dec 15, 2016

Yes, because MD5 digests are much shorter than 32 characters, even if it's just ascii, so by the pidgeonhole principle there must be. If you're asking if there are _known_ collisions between two messages with less than 32 printable ascii characters -- the answer is likely yes, but there are not known to me and likely not publicly known at all yet.

pbhjpbhj · on Dec 15, 2016

I thought md5 were 32 characters. But you're right every md5 hash would be in that space, so there must be collisions.

jsjohnst · on Dec 15, 2016

bcrypt(md5(password)) is what Yahoo! did when they switched.

tempestn · on Dec 14, 2016

Especially about it being a bad idea to make people regularly change their passwords!

lotyrin · on Dec 14, 2016

https://en.wikipedia.org/wiki/Salt_(cryptography)#1970s-1980...

x0 · on Dec 14, 2016

And nobody ever seemed to say "hey, maybe we should be using something more secure". Yahoo's been around for how many decades, and the fact they were still using MD5 in 2013 is just shameful. Yeah if it was some legacy code from 1993 you can probably excuse it, but I just can't believe after 20 years nobody thought it was a problem.

I'm not really a software developer but I really can't imagine it being a huge change. Instead of md5(pass) you could probably just change that to secure_hash(md5(pass), salt), add another column in the database for the salt, and rehash all the passwords. Customers wouldn't notice. Rehashing the databases would take a while, but otherwise that's really not a huge amount of work.

ashark · on Dec 14, 2016

Well, you can only rehash if you have the plaintext password. So you have to wait until they login again, or force a password reset for everyone. In the former case you're stuck with a bunch of md5 passwords hanging around for any account that's not very active, and for the latter you'll lose some percentage of active accounts whose reset process is for some reason no longer functional. You could mix-and-match the two methods (start with the former, force the latter on any stragglers after, say, a few weeks) to minimize the damage, but that's more work and a number that someone somewhere in the organization finds very important is still probably gonna go down.

(I've never had to do this myself, so these are just the most obvious options I came up with. Possibly there are others.)

manarth · on Dec 14, 2016

  You can only rehash if you have the plaintext password

There are techniques to rehash, even without the plain-text password, and without the user having to login to trigger a rehash.

Drupal 7 used such a technique for upgrades from Drupal 6, migrating from MD5 to a salted sha512 hash, but it's not an uncommon technique.

The old passwords are stored as MD5 hashes in the databases. The MD5 hash is processed through the same techniques as new passwords: a salt and the new sha512 hash. Provide a way to identify whether the origin was a password, or an MD5 hash.

Either way, you end up with a hash. You can identify whether the origin was a password, or an MD5 hash, but you can neither determine the origin MD5 hash, nor the origin password, as the new hash is secure. So even if the original MD5 hash was insecure, the new hash is secure.

When someone attempts to login, you still need to determine which password-validation to use: hash = sha512(salt + password), or hash = sha512(salt + MD5(password)), but the security level is the same.

leereeves · on Dec 15, 2016

> hash = sha512(salt + MD5(password))

Passing the password through MD5 reduces the complexity to 128 bits, you can't get that back.

So the security level is not the same, though it may be resistant to some attacks on MD5.

And it's probably not important for most people, since there are less than 2^56 eight character ASCII passwords.

manarth · on Dec 15, 2016

  > "Passing the password through MD5 reduces the complexity to 128 bits, you can't get that back."

Assuming that the new hash is secure (and sha512 is generally agreed to be secure), then, given a specific sha512 hash, the original MD5 hash can only be determined via rainbow tables, which is a Big-O operation. Even though entropy is reduced, it's still a significant work to determine the original MD5 hash (significant in this instance being longer than the heat-death of the Sun, given current extrapolations of computing performance).

Attacks against MD5 are based around knowing the original MD5 hash. In this instance, the original MD5 hash is unknown, so there is no mathematical shortcut to finding a collision.

leereeves · on Dec 15, 2016

In this case an attacker isn't looking for a collision (which would mean creating two passwords with the same hash, and what hash that is doesn't matter).

The attacker needs a password with a specific hash, and the best reported attack for that is around 2^128.

manarth · on Dec 15, 2016

Agreed, that the best reported rainbow-table attack on MD5 is 2^128 (i.e. the complete range of possible MD5 hashes).

Personally, I'm willing to chance that my password will be discovered via a brute-force attack within the next 0.65 billion billion years [1]

[1] http://bitcoin.stackexchange.com/questions/2847/how-long-wou...

leereeves · on Dec 15, 2016

I think it does make sense to be cautious.

A new preimage attack could be discovered - or might already have been, secretly.

danielweber · on Dec 15, 2016

> Passing the password through MD5 reduces the complexity to 128 bits

No, this is not the problem with MD5. You are not going to find two user-memorizeable-and-typeable passwords with an MD5 collision.

If you are bringing a password with more than 128 bits of complexity to the party, any password storage scheme better than plaintext will have your password safe.

leereeves · on Dec 15, 2016

For passwords, there is no known problem with MD5, unless you know about a preimage attack.

Collisions are a problem for digital signatures, not for passwords.

But some people do want and use more than 2^128 bit passwords, for whatever reason, and an MD5 intermediate stage limits that.

rev_bird · on Dec 14, 2016

I was doing all kinds of mental gymnastics trying to figure out how this would work; thanks for explaining it so clearly.

dsacco · on Dec 14, 2016

I have been in this situation, and you're correct.

Somewhere in the organization, a product team is going to throw a fit about usability and churn over the decision to reset user passwords en masse, or to force users to change them when they first log in. This isn't a slight against product managers, but one of the clearest indications of a company's overall security culture "health" is how the security, engineering and product teams choose to compromise and "pick their battles." Risk accepting vulnerabilities has a legitimate place when you have to balance product development and usability, but so does pushing back on egregious issues.

I don't have privileged insight into Yahoo's organization, but in this case it's pretty clear the security team should have either been more diligent in conveying the ramifications or less kneecapped by the surrounding org units, depending on the circumstance. More importantly, Yahoo should have "migrated" their passwords in the manner a parallel comment explains in this thread. This is what Facebook and other companies did after maturing their security programs (see "Facebook Onion" on how Facebook transitioned away from MD5).

Also good to note - there is evidence Yahoo's security culture improved over the years. The decision to go with MD5 almost certainly happened in the 90s, and when Tumblr suffered a breach all users were forced to reset their passwords. The capability and awareness was clearly there.

sleet · on Dec 14, 2016

x0's algorithm was secure_hash(md5(pass), salt), you already have md5(pass) so this can be done in one bulk update.

rz2k · on Dec 14, 2016

Does an insecure algorithm mean that you effectively have the plain text passwords?

manarth · on Dec 14, 2016

Not necessarily, because of collisions.

The password "foo" may encrypt to the hash "12345". If an attacker were to discover that the hash is "12345", they would look for a password that hashes to "12345", which could, hypothetically, be the password "bar". They don't know the original password "foo", they've simply discovered an alternative, which happens to match the algorithm enough to unlock access.

In general, rainbow tables are used for identifying and attacking common passwords, but that doesn't mean that the algorithm is insecure.

Insecure algorithms can be attacked through collisions, which don't necessarily give you the original password, they just provide an alternative password which is accepted by the algorithm. The distinction matters when it comes to password reuse, because if Site A uses MD5, but Site B uses sha512, finding a collision that grants access on Site A doesn't necessarily give you a password that will grant access on Site B.

gottam · on Dec 14, 2016

Having worked with monolithic legacy codebases that they likely have, it has gone through hundreds of developers who dont work for the company anymore that created a bunch of spaghetti code means its a huge effort required to make sure that none of their other services break when they implement such changes. Also, management HATES when dev teams do this because it isn't "new stuff" thats immediately visible to their bosses nor the end user.

If anything goes wrong with the password update, users get angry, lose faith in the services, stress, a few people get fired maybe, etc etc. On the other hand, letting it stay old and crappy just everything stays just peachy, and nobody is the wiser that the entire system is a house of cards. Until the day someone hacks the database of course... which happened so its "now" a problem.

They're not going to begin to take security seriously even after this incident. They'll do what they need to right now but there's no auditing and their users don't normally care about this sort of thing, therefore the management won't care either.

haser_au · on Dec 14, 2016

There are likely to be a lot of identity systems using the password in the database, all of which have been coded to look for an MD5 hash, not a salted hash. This means code in a number of applications have to be updated at the same time.

The typical way around this is to create your new destination column (e.g. sha256 with salt), and progressively have applications reference this column rather than the MD5 unsalted column.

It's a huge amount of work, and if the applications were made in 1990's, the code is likely legacy. If Yahoo are doing regular code security reviews, this will likely have been put in the pile of "we need to fix, but it's too costly to do".

flukus · on Dec 14, 2016

> It's a huge amount of work, and if the applications were made in 1990's, the code is likely legacy.

Which begs the question, can legacy code survive in an international network?

haser_au · on Dec 15, 2016

That's the right question to ask. The answer is no, because new security vulnerabilities are disclosed every hour.

A large organisation will implement layered security (otherwise known as layers of the onion) to prevent this type of attack. This means; more secure passwords to access the password database, fewer people with access, rotation of access passwords, auditing of backup storage and encryption, etc etc. Clearly Yahoo's layers of security were all broken to allow this type of theft.

pbhjpbhj · on Dec 15, 2016

>It's a huge amount of work //

Really? Moving from doing md5(password) to bcrypt(password,salt)? I see organisations make things hard and legacy code-base, yadda, yadda but surely if Yahoo couldn't do this then they couldn't manage scratching their own butt; it really seems like quite a small change in the scheme of things. Like one senior engineer, one afternoon of work (then testing, etc., OK, sure) ... ?

haser_au · on Dec 15, 2016

"It Takes 6 Days to Change 1 Line of Code" https://news.ycombinator.com/item?id=13119138

I'm going to go out on a limb and guess you've never worked as a software engineer in a large organisation.

Given MD5 hashes are currently stored, how do you propose user's password get converted to SHA256/512? Should Yahoo brute force the passwords, and then store them in the new algorithm? Or should they wait for the user to log on, verify their password, and store it in the new hash algorithm (given some users rarely log on, this could take over 12 months to complete 80% of users).

flukus · on Dec 15, 2016

Yes it could take months or years to complete the process, but they've had at least a decade.

Even if it never completes (abandoned accounts), it would still have saved most active accounts from being breached.

haser_au · on Dec 15, 2016

100% agree. Yahoo should have started the process a long time ago.

I was just replying to the comment it could be completed in an afternoon.

pbhjpbhj · on Dec 15, 2016

You're right on the first count. It wasn't sarcasm, it was a question.

On the storing of hashes though the standard protocol has been to pass the hash in as if it were a password.

_asummers · on Dec 14, 2016

Hashing the hash isn't a good idea, you're reducing the domain of your secure_hash function to the range of md5. The way to do it is to have a "password hash algo version" column and when the user puts in their password, you verify against the hash[algo](password) and rehash with the later version, changing the algo column for that user.

aidos · on Dec 14, 2016

You could do both though. Give much more security in the short term and upgrade anyone else who logged in later.

I did ask about the hash of hash thing some time ago and ptacek claimed that's a reasonable thing to do.

danielweber · on Dec 15, 2016

> you're reducing the domain of your secure_hash function to the range of md5.

Oh no, only 128 bits. The NSA will be able to brute force one of those passwords in 80 years.

nodamage · on Dec 15, 2016

You need to do both. If you only do the latter, then stale accounts which never log in again will never have their passwords upgraded to the more secure hash. Hashing the hash allows you to replace the md5 hashes immediately, and then you can perform the upgrade if/when the user logs in again.

cuckcuckspruce · on Dec 14, 2016

>I'm not really a software developer but...

If I had a nickle for every time I've heard this statement then I'd have enough to comfortably retire.

Yes, in theory, changing a column in a database (which in this case, happens to be a password) seems simple, but in practice, it's not.

rconti · on Dec 15, 2016

You're assuming engineering is just sitting on their thumbs, reviewing their code once a week, thinking of ways to optimize it.

In reality, they're constantly under pressure to develop new features, fix reported bugs, move on to the next project, keep the site from falling over, etc etc.

And the ones who choose NOT to work hard aren't sitting around reviewing old code either.

ozten · on Dec 14, 2016

For an IdP at the scale of Yahoo, the can adopt something as complicated as supporting versioned passwords and migrating credentials to the latest secure algorithm upon successful login. You have the clear text password at that point. You can store metadata such as the version (or algorithms) used to hash the credential.

mschuster91 · on Dec 14, 2016

Complex?!

It's easy as hell. Even PHP, so often flamed for "bad security" these days supports EASY functions for this (and polyfills are available, if you're running PHP < 5.5, which you should't do anyway):

- password_hash, which creates a salted hash (the returned value consists of a type/strength spec, the hash, and the salt)

- password_verify, which verifies a password with a hash in a timing-safe manner

- password_needs_rehash, which tells you if you should update the hash in the database

password_hash and password_needs_rehash take a parameter for the hash function (currently only bcrypt is supported, quite likely to keep people from using md5/sha1), and for the cost (the amount of hash function calls).

I believe any reasonable programming language these days has such functions.

What I am NOT so sure about is how the various LDAP server implementations, which many people use for SSO and "normal" account management (because it's easier to connect a new software to LDAP than to migrate existing user db's into LDAP), handle password storage. I mean, having an LDAP server for the credentials prevents any form of password leakage, but in case someone breaches both servers/the LDAP daemon is running on the same host as the webserver?

ozten · on Dec 14, 2016

Nothing is "easy as hell" at scale.

xxs · on Dec 14, 2016

Normally you'd =not= store the salt separately; the usual way is keeping the salt and the password together in the same 'blob'

Rehashing can be safely implemented as long as the auth. process can handle both md5 and some composite hash [i.e. shash(md5(pwd))]

It's really a trivial operation.

bpicolo · on Dec 14, 2016

I doubt that decision was made in the last decade. It's surely just something that's been around for a long time and was never upgraded.

Still neglectful, but I sincerely doubt it was just a recent engineer's bad decision-making.

lomnakkus · on Dec 14, 2016

It gets/got made ~10-15 years ago. (I don't understand the "no salt" thing, though. That was common practice even ~20 years ago on Linux machines, so I'm mildly surprised that it wasn't implemented in this case.)

lucb1e · on Dec 15, 2016

> I'm genuinely curious how the decision to use MD5 gets made.

You assume a formal decision was made? I think a manager just went "make them secure" and history was made. That's how it usually seems to happen if it's not a user-facing thing.

MorePowerToYou · on Dec 14, 2016

I think the organization as a whole is just indifferent. Does this breach really matter to Yahoo's bottom line? They were already sold to Verizon. Most of the active users probably won't read this news. It's sad to say, but I think Yahoo as a whole just doesn't care about their users.

kuschku · on Dec 14, 2016

[deleted]

chillydawg · on Dec 14, 2016

No, sorry. They're borderline criminally negligent. When you have 1bn passwords stored in raw md5, a decade after the first rainbow tables were published, then you don't deserve anyone's business or your freedom.

kuschku · on Dec 14, 2016

Sure, it's borderline negligent.

But it's already a godsend compared to what many banks do, storing passwords in plaintext, sending reset passwords via plaintext email, requiring 4-8 character passwords that can only contain digits and a limited set of characters, etc.

I'd be more than happy if any bank would follow Yahoo!'s password standards.

Kadin · on Dec 15, 2016

Most banks don't have a billion customers. (There are probably a few that do, but not many.)

scrollaway · on Dec 14, 2016

It's really not. Unsalted MD5 has been shameful for a long, long time.

pluma · on Dec 14, 2016

As a data point: when I was a teenage code monkey in 2004 writing PHP I already understood that unsalted MD5 is unsafe.

According to Wikipedia:

* 2004 it became possible to find MD5 collisions at a rate of one per hour on a cluster

* 2005 it became possible to do this within "a few hours" on a consumer laptop

* 2006 it became possible to do this within one minute

* nowadays it's possible to do this "within seconds"

Plus, as others have mentioned, it's now possible to find collisions instantly by using widely available rainbow tables, e.g. https://md5db.net/decrypt

yuhong · on Dec 14, 2016

MD5 collisions are probably not important for passwords.

user5994461 · on Dec 15, 2016

To put it in layman terms.

The MD5 collisions attack usually done by researchers: They want to generate 2 files with the same MD5 hash (they can put anything they want in these files).

This kind of attack doesn't affect passwords. The user picked one file (i.e. the password), you don't know it, you can't change it, you can't choose it.

funnyfacts365 · on Dec 14, 2016

Care to explain? The hashes are what is compared so it seems it's important.

duskwuff · on Dec 14, 2016

The existence of crafted collisions -- being able to create a pair of M1 and M2 such that MD5(M1) = MD5(M2) -- is primarily relevant to situations where MD5 is being used as a signature algorithm, such as in certificate issuance. In these applications, being able to generate a pair of documents with the same hash is catastrophic.

Being able to generate a pair of passwords that are treated as equal, on the other hand, is useless from a security perspective. It's a neat party trick, but it's not dangerous.

Now, if there were a preimage attack -- being able to take MD5(M1) and come up with a M2 such that MD5(M2) = MD5(M1) -- that'd be a much bigger deal, and it'd break MD5 password hashing wide open. But nobody's done that yet.

rev_bird · on Dec 15, 2016

I'm a total greenhorn when it comes to cryptography, but the difference between these two situations was totally lost on me until I read this comment. When I see, "It's easy to create MD5 collisions," my first thought is, "If you give me a hash, it's easy to find a string that results in an identical hash." If I'm understanding this right, that would be a "preimage attack," and would be bad for all the reasons being discussed in this thread.

However, it seems like "It's easy to create MD5 collisions," at least as it is true today, actually means something different: That, given a string, it's easy to find a second string that shares the same hash. If that's the case, I have two questions:

* I am totally lost as to how these are different scenarios. There's no difference I can see between "Here's string A" and "here's the hash of string A," if the goal is to find a "string B" that shares the hash. Are these "crafted collisions" generated by modifying string A and string B, until a collision pops out?

* If that's the case... what's everyone freaking out about? Why were people saying MD5 is unsafe 20 years ago, if even now, we can't achieve a preimage attack that can get you into an account based on the valid password's hash? Yahoo could have printed these hashes out and hung them up on posters in the mall and no one would have been able to get into accounts from it. There are dozens of comments lamenting how stupid this was, but... it seems like there's no actual problem?

duskwuff · on Dec 15, 2016

> However, it seems like "It's easy to create MD5 collisions," at least as it is true today, actually means something different: That, given a string, it's easy to find a second string that shares the same hash.

Very early MD5 collision attacks were even weaker, actually: given nothing, it was possible to find a pair of arbitrary garbage strings which had the same hash as each other. It wasn't until later that it became possible to pick what the strings would "look like".

> Are these "crafted collisions" generated by modifying string A and string B, until a collision pops out?

Generally speaking, yes.

> If that's the case... what's everyone freaking out about?

The issue with using MD5 as a password hash function actually has nothing to do with collisions. That's a red herring. :) The real problem is that using any fast and/or unsalted hash function for passwords is unsafe!

A fast hash function is unsafe because it makes it easy to generate a bunch of potential passwords, calculate their hashes, and look for a match.

An unsalted hash function is unsafe because it makes it possible to build a "rainbow table" of all possible passwords and their hashes, and look up password hashes in that table.

As used in this situation, MD5 is both fast and unsalted.

leereeves · on Dec 15, 2016

Most people here don't seem to understand the difference between collision and preimage attacks. So they're overreacting to the fact Yahoo used MD5.

Storing unsalted passwords, however, would be a huge mistake, if Yahoo did so as someone here claimed.

There are precomputed lookup tables for the unsalted hashes of many, many passwords (both MD5 and more secure hashes) and cracking unsalted passwords is simply a database lookup.

rev_bird · on Dec 15, 2016

Ah ha! There's the weakness I was missing, thank you so much for responding. I hadn't even thought of it that way---I knew salts shook up the resulting hashes, but an actual benefit of it is that it makes it pretty much impossible to do any "homework" (rainbow tables) ahead of time.

supergreg · on Dec 15, 2016

Google(MD5(M1)) = MD5(M2) is more than enough for most users.

Buge · on Dec 15, 2016

That website does not find collisions. It uses rainbow tables (or some other type of table) to crack passwords that it already knows.

Collisions are irrelevant for password cracking.

camus2 · on Dec 14, 2016

> Sure, SHA1, scrypt or bcrypt with salt were already common back then, but it's an entirely different story than if they had used it today.

Not an excuse, this is Yahoo, not a PHP shop in India doing some low budget contracting.They should have a top of the line security team enforcing the most recent secure practices. Furthermore I got no email from Yahoo telling me that my account may have been hacked. Both incompetent and irresponsible at the same time.

By the way I did some PHP dev back in 2011. bcrypt hashing was already common practice. How can you come up with that argument in good faith ?

normaljoe · on Dec 15, 2016

> Furthermore I got no email from Yahoo telling me that my account may have been hacked

Then your account was most likely not on the list of accounts compromised.

> By the way I did some PHP dev back in 2011

Well Yahoo is a tad bit older then that, by about 17 years. This is not an excuse, but really comparing your 2011 coding to 1994.... Go ahead and boot up your old 486. I'll get back to you when this page loads up in an hour. :)

Yahoo's code base is old and huge, like billions of lines huge. Yahoo's engineers have modernized it at a massively rapid pace. I'm not sure of current state, but when I left Yahoo finance was written in something like 10 languages including serving pages in C, cause that's all they had back then.

Current tech is NodeJSish and others. They have their own hardened versions. But still migrating millions of lines of C to something other then C isn't a walk in the park.

kuschku · on Dec 14, 2016

> How can you come up with that argument in good faith ?

Let's say I've seen far worse in 2016, from companies storing far more sensitive data.

Like a bank, with no 2FA support, emailing me my plaintext password after clicking "Password forgotten", in 2016.

This story is problematic, but I'd be grateful if that bank would implement even the same stuff as Yahoo.

mtgx · on Dec 14, 2016

Also malicious (allowing NSA to search through everyone's emails).

bpicolo · on Dec 14, 2016

Current law seems to dictate that if the NSA wants that, it's what they're getting. Blame the government.

iopq · on Dec 14, 2016

They actually fought in court about it, so I commend them for it