You are aware that these haven’t all been decrypted? (Or is there some
news I’ve missed.)

The passwords were encrypted, unsalted, using 3DES in ECB mode. But
the actual encryption key is unknown.

So the way that passwords have been “decrypted” is on a case by case basis.
For example, if we have, say, 100,000 users using the same password, and
one of them credibly ‘fesses up to what their password was, then we 
know what that password was for all of those users.

These are reinforced by the fact that many of the passwords included
password hints, often simply saying what the password was.

We also can work out what some of the more popular passwords are by comparing
with other breaches. For example if Alice at example.com is known to use
the password snoopy1 in both the Sony and LinkedIn breaches, and gives
the same hint in the Adobe data, that is a big clue. If we find a few dozen
other reusers that way we can say with high confidence what that particular
password is.

The ECB mode and small block size of 3DES has also been helpful. So suppose
we have about 6700 people corresponding to this password 

6682 /NpNslkFN4nioxG6CatHBw==

and 3402 corresponding to this one

3402 /FkacZU/hWrioxG6CatHBw==

Even with the base64 encoding, you can see that
the second block of each of those passwords is the same as
it encrypts to ioxG6CatHBw

(I really should convert the base64 to hex)

So if we find (and I haven’t correlated what I’m
working on with actual passwords, so now this is hypothetical)
that ioxG6CatHBw appears for the last block of the encryption
of “password1”, then we know that that is the encryption of “1”
plus padding.

Even a cursory glance at the data and you see penguins.

My project is on relative frequency of passwords, so I’m not
actually trying to figure out that plaintext. I’m interested
in relative password frequency.

Several people have noticed that the popularity of passwords
resembles a power law distribution. David Malone and colleagues
have specifically looked that this.

And I’ve seen similar in my own work. The “problem” is that if
the power law distribution holds up with a “big” exponent (near or above 1)
then that would indicate a situation where popularity contributes to 

So I want the resemblance to a power law distribution to be
superficial. There are other distributions that can look similar.
Either that I want an explanation for why the popularity of
password choice would make it more attractive to others. Are people
really being influenced by their password choices by others?

I think that high password reuse might be able to account for some
of the power lawish distribution, but I haven’t quite worked that

At any rate, this data dump is perfect for me. I’ve only just begun
working on it, but unsalted ECB encrypted passwords allow me to count

> I tried finding a list and was not successful.

There isn’t a list of these decrypted. Jeremi Gosney has, in
collaboration with others, worked out what the passwords are for
the 100 most frequent.  Troy Hunt is doing some excellent work
on correlating with other breaches.



