[cryptography] A small public key encryption program

Elias Yarrkov yarrkov at gmail.com
Tue Aug 7 14:59:17 EDT 2012

A couple of embarrassing cases of email incompetence later, let's see if I can
manage send this one without breaking something! (I'm voting no.) ;)

On Tue, Aug 7, 2012 at 2:36 AM, Solar Designer <solar at openwall.com> wrote:
> Please do write about this.

Will do.

> It is curious how you deliberately use a narrow pipe for the expensive
> component of the KDF (but not for the KDF as a whole).  You'd need to
> include some explanation/proof that this is safe (given certain threat
> models).

Here is the reasoning. After the initial hashing, we have k = H(input). Now we
must compute H(k || S(k)); one way is to actually compute S(k). The second way
is to just compute the outer H for all possible values of S(k), as you mention:

On Tue, Aug 7, 2012 at 4:34 AM, Solar Designer <solar at openwall.com> wrote:
> Here's an attack:
> This puts an upper bound on the cost, regardless of the (possibly
> configurable) cost of the expensive component.  It also lets an attacker
> avoid the memory cost altogether (but pay for this with the processing
> cost of the algorithm above).

Indeed, and the function is specified with the intention of reaching this
bound. This why I claim the construction's AT cost as being "similar to 2**32
iterations of SHA-1" on the webpage. Notice that the upper bound in fact is
2**32, and not 2**31. You did about (2**32 / 2) * input_space hash function
calls. Compare to a hash that is iterated 2**32 times: you would need to do
about 2**32 * (input_space / 2) calls. No difference.

> Overall, I think that having a narrower-pipe expensive component in a
> KDF might be OK (especially if there are multiple invocations of such
> components per a full KDF computation), but 32 bits for the only
> expensive component is likely too narrow for some otherwise reasonable
> uses (cost settings).

It is true that 32 bits is quite limiting. This structure works fine with
64-bit words as well, but I want to avoid uint64_t in this program.

Going back.

On Tue, Aug 7, 2012 at 2:36 AM, Solar Designer <solar at openwall.com> wrote:
> Also, I am curious how you derive your area*time cost estimate, and

Here are some back-of-the-envelope numbers. SHA-1 in ASIC is 8k GEs and hashes
a short input in 85 cycles.[1] 2**32 iterations of this would be about 2**51.4
GEs*cycles, plus unknown overhead.

One bit of memory is 1.5 GEs in eDRAM. zen32 uses 2**23 * 32 bits of memory.
Let's be pessimistic and assume that the attacker has zero memory latency. A
straightforward implementation would take about 2**23 * 4 cycles. This comes
down to about 2**53.6 GEs*cycles, plus unknown overhead.

There are obvious ways to reduce the latter cost slightly, at least if we stick
to the "zero memory latency" story, but it probably cannot be much cheaper than
the iterated SHA-1.

> whether you considered other metrics of cost (those more relevant for
> non-ASIC attacks).  Your zen32() does use RAM, but it only uses a tiny
> fraction of the CPU's logic (little more than a 32-bit ALU - no SIMD),
> which is suboptimal against attacks with CPUs.  It looks like

CPUs and other general-purpose hardware are only useful against passphrases
that are too weak for this purpose in the first place.

Granted, some users will use stupid passphrases despite being told how to do it
securely, but being harder on the CPU is, to me, not a goal in itself if the
attacker isn't constrained to general-purpose hardware. I try to go for
cryptography that not even the best-equipped attackers can break.

> SIMD-enhanced attacks won't even require scatter/gather addressing
> (which, by the way, might become widely available, such as with Intel
> MIC) since your array indices are not data-dependent.  Also, such
> attacks would use the available memory bandwidth more fully.  When you
> make a 32-bit memory access, an entire cache line is fetched anyway, but
> your second loop only uses the 32-bit value.  With SIMD or/and similar
> grouping of memory accesses for multiple keys, the entire cache line
> would be actually made use of.

I considered data-dependent array lookups in the manner of ISAAC. It's a good
idea in some circumstances, and definitely forces one to face the issue of
latency. I decided against it because CPUs leak information via side channels
about those lookups.

Consider this, for example: attacker gathers a small amount of noisy
information about memory lookups near the beginning of this type of random
access function. He then runs brute force on the full construction; he can
compare the known right memory lookup pattern to the one he ends up doing
near the beginning of the random access function, detect gross mismatches and
skip to the next candidate passphrase without computing the entire function.

> In general, your algorithm appears to be memory bandwidth bound, whereas
> scrypt avoids that (by including some non-trivial computation) - it is
> memory-hard, but not memory bandwidth bound.
> With your algorithm, an attacker with the same amount of RAM, but with
> higher memory bandwidth for random 32-bit accesses (such as with many
> independent 32-bit buses), would have an advantage.  This is probably
> not what you wanted.

Specialized hardware is indeed advantageous. One generally wants to increase
attack cost per user inconvenience in deriving a key from a passphrase; using
longer blocks of memory may indeed help slightly. It would also make the
program more complex.

> It is entirely possible that I am missing something or/and that I missed
> a more serious attack.

In short, I mostly care about the asymptotically best brute force attacks,
which work by special-purpose hardware.

I'm intending to write more formally about this structure when I have the time.

[1]: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1428517

More information about the cryptography mailing list