[cryptography] ZFS dedup? hashes (Re: [zfs] SHA-3 winner announced)

Dr Adam Back adam at cypherspace.org
Wed Oct 3 10:19:41 EDT 2012

I infer from your comments that you are focusing on the ZFS use of a hash
for dedup?  (The forward did not include the full context).  A forged
collision for dedup can translate into a DoS (deletion) so 2nd pre-image
collision resistance would still be important.

However 2nd pre-image collision resistance is typically offered at higher
assurance than chosen pairs of collisions (because you can use birthday
effect to roughly square root the search space with pairs).  So to that
extent I agree your security reliance on hash properties is weaker than for
integrity protection.  And SHA1 is still secure against 2nd pre-image
whereas its collision resistance has been demonstrated being below design

Incidentally a somewhat related problem with dedup (probably more in cloud
storage than local dedup of storage) is that the dedup function itself can
lead to the "confirmation" or even "decryption" of documents with
sufficiently low entropy as the attacker can induce you to "store" or
directly query the dedup service looking for all possible documents.  eg say
a form letter where the only blanks to fill in are the name (known
suspected) and a figure (<1,000,000 possible values).

Also if there is encryption there are privacy and security leaks arising
from doing dedup based on plaintext.

And if you are doing dedup on ciphertext (or the data is not encrypted), you
could follow David's suggestion of HMAC-SHA1 or the various AES-MACs.  In
fact I would suggest for encrypted data, you really NEED to base dedup on
MACs and NOT hashes or you leak and risk bruteforce "decryption" of
plaintext by hash brute-forcing the non-encrypted dedup tokens.


On Wed, Oct 03, 2012 at 03:41:27PM +0200, Eugen Leitl wrote:
>----- Forwarded message from Sašo Kiselkov <skiselkov.ml at gmail.com> -----
>From: Sašo Kiselkov <skiselkov.ml at gmail.com>
>Date: Wed, 03 Oct 2012 15:39:39 +0200
>To: zfs at lists.illumos.org
>CC: Eugen Leitl <eugen at leitl.org>
>Subject: Re: [cryptography] [zfs] SHA-3 winner announced
>User-Agent: Mozilla/5.0 (X11; Linux i686; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1
>Well, it's somewhat difficult to respond to cross-posted e-mails, but
>here goes:
>On 10/03/2012 03:15 PM, Eugen Leitl wrote:
>> ----- Forwarded message from Adam Back <adam at cypherspace.org> -----
>> From: Adam Back <adam at cypherspace.org>
>> Date: Wed, 3 Oct 2012 13:25:06 +0100
>> To: Eugen Leitl <eugen at leitl.org>
>> Cc: cryptography at randombit.net, Adam Back <adam at cypherspace.org>
>> Subject: Re: [cryptography] [zfs] SHA-3 winner announced
>> User-Agent: Mutt/1.5.21 (2010-09-15)
>> (comment to Saso's email forwarded by Eugen):
>> Well I think it would be fairer to say SHA-3 was initiatied more in the
>> direction of improving on the state of art in security of hash algorithms
>> [snip]
>> In that you see the selection of Keecak, focusing more on its high security
>> margin, and new defenses against existing known types of attacks.
>At no point did I claim that the NIST people chose badly. I always said
>that NIST's requirements need not align perfectly with ZFS' requirements.
>> If the price of that is slower, so be it - while fast primitives are very
>> useful, having things like MD5 full break and SHA-1 significant weakening
>> take the security protocols industry by surprise is also highly undesirable
>> and expensive to fix. To some extent for the short/mid term almost
>> unfixable given the state of software update, and firmware update realities.
>Except in ZFS, where it's a simple zfs set command. Remember, Illumos'
>ZFS doesn't use the hash as a security feature at all - that property is
>not the prime focus.
>> So while I am someone who pays attention to protocol, algorithm and
>> implementation efficiency, I am happy with Keecak.
>ZFS is not a security protocol, therefore the security margin of the
>hash is next to irrelevant. Now that is not to say that it's entirely
>pointless - it's good to have some security there, just for the added
>peace of mind, but it's crazy to focus on it as primary concern.
>> And CPUs are geting
>> faster all the time, the Q3 2013 ivybridge (22nm) intel i7 next year is
>> going to be available in 12-core (24 hyperthreads) with 30GB cache.  Just
>> chuck another core at if if you have problems. ARMs are also coming out in
>> more cores.
>Aaah, the good old "but CPUs are getting faster every day!" argument. So
>should people hold off for a few years before purchasing new equipment
>for problems they have now? And if these new super-duper CPUs are so
>much higher performing, why not use a more efficient algo and push even
>higher numbers with them? If I could halve my costs by simply switching
>to a faster algorithm, I'd do it in a heartbeat!
>> And AMD 7970 GPU has 2048 cores.
>Are you suggesting we run ZFS kernel code on GPUs? How about driver
>issues? Or simultaneous use by graphical apps/games? Who's going to
>implement and maintain this? It's easy to propose theoretical models,
>but unless you plan to invest the energy in this, it'll most likely
>remain purely theoretical.
>> For embedded and portable
>> use, a reasonably fast and energy frugal 32-bit or 64-bit processor is
>> really cheap these days.  OK I know there's always a market for 8-bit
>> processors, on the extreme cheap/low power end, but the validity of the cost
>> complaint is diminishing even there.
>My point in recommending a higher-speed hash is not for low-power
>systems - these wouldn't even come near dedup for the foreseeable future.
>----- End forwarded message -----
>Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org
>ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
>8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
>cryptography mailing list
>cryptography at randombit.net

More information about the cryptography mailing list