[cryptography] ZFS dedup? hashes (Re: [zfs] SHA-3 winner announced)

Eugen Leitl eugen at leitl.org
Wed Oct 3 16:29:43 EDT 2012

----- Forwarded message from Sašo Kiselkov <skiselkov.ml at gmail.com> -----

From: Sašo Kiselkov <skiselkov.ml at gmail.com>
Date: Wed, 03 Oct 2012 22:19:19 +0200
To: Dr Adam Back <adam at cypherspace.org>
CC: Eugen Leitl <eugen at leitl.org>, cryptography at randombit.net
Subject: Re: ZFS dedup? hashes (Re: [cryptography] [zfs] SHA-3 winner announced)
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1

On 10/03/2012 04:19 PM, Dr Adam Back wrote:
> I infer from your comments that you are focusing on the ZFS use of a hash
> for dedup?  (The forward did not include the full context).

Correct, though not complete. ZFS uses the hash also for data integrity.

> A forged
> collision for dedup can translate into a DoS (deletion) so 2nd pre-image
> collision resistance would still be important.
> However 2nd pre-image collision resistance is typically offered at higher
> assurance than chosen pairs of collisions (because you can use birthday
> effect to roughly square root the search space with pairs).  So to that
> extent I agree your security reliance on hash properties is weaker than for
> integrity protection.  And SHA1 is still secure against 2nd pre-image
> whereas its collision resistance has been demonstrated being below design
> strength.

Due to the design of ZFS, it's fairly hard to pull off a successful
collision even if one has a hash function that's widely broken. Also,
the mitigation to this kind of problem is fairly simple: turn on data
verification (many dedup deployments already use this). So while this
clearly doesn't amount to a good analysis, by my estimates, this attack
is highly improbable.

> Incidentally a somewhat related problem with dedup (probably more in cloud
> storage than local dedup of storage) is that the dedup function itself can
> lead to the "confirmation" or even "decryption" of documents with
> sufficiently low entropy as the attacker can induce you to "store" or
> directly query the dedup service looking for all possible documents.  eg
> say
> a form letter where the only blanks to fill in are the name (known
> suspected) and a figure (<1,000,000 possible values).

This would require the user to be able to inject specific blocks into
the dedup machinery and to have intimate knowledge of and access to the
storage system anyway.

> Also if there is encryption there are privacy and security leaks arising
> from doing dedup based on plaintext.
> And if you are doing dedup on ciphertext (or the data is not encrypted),
> you
> could follow David's suggestion of HMAC-SHA1 or the various AES-MACs.  In
> fact I would suggest for encrypted data, you really NEED to base dedup on
> MACs and NOT hashes or you leak and risk bruteforce "decryption" of
> plaintext by hash brute-forcing the non-encrypted dedup tokens.

As noted before, ZFS (in Illumos) currently doesn't support encryption.
Oracle's implementation does and once you enable encryption, it
automatically switches to HMAC-SHA256. If we get around to implementing
encryption in Illumos, we would most likely go the same route. Thanks
for your insights, though, they are certainly valuable.


----- End forwarded message -----
Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE

More information about the cryptography mailing list