[cryptography] ZFS dedup? hashes (Re: [zfs] SHA-3 winner announced)

Nico Williams nico at cryptonector.com
Wed Oct 3 11:11:05 EDT 2012

On Wed, Oct 3, 2012 at 9:19 AM, Dr Adam Back <adam at cypherspace.org> wrote:
> Incidentally a somewhat related problem with dedup (probably more in cloud
> storage than local dedup of storage) is that the dedup function itself can
> lead to the "confirmation" or even "decryption" of documents with
> sufficiently low entropy as the attacker can induce you to "store" or
> directly query the dedup service looking for all possible documents.  eg say
> a form letter where the only blanks to fill in are the name (known
> suspected) and a figure (<1,000,000 possible values).
> Also if there is encryption there are privacy and security leaks arising
> from doing dedup based on plaintext.

Compression at lower layers tends to leak.  We've seen this in VOIP,
and now CRIME.  Dedup is a compression function running at a lower
layer (i.e., lower than the application writing the file contents).
Of course, dedup is not a compression function that is easily applied
at the application layer, so if you really need dedup, then you need
it at lower layers.  The question is: do you need dedup and
confidentiality protection for the same data?  I think most would
answer "no".

> And if you are doing dedup on ciphertext (or the data is not encrypted), you
> could follow David's suggestion of HMAC-SHA1 or the various AES-MACs.  In
> fact I would suggest for encrypted data, you really NEED to base dedup on
> MACs and NOT hashes or you leak and risk bruteforce "decryption" of
> plaintext by hash brute-forcing the non-encrypted dedup tokens.

Encrypted ZFS hashes and authenticates ciphertext.  The attacker is
presumed to observe all on-disk data, including ciphertext, block
pointers (which contain authentication tags and hashes), ...  The
attacker can observe dups as well as ZFS, and can attempt passive and
active attacks.  Dedup certainly adds to the attacker's traffic
analysis capabilities, but also to the attacker's active attack
capabilities (e.g., if the attacker can mount a chosen plaintext
attack).  Note that encrypted ZFS can only dedup within sets of
datasets that share the same keys.

What difference does it make if dedup uses an authentication tag or a
hash of ciphertext?  Assuming no collisions anyways, and if dups are
verified then collisions make little difference as far as dedup is
concerned.  I think the harm is done first by compressing and
encrypting at a layer lower than the application; encryption can be
done at lower layers, but compression is best left to the application


More information about the cryptography mailing list