[cryptography] rolling hashes, EDC/ECC vs MAC/MIC, etc.
zooko at zooko.com
Sat May 21 14:50:19 EDT 2011
Dear Nico Williams:
Thanks for the reference! Very cool.
What I would most want is for ZFS (and every other filesystem) to
maintain a Merkle Tree over the file data with a good secure hash.
Whenever a change to a file is made, the filesystem can update the
Merkle Tree this with mere O(log(N)) work in the size of the file plus
O(N) work in the size of the change. For a modern filesystem like ZFS
which is already maintaining a checksum tree the *added* cost of
maintaining the secure hash Merkle Tree could be minimal.
Then, the filesystem should make this Merkle Tree available to
applications through a simple query.
This would enable applications—without needing any further
in-filesystem code—to perform a Merkle Tree sync, which would range
from "noticeably more efficient" to "dramatically more efficient" than
rsync or zfs send. :-)
Of course it is only more efficient because we're treating the
maintenance of the secure-hash Merkle Tree as free. There are two
senses in which this is legitimate and it is almost free:
1. Since the values get maintained persistently over the file's
lifetime then the total computation required is approximately O(N)
where N is the total size of all deltas that have been applied to the
file in its life. (Let's just drop the logarithmic part for now,
because see 2. below.)
Compare this to the cost of doing a fast, insecure CRC over the whole
file such as in rsync. The cost of that is O(N) * K where N is the
(then current) size of the file and K is the number of times you run
rsync on that file.
The extreme case is if the file hasn't changed. Then for the
application-level code to confirm that the file on this machine is the
same as the file on that machine, it merely has to ask the filesystem
for the root hash on each machine and transmit that root hash over the
network. This is optimally fast compared to rsync, and unlike "zfs
send|recv" it is optimally fast whenever the two files are identical
even if they have both changed since the last time they were synced.
2. Since the modern, sophisticated filesystem like ZFS is maintaining
a tree of checksums over the data *anyway* you can piggy-back this
computation onto that work, avoiding any extra seeks and minimizing
extra memory access.
In fact, ZFS itself can actually use SHA-256 for the checksum tree,
which would make it almost provide exactly what I want, except for:
2. a. From what I've read, nobody uses the SHA-256 configuration in
ZFS because it is too computationally expensive, so they use an
insecure checksum (fletcher2/4) instead.
2. b. I assume the shape of the resulting checksum tree is modified by
artifacts of the ZFS layout instead of being a simple canonical shape.
This is a show-stopper for this use case because if the same file data
exists on a different system, and some software on that system
computes a Merkle Tree over the data, it might come out with different
hashes than the ZFS checksum tree, thus eliminating all of the
performance benefits of this approach.
But, if ZFS could be modified to fix these problems or if a new
filesystem would add a feature of maintaining a canonical,
reproducible Merkle Tree, then it might be extremely useful.
Thanks to Brian Warner and Dan Shoutis for discussions about this idea.
More information about the cryptography