[botan-devel] Understanding authentication in two-pass authenticated encryption algorithms

Jack Lloyd jack at randombit.net
Tue Dec 3 18:52:37 EST 2019


On Tue, Dec 03, 2019 at 01:15:52AM +0100, Paul Jones wrote:
> Hopefully I am posting in an appropriate place with a relevant question. I am new to this and want to be sure I am using the library correctly.
> 
> In my understanding, two pass authentication algorithms (I'm currently trying out "AES-256/EAX") provide two passes on decryption where the first pass validates the authentication tag and the second pass decrypts only if the message is authenticated. With this, as I understand, usage where authentication fails should not give an attacker any visibility into any decryption output before the tag is authenticated. That seems highly desirable and worth a penalty compared to single-pass algorithms that are faster if the application allows for this.
> 
> Now I'm trying to reconcile this understanding with the Botan AEAD_Mode API that has start(), update(), and finish() with large messages that do not comfortably fit in memory. I see a note in the documentation that touches on this and explains that start() can be followed with finish() and the memory buffer can be immediately discarded if finish fails with a tag authentication error. However, what about the case where I want to decrypt in chunks using update(), read from disk, where the content is very large and don't want to attempt loading it all into one memory buffer? Is there a way to authenticate such large content before streaming out decryted data for an attacker to see only to discover the error on the finish() call?
> 
> Hopefully that is understandable. Thanks for your time.

Hi Paul,

Yes, a completely reasonable and coherent question.

The start/update/finish API exists because some existing formats rely on being able to encrypt/decrypt arbitrarily large
ciphertexts which as you say may be too large to fit into memory, and one of Botan's design constraints is that it be
usable to implement any useful or widely adopted protocol or format. This unfortunately requires sometimes providing
APIs which are unsafe or prone to misuse. It is certainly preferable if possible to use start+finish to do the
decryption in one fell swoop.

If the message was encrypted as a single large block, then there isn't any way of detecting corruption until the MAC is
checked. Since the MAC depends on all of the bits of the ciphertext, even with the MAC in front of the message the
entire message must have been processed to verify it. So the best you can do is to dump the decrypted (supposed)
plaintext to a temporary file, and then at the point of verifying the MAC, if it fails, delete the file without trying
to parse it.

Under *no cicumstances* should you ever attempt to act on the decrypted plaintext before finish has returned
successfully, because until that happens you have no idea if the ciphertext was sent by a legitimate peer, or some
attacker, and in many very plausible settings it is possible for an attacker to construct ciphertexts which decrypt to
semantically reasonable plaintexts but which fail to authenticate.

One excellent way of handling this situation - if you control the format at least - is to split the message up into
several smaller chunks. For example, split the message into 4K blocks, and encrypt each block as its own EAX
ciphertext. This will cause a slight ((4096+12+16)/4096 =~ .7%) space overhead, assuming 12 byte nonces and 16 byte
authentication tags. However there are ways for this approach to go badly wrong; you have to account for issues such as
reordering or repeating blocks, or early truncation, where an attacker simply removes some number of trailing blocks,
and the receipent decrypts and acts on the message in some way, being completely unaware that some trailing portion of
the message, which may change the context entirely, has been removed.

The STREAM and CHAIN protocols given in https://eprint.iacr.org/2015/189.pdf are somewhat popular for this purpose. That
I have to give you a reference to a 42 page academic paper for solving this simple problem is absolutely a failure on
the part of the library; Botan should have an out of the box solution for this.

One minor nit, which I mention only because you may come across documentation elsewhere that causes confusion - in fact
EAX (and GCM, OCB, etc) are one pass or "online" in the sense that it is possible to encrypt or decrypt data on the fly
without being able to see the entire message at once. In contrast CCM and SIV are truly two-pass, and when decrypting a
message in those modes, the result isn't available at all until the MAC has been verified. The advantage of one-pass
modes is they are generally faster, but lose some of the good properties that two-pass modes can provide.  (CCM does not
actually provide those properties, it is only included in the lib because CCM is a widely used standard in IoT. In
constrast SIV is two-pass, and has much more robust properties wrt nonce reuse than other modes such as EAX or GCM.)

Hope this helps, certainly feel free to send any followup questions on this topic.

Best,
  Jack


More information about the botan-devel mailing list