[Botan-devel] Problem with ECB mode
lloyd at randombit.net
Mon May 24 12:50:35 EDT 2010
On Mon, May 24, 2010 at 05:31:24PM +0200, Rickard Bellgrim wrote:
> I am having some problem with the ECB mode. The problem is that I do not get the correct length of data back from Botan. I only get 16B of encrypted data compared to expected 48B. The code does work for both AES (in CBC mode (have not implemented OFB and CFB)) and DES (in CBC, OFB, and CFB mode).
> Do you know what the problem could be? (Using Botan 1.9.7)
> You can find an example code below. I have simplified it and it should not compile because of the extra text.
I think the problem lies in this assumption:
> // Final
> int bytesRead2 = cryption->read(output2, ""This is the maximum size we can get: getBlockSize()");
You may well get back more than just a single block in the final
output, depending on the implementation. And, in this particular case,
you should see 48 bytes here with 1.9.7, with 0 bytes in the first
read. (After the second read finishes, I would guess that calling
remaining() on your Pipe object would return 32).
Why is this? ECB mode (and some others, including XTS, and CBC in the
decryption direction) will buffer multiple blocks together and
encrypt/decrypt them in one go. This improves cache efficiency in some
cases, and additionally exposes multiple blocks to the cipher
implementation which is very important for vector optimizations. That
probably doesn't apply in this particular case; the only AES
implementation that processes multiple blocks natively in botan
currently is the one that uses the AES-NI x86 instructions, which are
only available on the latest Westmere processors right now. In the
future there might be others, such as a bitsliced implementation of
AES that processes 8 blocks, or even 128, in parallel. Or a CUDA
implementation that might process in 4K or larger chunks.
However by default 4 blocks are buffered for even the iterative block
ciphers for cache reasons (this was the primary reason, I measured
noticable speedups even with the base AES implementation with this,
but also has the nice side benefit that it ensures code is capable of
handling this situation; otherwise it might work when written and then
break later when used against an implementation compiled to use AES-NI
or SSE4 or CUDA or Larabee or some other vector instruction set).
So for AES, the output will be chunked with granularity 4*16=64 bytes;
at the point you've added all the data, no encryption has actually
occured and no ciphertext is available; it is waiting until you've
called end_msg() [or added another 16 bytes to round it out to 4
blocks] and then encrypts everything at that point.
Other ciphers will use different chunk sizes; for instance the IDEA
implementation that uses SSE2 processes 8 blocks at a time (though
since IDEA is a 64 bit cipher it also ends up as 64 byte chunks).
It looks like mirroring this behavior in PKCS #11 is OK; section 11.2
of the spec explicitly allows this:
1. If pBuf is NULL_PTR, then all that the function does is return (in
*pulBufLen) a number of bytes which would suffice to hold the
cryptographic output produced from the input to the function. This
number may somewhat exceed the precise number of bytes needed, but
should not exceed it by a large amount. CKR_OK is returned by the
2. If pBuf is not NULL_PTR, then *pulBufLen must contain the size in
bytes of the buffer pointed to by pBuf. If that buffer is large
enough to hold the cryptographic output produced from the input to
the function, then that cryptographic output is placed there, and
CKR_OK is returned by the function. If the buffer is not large
enough, then CKR_BUFFER_TOO_SMALL is returned. In either case,
*pulBufLen is set to hold the exact number of bytes needed to hold
the cryptographic output produced from the input to the function.
This may break PKCS #11 applications which expect the same
block-at-a-time buffering scheme you expected and which most software
libraries use. However it's worth noting that if a hardware token or
module was used for encryption, it is very likely a similiar buffering
system would apply (in that case, to amortize the cost of expensive
host<->device I/O transfers, rather than to be able to use vector
operations, but same effect as far as the application would
see). (Presumably this reason is why PKCS #11 allows buffering in the
Hope this clears things up.
More information about the botan-devel