[Botan-devel] Failing RSA and ElGamal tests on most platforms in 1.7.10+ (was A question about the Debian builds of Botan)

Jack Lloyd lloyd at randombit.net
Mon Sep 15 15:24:23 EDT 2008

[CC'ing botan-devel, this is an interesting bug in BigInt in 1.7.10 onwards]

Background for botan-devel: I asked Daniel Baumann, who maintains the
Debian package of Botan, to enable the test suite (previously, it was
built, but not run). It turns out RSA, RW, and a few other tests were
failing on most machines (everything Debian has except x86, x86-64,
and m68k). See the build reports at
http://buildd.debian.org/pkg.cgi?pkg=botan-devel, for instance
PowerPC (http://buildd.debian.org/fetch.cgi?pkg=botan-devel;ver=1.7.11-1;arch=powerpc;stamp=1221395045)

On some platforms there are failures in Rabin-Williams and
ElGamal. However most of them are in RSA, and those only occur in
private key operations. That is pretty insidious since even a single
failed RSA computation (caused either by software or hardware error,
or active attack as often seen on smartcards/embedded systems) can
leak the entire private key. However since version 1.2.2 (released in
May 2003), any time a RSA or RW private key operation is performed, it
is then checked against the public operation to make sure the result
is valid. This caught the error, which is fortunate (otherwise RSA
keys would easily be comprimised), but is bad (because it is very hard
to find out exactly which computation went wrong!)

I've looked at this further, it is actually quite odd and surprising.
In 1.7.10 I made a change that allowed BigInt to cache the size of its
significant word count: if anyone accesses the underlying word array
and gets a non-const reference or pointer, the cache is invalidated
(it would be nice to be smarter about this, but it solved the
immediate performance problem, IIRC dropping in valgrind/callgrind
reports from 5% to <.5%, which seemed good enough).

I have been using a 64-bit PowerPC (hazelnut.osuosl.org - I think
pretty much anyone with a plausible reason can get an account -
http://powerdev.osuosl.org/) to test what is going on here. A BigInt
value is decremented or otherwise reduced in size, but the cached
value remains and then errors occur. Occurs with both 32 and 64 bit
builds on PPC.

The array of words that reprsents the BigInt is guarded by the
BigInt::Rep class.  Rep guarantees (I thought!) that if a writeable
reference is returned, then the cached size will be invalidated.

At first I thought it might be some code getting a const reference to
the register, and then calling some operation on the SecureVector that
had non-const consequences. In particular, MemoryRegion's members were
all mutable, mostly to support the grow_to/grow_by functions being
const. I thought perhaps some code was changing the size underneath
the Rep, so I removed the const-ness of those function (which really
should never have been const, since they do mutate the object), and
remove the mutable marker from the members. I then changed
BigInt::grow_by / grow_to to also be non-const, since they were now
calling non-const members.

However that did not fix the problem! The cache continued to get off
by one errors, usually when the value had reduced in size by one
significant word.

It appears that something is changing the underlying values without
Rep being made aware of it. Probably this happens via a sequence like:

  word* writable_ptr = rep.get_reg().begin(); // zeros cache
  rep.sig_words(); // recalcs cache
  *writable_ptr = 0; // cached value invalid
  rep.sig_words(); // returned invalid cached value

Ideas definitely appreciated! I have been looking at this for much of
today and can't find out what is going wrong.


More information about the botan-devel mailing list