[cryptography] RDRAND and Is it possible to protect against malicious hw accelerators?
marsh at extendedsubset.com
Sat Jun 18 20:16:39 EDT 2011
On 06/18/2011 03:08 PM, slinky wrote:
> Likewise, for accelerated component functions the hardware will know
> what is a key and what is input data - again, it needs this
> information in order to operate. Contrast this to a general purpose
> processor which can't really deduce what is a key and what isn't
> while processing code that happens to be AES.
As Peter Gutmann just said "They really have waaaay too much die space
to spare don't they?"
Intel bought McAfee a while ago. Speaking informally with some chip
people (not necessarily from Intel and definitely not under any
confidentiality) there's active research around the idea of building
instruction stream validation in support of antivirus directly into the
processor. Recognizing an intentionally-obfuscated virus seems no easier
than recognizing AES.
The Intel hardware RNG ("DRBG") is an example of how not do do it. It
has weird timings and magic numbers:
> There are two approaches to structuring RdRand invocations such that
> DRBG reseeding can be guaranteed: Iteratively execute RdRand beyond
> the DRBG upper bound by executing more than 1022 64-bit RdRands, and
> Iteratively execute 32 RdRand invocations with a 10us wait period
> per iteration. The latter approach has the effect of forcing a
> reseeding event since the DRBG aggressively reseeds during idle
But on any kind of networked or mutitasking system "idle periods" happen
only at the consent of the attacker.
> Within the context of virtualization, the DRNG's stateless design
> and atomic instruction access means that RdRand can be used freely
> by multiple VMs without the need for hypervisor intervention or
> resource management.
Hmm, are we sure none of this carries across any security boundaries?
In another place they say:
> After invoking the RdRand instruction, the caller must examine the
> Carry Flag (CF) to determine whether a random value was available at
> the time the RdRand instruction was executed.
So it's not stateless after all because it keeps a FIFO of numbers and
emits all zeroes when it "runs out".
To me, this only makes sense if one of the following might be true:
* the entropy pool is so small that it's in danger of brute-forcing
* the pool's contents can be read out somehow, or
* the extraction process (NIST SP 800-90 CTR_DRBG AES) may not be
But I know there are other opinions :-). More than likely though,
they're doing this to follow "best practices".
At the very least this is going to disclose to an attacker on another
core how many random numbers you're consuming. Random numbers can often
be requested via the network, so he sends changing rates of SSL or IPsec
handshake requests while watching how fast the pool depletes. This
enables him to determine if he's running on the same processor as your
crypto thread. It may also create a covert channel for exfiltration. Of
course, there are other shared resources that might already be an easier
way to do this.
Still, we can predict how this story will turn out because we have
several examples of what happens in practice when RNGs decide that they
have "run out" of numbers: the client code continues running with
whatever garbage it gets because it's not a situation that the software
developer ever encountered in his debugger, or one which a QA team ever
noticed in the lab. At best, it will continue along an untested code path.
The thing _least_ likely to happen is for the operation actually needing
the CSRNs to fail, because that would be a conspicuous bug which would
have to be "fixed" somehow.
So the Intel "DRNG" has observable shared internal state and is shared
among multiple cores. Even worse, *an attacker running on one core has
the ability to cause the RDRAND instruction to write zeroes to the
destination register* !
Note that the carry flag isn't accessible from C. The RDRAND instruction
isn't either, but there will be inline assembler snippets floating
around any day now.
Just to pick on Peter Gutmann:
> How would you encode, for example, 'RdRand eax'?
> I'd like to get the encoded form to implement it as '__asm _emit 0x0F __asm
> _emit 0xC7 __asm _emit <something>' (in the case of MSVC).
Note that he's not asking about how to check the carry flag too. I'm
sure he of all people wouldn't forget this, but not so for your typical
It's possible to check the carry flag from inline asm:
So if you were a C programmer who didn't know x64 assembler *maybe* you
could find the right advice in that thread and get the carry flag out
reliably. But how would you test it? How would QA test it?
It's certainly possible to get everything right, but
* requires significant skill and effort on the developer's part
* zero observable benefit in the normal case
* proper error handling reduces perceived reliability
* silent vulnerable failure mode
* difficult to repro on the developers box (masked by debugger)
* difficult to repro on the QA box
* may change with new processor revisions
* relatively easy for the attacker under the right conditions
This is a recipe for fail.
For comparison, look at the RDTSC instruction. Here's a nice helpful
page (I've referred to it myself) with code to read the processor
Microsoft's compiler even provides a compiler intrinsic for it:
Note that neither of these two sources provide any code to determine the
processor support for and the validity of the values returned by this
instruction! In fact, the numbers it returns can be really wacky on
multi-cpu architectures. Intel and AMD behave differently, even across
chip models. You should probably set CPU affinity on any thread that's
going to use it and be prepared to handle a certain amount of wacky
> Now, put on your tinfoil beanie and suppose the hw accelerator is a
> Mallory. Suppose there is some kind of a built-in weakness/backdoor,
> for instance as a persistent memory inside the chip, which stores
> the last N keys.
Years ago, somebody ran "strings" on a Windows system DLL and found the
string "NSAKEY". http://en.wikipedia.org/wiki/NSAKEY There was a lot of
speculation and outright accusations that this was a backdoor to Windows.
Ah, those were innocent times. :-)
Since then we've learned more about exploiting ordinary software bugs
and there have been *thousands* of remote holes discovered in Microsoft
The idea that the NSA (or any skilled attacker) would have needed
Microsoft's help to execute code remotely on Windows NT is now simply
laughable. The same goes for other commonly-used OSes.
Microsoft's code quality is vastly improved and now far ahead of most of
the rest of the industry. But we know there are still hundreds of
"trusted" root CAs, many from governments, that will silently install
themselves into Windows at the request of any website. Some of these
even have code signing capabilities.
I guess the point here is that unless you're talking about hardware
constructed specifically for a high-security environment, sneaking
special key detection logic into the microarchitecture seems like the
world's most overcomplicated way of pwning a commodity box.
> Having physical access to the machine would yield
> the keys (thus subverting e.g. any disk encryption). And even more
> paranoidly, a proper instruction sequence could blurt the key cache
> out for convenient remote access by malware crafted by the People
> Who Know The Secrets.
It wouldn't need to be anything that obvious. Just a circuit
configuration that leaks enough secret stuff via a side channel (timing,
power, RF, etc.) that it could be captured.
Things like data caches and branch prediction buffers have been shown to
do this as a natural consequence of how they operate. I think when an
incomprehensibly complex black-box system has things like subtle info
leaks happening by accident it's a good sign that intentional behaviors
could easily be hidden.
> My questions: 1. How can one ensure this blackbox device really
> isn't a Mallory?
Mallory lives on the communications link, which is neither Alice's nor
Bob's CPU by definition.
But it could certainly be malicious and there's probably no way to prove
that it isn't, particularly on a chip as dense as a modern x64 CPU.
Being in your chips is the ultimate in physical access to the hardware,
so you have no choice but to use only hardware that you "trust". But
there may be ways to split up the data such that the attacker would need
to have previously compromised multiple, disparate systems.
> 2. Are there techniques, such as encrypting a lot of useless junk
> before/after the real deal to flush out the real key, as a way to
> reduce the impact of untrusted hardware, while still being able to
> use the hw-accelerated capabilities? And if you know of any good
> papers around this subject, feel free to
> mention them :)
Encrypting a lot of stuff with the same key would probably just make the
side channel easier to read. Switching keys on every block might help,
but only if the attacker didn't expect and account for that when he
designed the backdoor.
More information about the cryptography