[cryptography] New bit-fiddling instructions in Intel's Haswell
sneves at dei.uc.pt
Tue Jun 14 11:48:38 EDT 2011
On 14-06-2011 13:13, Jack Lloyd wrote:
> Intel has publicly described the new instructions that will be
> available in Haswell (their 22nm chip with ETA 2013). It will include
> integer AVX, and some interesting new bit fiddling instructions for
> GPRs, including bit-level gather/scatter instructions (pext/pdep),
> and an unsigned multiply instruction that doesn't set flags which
> seems intended for modexp.
> I suspect there are some interesting possibilities with
> pext/pdep. While it's about 15 years too late to matter, a table-less
> DES running entirely in registers seems possible. And last year I
> played around with a Serpent implementation using pshufb for the 4-bit
> sboxes, but couldn't find a way of doing the linear transformation
> quickly; doing the sboxes in the xmm registers and the linear
> operation in GPRs with these might work out, though.
> Anyone see other ways to use the new instructions in interesting ways,
> cryptographically speaking?
> The instruction reference (PDF) is posted on their formum:
> cryptography mailing list
> cryptography at randombit.net
I'm actually unhappy that VPSHUFB still doesn't allow 32-byte wide
shuffles (each half does its own 16-byte shuffle. It would seem Intel is
still unwilling to abandon their 128-bit wide microarchitecture just
yet...I'd very much like that Intel adopt the far more powerful PPERM
instruction, part of AMD's XOP extensions. It would allow for 6-bit
table lookups, which would even allow for AES "bruteforce" table lookups
in a reasonable amount of cycles.
The new integer shift instructions are, of course, interesting (I
thought to have seen a rotation instruction in there, but apparently it
was wishful thinking). Since every shift constant is variable, it
allows for faster Salsa20/ChaCha20, but also speeds Threefish/Skein up
by performing the various MIX rounds simultaneously.
I can see the bit permutation instructions being used to do very fast
binary field squaring (a linear operation) --- will probably be much
faster than recurring to PCMULCLDQ, which is not too fast (in current
hardware, at least).
The "Post-32nm" instructions also contain RDRAND, which gives access to
a hardware random bit generator. It would be interesting to see how
these bits are being generated.
More information about the cryptography