[cryptography] 100 Gbps line rate encryption

Steve Weis steveweis at gmail.com
Mon Jul 1 18:35:12 EDT 2013


Are you assuming a single core?

I ran 'openssl speed' on an 8-core 2.9 GHz Intel Xeon E5-2690 with
hyperthreading enabled, which gives it 16 logical cores. It's an artificial
benchmark, but openssl is able to encrypt using AES-XTS with 128-bit keys
at 28 gigabytes / second for 8KB blocks, which is 225.2 gigabits per second.

This may not be relevant to the whichever platforms the original post was
thinking of. On the x86 platform I tested this on, memory bandwidth would
be the bottleneck before the crypto.

------------------------------------------------------------------------
Edited output:

$ uname -a
Linux [hostname] 3.2.0-41-generic #66-Ubuntu SMP Thu Apr 25 03:27:11 UTC
2013 x86_64 x86_64 x86_64 GNU/Linux

$ openssl speed -multi 16 -evp aes-128-xts
OpenSSL 1.0.1 14 Mar 2012
built on: Mon Apr 15 15:27:18 UTC 2013
[some build output omitted]
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192
bytes
evp            3994884.76k 11902064.66k 21140865.02k 26338644.65k
28151824.38k



On Sun, Jun 30, 2013 at 2:13 PM, <aortega at alu.itba.edu.ar> wrote:

> Oops, miscalculation. That should be a 6.5 Ghz clock for 100 Gbps. ((100
> Gbps/8)/2) . Anyway I don't think anybody has hardware that fast except
> maybe for IBM with the Power8.
>
> > The fastest hardware implementation of RC4 that I know is 2 bytes/clock.
> I
> > personally programmed a 1 byte/clock RC4 in a FPGA, it's quite simple.
> >
> > At 2 bytes/clock you still need a clock of 10 gigahertz to encrypt 100
> > Gbps. That's unfeasible, the way it's done is using paralelism, then you
> > can use any algorithm you want as long as you have silicon available.
> > Consider there are 400 Gbps systems coming online.
> >
> > Using a PC for that kind of workload is a waste of money and power. FPGAs
> > are not that expensive nowadays.
> >
> >
> >
> >
> >> Just as a data point, on x86 processors with AESNI you can encrypt AES
> >> in,
> >> say, XTS mode with about 0.75 cycles / byte on each core.
> >>
> >> On an Intel Xeon E5-2690 'openssl speed -multi 4 -evp aes-128-xts' tops
> >> out
> >> at 13.5 GB/s for 8k blocks, which is 108 Gbps. That's only using half
> >> the
> >> physical cores and no hyperthreading.
> >>
> >> However, that's unlikely a realistic benchmark for whatever context the
> >> original question was referring to.
> >>
> >>
> >> On Sat, Jun 22, 2013 at 5:25 PM, Peter Maxwell
> >> <peter at allicient.co.uk>wrote:
> >>
> >>>
> >>>
> >>> On 22 June 2013 23:31, James A. Donald <jamesd at echeque.com> wrote:
> >>>
> >>>>  On 2013-06-23 6:47 AM, Peter Maxwell wrote:
> >>>>
> >>>>
> >>>>
> >>>>  I think Bernstein's Salsa20 is faster and significantly more secure
> >>>> than RC4, whether you'll be able to design hardware to run at
> >>>> line-speed is
> >>>> somewhat more questionable though (would be interested to know if it's
> >>>> possible right enough).
> >>>>
> >>>>
> >>>> I would be surprised if it is faster.
> >>>>
> >>>>
> >>>>
> >>>
> >>> Given the 100Gbps spec, I can only presume it's hardware that's being
> >>> talked about, which is well outwith my knowledge.  We also don't know
> >>> whether there is to be only one keystream allowed or not.
> >>>
> >>> However, just to give an idea of performance: from a cursory search on
> >>> Google, once can seemingly find Salsa20/12 being implemented recently
> >>> on
> >>> GPU with performance around 43Gbps without memory transfer (2.7Gbps
> >>> with) -
> >>> http://link.springer.com/chapter/10.1007%2F978-3-642-38553-7_11 ) -
> >>> unfortunately I don't have access to the paper.
> >>>
> >>> On a decent 64-bit processor, the full Salsa20/20 is coming in around
> >>> 3-4cpb - http://bench.cr.yp.to/results-stream.html - and while cpb
> >>> isn't
> >>> a great measurement, it at least gives a feel for things.
> >>>
> >>>
> >>> Going on a very naive approach, I would imagine the standard RC4 will
> >>> suffer due to being byte-orientated and not particularly open to
> >>> parallelism.  Salsa20 operates on 32-bit words and from a cursory
> >>> inspection of the spec seems to offer at least some options to do
> >>> operations in parallel.
> >>>
> >>> If I were putting money on it, I suspect one could optimise at least
> >>> Salsa20/12 to be faster than RC4 on modern platforms; whether this has
> >>> been
> >>> done is another story.  Fairly sure Salsa20/8 was faster than RC4
> >>> out-of-the-box.
> >>>
> >>> As with anything though, I stand to be corrected.
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> cryptography mailing list
> >>> cryptography at randombit.net
> >>> http://lists.randombit.net/mailman/listinfo/cryptography
> >>>
> >>>
> >> _______________________________________________
> >> cryptography mailing list
> >> cryptography at randombit.net
> >> http://lists.randombit.net/mailman/listinfo/cryptography
> >>
> >
> >
> > _______________________________________________
> > cryptography mailing list
> > cryptography at randombit.net
> > http://lists.randombit.net/mailman/listinfo/cryptography
> >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.randombit.net/pipermail/cryptography/attachments/20130701/5d373627/attachment-0001.html>


More information about the cryptography mailing list