[Botan-devel] [Updated] RSA, DES on x86-64 (was Re: RSA, DSA in Botan and OpenSSL on x86-64)

Jack Lloyd lloyd at randombit.net
Fri Oct 17 09:22:05 EDT 2008


On Thu, Aug 28, 2008 at 12:14:32PM -0400, Jack Lloyd wrote:
> 
> Botan 1.7.9 vs OpenSSL 0.9.8 and 0.9.9 20080828 snapshot
[snip]

Update/retraction:

With net.randombit.botan head, I ran a set of benchmarks for Botan
with several compilers.

I modified the benchmark app slightly, to use PKCS #1 v1.5 padding, to
match OpenSSL (EME1 aka OAEP, in particular, is noticably slower than
PKCS #1 v1.5 padding or signatures using PSS) - I need to profile
this).

This is with a Core2 Q6600.

For RSA, in what I think/hope is a fair comparison:

RSA 1024 64 keygen per second; 15.52 ms/keygen (1 ops in 15.52 ms)
RSA 1024 EMSA-PKCS1-v1_5(SHA-1) 25417 verify per second; 0.04 ms/verify (127087 ops in 5000.02 ms)
RSA 1024 EMSA-PKCS1-v1_5(SHA-1) 1218 signature per second; 0.82 ms/signature (6106 ops in 5010.38 ms)
RSA 1024 EME-PKCS1-v1_5 23410 encrypt per second; 0.04 ms/encrypt (117053 ops in 5000.07 ms)
RSA 1024 EME-PKCS1-v1_5 1233 decrypt per second; 0.81 ms/decrypt (6170 ops in 5000.65 ms)
RSA 2048 5 keygen per second; 173.75 ms/keygen (1 ops in 173.75 ms)
RSA 2048 EMSA-PKCS1-v1_5(SHA-1) 8074 verify per second; 0.12 ms/verify (40372 ops in 5000.11 ms)
RSA 2048 EMSA-PKCS1-v1_5(SHA-1) 219 signature per second; 4.56 ms/signature (1099 ops in 5016.44 ms)
RSA 2048 EME-PKCS1-v1_5 7886 encrypt per second; 0.13 ms/encrypt (39432 ops in 5000.03 ms)
RSA 2048 EME-PKCS1-v1_5 220 decrypt per second; 4.54 ms/decrypt (1102 ops in 5000.16 ms)
RSA 3072 0 keygen per second; 2306.25 ms/keygen (1 ops in 2306.25 ms)
RSA 3072 EMSA-PKCS1-v1_5(SHA-1) 3169 verify per second; 0.32 ms/verify (15847 ops in 5000.01 ms)
RSA 3072 EMSA-PKCS1-v1_5(SHA-1) 58 signature per second; 17.22 ms/signature (292 ops in 5028.72 ms)
RSA 3072 EME-PKCS1-v1_5 3133 encrypt per second; 0.32 ms/encrypt (15667 ops in 5000.10 ms)
RSA 3072 EME-PKCS1-v1_5 58 decrypt per second; 17.18 ms/decrypt (292 ops in 5017.12 ms)
RSA 4096 0 keygen per second; 1981.99 ms/keygen (1 ops in 1981.99 ms)
RSA 4096 EMSA-PKCS1-v1_5(SHA-1) 2339 verify per second; 0.43 ms/verify (11698 ops in 5000.15 ms)
RSA 4096 EMSA-PKCS1-v1_5(SHA-1) 33 signature per second; 29.68 ms/signature (170 ops in 5045.19 ms)
RSA 4096 EME-PKCS1-v1_5 2323 encrypt per second; 0.43 ms/encrypt (11616 ops in 5000.14 ms)
RSA 4096 EME-PKCS1-v1_5 33 decrypt per second; 29.41 ms/decrypt (170 ops in 5000.37 ms)


$ openssl speed rsa
Doing 512 bit private rsa's for 10s: 73116 512 bit private RSA's in 9.96s
Doing 512 bit public rsa's for 10s: 780937 512 bit public RSA's in 9.97s
Doing 1024 bit private rsa's for 10s: 15101 1024 bit private RSA's in 9.95s
Doing 1024 bit public rsa's for 10s: 278641 1024 bit public RSA's in 9.94s
Doing 2048 bit private rsa's for 10s: 2456 2048 bit private RSA's in 9.96s
Doing 2048 bit public rsa's for 10s: 82109 2048 bit public RSA's in 9.96s
Doing 4096 bit private rsa's for 10s: 351 4096 bit private RSA's in 9.98s
Doing 4096 bit public rsa's for 10s: 22010 4096 bit public RSA's in 9.96s
OpenSSL 0.9.8h 28 May 2008
built on: Thu Oct  9 21:41:49 EDT 2008
options:bn(64,64) md2(int) rc4(1x,char) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(ptr2) 
compiler: x86_64-pc-linux-gnu-gcc -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DKRB5_MIT -DL_ENDIAN -DTERMIO -Wall -DMD32_REG_T=int -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -march=core2 -O2 -pipe -momit-leaf-frame-pointer -Wa,--noexecstack
available timing options: TIMES TIMEB HZ=100 [sysconf value]
timing function used: times
                  sign    verify    sign/s verify/s
rsa  512 bits 0.000136s 0.000013s   7341.0  78328.7
rsa 1024 bits 0.000659s 0.000036s   1517.7  28032.3
rsa 2048 bits 0.004055s 0.000121s    246.6   8243.9
rsa 4096 bits 0.028433s 0.000453s     35.2   2209.8

The single true advantage for Botan seems to be 4096 verifies.

This sudden drop/change is because of the benchmark changes I made in
1.7.10. In particular padding is used - previously it was just the
bare RSA operation. I realized this was a) unfair b) not realistic.

So the summarize the RSA results: still a lot of work to do!

On the plus side, I optimized 3DES quite a bit recently. Full numbers
below, and keep in mind OpenSSL's numbers are in 1000s of bytes, Botan
uses 2^20. Which if I did the math right means (at least on my
personal desktop machine, YMMV and probably will, and I haven't
checked this result on any other machine), 3DES in Botan is at least
15% faster than OpenSSL's implementation.

The numbers:

$ ./check --bench-algo=DES,TripleDES,DES/CBC/PKCS7,TripleDES/CBC/PKCS7
DES:                        48.43 MiB/sec
TripleDES:                  18.93 MiB/sec
DES/CBC/PKCS7:              46.07 MiB/sec
TripleDES/CBC/PKCS7:        18.54 MiB/sec

$ openssl speed des
Doing des cbc for 3s on 16 size blocks: 7658639 des cbc's in 2.97s
Doing des cbc for 3s on 64 size blocks: 1952863 des cbc's in 2.97s
Doing des cbc for 3s on 256 size blocks: 493279 des cbc's in 2.98s
Doing des cbc for 3s on 1024 size blocks: 124143 des cbc's in 2.98s
Doing des cbc for 3s on 8192 size blocks: 15539 des cbc's in 2.99s
Doing des ede3 for 3s on 16 size blocks: 2987825 des ede3's in 2.98s
Doing des ede3 for 3s on 64 size blocks: 756969 des ede3's in 2.99s
Doing des ede3 for 3s on 256 size blocks: 191166 des ede3's in 2.99s
Doing des ede3 for 3s on 1024 size blocks: 47916 des ede3's in 3.00s
Doing des ede3 for 3s on 8192 size blocks: 5973 des ede3's in 2.98s
OpenSSL 0.9.8h 28 May 2008
built on: Thu Oct  9 21:41:49 EDT 2008
options:bn(64,64) md2(int) rc4(1x,char) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(ptr2) 
compiler: x86_64-pc-linux-gnu-gcc -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DKRB5_MIT -DL_ENDIAN -DTERMIO -Wall -DMD32_REG_T=int -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -march=core2 -O2 -pipe -momit-leaf-frame-pointer -Wa,--noexecstack
available timing options: TIMES TIMEB HZ=100 [sysconf value]
timing function used: times
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
des cbc          41258.66k    42081.90k    42375.65k    42658.53k    42573.74k
des ede3         16042.01k    16202.68k    16367.39k    16355.33k    16419.74k

About 10 years too late for that to be a deciding factor for anyone,
but still, I was pleased. I think it's mostly due to using a
table-based 64-bit IP/FP instead of the bit ops that are normally
used. I wrote it that way years ago thinking parallel memory access
would ultimately be faster than a long chain of dependent bit ops
(there is a lot of latency to the memory access, but if the cache and
memory can handle it, the CPU can dispatch the entire set of loads in
parallel, and do the xors in parallel as well). I tried replacing the
IP/FP with the bit op sequence used in Crypto++ - Botan immediately
became slower than OpenSSL for DES and 3DES.

This may only be an optimization on machines with very good memory
bandwidth, though (and a pessimization otherwise). However I tend to
optimize for the hardware I have (or that I want to get in the future).

-Jack



More information about the botan-devel mailing list