[Botan-devel] Small file ciphering speed

Mr Diggilin mr.diggilin at gmail.com
Wed Sep 24 00:36:26 EDT 2008


On Wed, 2008-09-17 at 23:08 -0400, Jack Lloyd wrote:
> On Thu, Sep 18, 2008 at 08:43:27AM +0800, Mr Diggilin wrote:
> > ...
> 
> > b. What kind of speed can I expect?
> 
> Well expectations may vary. :)
> 
> Obviously the answer depends on your hardware (almost entirely the
> CPU, and to some extent the memory) and the compiler a bit. But for
> reasonable commodity hardware (say, anything that one might sanely run
> a database process on), I would say anything more than 1 ms is far too
> long, and on a decent machine I would expect an operation of this sort
> to take tens of microseconds or so, if reasonably optimized; I'm not
> talking about heroic measures here, just using the right algorithms, a
> bit of caching, and avoiding excessive dynamic memory allocation.

This could end up running on just about any old hardware, the database
in question is sqlite (not client-server) and the program is for your
average person.
So right here is the most important question to me. Given 15,000 rows in
an sqlite database each containing two fields that need decrypting...
1 ms per row means 30 second load time.
200 microseconds per row means a 6 second load time.
100 microseconds per row means a 3 second load time.
Given my competition running 15,000 rows at 5 seconds, the first is
enough to force surrender, the second is acceptable, and the third is
much preferable. This is on a core duo (not core 2 duo) 2.16 ghz. Would
you guess the latter two goals to be attainable on such a processor?

> > c. I'm using an S2K with 100 hash rounds for each operation, which isn't
> > much. Would there be any security concerns if I reused the same key
> > (with more rounds) over all of the thousands of entries?
> 
> No, as long as you use different random IVs each time, using the same
> key over multiple messages is fine, and is the normal way to approach
> this problem. I would say do the S2K on the passphrase once, with many
> iterations (100 is tolerable but 1000-10000 is much better - and the
> whole idea is you only do it once, when you read the passphrase in
> from the config file or arguments, then it has minimal cost overall to
> runtime).

That sounds good, and I thought would be the way of things. How would I
do this in practice though? Right now, in order to do an S2K I need a
salt that is different for every field, do I just not change the salt
for each cipher operation? Also where what do you recommend for a random
IV generation?
> 
> > auto_ptr<Pipe> Crypto::RunCipher(string Passphrase, wxInputStream * In,
> > SecureVector<byte>& Salt)
> > {
> >   Cipher_Dir Dir = (Salt == NULL) ? ENCRYPTION : DECRYPTION;
> >   KeyAndIV = 100 round s2k;
> >   SymmetricKey Key(KeyAndIV, KeySize); //256 key
> >   InitializationVector IV(KeyAndIV + KeySize, IVSize); //128 IV
> >   auto_ptr<Pipe> Out???(new Pipe(get_cipher("Twofish/EAX", Key, IV,
> > Dir)));
> >   Out->start_msg();
> >   *In >> *Out;
> >   Out->end_msg();
> >   return Out;
> > }
> > 
> 
> Comments/questions:
> 
> - What is the Salt? Does it change every time, or is it specific to
> a particular database row?

As mentioned above, salt is changed every time. It's stored at the
beginning of the field. The function I gave is a private function called
by the public functions which extract the salt and pass it for
decryption, or for encryption pass it as NULL (in which which case it
gets filled in the S2K creation).

> - I see you are generating a random IV here: how are you saving it to
> decrypt later?

I don't save the IV, but the salt that generates it. Here's the function
that is called where I put "100 round s2k":

SecureVector<byte> Crypto::GetKeyAndIV(string Passphrase, int Size,
SecureVector<byte>& Salt)
{
	auto_ptr<S2K> s2k (get_s2k(S2KStr));
	s2k->set_iterations(S2KIterations);
	if (Salt == NULL)
	{
		s2k->new_random_salt(SaltSize);
		Salt = s2k->current_salt();
	}
	else
	{
		s2k->change_salt(Salt, SaltSize);
	}
	return s2k->derive_key(Size, Passphrase).bits_of();
}

Salt is always passed by reference, so in the case of encryption, when
the RunCipher function completes, you can write it. Like so:

auto_ptr<Pipe> pipe(RunCipher(Passphrase, PlainStream, Salt));
CipheredStream->Write(reinterpret_cast<void*>(Salt.begin()), SaltSize);
*Ciphered << *pipe;

> 
> - For optimization, you want to make this two distinct operations:
>   1) Taking a passphrase and a salt and turning it into a key (PBKDF2)
>       - this is slow (to make password guessing attacks harder)
>   2) Encrypting lots of fields very quickly using said key, and a salt
>      (or, better, a persistent row id, if that is feasible)
>        - fast because it needs to be
> 
Yes, I thought as much. As you say later, this is worst case for the
small message. The functions I'm giving are not only for these small
operations, and weren't really made while taking them into account, they
are generic methods which I use for files mainly. I'm going to refactor
the crypto class with some other methods for fast encryption for the
database stuff as soon as I understand fully how.

> - Saving the Pipe (along with the cipher object, the precomputed key
>   schedule, etc) will save a LOT of time in this. You are pretty much
>   worst case scenario for the current style of Pipe per message, since
>   you only get to process ~50 bytes / Pipe instance. Also, for
>   instance each time you call get_cipher to get a new object, it also
>   reruns the Twofish key schedule, which is fairly complex itself
>   (roughly takes as long as encrypting 16-32 bytes - again, almost as
>   much as your whole starting message! Lotta overhead). And
>   dynamically allocating the Pipe and returning it in the auto_ptr
>   will probably not be fast either (that is an interesting style
>   though - is that so callers can capture the Pipe if desired, with it
>   being released otherwise?).

Yes. In encryption, as shown above, I need to capture the pipe for use
after I read in the salt. However, for decryption I can read it
directly.

>   So what you really want to do is when you first realize what the
>   passphrase is, run PBKDF2, set up the Pipe with Twofish, etc, and
>   just keep it around for each message you want to encrypt. This
>   amortizes the S2K, and the Pipe creation costs, over all messages,
>   which should easily give a speedup of an order of magnitude.
> 
Sounds good to me. I can get the row id for the db if needed. As I said
above, I'm not sure what to do about salt, since it's only used in key
generation. Should I put the row id to use in getting an IV?

Also, is there any cleanup I'll have to do on the pipe between messages?
Can I just write in and read out indefinitely?

>   Or if you need to support multiple simultanious passphrases you
>   could use a map<string, Pipe>
> 
> I hope this helps, I can't give perfectly clear answers to all points
> since I don't know the value/purpose of the salt and probably numerous
> other important bits of info, but hopefully this should get you
> started on optimizing.
> 
> -Jack

Thank you very much for taking your time to provide such a helpful
response.

-Diggilin




More information about the botan-devel mailing list