[botan-devel] Threaded Filters/Operation Parallelisation

Joel Low joel at joelsplace.sg
Thu Jan 3 17:51:18 EST 2013


Hmm, I haven't thought that far.

The main driving force behind this was a small little utility I wrote to 
calculate file hashes. Prior to this patch I used Botan::Fork, since it could 
calculate the file hash using a few different algorithms at once, and when 
expensive algorithms were run, how fast the program completed became solely 
dependent on how fast (or slow) the slowest algorithm specified was because of 
how Fork ran every filter in sequence.

So after writing this I just replaced all instances of Botan::Fork with 
Botan::Threaded_Fork (I'm not kidding... I designed it such that any code 
using Fork can be changed to Threaded_Fork, hopefully my implementation stayed 
true to that) and then re-compiled my program. I did some trial runs on my 
computer (Sandy Bridge-E, i7 3930K), hashing disk images (~815MB) from a 
ramdisk:

Threaded fork:
C:\Users\Joel>hash Adler32 BMW-512 Keccak-1600(224) Keccak-1600(256) 
Keccak-1600(384) Keccak-1600(512) MD5 RIPEMD-128 RIPEMD-160 SHA-1 SHA-224 
SHA-256 SHA-384 SHA-512 Tiger Skein-512 Whirlpool -f "I:\Image.ISO"
<hashes omitted>
Calculated in 15.2s

Fork:
D:\Development\Tools\bin\Release\x64>hash Adler32 BMW-512 Keccak-1600(224) 
Keccak-1600(256) Keccak-1600(384) Keccak-1600(512) MD5 RIPEMD-128 RIPEMD-160 
SHA-1 SHA-224 SHA-256 SHA-384 SHA-512 Tiger Skein-512 Whirlpool -f 
"I:\Image.ISO"
<hashes omitted>
Calculated in 68.4s

I've only tested using hash algorithms, so I've not seen how the performance 
difference will be like for other filter chains. I guess the main thing 
determining speed up is what fraction of a single thread one filter takes: if 
a thread is consumed by one filter, parallelising it may not make much of a 
difference (in this implementation, anyway) because other filters will be 
waiting on it. In my tests above I've omitted

CRC24 CRC32 GOST-34.11 HAS-160 MD2 MD4

Because those functions were relatively costly in the first place. 
Parallelising them won't create that much of a speed up. The same test above, 
including these functions:

Threaded fork:
C:\Users\Joel>hash Adler32 BMW-512 Keccak-1600(224) Keccak-1600(256) 
Keccak-1600(384) Keccak-1600(512) MD5 RIPEMD-128 RIPEMD-160 SHA-1 SHA-224 
SHA-256 SHA-384 SHA-512 Tiger Skein-512 Whirlpool CRC24 CRC32 GOST-34.11 
HAS-160 MD2 MD4 -f I:\Image.ISO
<Hashes omitted>
Calculated in 169s

Fork:
D:\Development\Tools\bin\Release\x64>hash Adler32 BMW-512 Keccak-1600(224) 
Keccak-1600(256) Keccak-1600(384) Keccak-1600(512) MD5 RIPEMD-128 RIPEMD-160 
SHA-1 SHA-224 SHA-256 SHA-384 SHA-512 Tiger Skein-512 Whirlpool CRC24 CRC32 
GOST-34.11 HAS-160 MD2 MD4 -f "I:\Image.ISO"
<Hashes omitted>
Calculated in 267s

Regards,
Joel

-----Original Message-----
From: botan-devel [mailto:botan-devel-bounces at randombit.net] On Behalf Of Jack 
Lloyd
Sent: Friday, 4 January 2013 1:00 AM
To: Botan development list
Subject: Re: [botan-devel] Threaded Filters/Operation Parallelisation


Thanks Joel!

Have you done any measurements on what kind of speedup can be obtained on a 
multicore processor? I would imagine that the overhead of thread interactions 
exceeds the gains for filters below some cost/byte. I'm not sure if it is 
cycles/byte though; cache line access/byte may be a more important question in 
many multicore cases.

Something else that would be very useful would be a demo application that 
demonstrates usage and can let users understand what kind of speedup they 
might see on their machines from using it, do you have some test code that you 
could adapt to an example?

Thanks,
  Jack

On Wed, Jan 02, 2013 at 11:01:12PM +0000, Joel Low wrote:
> Thanks for the review, Jack.
>
> I've included the updated patch. Let me know if that works.
>
> Regards,
> Joel
>
> -----Original Message-----
> From: botan-devel [mailto:botan-devel-bounces at randombit.net] On Behalf
> Of Jack Lloyd
> Sent: Wednesday, 2 January 2013 11:49 PM
> To: Botan development list
> Subject: Re: [botan-devel] Threaded Filters/Operation Parallelisation
>
> Hi Joel,
>
> I've only had time to very briefly review the patch but it looks good.
>
> A few minor comments:
>
> - Can you use an m_ prefix on all class members? I know a lot of the
>   code doesn't use a prefix or suffix at all, but I am trying to
>   ensure that all new or heavily touched code adopts m_ (rather than
>   suffix _ as you use in Semaphore or no prefix).
>
> - Semaphore seems generally useful, move it to
> src/util/semaphore.{cpp,h}
>
> - Add your name to the copyright headers of each modified file. (And
>   make sure all new sources have one)
>
> - For some reason, despite all your work on Windows support and other
>   patches, your name is not already included in the list of copyright
>   owners in the license, so please also update license.rst
>
> Can you make those changes and send the updated patch as an attachment
> to the list (so it can be archived by mailman)?
>
> Thanks,
>   Jack
>
> On Wed, Jan 02, 2013 at 06:51:10AM +0000, Joel Low wrote:
> > Hello all,
> >
> > So I've finally managed to sit down and write this patch. The design
> > seems quite different from what I had in mind, since this round I
> > wanted it to be a drop-in replacement for Fork.
> >
> > There's a bit of code taken from a blog post to emulate a semaphore:
> > The link is there, and if someone knows of a better class (or better
> > still, one within Botan itself), let me know and I'll use that class
> instead.
> >
> > Let me know what you think.
> >
> > Patch: http://pastebin.com/kgF0b7Yp (1 month visibility)
> >
> > Regards,
> > Joel
> >
> > -----Original Message-----
> > From: botan-devel-bounces at randombit.net
> > [mailto:botan-devel-bounces at randombit.net] On Behalf Of Joel Low
> > Sent: Tuesday, 24 April 2012 5:07 PM
> > To: botan-devel at randombit.net
> > Subject: [botan-devel] Threaded Filters/Operation Parallelisation
> >
> > Hello all,
> >
> > Recently I've been playing with the idea of having a threaded Fork
> > filter to be used together with Pipe: processing for each subsequent
> > filter downstream of the threaded Fork filter is done in a separate
> > thread of its own. This could potentially bring performance benefits
> > (in theory) especially with a move on to having an increasing number
> > of
> computing cores per CPU.
> >
> > So I've emailed Jack separate to the list to get some of his opinions.
> > The main points he raised were that:
> >
> >  - The approach used for Fork would be most promising in terms of
> > working with the current design and not forcing a full rewrite of
> > the
> filter system.
> >  - He proposed defining a new Filter subclass Threaded_Filter which
> > itself takes a Filter* as an argument which will spawn a thread and
> > uses two message queues for I/O with the filter it manages.
> >  - When write() is called on the Threaded_Filter, it pushes it to
> > the input queue, which the worker thread pulls off and write()s to
> > the underlying filter.
> >  - With this approach the application can control concurrency very
> > finely (perhaps too finely), since the application can specify which
> > filters run on the main thread, and what runs on a secondary thread.
> >
> > I was leaning to a slightly different approach: with the knowledge
> > that Fork operations parallelise the easiest, I should adapt the
> > Fork class to let each downstream filter run in a separate thread.
> > This shares much commonality in its design with Jack's idea
> > (especially in terms of thread sync and friends), provides a little
> > less control over what runs in a separate thread, but comes with a
> > slightly easier
> implementation.
> >
> > Jack stated that I probably should email the devel list and solicit
> > ideas, since everyone may have different expectations. I'll be
> > implementing this in my spare time so I'd like to accept as many
> > ideas and combine them before acting on it. I think that the
> > pipe/filter design Botan has is intuitive, and I'd like to keep that
> > as much as possible, without compromising on potential performance gains.
> > Intuitively, both seem to run contrary to each other, but I think we
> > can work something out here to an API that is both powerful yet
> > easily
> mastered.
> >
> > Any thoughts?
> >
> > Regards,
> > Joel
> >
>
>
>
> > _______________________________________________
> > botan-devel mailing list
> > botan-devel at randombit.net
> > http://lists.randombit.net/mailman/listinfo/botan-devel
>
> _______________________________________________
> botan-devel mailing list
> botan-devel at randombit.net
> http://lists.randombit.net/mailman/listinfo/botan-devel





> _______________________________________________
> botan-devel mailing list
> botan-devel at randombit.net
> http://lists.randombit.net/mailman/listinfo/botan-devel

_______________________________________________
botan-devel mailing list
botan-devel at randombit.net
http://lists.randombit.net/mailman/listinfo/botan-devel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6053 bytes
Desc: not available
URL: <http://lists.randombit.net/pipermail/botan-devel/attachments/20130103/e17282d6/attachment.p7s>


More information about the botan-devel mailing list