2003-09-10 15:40:53

by Fruhwirth Clemens

[permalink] [raw]
Subject: [PATCH] AES i586-asm optimized

This patch[1] adds an i586 assembler optimized version of the Rijndael (AES)
cipher. Please have a look, try, and criticise.

Before starting the old "do we need assembler" thread again:
As tested by hvr[2] this implemention is significantly faster than the C
version. Guys, the linux kernel doesn't even compile with icc (Intel C
compiler) so don't start arguing with "a decent compiler one could blahbla.."
These are the raw numbers. Assembler is faster.

And before we start to discuss a sophisticated framework for assembler
implemention or automatic selection of implementions or preferences by
application for a particular implemention and so one: This is the first
assembler implemention and most likely the last for a long time. So I think
with this perspective it's not worth delaying this feature, especially
because after this module disk encryption becomes reasonable.

Regards, Clemens

[1] http://clemens.endorphin.org/patches/aes-i586-asm-2.6.0-test5.diff
[2] http://www.kerneli.org/pipermail/cryptoapi-devel/2003-August/000607.html


Attachments:
(No filename) (0.00 B)
(No filename) (189.00 B)
Download all attachments

2003-09-10 16:22:27

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] AES i586-asm optimized

On Wed, Sep 10, 2003 at 05:38:59PM +0200, Fruhwirth Clemens wrote:
> This patch[1] adds an i586 assembler optimized version of the Rijndael (AES)
> cipher. Please have a look, try, and criticise.
>
> Before starting the old "do we need assembler" thread again:
> As tested by hvr[2] this implemention is significantly faster than the C
> version.

Tested on what processors? With what kernel config?

I would be surprised if a 586-optimized asm was useful on P4.


> Guys, the linux kernel doesn't even compile with icc (Intel C
> compiler)

Wrong. As Intel pointed out on linux-kernel less than 24 hours ago,
even.


> These are the raw numbers. Assembler is faster.

gcc generates assembler, so this is nonsensical ;-)


> And before we start to discuss a sophisticated framework for assembler
> implemention or automatic selection of implementions or preferences by
> application for a particular implemention and so one: This is the first
> assembler implemention and most likely the last for a long time.

Nope, S/390 folks beat ya to it.

And I'm working on something as well.


> So I think
> with this perspective it's not worth delaying this feature, especially
> because after this module disk encryption becomes reasonable.

In your opinion.

Jeff



2003-09-10 17:06:27

by Jari Ruusu

[permalink] [raw]
Subject: Re: [PATCH] AES i586-asm optimized

Jeff Garzik wrote:
> On Wed, Sep 10, 2003 at 05:38:59PM +0200, Fruhwirth Clemens wrote:
> > As tested by hvr[2] this implemention is significantly faster than the C
> > version.
>
> Tested on what processors? With what kernel config?
>
> I would be surprised if a 586-optimized asm was useful on P4.

It uses classic Pentium instruction set. Speed optimized for my 300 MHz
Pentium-2 test box. Original Gladman version that I started with was pretty
fast but I was able to improve performance about 7% over original version.

On my same 300 MHz P2 test box, assembler implementation is about twice as
fast as the mainline kernel C implementation.

Regards,
Jari Ruusu <[email protected]>

2003-09-11 15:33:34

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] AES i586-asm optimized

Jari Ruusu wrote:
> Jeff Garzik wrote:
>
>>On Wed, Sep 10, 2003 at 05:38:59PM +0200, Fruhwirth Clemens wrote:
>>
>>>As tested by hvr[2] this implemention is significantly faster than the C
>>>version.
>>
>>Tested on what processors? With what kernel config?
>>
>>I would be surprised if a 586-optimized asm was useful on P4.
>
>
> It uses classic Pentium instruction set. Speed optimized for my 300 MHz
> Pentium-2 test box. Original Gladman version that I started with was pretty
> fast but I was able to improve performance about 7% over original version.
>
> On my same 300 MHz P2 test box, assembler implementation is about twice as
> fast as the mainline kernel C implementation.


Neat. Consider me surprised, then ;-)

Don't take my message as objection to the merge. I dunno what DaveM or
JamesM thinks, but I definitely support merging patches like this. It
provides a great example, if nothing else.

Eventually I bet there will be issues about automatic algorithm
selection: like the RAID5 code, which benchmarks all available
algorithms, and selects the fastest one.

Jeff



2003-09-11 22:23:44

by Bill Davidsen

[permalink] [raw]
Subject: Re: [PATCH] AES i586-asm optimized

In article <[email protected]>,
Jeff Garzik <[email protected]> wrote:
| Jari Ruusu wrote:

| > It uses classic Pentium instruction set. Speed optimized for my 300 MHz
| > Pentium-2 test box. Original Gladman version that I started with was pretty
| > fast but I was able to improve performance about 7% over original version.
| >
| > On my same 300 MHz P2 test box, assembler implementation is about twice as
| > fast as the mainline kernel C implementation.
|
|
| Neat. Consider me surprised, then ;-)
|
| Don't take my message as objection to the merge. I dunno what DaveM or
| JamesM thinks, but I definitely support merging patches like this. It
| provides a great example, if nothing else.
|
| Eventually I bet there will be issues about automatic algorithm
| selection: like the RAID5 code, which benchmarks all available
| algorithms, and selects the fastest one.

Didn't we just have this discussion? ;-) RAID5 benchmarks all available
code and then uses SSE2 or whatever because it doesn't plunk the
registers or cache or something... sorry, the details escape me, or the
sorry details escape me, or whatever.

BTW: I do agree with your point here, I'm using cryptoloop for some
stuff I'm doing, and while I never do enough disk i/o to care, this is a
good thing for the future. I'm using a PII-350, but next month I have to
add a P55C SMP machine, and will be doing much more crypto.

Thanks for asking for clarification, the patch looks better for it.
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.