2001-10-20 11:55:30

by Krzysztof Olędzki

[permalink] [raw]
Subject: bug in "raid5: measuring checksumming speed"

Hello,

It seems that there is something wrong with measuring checksumming speed -
on my two P3 boxes linux chooses pIII_sse but pII_mmx and p5_mmx are
reported as faster instructions:

2.4.10-ac10/Pentium3 733:
raid5: measuring checksumming speed
8regs : 1266.400 MB/sec
32regs : 898.400 MB/sec
pIII_sse : 1508.000 MB/sec
pII_mmx : 1643.600 MB/sec
p5_mmx : 1730.400 MB/sec
raid5: using function: pIII_sse (1508.000 MB/sec)

Podobnie na drugim systemie:
2.4.10-ac10/Pentium3 933:
raid5: measuring checksumming speed
8regs : 1603.600 MB/sec
32regs : 1138.000 MB/sec
pIII_sse : 1910.000 MB/sec
pII_mmx : 2081.600 MB/sec
p5_mmx : 2189.600 MB/sec
raid5: using function: pIII_sse (1910.000 MB/sec)

Best regards,

Krzysztof Oledzki



2001-10-20 12:14:41

by jurriaan

[permalink] [raw]
Subject: Re: bug in "raid5: measuring checksumming speed"

On Sat, Oct 20, 2001 at 01:54:54PM +0200, Krzysztof Oledzki wrote:
> Hello,
>
> It seems that there is something wrong with measuring checksumming speed -
> on my two P3 boxes linux chooses pIII_sse but pII_mmx and p5_mmx are
> reported as faster instructions:
>
I read somewhere that PIII_sse has better cache behaviour. You could
check this by reading the source, of course.

Good luck,
Jurriaan

2001-10-20 17:43:03

by Mark Hahn

[permalink] [raw]
Subject: Re: bug in "raid5: measuring checksumming speed"

> on my two P3 boxes linux chooses pIII_sse but pII_mmx and p5_mmx are
> reported as faster instructions:

tail of linux/include/asm-i386/xor.h:

| /* We force the use of the SSE xor block because it can write around L2.
| We may also be able to load into the L1 only depending on how the cpu
| deals with a load to a line that is being prefetched. */
| #define XOR_SELECT_TEMPLATE(FASTEST) \
| (cpu_has_xmm ? &xor_block_pIII_sse : FASTEST)

this code should probably be generalized to test for K7 feature flags, as well.


2001-10-23 12:56:56

by Krzysztof Olędzki

[permalink] [raw]
Subject: Re: bug in "raid5: measuring checksumming speed"



On Sat, 20 Oct 2001, Mark Hahn wrote:

> I haven't looked at the code, but I'm guessing the sse code uses
> NTA instructions that avoid polluting the cache. that's far more
> of an advantage than a small difference in speed. especially since
> it's physically impossible that you'll be using more than say,
> 200 MB/s.
Ok :) I agree. But this behaviour looks strange at first time.
I think kernel should print a message like "Forced using pIII_sse".

Best regards,


Krzysztof Oledzki