2002-11-02 06:58:04

by Akira Tsukamoto

[permalink] [raw]
Subject: [PATCH] Athlon cache-line fix

This is a fix for Athlon cache-line.

For Athlon CPU, CONFIG_X86_MK7,
the X86_L1_CACHE_SHIFT is set to 6, 128 Bytes, and this value is used
for L1 cache aligning.

But the AMD’s document clearly states that the cache-line for
Athlon is 64 Bytes.
When I set the X86_L1_CACHE_SHIFT = 5 the performance increased
significantly about 30%.

These are measurements from Taka’s simple socket benchmark program.
http://www.suna-asobi.com/~akira-t/linux/netio-bench/netio2.c

This is result for X86_L1_CACHE_SHIFT = 6.
(off:100, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.117 seconds at 341.6 Mbytes/sec
(off:104, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.116 seconds at 343.9 Mbytes/sec
(off:108, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.116 seconds at 345.4 Mbytes/sec
(off:112, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.115 seconds at 348.7 Mbytes/sec
(off:116, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.114 seconds at 352.4 Mbytes/sec
(Entire log is here,
http://www.suna-asobi.com/~akira-t/linux/cache-align-fix/K7_cache_shift_6.log)

This is result for X86_L1_CACHE_SHIFT = 5
(off:100, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.086 seconds at 462.4 Mbytes/sec
(off:104, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.087 seconds at 458.5 Mbytes/sec
(off:108, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.087 seconds at 461.8 Mbytes/sec
(off:112, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.088 seconds at 453.9 Mbytes/sec
(off:116, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.088 seconds at 456.7 Mbytes/sec
(Entire log is here,
http://www.suna-asobi.com/~akira-t/linux/cache-align-fix/K7_cache_shift_5.log)

I attached the patch to fix this. But a bit worry that somebody might
reverse this changes because Athlon has 128bytes L1.
(Athlon-L1, data 64bytes + instruction 64bytes = total 128bytes)

(I found this problem by accident while I was making faster
user_to/from_copy function, inspired from taka's faster_intel_copy,
which went into 2.5.45)


--- linux-2.5.45/arch/i386/Kconfig Thu Oct 31 22:40:01 2002
+++ linux-2.5.45-tkcp-1/arch/i386/Kconfig Sat Nov 2 01:34:19 2002
@@ -190,9 +190,8 @@

config X86_L1_CACHE_SHIFT
int
- default "5" if MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 || MCRUSOE || MCYRIXIII || MK6 || MPENTIUMIII || M686 || M586MMX || M586TSC || M586
+ default "5" if MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 || MCRUSOE || MCYRIXIII || MK6 || MK7|| MPENTIUMIII || M686 || M586MMX || M586TSC || M586
default "4" if MELAN || M486 || M386
- default "6" if MK7
default "7" if MPENTIUM4

config RWSEM_GENERIC_SPINLOCK

--
Akira Tsukamoto <[email protected], [email protected]>



2002-11-02 19:39:11

by steve roemen

[permalink] [raw]
Subject: RE: [PATCH] Athlon cache-line fix

it speeds mine up too.

-steve



-----Original Message-----
From: [email protected]
[mailto:[email protected]]On Behalf Of Akira Tsukamoto
Sent: Saturday, November 02, 2002 12:04 AM
To: [email protected]
Cc: Hirokazu Takahashi; Andrew Morton
Subject: [PATCH] Athlon cache-line fix


This is a fix for Athlon cache-line.

For Athlon CPU, CONFIG_X86_MK7,
the X86_L1_CACHE_SHIFT is set to 6, 128 Bytes, and this value is used
for L1 cache aligning.

But the AMD’s document clearly states that the cache-line for
Athlon is 64 Bytes.
When I set the X86_L1_CACHE_SHIFT = 5 the performance increased
significantly about 30%.

These are measurements from Taka’s simple socket benchmark program.
http://www.suna-asobi.com/~akira-t/linux/netio-bench/netio2.c

This is result for X86_L1_CACHE_SHIFT = 6.
(off:100, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.117 seconds at 341.6 Mbytes/sec
(off:104, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.116 seconds at 343.9 Mbytes/sec
(off:108, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.116 seconds at 345.4 Mbytes/sec
(off:112, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.115 seconds at 348.7 Mbytes/sec
(off:116, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.114 seconds at 352.4 Mbytes/sec
(Entire log is here,
http://www.suna-asobi.com/~akira-t/linux/cache-align-fix/K7_cache_shift_6.lo
g)

This is result for X86_L1_CACHE_SHIFT = 5
(off:100, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.086 seconds at 462.4 Mbytes/sec
(off:104, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.087 seconds at 458.5 Mbytes/sec
(off:108, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.087 seconds at 461.8 Mbytes/sec
(off:112, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.088 seconds at 453.9 Mbytes/sec
(off:116, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.088 seconds at 456.7 Mbytes/sec
(Entire log is here,
http://www.suna-asobi.com/~akira-t/linux/cache-align-fix/K7_cache_shift_5.lo
g)

I attached the patch to fix this. But a bit worry that somebody might
reverse this changes because Athlon has 128bytes L1.
(Athlon-L1, data 64bytes + instruction 64bytes = total 128bytes)

(I found this problem by accident while I was making faster
user_to/from_copy function, inspired from taka's faster_intel_copy,
which went into 2.5.45)


--- linux-2.5.45/arch/i386/Kconfig Thu Oct 31 22:40:01 2002
+++ linux-2.5.45-tkcp-1/arch/i386/Kconfig Sat Nov 2 01:34:19 2002
@@ -190,9 +190,8 @@

config X86_L1_CACHE_SHIFT
int
- default "5" if MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 || MCRUSOE ||
MCYRIXIII || MK6 || MPENTIUMIII || M686 || M586MMX || M586TSC || M586
+ default "5" if MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 || MCRUSOE ||
MCYRIXIII || MK6 || MK7|| MPENTIUMIII || M686 || M586MMX || M586TSC || M586
default "4" if MELAN || M486 || M386
- default "6" if MK7
default "7" if MPENTIUM4

config RWSEM_GENERIC_SPINLOCK

--
Akira Tsukamoto <[email protected], [email protected]>



Attachments:
netio_test_shift_6.txt (34.54 kB)
netio_test_shift_5.txt (34.54 kB)
Download all attachments

2002-11-02 23:03:15

by Andrew Kanaber

[permalink] [raw]
Subject: Re: [PATCH] Athlon cache-line fix

Akira Tsukamoto wrote:
> For Athlon CPU, CONFIG_X86_MK7,
> the X86_L1_CACHE_SHIFT is set to 6, 128 Bytes

Eh? L1_CACHE_BYTES is defined as (1 << L1_CACHE_SHIFT) in
include/asm-i386/cache.h, which makes for a cache line size of 64 bytes
which is right. Perhaps you were assuming the cache line size was
2 << L1_CACHE_SHIFT ?

> config X86_L1_CACHE_SHIFT
> int
> - default "5" if MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 || MCRUSOE || MCYRIXIII || MK6 || MPENTIUMIII || M686 || M586MMX || M586TSC || M586
> + default "5" if MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 || MCRUSOE || MCYRIXIII || MK6 || MK7|| MPENTIUMIII || M686 || M586MMX || M586TSC || M586
> default "4" if MELAN || M486 || M386
> - default "6" if MK7
> default "7" if MPENTIUM4

Regardless of the above this patch can't be right: the PIII's cache line
size is 32 bytes and the P4's is 128 bytes. Interesting that it increases
performance (on at least one benchmark) though.

Andrew

2002-11-03 03:55:43

by Akira Tsukamoto

[permalink] [raw]
Subject: Re: [PATCH] Athlon cache-line fix

Thank you for tring it.

Akira

On Sat, 02 Nov 2002 13:40:39 -0600
steve roemen <[email protected]> mentioned:

> it speeds mine up too.
>
> -steve
>
> -----Original Message-----
> Subject: [PATCH] Athlon cache-line fix
>
> This is a fix for Athlon cache-line.


2002-11-03 04:20:18

by Akira Tsukamoto

[permalink] [raw]
Subject: Re: [PATCH] Athlon cache-line fix

On Sat, 2 Nov 2002 23:09:45 +0000
Andrew Kanaber <[email protected]> mentioned:
> Akira Tsukamoto wrote:
> > For Athlon CPU, CONFIG_X86_MK7,
> > the X86_L1_CACHE_SHIFT is set to 6, 128 Bytes
>
> Eh? L1_CACHE_BYTES is defined as (1 << L1_CACHE_SHIFT) in
> include/asm-i386/cache.h, which makes for a cache line size of 64 bytes
> which is right. Perhaps you were assuming the cache line size was
> 2 << L1_CACHE_SHIFT ?

Yes, it is 32bytes. :)
I think I was not sleeping right.

> Interesting that it increases
> performance (on at least one benchmark) though.

I also tried many times and it increases performace.

--
Akira Tsukamoto <[email protected], [email protected]>