This is a fix for Athlon cache-line.
For Athlon CPU, CONFIG_X86_MK7,
the X86_L1_CACHE_SHIFT is set to 6, 128 Bytes, and this value is used
for L1 cache aligning.
But the AMD’s document clearly states that the cache-line for
Athlon is 64 Bytes.
When I set the X86_L1_CACHE_SHIFT = 5 the performance increased
significantly about 30%.
These are measurements from Taka’s simple socket benchmark program.
http://www.suna-asobi.com/~akira-t/linux/netio-bench/netio2.c
This is result for X86_L1_CACHE_SHIFT = 6.
(off:100, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.117 seconds at 341.6 Mbytes/sec
(off:104, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.116 seconds at 343.9 Mbytes/sec
(off:108, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.116 seconds at 345.4 Mbytes/sec
(off:112, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.115 seconds at 348.7 Mbytes/sec
(off:116, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.114 seconds at 352.4 Mbytes/sec
(Entire log is here,
http://www.suna-asobi.com/~akira-t/linux/cache-align-fix/K7_cache_shift_6.log)
This is result for X86_L1_CACHE_SHIFT = 5
(off:100, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.086 seconds at 462.4 Mbytes/sec
(off:104, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.087 seconds at 458.5 Mbytes/sec
(off:108, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.087 seconds at 461.8 Mbytes/sec
(off:112, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.088 seconds at 453.9 Mbytes/sec
(off:116, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.088 seconds at 456.7 Mbytes/sec
(Entire log is here,
http://www.suna-asobi.com/~akira-t/linux/cache-align-fix/K7_cache_shift_5.log)
I attached the patch to fix this. But a bit worry that somebody might
reverse this changes because Athlon has 128bytes L1.
(Athlon-L1, data 64bytes + instruction 64bytes = total 128bytes)
(I found this problem by accident while I was making faster
user_to/from_copy function, inspired from taka's faster_intel_copy,
which went into 2.5.45)
--- linux-2.5.45/arch/i386/Kconfig Thu Oct 31 22:40:01 2002
+++ linux-2.5.45-tkcp-1/arch/i386/Kconfig Sat Nov 2 01:34:19 2002
@@ -190,9 +190,8 @@
config X86_L1_CACHE_SHIFT
int
- default "5" if MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 || MCRUSOE || MCYRIXIII || MK6 || MPENTIUMIII || M686 || M586MMX || M586TSC || M586
+ default "5" if MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 || MCRUSOE || MCYRIXIII || MK6 || MK7|| MPENTIUMIII || M686 || M586MMX || M586TSC || M586
default "4" if MELAN || M486 || M386
- default "6" if MK7
default "7" if MPENTIUM4
config RWSEM_GENERIC_SPINLOCK
--
Akira Tsukamoto <[email protected], [email protected]>
it speeds mine up too.
-steve
-----Original Message-----
From: [email protected]
[mailto:[email protected]]On Behalf Of Akira Tsukamoto
Sent: Saturday, November 02, 2002 12:04 AM
To: [email protected]
Cc: Hirokazu Takahashi; Andrew Morton
Subject: [PATCH] Athlon cache-line fix
This is a fix for Athlon cache-line.
For Athlon CPU, CONFIG_X86_MK7,
the X86_L1_CACHE_SHIFT is set to 6, 128 Bytes, and this value is used
for L1 cache aligning.
But the AMD’s document clearly states that the cache-line for
Athlon is 64 Bytes.
When I set the X86_L1_CACHE_SHIFT = 5 the performance increased
significantly about 30%.
These are measurements from Taka’s simple socket benchmark program.
http://www.suna-asobi.com/~akira-t/linux/netio-bench/netio2.c
This is result for X86_L1_CACHE_SHIFT = 6.
(off:100, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.117 seconds at 341.6 Mbytes/sec
(off:104, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.116 seconds at 343.9 Mbytes/sec
(off:108, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.116 seconds at 345.4 Mbytes/sec
(off:112, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.115 seconds at 348.7 Mbytes/sec
(off:116, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.114 seconds at 352.4 Mbytes/sec
(Entire log is here,
http://www.suna-asobi.com/~akira-t/linux/cache-align-fix/K7_cache_shift_6.lo
g)
This is result for X86_L1_CACHE_SHIFT = 5
(off:100, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.086 seconds at 462.4 Mbytes/sec
(off:104, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.087 seconds at 458.5 Mbytes/sec
(off:108, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.087 seconds at 461.8 Mbytes/sec
(off:112, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.088 seconds at 453.9 Mbytes/sec
(off:116, size:0x800000)
send/recv: copied 40.0 Mbytes in 0.088 seconds at 456.7 Mbytes/sec
(Entire log is here,
http://www.suna-asobi.com/~akira-t/linux/cache-align-fix/K7_cache_shift_5.lo
g)
I attached the patch to fix this. But a bit worry that somebody might
reverse this changes because Athlon has 128bytes L1.
(Athlon-L1, data 64bytes + instruction 64bytes = total 128bytes)
(I found this problem by accident while I was making faster
user_to/from_copy function, inspired from taka's faster_intel_copy,
which went into 2.5.45)
--- linux-2.5.45/arch/i386/Kconfig Thu Oct 31 22:40:01 2002
+++ linux-2.5.45-tkcp-1/arch/i386/Kconfig Sat Nov 2 01:34:19 2002
@@ -190,9 +190,8 @@
config X86_L1_CACHE_SHIFT
int
- default "5" if MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 || MCRUSOE ||
MCYRIXIII || MK6 || MPENTIUMIII || M686 || M586MMX || M586TSC || M586
+ default "5" if MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 || MCRUSOE ||
MCYRIXIII || MK6 || MK7|| MPENTIUMIII || M686 || M586MMX || M586TSC || M586
default "4" if MELAN || M486 || M386
- default "6" if MK7
default "7" if MPENTIUM4
config RWSEM_GENERIC_SPINLOCK
--
Akira Tsukamoto <[email protected], [email protected]>
Akira Tsukamoto wrote:
> For Athlon CPU, CONFIG_X86_MK7,
> the X86_L1_CACHE_SHIFT is set to 6, 128 Bytes
Eh? L1_CACHE_BYTES is defined as (1 << L1_CACHE_SHIFT) in
include/asm-i386/cache.h, which makes for a cache line size of 64 bytes
which is right. Perhaps you were assuming the cache line size was
2 << L1_CACHE_SHIFT ?
> config X86_L1_CACHE_SHIFT
> int
> - default "5" if MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 || MCRUSOE || MCYRIXIII || MK6 || MPENTIUMIII || M686 || M586MMX || M586TSC || M586
> + default "5" if MWINCHIP3D || MWINCHIP2 || MWINCHIPC6 || MCRUSOE || MCYRIXIII || MK6 || MK7|| MPENTIUMIII || M686 || M586MMX || M586TSC || M586
> default "4" if MELAN || M486 || M386
> - default "6" if MK7
> default "7" if MPENTIUM4
Regardless of the above this patch can't be right: the PIII's cache line
size is 32 bytes and the P4's is 128 bytes. Interesting that it increases
performance (on at least one benchmark) though.
Andrew
Thank you for tring it.
Akira
On Sat, 02 Nov 2002 13:40:39 -0600
steve roemen <[email protected]> mentioned:
> it speeds mine up too.
>
> -steve
>
> -----Original Message-----
> Subject: [PATCH] Athlon cache-line fix
>
> This is a fix for Athlon cache-line.
On Sat, 2 Nov 2002 23:09:45 +0000
Andrew Kanaber <[email protected]> mentioned:
> Akira Tsukamoto wrote:
> > For Athlon CPU, CONFIG_X86_MK7,
> > the X86_L1_CACHE_SHIFT is set to 6, 128 Bytes
>
> Eh? L1_CACHE_BYTES is defined as (1 << L1_CACHE_SHIFT) in
> include/asm-i386/cache.h, which makes for a cache line size of 64 bytes
> which is right. Perhaps you were assuming the cache line size was
> 2 << L1_CACHE_SHIFT ?
Yes, it is 32bytes. :)
I think I was not sleeping right.
> Interesting that it increases
> performance (on at least one benchmark) though.
I also tried many times and it increases performace.
--
Akira Tsukamoto <[email protected], [email protected]>