Hi all...
This is set of small patches that allow a finer tuning of i386 arch, and fix a
small bug:
- 20-x86-p4-prefetch: enables prefetch also for p4. This is a pending bug, IMHO.
- 21-x86-pII: splits Pentium-II as a separate config option; yes some of us
still have oldies and would like a slightly better optimized kernel
- 22-x86-check_gcc: use check_gcc also for Intel CPUs (like others already do)
to get better gcc flags.
- 23-x86-mb: implement memory barriers with specific instructions in p3 and p4
(credits go to Zwane Mwaikambo <[email protected]>)
Could this ever get into mainline ? Perhaps the only questionable piece is
the mb changes. How about next -pre ?
TIA
--
J.A. Magallon <[email protected]> \ Software is like sex:
werewolf.able.es \ It's better when it's free
Mandrake Linux release 9.1 (Cooker) for i586
Linux 2.4.21-pre4-jam1 (gcc 3.2.2 (Mandrake Linux 9.1 3.2.2-1mdk))
On 03.07 04:08, J.A. Magallon wrote:
> Hi all...
>
> This is set of small patches that allow a finer tuning of i386 arch, and fix a
> small bug:
>
> - 20-x86-p4-prefetch: enables prefetch also for p4. This is a pending bug, IMHO.
> - 21-x86-pII: splits Pentium-II as a separate config option; yes some of us
> still have oldies and would like a slightly better optimized kernel
> - 22-x86-check_gcc: use check_gcc also for Intel CPUs (like others already do)
> to get better gcc flags.
> - 23-x86-mb: implement memory barriers with specific instructions in p3 and p4
> (credits go to Zwane Mwaikambo <[email protected]>)
>
> Could this ever get into mainline ? Perhaps the only questionable piece is
> the mb changes. How about next -pre ?
>
Ooops. Here they go...
--
J.A. Magallon <[email protected]> \ Software is like sex:
werewolf.able.es \ It's better when it's free
Mandrake Linux release 9.1 (Cooker) for i586
Linux 2.4.21-pre4-jam1 (gcc 3.2.2 (Mandrake Linux 9.1 3.2.2-1mdk))
Nice to see somebody is pushing this. I am getting fed up wtih
applying these to every 2.4.xx.
Note that with GCC >= 3.x(Not sure about "x", definitely 3.2), the P4
compile option correctly
generates add/sub instructions instead of the "P4 killer" inc/dec.
Of course, in include/asm-i386, we still have incs/decs in
processor.h, atomic.h, rwsem.h, semaphore.h uaccess.h, system.h,
string.h, spinlock.h and smplock.h.
With the patches and changing the includes, I can get a 5-10% improvement
with a hefty DB app on a UP P4 which isn't to be sneezed at.
And perhaps somebody should take a squint at the "copy_xxx_user".
IMHO, this really should be inlined.
Try running a DB app which bounces semaphores at a terrific rate and what
do you see at the top of readprofile ? Yep, you guessed it.
Margit