2002-12-03 08:54:00

by Margit Schubert-While

[permalink] [raw]
Subject: 2.4.19/20, 2.5 missing P4 ifdef ?

> I think you are mistaken. The prefetch instructions came to
> Intel CPUs with SSE. There are no (afair) no SSE Pentium II's.

Correct.
And while we are at it, any kernel guru like to take a squint at
include/asm-i386 and arch/i386 ?
It seems to me that it should be possible to get a lot more out
of P3's and P4's.
Taking the example of the (misnamed) 3DNOW page_clear code,
for the P4(SSE2) it could be implemented as :

__asm__ __volatile__ (
" pxor %%xmm0, %%xmm0\n" : :
);

for(i=0;i<4096/128;i++)
{
__asm__ __volatile__ (
" movntdq %%xmm0, (%0)\n"
" movntdq %%xmm0, 16(%0)\n"
" movntdq %%xmm0, 32(%0)\n"
" movntdq %%xmm0, 48(%0)\n"
" movntdq %%xmm0, 64(%0)\n"
" movntdq %%xmm0, 80(%0)\n"
" movntdq %%xmm0, 96(%0)\n"
" movntdq %%xmm0, 112(%0)\n"
: : "r" (page) : "memory");
page+=128;
}
/* since movntdq is weakly-ordered, a "sfence" is needed to become
* ordered again.
*/
__asm__ __volatile__ (
" sfence \n" : :
);

I'm quite willing to be flamed on this :-)

Margit