movntq instruction is supported by Geode CPU's, so use
fast_clear_page/fast_copy_page versions that have it.
Signed-off-by: Marcelo Tosatti <[email protected]>
diff --git a/arch/i386/lib/mmx.c b/arch/i386/lib/mmx.c
index 28084d2..ddc1421 100644
--- a/arch/i386/lib/mmx.c
+++ b/arch/i386/lib/mmx.c
@@ -121,7 +121,7 @@ void *_mmx_memcpy(void *to, const void *
return p;
}
-#ifdef CONFIG_MK7
+#if defined (CONFIG_MK7) || defined(CONFIG_MGEODE_LX)
/*
* The K7 has streaming cache bypass load/store. The Cyrix III, K6 and
On Wed, Feb 14, 2007 at 05:08:39PM -0200, Marcelo Tosatti wrote:
>
> movntq instruction is supported by Geode CPU's, so use
> fast_clear_page/fast_copy_page versions that have it.
it's supported, but is it a win ?
The same was also true of the VIA C3/C7's, but due to
poor memory bandwidth, it turned out to be slower in most cases.
Dave
--
http://www.codemonkey.org.uk
On Wed, Feb 14, 2007 at 02:55:46PM -0500, Dave Jones wrote:
> On Wed, Feb 14, 2007 at 05:08:39PM -0200, Marcelo Tosatti wrote:
> >
> > movntq instruction is supported by Geode CPU's, so use
> > fast_clear_page/fast_copy_page versions that have it.
>
> it's supported, but is it a win ?
> The same was also true of the VIA C3/C7's, but due to
> poor memory bandwidth, it turned out to be slower in most cases.
Do you have the numbers for VIA C3/C7 around?
The Geode benefits from movntq instead of movq:
[marcelo@localhost ~]$ cat /proc/cpuinfo
processor : 0
vendor_id : Geode by NSC
cpu family : 5
model : 5
model name : Geode(TM) Integrated Processor by National Semi
stepping : 2
cpu MHz : 364.898
cache size : 32 KB
...
[marcelo@localhost ~]$ wget http://www.fenrus.demon.nl/athlon.c
...
[marcelo@localhost ~]$ ./athlon
Athlon test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
clear_page() tests
clear_page function 'warm up run' took 9565 cycles per page
clear_page function '2.4 non MMX' took 3347 cycles per page
clear_page function '2.4 MMX fallback' took 3389 cycles per page
clear_page function '2.4 MMX version' took 2920 cycles per page
clear_page function 'faster_clear_page' took 2912 cycles per page
clear_page function 'even_faster_clear' took 2863 cycles per page
copy_page() tests
copy_page function 'warm up run' took 9409 cycles per page
copy_page function '2.4 non MMX' took 13161 cycles per page
copy_page function '2.4 MMX fallback' took 13033 cycles per page
copy_page function '2.4 MMX version' took 9288 cycles per page
copy_page function 'faster_copy' took 9806 cycles per page
copy_page function 'even_faster' took 8990 cycles per page
On Wed, Feb 14, 2007 at 06:17:36PM -0200, Marcelo Tosatti wrote:
> On Wed, Feb 14, 2007 at 02:55:46PM -0500, Dave Jones wrote:
> > On Wed, Feb 14, 2007 at 05:08:39PM -0200, Marcelo Tosatti wrote:
> > >
> > > movntq instruction is supported by Geode CPU's, so use
> > > fast_clear_page/fast_copy_page versions that have it.
> >
> > it's supported, but is it a win ?
> > The same was also true of the VIA C3/C7's, but due to
> > poor memory bandwidth, it turned out to be slower in most cases.
>
> Do you have the numbers for VIA C3/C7 around?
I don't, and my 3dnow capable C3s are unplugged right now.
The newer generation (including the C7) have SSE/SSE2 instead,
which seems to be faster. (Using a different benchmark app that uses SSE)
clear_page function 'normal clear_page()' took 9425 cycles per page (620.3 MB/s)
clear_page function 'new clear_page() ' took 3840 cycles per page (1522.7 MB/s)
copy_page function 'normal copy_page()' took 11453 cycles per page (510.5 MB/s)
copy_page function 'new copy_page() ' took 5024 cycles per page (1163.7 MB/s)
Dave
--
http://www.codemonkey.org.uk
On Wed, 14 Feb 2007 17:08:39 -0200
Marcelo Tosatti <[email protected]> wrote:
>
> movntq instruction is supported by Geode CPU's, so use
> fast_clear_page/fast_copy_page versions that have it.
Is it actually faster for macro performance not just microbenchmarking ?
Alan
On Wed, 2007-02-14 at 18:17 -0200, Marcelo Tosatti wrote:
> On Wed, Feb 14, 2007 at 02:55:46PM -0500, Dave Jones wrote:
> > On Wed, Feb 14, 2007 at 05:08:39PM -0200, Marcelo Tosatti wrote:
> > >
> > > movntq instruction is supported by Geode CPU's, so use
> > > fast_clear_page/fast_copy_page versions that have it.
> >
> > it's supported, but is it a win ?
> > The same was also true of the VIA C3/C7's, but due to
> > poor memory bandwidth, it turned out to be slower in most cases.
>
> Do you have the numbers for VIA C3/C7 around?
>
> The Geode benefits from movntq instead of movq:
>
> [marcelo@localhost ~]$ cat /proc/cpuinfo
> processor : 0
> vendor_id : Geode by NSC
> cpu family : 5
> model : 5
> model name : Geode(TM) Integrated Processor by National Semi
> stepping : 2
> cpu MHz : 364.898
> cache size : 32 KB
> ...
>
> [marcelo@localhost ~]$ wget http://www.fenrus.demon.nl/athlon.c
> ...
btw there is a caveat with this program: you don't see that this evicts
the data RIGHT AFTER THE COPY, so if you use it again you pay AGAIN the
memory bandwidth price...
On Wed, Feb 14, 2007 at 09:16:46PM +0000, Alan wrote:
> On Wed, 14 Feb 2007 17:08:39 -0200
> Marcelo Tosatti <[email protected]> wrote:
>
> >
> > movntq instruction is supported by Geode CPU's, so use
> > fast_clear_page/fast_copy_page versions that have it.
>
> Is it actually faster for macro performance not just microbenchmarking ?
A COW intensive private mmap() benchmark shows the kernel spending
_more_ time inside mmx_copy_page() with movntq than with movq.
So its not clear whether the patch is actually a win, please drop it.