2007-02-14 19:32:36

by Marcelo Tosatti

[permalink] [raw]
Subject: [PATCH] use movntq version of fast_clear_page/fast_copy_page on Geode


movntq instruction is supported by Geode CPU's, so use
fast_clear_page/fast_copy_page versions that have it.

Signed-off-by: Marcelo Tosatti <[email protected]>

diff --git a/arch/i386/lib/mmx.c b/arch/i386/lib/mmx.c
index 28084d2..ddc1421 100644
--- a/arch/i386/lib/mmx.c
+++ b/arch/i386/lib/mmx.c
@@ -121,7 +121,7 @@ void *_mmx_memcpy(void *to, const void *
return p;
}

-#ifdef CONFIG_MK7
+#if defined (CONFIG_MK7) || defined(CONFIG_MGEODE_LX)

/*
* The K7 has streaming cache bypass load/store. The Cyrix III, K6 and


2007-02-14 19:56:43

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] use movntq version of fast_clear_page/fast_copy_page on Geode

On Wed, Feb 14, 2007 at 05:08:39PM -0200, Marcelo Tosatti wrote:
>
> movntq instruction is supported by Geode CPU's, so use
> fast_clear_page/fast_copy_page versions that have it.

it's supported, but is it a win ?
The same was also true of the VIA C3/C7's, but due to
poor memory bandwidth, it turned out to be slower in most cases.

Dave

--
http://www.codemonkey.org.uk

2007-02-14 20:21:05

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH] use movntq version of fast_clear_page/fast_copy_page on Geode

On Wed, Feb 14, 2007 at 02:55:46PM -0500, Dave Jones wrote:
> On Wed, Feb 14, 2007 at 05:08:39PM -0200, Marcelo Tosatti wrote:
> >
> > movntq instruction is supported by Geode CPU's, so use
> > fast_clear_page/fast_copy_page versions that have it.
>
> it's supported, but is it a win ?
> The same was also true of the VIA C3/C7's, but due to
> poor memory bandwidth, it turned out to be slower in most cases.

Do you have the numbers for VIA C3/C7 around?

The Geode benefits from movntq instead of movq:

[marcelo@localhost ~]$ cat /proc/cpuinfo
processor : 0
vendor_id : Geode by NSC
cpu family : 5
model : 5
model name : Geode(TM) Integrated Processor by National Semi
stepping : 2
cpu MHz : 364.898
cache size : 32 KB
...

[marcelo@localhost ~]$ wget http://www.fenrus.demon.nl/athlon.c
...

[marcelo@localhost ~]$ ./athlon
Athlon test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
clear_page() tests
clear_page function 'warm up run' took 9565 cycles per page
clear_page function '2.4 non MMX' took 3347 cycles per page
clear_page function '2.4 MMX fallback' took 3389 cycles per page
clear_page function '2.4 MMX version' took 2920 cycles per page
clear_page function 'faster_clear_page' took 2912 cycles per page
clear_page function 'even_faster_clear' took 2863 cycles per page

copy_page() tests
copy_page function 'warm up run' took 9409 cycles per page
copy_page function '2.4 non MMX' took 13161 cycles per page
copy_page function '2.4 MMX fallback' took 13033 cycles per page
copy_page function '2.4 MMX version' took 9288 cycles per page
copy_page function 'faster_copy' took 9806 cycles per page
copy_page function 'even_faster' took 8990 cycles per page

2007-02-14 20:48:06

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] use movntq version of fast_clear_page/fast_copy_page on Geode

On Wed, Feb 14, 2007 at 06:17:36PM -0200, Marcelo Tosatti wrote:
> On Wed, Feb 14, 2007 at 02:55:46PM -0500, Dave Jones wrote:
> > On Wed, Feb 14, 2007 at 05:08:39PM -0200, Marcelo Tosatti wrote:
> > >
> > > movntq instruction is supported by Geode CPU's, so use
> > > fast_clear_page/fast_copy_page versions that have it.
> >
> > it's supported, but is it a win ?
> > The same was also true of the VIA C3/C7's, but due to
> > poor memory bandwidth, it turned out to be slower in most cases.
>
> Do you have the numbers for VIA C3/C7 around?

I don't, and my 3dnow capable C3s are unplugged right now.
The newer generation (including the C7) have SSE/SSE2 instead,
which seems to be faster. (Using a different benchmark app that uses SSE)

clear_page function 'normal clear_page()' took 9425 cycles per page (620.3 MB/s)
clear_page function 'new clear_page() ' took 3840 cycles per page (1522.7 MB/s)

copy_page function 'normal copy_page()' took 11453 cycles per page (510.5 MB/s)
copy_page function 'new copy_page() ' took 5024 cycles per page (1163.7 MB/s)


Dave

--
http://www.codemonkey.org.uk

2007-02-14 21:02:58

by Alan

[permalink] [raw]
Subject: Re: [PATCH] use movntq version of fast_clear_page/fast_copy_page on Geode

On Wed, 14 Feb 2007 17:08:39 -0200
Marcelo Tosatti <[email protected]> wrote:

>
> movntq instruction is supported by Geode CPU's, so use
> fast_clear_page/fast_copy_page versions that have it.

Is it actually faster for macro performance not just microbenchmarking ?

Alan

2007-02-14 21:23:38

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] use movntq version of fast_clear_page/fast_copy_page on Geode

On Wed, 2007-02-14 at 18:17 -0200, Marcelo Tosatti wrote:
> On Wed, Feb 14, 2007 at 02:55:46PM -0500, Dave Jones wrote:
> > On Wed, Feb 14, 2007 at 05:08:39PM -0200, Marcelo Tosatti wrote:
> > >
> > > movntq instruction is supported by Geode CPU's, so use
> > > fast_clear_page/fast_copy_page versions that have it.
> >
> > it's supported, but is it a win ?
> > The same was also true of the VIA C3/C7's, but due to
> > poor memory bandwidth, it turned out to be slower in most cases.
>
> Do you have the numbers for VIA C3/C7 around?
>
> The Geode benefits from movntq instead of movq:
>
> [marcelo@localhost ~]$ cat /proc/cpuinfo
> processor : 0
> vendor_id : Geode by NSC
> cpu family : 5
> model : 5
> model name : Geode(TM) Integrated Processor by National Semi
> stepping : 2
> cpu MHz : 364.898
> cache size : 32 KB
> ...
>
> [marcelo@localhost ~]$ wget http://www.fenrus.demon.nl/athlon.c
> ...


btw there is a caveat with this program: you don't see that this evicts
the data RIGHT AFTER THE COPY, so if you use it again you pay AGAIN the
memory bandwidth price...

2007-02-15 15:04:44

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH] use movntq version of fast_clear_page/fast_copy_page on Geode

On Wed, Feb 14, 2007 at 09:16:46PM +0000, Alan wrote:
> On Wed, 14 Feb 2007 17:08:39 -0200
> Marcelo Tosatti <[email protected]> wrote:
>
> >
> > movntq instruction is supported by Geode CPU's, so use
> > fast_clear_page/fast_copy_page versions that have it.
>
> Is it actually faster for macro performance not just microbenchmarking ?

A COW intensive private mmap() benchmark shows the kernel spending
_more_ time inside mmx_copy_page() with movntq than with movq.

So its not clear whether the patch is actually a win, please drop it.