2006-01-12 00:30:38

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 0 of 2] Much smaller MMIO copy patches

These MMIO copy patches are lean, mean, and apparently clean.

These define the generic __raw_memcpy_toio32 as a weak symbol, which
arches are free to override. We provide a specialised implementation
for x86_64.

We also introduce include/linux/io.h, which is tiny now, but a candidate
for later cleanups of all the per-arch asm-*/io.h files.

These patches should apply cleanly against current -git, and have been
tested on i386 and x86_64. The symbol shows up in the built vmlinux,
as one might hope.

The patch series is as follows:

raw_memcpy_io.patch
Introduce the generic MMIO 32-bit copy routine.

x86_64-raw_memcpy_io.patch
Add a faster __raw_memcpy_io32 routine to x86_64.

Signed-off-by: Bryan O'Sullivan <[email protected]>


2006-01-12 00:30:39

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 1 of 2] Introduce __raw_memcpy_toio32

This arch-independent routine copies data to a memory-mapped I/O region,
using 32-bit accesses. It does not guarantee access ordering, nor does
it perform a memory barrier afterwards. This style of access is required
by some devices.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r c90267e4a29b -r cd6d8a62dad5 lib/Makefile
--- a/lib/Makefile Wed Jan 11 13:31:24 2006 +0800
+++ b/lib/Makefile Wed Jan 11 16:25:30 2006 -0800
@@ -9,7 +9,7 @@

lib-y += kobject.o kref.o kobject_uevent.o klist.o

-obj-y += sort.o parser.o halfmd4.o
+obj-y += sort.o parser.o halfmd4.o raw_memcpy_io.o

ifeq ($(CONFIG_DEBUG_KOBJECT),y)
CFLAGS_kobject.o += -DDEBUG
diff -r c90267e4a29b -r cd6d8a62dad5 include/linux/io.h
--- /dev/null Thu Jan 1 00:00:00 1970 +0000
+++ b/include/linux/io.h Wed Jan 11 16:25:30 2006 -0800
@@ -0,0 +1,25 @@
+/*
+ * Copyright 2006 PathScale, Inc. All Rights Reserved.
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#ifndef _LINUX_IO_H
+#define _LINUX_IO_H
+
+#include <asm/io.h>
+
+void __raw_memcpy_toio32(void __iomem *to, const void *from, size_t count);
+
+#endif /* _LINUX_IO_H */
diff -r c90267e4a29b -r cd6d8a62dad5 lib/raw_memcpy_io.c
--- /dev/null Thu Jan 1 00:00:00 1970 +0000
+++ b/lib/raw_memcpy_io.c Wed Jan 11 16:25:30 2006 -0800
@@ -0,0 +1,41 @@
+/*
+ * Copyright 2006 PathScale, Inc. All Rights Reserved.
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#include <linux/io.h>
+#include <linux/module.h>
+
+/**
+ * __raw_memcpy_toio32 - copy data to MMIO space, in 32-bit units
+ * @to: destination, in MMIO space (must be 32-bit aligned)
+ * @from: source (must be 32-bit aligned)
+ * @count: number of 32-bit quantities to copy
+ *
+ * Copy data from kernel space to MMIO space, in units of 32 bits at a
+ * time. Order of access is not guaranteed, nor is a memory barrier
+ * performed afterwards.
+ */
+void __attribute__((weak)) __raw_memcpy_toio32(void __iomem *to,
+ const void *from, size_t count)
+{
+ u32 __iomem *dst = to;
+ const u32 *src = from;
+ const u32 *end = src + count;
+
+ while (src < end)
+ __raw_writel(*src++, dst++);
+}
+EXPORT_SYMBOL_GPL(__raw_memcpy_toio32);

2006-01-12 00:31:01

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 2 of 2] __raw_memcpy_toio32 for x86_64

Introduce an x86_64-specific __raw_memcpy_toio32 routine. This is
measurably faster than the generic version in lib/raw_memcpy_io.c.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r cd6d8a62dad5 -r f03a807a80b8 arch/x86_64/lib/Makefile
--- a/arch/x86_64/lib/Makefile Wed Jan 11 16:25:30 2006 -0800
+++ b/arch/x86_64/lib/Makefile Wed Jan 11 16:26:59 2006 -0800
@@ -4,7 +4,7 @@

CFLAGS_csum-partial.o := -funroll-loops

-obj-y := io.o
+obj-y := io.o raw_memcpy_io.o

lib-y := csum-partial.o csum-copy.o csum-wrappers.o delay.o \
usercopy.o getuser.o putuser.o \
diff -r cd6d8a62dad5 -r f03a807a80b8 arch/x86_64/lib/raw_memcpy_io.S
--- /dev/null Thu Jan 1 00:00:00 1970 +0000
+++ b/arch/x86_64/lib/raw_memcpy_io.S Wed Jan 11 16:26:59 2006 -0800
@@ -0,0 +1,29 @@
+/*
+ * Copyright 2006 PathScale, Inc. All Rights Reserved.
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+/*
+ * override generic version in lib/raw_memcpy_io.c
+ */
+ .globl __raw_memcpy_toio32
+__raw_memcpy_toio32:
+ movl %edx,%ecx
+ shrl $1,%ecx
+ andl $1,%edx
+ rep movsq
+ movl %edx,%ecx
+ rep movsd
+ ret

2006-01-12 00:56:31

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2 of 2] __raw_memcpy_toio32 for x86_64

On Thursday 12 January 2006 01:29, Bryan O'Sullivan wrote:

> lib-y := csum-partial.o csum-copy.o csum-wrappers.o delay.o \
> usercopy.o getuser.o putuser.o \
> diff -r cd6d8a62dad5 -r f03a807a80b8 arch/x86_64/lib/raw_memcpy_io.S
> --- /dev/null Thu Jan 1 00:00:00 1970 +0000
> +++ b/arch/x86_64/lib/raw_memcpy_io.S Wed Jan 11 16:26:59 2006 -0800
> @@ -0,0 +1,29 @@
> +/*
> + * Copyright 2006 PathScale, Inc. All Rights Reserved.

At least some people have complained about the "All Rights reserved"
in the past. Best you drop it.


> +/*
> + * override generic version in lib/raw_memcpy_io.c
> + */
> + .globl __raw_memcpy_toio32

Usually one should use .p2align or ENTRY() at function beginning,
otherwise you might get some penalty on K8.

> +__raw_memcpy_toio32:
> + movl %edx,%ecx
> + shrl $1,%ecx

1? If it's called memcpy it should get a byte argument, no? If not
name it something else, otherwise everybody will be confused.

> + andl $1,%edx
> + rep movsq

movsq? I thought you wanted 32bit IO?

The movsd also looks weird.

-Andi

2006-01-12 01:21:42

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 2 of 2] __raw_memcpy_toio32 for x86_64

Andi> At least some people have complained about the "All Rights
Andi> reserved" in the past. Best you drop it.

There are hundreds of files in the kernel with "all rights reserved"
as part of the copyright, including things merged as recently as
ocfs2. I don't see how this could possibly be an issue.

Andi> 1? If it's called memcpy it should get a byte argument, no?
Andi> If not name it something else, otherwise everybody will be
Andi> confused.

The kernel doc for the function says

+ * @count: number of 32-bit quantities to copy

but maybe that's not the clearest way to define such a function.

Andi> movsq? I thought you wanted 32bit IO?

The idea is to do I/O in at least 32-bit chunks to cope with hardware
that can't handle 8-bit or 16-bit accesses. 64-bit chunks are OK for
Pathscale hardware.

Andi> The movsd also looks weird.

I think it's OK. The code is doing:

> + movl %edx,%ecx
> + shrl $1,%ecx
> + andl $1,%edx
> + rep movsq
> + movl %edx,%ecx
> + rep movsd
> + ret

so it does the copy in 64-bit chunks, and then it does "rep movsd" to
copy either 0 or 1 more 32-bit words.

- R.

2006-01-12 01:27:36

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: [PATCH 2 of 2] __raw_memcpy_toio32 for x86_64

On Thu, 2006-01-12 at 01:56 +0100, Andi Kleen wrote:

> At least some people have complained about the "All Rights reserved"
> in the past. Best you drop it.

OK, thanks.

> Usually one should use .p2align or ENTRY() at function beginning,
> otherwise you might get some penalty on K8.

Will do.

> > +__raw_memcpy_toio32:
> > + movl %edx,%ecx
> > + shrl $1,%ecx
>
> 1? If it's called memcpy it should get a byte argument, no? If not
> name it something else, otherwise everybody will be confused.

It's called toio32 for a reason :-)

Also, the kernel doc clearly states its purpose.

> movsq? I thought you wanted 32bit IO?

The northbridge will split qword writes into pairs of dword writes.

> The movsd also looks weird.

Why?

<b
--
Bryan O'Sullivan <[email protected]>

2006-01-12 01:28:01

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2 of 2] __raw_memcpy_toio32 for x86_64

On Thursday 12 January 2006 02:21, Roland Dreier wrote:

> Andi> movsq? I thought you wanted 32bit IO?
>
> The idea is to do I/O in at least 32-bit chunks to cope with hardware
> that can't handle 8-bit or 16-bit accesses. 64-bit chunks are OK for
> Pathscale hardware.

Well then they can just as well use normal memcpy as long as they don't
pass any sizes % 4 != 0. That should be ok as long as they add a comment
there.

-Andi

2006-01-12 01:32:08

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 2 of 2] __raw_memcpy_toio32 for x86_64

Andi> Well then they can just as well use normal memcpy as long as
Andi> they don't pass any sizes % 4 != 0. That should be ok as
Andi> long as they add a comment there.

But then the driver will be doing memcpy() to I/O memory, which may
work by chance on x86_64 but will blow up on other archs.

- R.

2006-01-12 01:34:13

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2 of 2] __raw_memcpy_toio32 for x86_64

On Thursday 12 January 2006 02:27, Bryan O'Sullivan wrote:

> > > +__raw_memcpy_toio32:
> > > + movl %edx,%ecx
> > > + shrl $1,%ecx
> >
> > 1? If it's called memcpy it should get a byte argument, no? If not
> > name it something else, otherwise everybody will be confused.
>
> It's called toio32 for a reason :-)
>
> Also, the kernel doc clearly states its purpose.

I think it's deeply wrong to reuse names of standard functions with different
arguments. Either pass bytes or give it some other name.

> > movsq? I thought you wanted 32bit IO?
>
> The northbridge will split qword writes into pairs of dword writes.

That sounds like a very chipset specific assumption. Is that safe
to make? If yes you would need to document it clearly. But most likely
it's not a good idea to do this. Even if it works right now it would
be another death trap for the next user.

-Andi

2006-01-12 01:40:56

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2 of 2] __raw_memcpy_toio32 for x86_64

On Thursday 12 January 2006 02:32, Roland Dreier wrote:
> Andi> Well then they can just as well use normal memcpy as long as
> Andi> they don't pass any sizes % 4 != 0. That should be ok as
> Andi> long as they add a comment there.
>
> But then the driver will be doing memcpy() to I/O memory, which may
> work by chance on x86_64 but will blow up on other archs.

I meant aliasing it to memcpy on x86-64

But with the nasty qword splitup assumptions it would be wrong anyways,
so scratch that.

-Andi

2006-01-12 04:14:10

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: [PATCH 2 of 2] __raw_memcpy_toio32 for x86_64

On Thu, 2006-01-12 at 02:33 +0100, Andi Kleen wrote:

> I think it's deeply wrong to reuse names of standard functions with different
> arguments. Either pass bytes or give it some other name.

Someone (Matt Mackall?) suggested naming it __iowrite32_copy, by analogy
with the stuff in asm-generic/iomap.h, I presume. Would that suit you?

> That sounds like a very chipset specific assumption. Is that safe
> to make?

I can fix the doc so that it says "at least 32 bits", in that case.
This should make the assumption more clear for other bus types.

<b

2006-01-12 04:20:36

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2 of 2] __raw_memcpy_toio32 for x86_64

On Thursday 12 January 2006 05:14, Bryan O'Sullivan wrote:

> > That sounds like a very chipset specific assumption. Is that safe
> > to make?
>
> I can fix the doc so that it says "at least 32 bits", in that case.
> This should make the assumption more clear for other bus types.

Well it seems quite wrong - an iowrite32 shouldn't write more than 32bits
at a time.

My feeling is more and more that this thing is so specialized for your setup
and so narrow purpose that you're best off dropping this whole patchkit and
just put the assembly into your driver. At least normal kernels wouldn't be
bloated with such unlikely to be useful for anything else functions then.

-Andi

2006-01-12 04:32:44

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: [PATCH 2 of 2] __raw_memcpy_toio32 for x86_64

On Thu, 2006-01-12 at 05:19 +0100, Andi Kleen wrote:

> My feeling is more and more that this thing is so specialized for your setup
> and so narrow purpose that you're best off dropping this whole patchkit and
> just put the assembly into your driver.

But this gave people fits when Roland first posted the driver for
review. There's also no clean, obvious way to make it work on other
64-bit architectures, in that case.

I don't mind switching the movsq to a movsd, if that's the main aspect
of what you're worried about now, so that it is absolutely a 32-bit
copy.

<b



2006-01-12 04:46:56

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2 of 2] __raw_memcpy_toio32 for x86_64

On Thursday 12 January 2006 05:32, Bryan O'Sullivan wrote:
> On Thu, 2006-01-12 at 05:19 +0100, Andi Kleen wrote:
> > My feeling is more and more that this thing is so specialized for your
> > setup and so narrow purpose that you're best off dropping this whole
> > patchkit and just put the assembly into your driver.
>
> But this gave people fits when Roland first posted the driver for
> review.

Yah, but they clearly didn't see the whole picture back then
(you should probably have explained it better). All the ugly details
were only brought to light on close review.

> There's also no clean, obvious way to make it work on other
> 64-bit architectures, in that case.

for loop and writel() ?

-Andi

2006-01-12 05:04:24

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: [PATCH 2 of 2] __raw_memcpy_toio32 for x86_64

On Thu, 2006-01-12 at 05:45 +0100, Andi Kleen wrote:

> > There's also no clean, obvious way to make it work on other
> > 64-bit architectures, in that case.
>
> for loop and writel() ?

But that's what I have now, albeit with a name you don't like and use of
movsq where you'd apparently prefer movsd. If I resolve those, can it
just live as-is otherwise? In which case it is no longer doing anything
funny or odd.

<b

2006-01-12 15:54:23

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2 of 2] __raw_memcpy_toio32 for x86_64

On Thursday 12 January 2006 06:04, Bryan O'Sullivan wrote:
> On Thu, 2006-01-12 at 05:45 +0100, Andi Kleen wrote:
>
> > > There's also no clean, obvious way to make it work on other
> > > 64-bit architectures, in that case.
> >
> > for loop and writel() ?
>
> But that's what I have now, albeit with a name you don't like and use of
> movsq where you'd apparently prefer movsd. If I resolve those, can it
> just live as-is otherwise? In which case it is no longer doing anything
> funny or odd.

Yes probably.

-Andi