2005-12-23 01:36:11

by Bryan O'Sullivan

[permalink] [raw]
Subject: [RFC] [PATCH] Add memcpy32 function

In response to the comments that followed Roland Dreier posting our
InfiniPath driver for review last week, we've been making some cleanups
to our driver code.

As our chip requires 32-bit accesses, we need a copy function that
guarantees operating in such terms. It was suggested that we make this
generic, with arch-specific optimised versions.

This patch introduces the generic copy routine, memcpy32. At Andrew's
suggestion, I've put it in a new header file, include/linux/io.h, which
I've styled after include/linux/string.h.

Linus made some comments this morning about the manner in which the
functions in string.h are provided, but I wanted to float this patch for
review in any case, if only to see whether the alternative really is
preferred.

If desired, I can introduce instead an
include/asm-generic/memcpy32-algo.h header which defines the generic
memcpy32 routine, and edit all 20-odd asm-*/io.h files to use it. That
seems wasteful, though.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 0a6978881777 lib/Makefile
--- a/lib/Makefile Fri Dec 23 01:41:03 2005 +0800
+++ b/lib/Makefile Thu Dec 22 15:45:05 2005 -0800
@@ -5,7 +5,7 @@
lib-y := errno.o ctype.o string.o vsprintf.o cmdline.o \
bust_spinlocks.o rbtree.o radix-tree.o dump_stack.o \
idr.o div64.o int_sqrt.o bitmap.o extable.o prio_tree.o \
- sha1.o
+ sha1.o io.o

lib-y += kobject.o kref.o kobject_uevent.o klist.o

diff -r 0a6978881777 include/linux/io.h
--- /dev/null Thu Jan 1 00:00:00 1970 +0000
+++ b/include/linux/io.h Thu Dec 22 15:45:05 2005 -0800
@@ -0,0 +1,32 @@
+#ifndef _LINUX_IO_H_
+#define _LINUX_IO_H_
+
+#include <linux/types.h>
+
+#ifdef __KERNEL__
+
+/*
+ * Include machine specific inline routines.
+ */
+#include <asm/io.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#ifndef __HAVE_ARCH_MEMCPY32
+/*
+ * Copy routine that is guaranteed to operate in terms of 32-bit
+ * quantities. Source and dest must be 32-bit aligned. Count is
+ * number of 32-bit quantities (NOT bytes) to copy.
+ */
+void *memcpy32(void *, const void *, __kernel_size_t);
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __KERNEL__ */
+
+#endif /* _LINUX_IO_H_ */
diff -r 0a6978881777 lib/io.c
--- /dev/null Thu Jan 1 00:00:00 1970 +0000
+++ b/lib/io.c Thu Dec 22 15:45:05 2005 -0800
@@ -0,0 +1,21 @@
+#include <linux/io.h>
+
+#ifndef __HAVE_ARCH_MEMCPY32
+/**
+ * memcpy32 - Copy one area of memory to another, as a series of
+ * aligned 32-bit values.
+ * @dest: Where to copy to (must be 32-bit aligned)
+ * @src: Where to copy from (must be 32-bit aligned)
+ * @count: The number of 32-bit values to copy
+ */
+void *memcpy32(void *dest, const void *src, size_t count)
+{
+ u32 *tmp = dest;
+ u32 *s = src;
+
+ while (count--)
+ *tmp++ = *s++;
+ return dest;
+}
+EXPORT_SYMBOL_GPL(memcpy32);
+#endif



2005-12-23 02:50:07

by Dave Jones

[permalink] [raw]
Subject: Re: [RFC] [PATCH] Add memcpy32 function

On Thu, Dec 22, 2005 at 05:35:59PM -0800, Bryan O'Sullivan wrote:
> In response to the comments that followed Roland Dreier posting our
> InfiniPath driver for review last week, we've been making some cleanups
> to our driver code.
>
> As our chip requires 32-bit accesses, we need a copy function that
> guarantees operating in such terms. It was suggested that we make this
> generic, with arch-specific optimised versions.
>
> This patch introduces the generic copy routine, memcpy32. At Andrew's
> suggestion, I've put it in a new header file, include/linux/io.h, which
> I've styled after include/linux/string.h.

io.h is a very generic sounding name for something that just houses
a memcpy variant. What's wrong with calling a spade a spade,
and using memcpy32.h ?

Dave

2005-12-23 17:18:25

by Matt Mackall

[permalink] [raw]
Subject: Re: [RFC] [PATCH] Add memcpy32 function

On Thu, Dec 22, 2005 at 09:49:43PM -0500, Dave Jones wrote:
> On Thu, Dec 22, 2005 at 05:35:59PM -0800, Bryan O'Sullivan wrote:
> > In response to the comments that followed Roland Dreier posting our
> > InfiniPath driver for review last week, we've been making some cleanups
> > to our driver code.
> >
> > As our chip requires 32-bit accesses, we need a copy function that
> > guarantees operating in such terms. It was suggested that we make this
> > generic, with arch-specific optimised versions.
> >
> > This patch introduces the generic copy routine, memcpy32. At Andrew's
> > suggestion, I've put it in a new header file, include/linux/io.h, which
> > I've styled after include/linux/string.h.
>
> io.h is a very generic sounding name for something that just houses
> a memcpy variant. What's wrong with calling a spade a spade,
> and using memcpy32.h ?

I think it belongs in string.h alongside memcpy, just for tradition's
sake. I don't think it belongs in a file named io.h, as it probably
has uses beyond I/O.

--
Mathematics is the supreme nostalgia of our time.

2005-12-23 17:42:34

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC] [PATCH] Add memcpy32 function

On Fri, Dec 23, 2005 at 11:16:28AM -0600, Matt Mackall wrote:
> > io.h is a very generic sounding name for something that just houses
> > a memcpy variant. What's wrong with calling a spade a spade,
> > and using memcpy32.h ?
>
> I think it belongs in string.h alongside memcpy, just for tradition's
> sake. I don't think it belongs in a file named io.h, as it probably
> has uses beyond I/O.

Actually I think memcpy32 is not the thing pathscale wants. They want
memcpy_{to,from}_io32, because memcpy32 wouldn't be allowed to operate
on I/O mapped memory. I'd say back to the drawingboard.

And to pathscale: please get your driver __iomem and endianess annotated
before sending out further core patches, I'm pretty sure getting those
things fixed will shed some light on the actual requirements.

2005-12-23 18:16:49

by Matt Mackall

[permalink] [raw]
Subject: Re: [RFC] [PATCH] Add memcpy32 function

On Fri, Dec 23, 2005 at 05:42:28PM +0000, Christoph Hellwig wrote:
> On Fri, Dec 23, 2005 at 11:16:28AM -0600, Matt Mackall wrote:
> > > io.h is a very generic sounding name for something that just houses
> > > a memcpy variant. What's wrong with calling a spade a spade,
> > > and using memcpy32.h ?
> >
> > I think it belongs in string.h alongside memcpy, just for tradition's
> > sake. I don't think it belongs in a file named io.h, as it probably
> > has uses beyond I/O.
>
> Actually I think memcpy32 is not the thing pathscale wants. They want
> memcpy_{to,from}_io32, because memcpy32 wouldn't be allowed to operate
> on I/O mapped memory. I'd say back to the drawingboard.

Ahh, excellent point. We probably want something closer to
iowrite32_rep and have it live in iomap.h. Perhaps iowrite32_copy?

--
Mathematics is the supreme nostalgia of our time.

2005-12-23 23:50:38

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: [RFC] [PATCH] Add memcpy32 function

On Fri, 2005-12-23 at 17:42 +0000, Christoph Hellwig wrote:

> Actually I think memcpy32 is not the thing pathscale wants. They want
> memcpy_{to,from}_io32, because memcpy32 wouldn't be allowed to operate
> on I/O mapped memory. I'd say back to the drawingboard.

Fair enough. I'll follow Matt's suggestion of iowrite32_copy and
ioread32_copy, in that case, and put them in asm-generic/iomap.h.

> And to pathscale: please get your driver __iomem and endianess annotated
> before sending out further core patches, I'm pretty sure getting those
> things fixed will shed some light on the actual requirements.

OK, will do.

<b

--
Bryan O'Sullivan <[email protected]>

2005-12-28 11:02:03

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC] [PATCH] Add memcpy32 function

Bryan O'Sullivan <[email protected]> writes:

> In response to the comments that followed Roland Dreier posting our
> InfiniPath driver for review last week, we've been making some cleanups
> to our driver code.

What irritates me is that the original author said this copy
would happen from user space in ipath. In that case you would need
exception handling for all memory accesses to return EFAULT,
otherwise everybody can crash the kernel.

-Andi

2005-12-28 15:00:31

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: [RFC] [PATCH] Add memcpy32 function

On Wed, 2005-12-28 at 12:01 +0100, Andi Kleen wrote:

> What irritates me is that the original author said this copy
> would happen from user space in ipath.

I'm afraid there may have been some miscommunication there.

All of our uses of memcpy_toio32 (which uses memcpy32 on x86_64) copy
from kernel virtual addresses to MMIO space. There's no direct copying
from userspace to MMIO space through the driver.

However, we do let userspace code directly access portions of our chip.
That code uses a routine that is exactly the same as memcpy32 to perform
MMIO writes. That's where I think the confusion arose on the part of
whoever responded to you.

> In that case you would need
> exception handling for all memory accesses to return EFAULT,
> otherwise everybody can crash the kernel.

Just to be clear: we use copy_{to,from}_user for all copies between
userspace and driver, and we return -EFAULT on short copies, so we're
already doing the right thing in that sort of case.

Sorry for the confusion. I hope this clears that business up.

<b

2005-12-28 17:50:43

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC] [PATCH] Add memcpy32 function

Am Mi 28.12.2005 16:00 schrieb Bryan O'Sullivan <[email protected]>:

> All of our uses of memcpy_toio32 (which uses memcpy32 on x86_64) copy
> from kernel virtual addresses to MMIO space. There's no direct copying
> from userspace to MMIO space through the driver.
>
> However, we do let userspace code directly access portions of our
> chip.
> That code uses a routine that is exactly the same as memcpy32 to
> perform
> MMIO writes. That's where I think the confusion arose on the part of
> whoever responded to you.

Ok thanks. And do you have numbers that show that the assembly
function with rep ; movsl actually  improves performance over C?  I
guess the MMIO
is uncached or at best write combined and in both cases it usually
doesn't
matter if the CPU burns a few more cycles generating the writes or not
because everything is bus bound and every cycle you save
is just lost again on the next synchronization point with the hardware.

If the assembly is not really faster I would recommend you just use a
writel()
loop in the driver instead of adding this very special purpose function
everywhere.

-Andi




2005-12-28 18:11:49

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: [RFC] [PATCH] Add memcpy32 function

On Wed, 2005-12-28 at 18:50 +0100, Andreas Kleen wrote:

> Ok thanks. And do you have numbers that show that the assembly
> function with rep ; movsl actually improves performance over C?

I'll see if I can ferret some numbers out. If not, I'll generate them,
but it will take me a day or so. I'm pretty sure it makes a difference
of tens to hundreds of nanoseconds, which in our case is very
significant (we measure some of our user-level performance in increments
of 10ns, very repeatably).

> If the assembly is not really faster I would recommend you just use a
> writel()
> loop in the driver instead of adding this very special purpose function
> everywhere.

Yeah, clearly if there's no difference, it's not worth the trouble.

<b

2005-12-28 18:22:39

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC] [PATCH] Add memcpy32 function

Am Mi 28.12.2005 19:11 schrieb Bryan O'Sullivan <[email protected]>:

> On Wed, 2005-12-28 at 18:50 +0100, Andreas Kleen wrote:
>
> > Ok thanks. And do you have numbers that show that the assembly
> > function with rep ; movsl actually improves performance over C?
>
> I'll see if I can ferret some numbers out. If not, I'll generate them,
> but it will take me a day or so. I'm pretty sure it makes a difference
> of tens to hundreds of nanoseconds, which in our case is very
> significant (we measure some of our user-level performance in
> increments
> of 10ns, very repeatably).

If you test the C version use

CFLAGS_memcpy32.o := -funroll-loops

BTW on x86-64 with CONFIG_UNORDERED_IO writel can actually expand to a
non temporal write which might break it.

-Andi