LinuxLists.cc - [PATCH] zerocopy NFS for 2.5.36

2002-09-18 08:14:31

Subject: [PATCH] zerocopy NFS for 2.5.36

Hello,

I ported the zerocopy NFS patches against linux-2.5.36.

I made va05-zerocopy-nfsdwrite-2.5.36.patch more generic,
so that it would be easy to merge with NFSv4. Each procedure can
chose whether it can accept splitted buffers or not.
And I fixed a probelem that nfsd couldn't handle NFS-symlink
requests which were very large.

1)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va10-hwchecksum-2.5.36.patch
This patch enables HW-checksum against outgoing packets including UDP frames.

2)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va11-udpsendfile-2.5.36.patch
This patch makes sendfile systemcall over UDP work. It also supports
UDP_CORK interface which is very similar to TCP_CORK. And you can call
sendmsg/senfile with MSG_MORE flags over UDP sockets.

3)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va-csumpartial-fix-2.5.36.patch
This patch fixes the problem of x86 csum_partilal() routines which
can't handle odd addressed buffers.

4)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va01-zerocopy-rpc-2.5.36.patch
This patch makes RPC can send some pieces of data and pages without copy.

5)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va02-zerocopy-nfsdread-2.5.36.patch
This patch makes NFSD send pages in pagecache directly when NFS clinets request
file-read.

6)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va03-zerocopy-nfsdreaddir-2.5.36.patch
nfsd_readdir can also send pages without copy.

7)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va04-zerocopy-shadowsock-2.5.36.patch
This patch makes per-cpu UDP sockets so that NFSD can send UDP frames on
each prosessor simultaneously.
Without the patch we can send only one UDP frame at the time as a UDP socket
have to be locked during sending some pages to serialize them.

8)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va05-zerocopy-nfsdwrite-2.5.36.patch
This patch enables NFS-write uses writev interface. NFSd can handle NFS
requests without reassembling IP fragments into one UDP frame.

9)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/taka-writev-2.5.36.patch
This patch makes writev for regular file work faster.
It also can be found at
http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.35/2.5.35-mm1/broken-out/

Caution:
XFS doesn't support writev interface yet. NFS write on XFS might
slow down with No.8 patch. I wish SGI guys will implement it.

10)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va07-nfsbigbuf-2.5.36.patch
This makes NFS buffer much bigger (60KB).
60KB buffer is the same to 32KB buffer for linux-kernel as both of them
require 64KB chunk.

11)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va09-zerocopy-tempsendto-2.5.36.patch
If you don't want to use sendfile over UDP yet, you can apply it instead of No.1 and No.2 patches.

Regards,
Hirokazu Takahashi

2002-09-21 11:56:42

by Pavel Machek

[permalink] [raw]

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

Hi!
>
> 1)
> ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va10-hwchecksum-2.5.36.patch
> This patch enables HW-checksum against outgoing packets including UDP frames.
>
> Can you explain the TCP parts? They look very wrong.
>
> It was discussed long ago that csum_and_copy_from_user() performs
> better than plain copy_from_user() on x86. I do not remember all
> details, but I do know that using copy_from_user() is not a real
> improvement at least on x86 architecture.

Well, if this is the case, we need to #define copy_from_user csum_and_copy_from_user :-).

Pavel
--
I'm [email protected]. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [email protected]

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-09-18 23:00:57

by David Miller

[permalink] [raw]

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

From: Hirokazu Takahashi <[email protected]>
Date: Wed, 18 Sep 2002 17:14:31 +0900 (JST)

1)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va10-hwchecksum-2.5.36.patch
This patch enables HW-checksum against outgoing packets including UDP frames.

Can you explain the TCP parts? They look very wrong.

It was discussed long ago that csum_and_copy_from_user() performs
better than plain copy_from_user() on x86. I do not remember all
details, but I do know that using copy_from_user() is not a real
improvement at least on x86 architecture.

The rest of the changes (ie. the getfrag() logic to set
skb->ip_summed) looks fine.

3)
ftp://ftp.valinux.co.jp/pub/people/taka/2.5.36/va-csumpartial-fix-2.5.36.patch
This patch fixes the problem of x86 csum_partilal() routines which
can't handle odd addressed buffers.

I've sent Linus this fix already.

2002-09-18 23:53:29

by Alan

[permalink] [raw]

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

On Thu, 2002-09-19 at 00:00, David S. Miller wrote:
> It was discussed long ago that csum_and_copy_from_user() performs
> better than plain copy_from_user() on x86. I do not remember all

The better was a freak of PPro/PII scheduling I think

> details, but I do know that using copy_from_user() is not a real
> improvement at least on x86 architecture.

The same as bit is easy to explain. Its totally memory bandwidth limited
on current x86-32 processors. (Although I'd welcome demonstrations to
the contrary on newer toys)

-------------------------------------------------------
This SF.NET email is sponsored by: AMD - Your access to the experts
on Hammer Technology! Open Source & Linux Developers, register now
for the AMD Developer Symposium. Code: EX8664
http://www.developwithamd.com/developerlab
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-09-19 00:16:43

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

Alan Cox wrote:
>
> On Thu, 2002-09-19 at 00:00, David S. Miller wrote:
> > It was discussed long ago that csum_and_copy_from_user() performs
> > better than plain copy_from_user() on x86. I do not remember all
>
> The better was a freak of PPro/PII scheduling I think
>
> > details, but I do know that using copy_from_user() is not a real
> > improvement at least on x86 architecture.
>
> The same as bit is easy to explain. Its totally memory bandwidth limited
> on current x86-32 processors. (Although I'd welcome demonstrations to
> the contrary on newer toys)

Nope. There are distinct alignment problems with movsl-based
memcpy on PII and (at least) "Pentium III (Coppermine)", which is
tested here:

copy_32 uses movsl. copy_duff just uses a stream of "movl"s

Time uncached-to-uncached memcpy, source and dest are 8-byte-aligned:

akpm:/usr/src/cptimer> ./cptimer -d -s
nbytes=10240 from_align=0, to_align=0
copy_32: copied 19.1 Mbytes in 0.078 seconds at 243.9 Mbytes/sec
__copy_duff: copied 19.1 Mbytes in 0.090 seconds at 211.1 Mbytes/sec

OK, movsl wins. But now give the source address 8+1 alignment:

akpm:/usr/src/cptimer> ./cptimer -d -s -f 1
nbytes=10240 from_align=1, to_align=0
copy_32: copied 19.1 Mbytes in 0.158 seconds at 120.8 Mbytes/sec
__copy_duff: copied 19.1 Mbytes in 0.091 seconds at 210.3 Mbytes/sec

The "movl"-based copy wins. By miles.

Make the source 8+4 aligned:

akpm:/usr/src/cptimer> ./cptimer -d -s -f 4
nbytes=10240 from_align=4, to_align=0
copy_32: copied 19.1 Mbytes in 0.134 seconds at 142.1 Mbytes/sec
__copy_duff: copied 19.1 Mbytes in 0.089 seconds at 214.0 Mbytes/sec

So movl still beats movsl, by lots.

I have various scriptlets which generate the entire matrix.

I think I ended up deciding that we should use movsl _only_
when both src and dsc are 8-byte-aligned. And that when you
multiply the gain from that by the frequency*size with which
funny alignments are used by TCP the net gain was 2% or something.

It needs redoing. These differences are really big, and this
is the kernel's most expensive function.

A little project for someone.

The tools are at http://www.zip.com.au/~akpm/linux/cptimer.tar.gz

2002-09-19 02:13:33

by Aaron Lehmann

[permalink] [raw]

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

> akpm:/usr/src/cptimer> ./cptimer -d -s
> nbytes=10240 from_align=0, to_align=0
> copy_32: copied 19.1 Mbytes in 0.078 seconds at 243.9 Mbytes/sec
> __copy_duff: copied 19.1 Mbytes in 0.090 seconds at 211.1 Mbytes/sec

It's disappointing that this program doesn't seem to support
benchmarking of MMX copy loops (like the ones in arch/i386/lib/mmx.c).
Those seem to be the more interesting memcpy functions on modern
systems.

2002-09-19 03:31:00

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

Aaron Lehmann wrote:
>
> > akpm:/usr/src/cptimer> ./cptimer -d -s
> > nbytes=10240 from_align=0, to_align=0
> > copy_32: copied 19.1 Mbytes in 0.078 seconds at 243.9 Mbytes/sec
> > __copy_duff: copied 19.1 Mbytes in 0.090 seconds at 211.1 Mbytes/sec
>
> It's disappointing that this program doesn't seem to support
> benchmarking of MMX copy loops (like the ones in arch/i386/lib/mmx.c).
> Those seem to be the more interesting memcpy functions on modern
> systems.

Well the source is there, and the licensing terms are most reasonable.

But then, the source was there eighteen months ago and nothing happened.
Sigh.

I think in-kernel MMX has fatal drawbacks anyway. Not sure what
they are - I prefer to pretend that x86 CPUs execute raw C.

-------------------------------------------------------
This SF.NET email is sponsored by: AMD - Your access to the experts
on Hammer Technology! Open Source & Linux Developers, register now
for the AMD Developer Symposium. Code: EX8664
http://www.developwithamd.com/developerlab
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-09-19 10:38:29

by Alan

[permalink] [raw]

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

On Thu, 2002-09-19 at 04:30, Andrew Morton wrote:
> > It's disappointing that this program doesn't seem to support
> > benchmarking of MMX copy loops (like the ones in arch/i386/lib/mmx.c).
> > Those seem to be the more interesting memcpy functions on modern
> > systems.
>
> Well the source is there, and the licensing terms are most reasonable.
>
> But then, the source was there eighteen months ago and nothing happened.
> Sigh.
>
> I think in-kernel MMX has fatal drawbacks anyway. Not sure what
> they are - I prefer to pretend that x86 CPUs execute raw C.

MMX isnt useful for anything smaller than about 512bytes-1K. Its not
useful in interrupt handlers. The list goes on.

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-09-19 13:15:13

Subject: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: [NFS] Re: [PATCH] zerocopy NFS for 2.5.36

Subject: [PATCH] zerocopy NFS for 2.5.43

Attachments:

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: [NFS] Re: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.36

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Attachments:

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43

Subject: Re: Re: [PATCH] zerocopy NFS for 2.5.43