2002-09-09 07:13:58

by Hirokazu Takahashi

[permalink] [raw]
Subject: [PATCH] zerocopy NFS for 2.5.33

Hello,

I updated the patches for zerocopy NFS. You can apply them against
linux-2.5.33 and zerocopy NFS over UDP/TCP works very fine.

1)
ftp://ftp.valinux.co.jp/pub/people/taka/tune/2.5.33/va10-hwchecksum-2.5.33.patch
This patch enables HW-checksum against outgoing packets including UDP frames.

2)
ftp://ftp.valinux.co.jp/pub/people/taka/tune/2.5.33/va11-udpsendfile-2.5.33.patch
This patch makes sendfile systemcall over UDP work. It also supports
UDP_CORK interface which is very similar to TCP_CORK. And you can call
sendmsg/senfile with MSG_MORE flags over UDP sockets too.

Using TSO code is commented out at this moment as TSO for UDP isn't
implemented yet. I'm waiting for it so that we would remove "#ifdef NotYet"
to send jumbo UDP frames without any fragmentation and any checksumming.
Then I hope we will get great performance.

3)
ftp://ftp.valinux.co.jp/pub/people/taka/tune/2.5.33/va-csumpartial-fix-2.5.33.patch
This patch fixes the problem of x86 csum_partilal() routines which
can't handle odd addressed buffers.

4)
ftp://ftp.valinux.co.jp/pub/people/taka/tune/2.5.33/va01-zerocopy-rpc-2.5.33.patch
This patch makes RPC be able to send some pieces of data and pages
without any copies.

5)
ftp://ftp.valinux.co.jp/pub/people/taka/tune/2.5.33/va02-zerocopy-nfsdread-2.5.33.patch
This patch makes NFSD pass pages in pagecache to RPC layer directly
when NFS clinets request file-read.

6)
ftp://ftp.valinux.co.jp/pub/people/taka/tune/2.5.33/va03-zerocopy-nfsdreaddir-2.5.33.patch
nfsd_readdir can also send pages without copy.

7)
ftp://ftp.valinux.co.jp/pub/people/taka/tune/2.5.33/va04-zerocopy-shadowsock-2.5.33.patch
This patch makes per-cpu UDP sockets so that NFSD can send UDP frames on
each prosessor simultaneously.
Without the patch we can send only one UDP frame at the time as a UDP socket
have to be locked during sending some pages to serialize them.

8)
ftp://ftp.valinux.co.jp/pub/people/taka/tune/2.5.33/va05-zerocopy-tempsendto-2.5.33.patch
If you don't want to use sendfile over UDP yet, you can apply it instead
of 1) and 2) .


If you have any requests or comments, could you let me know.


Thank you,
Hirokazu Takahashi.


2002-09-09 07:22:00

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] zerocopy NFS for 2.5.33

From: Hirokazu Takahashi <[email protected]>
Date: Mon, 09 Sep 2002 16:11:23 +0900 (JST)

Using TSO code is commented out at this moment as TSO for UDP isn't
implemented yet. I'm waiting for it so that we would remove "#ifdef NotYet"
to send jumbo UDP frames without any fragmentation and any checksumming.
Then I hope we will get great performance.

Actually, device interface for what could be used is there, see
NETIF_F_FRAGLIST. No devices set this and IP never makes use of it
yet though :-)

Acenic and Tigon3 will be able to do this, probably e1000 has this
feature as well.

But it does not work how you imagine. One passes already fragmented
list of packets to card, and it can checksum the packet if you tell it
which descriptor is first of fragmented frame and which is last.

It does not do the fragmentation of UDP frames for you, only
checksumming of UDP portion. No card does what you mention.

Franks a lot,
David S. Miller
[email protected]

2002-09-09 09:00:57

by Hirokazu Takahashi

[permalink] [raw]
Subject: Re: [PATCH] zerocopy NFS for 2.5.33

Hi,

As far as I know e1000 has a feature that it can split a jumbo UDP frame
into some IP fragments.

> From: Hirokazu Takahashi <[email protected]>
> Date: Mon, 09 Sep 2002 16:11:23 +0900 (JST)
>
> Using TSO code is commented out at this moment as TSO for UDP isn't
> implemented yet. I'm waiting for it so that we would remove "#ifdef NotYet"
> to send jumbo UDP frames without any fragmentation and any checksumming.
> Then I hope we will get great performance.
>
> Actually, device interface for what could be used is there, see
> NETIF_F_FRAGLIST. No devices set this and IP never makes use of it
> yet though :-)
>
> Acenic and Tigon3 will be able to do this, probably e1000 has this
> feature as well.
>
> But it does not work how you imagine. One passes already fragmented
> list of packets to card, and it can checksum the packet if you tell it
> which descriptor is first of fragmented frame and which is last.
>
> It does not do the fragmentation of UDP frames for you, only
> checksumming of UDP portion. No card does what you mention.

2002-09-09 09:06:59

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] zerocopy NFS for 2.5.33

From: Hirokazu Takahashi <[email protected]>
Date: Mon, 09 Sep 2002 17:58:21 +0900 (JST)

As far as I know e1000 has a feature that it can split a jumbo UDP frame
into some IP fragments.

I doubt this, because very rarely do vendors of commodity networking
cards implement things outside of Microsoft's NDIS (Network Driver
Interface Specification) and what I have described is what they define
for fragmentation offloading.

Maybe some new revision has the feature you suggest.

2002-09-10 01:53:22

by Feldman, Scott

[permalink] [raw]
Subject: RE: [PATCH] zerocopy NFS for 2.5.33

> As far as I know e1000 has a feature that it can split a
> jumbo UDP frame into some IP fragments.

UDP segmentation but not UDP fragmentation, sorry.

-scott

2002-09-10 03:16:16

by Hirokazu Takahashi

[permalink] [raw]
Subject: Re: [PATCH] zerocopy NFS for 2.5.33

Hello,

> > As far as I know e1000 has a feature that it can split a
> > jumbo UDP frame into some IP fragments.
>
> UDP segmentation but not UDP fragmentation, sorry.

Really?
it's too sad.