2008-02-05 23:08:50

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 4/5] NFSD: Remove NFSD_TCP kernel build option

On Wed, Feb 06, 2008 at 10:05:39AM +1100, Greg Banks wrote:
> Frank van Maarseveen wrote:
> > Last time I checked (around 2.6.22) writing large files on NFSv3 over
> > UDP was 20% faster compared to TCP (Gb LAN with one switch connecting
> > all machines).
> >
> Did all of your file arrive at the server, and in the same order it left
> the client? NFS on UDP relies on IP fragmentation, which is known to
> introduce silent data corruption at high data rates (google for "IPID aliasing").

The right query appears to be "IPID aliasing NFS", which (at least for
me) gets you a nice explanation from Olaf's 2006 OLS paper as the first


> Also, last time I checked, UDP support in the server uses a single socket
> for all traffic, and processes need to serialise on the svc_sock lock to send,
> so aggregate UDP throughput is strictly limited compared to TCP. As in, 145 MB/s
> for UDP compared to filling 12 1gige pipes for TCP. I have a patch to fix this,
> but given the inherent data corruption issues of UDP I haven't bothered posting
> the most recent version.
> > TCP and its timeout/retransmission behavior isn't always the best choice.
> >
> >
> The timeout & retrans that sunrpc implements on top of UDP is arguably worse,
> especially if you use the "soft" mount option.
> --
> Greg Banks, R&D Software Engineer, SGI Australian Software Group.
> The cake is *not* a lie.
> I don't speak for SGI.