2006-03-15 22:24:32

by Bret Towe

[permalink] [raw]
Subject: nfs udp 1000/100baseT issue

a while ago i noticed a issue when one has a nfs server that has
gigabit connection
to a network and a client that connects to that network instead via 100baseT
that udp connection from client to server fails the client gets a
server not responding
message when trying to access a file, interesting bit is you can get a directory
listing without issue
work around i found for this is adding proto=tcp to the client side
and all works
without error

ive seen this on kernels as far back as 2.6.13 on my own machines
(was around that time when i accutally got gigabit at home)
and recently noticed on some thin clients i maintain that 2.4 kernels
on the client side are also affected so perhaps its server side issue?
as all servers ive seen this on are 2.6 i havent used 2.4 kernels in ages
on my own machines so i havent looked into if 2.4 has that issue server side
or not

error message i see client side are as follows:
nfs: server vox.net not responding, still trying
nfs: server vox.net not responding, still trying
nfs: server vox.net not responding, still trying

server side shows no errors at all


i was able to cat a couple files and narrow it down to it doesnt like files
over 28504 bytes
kernels at the moment here are client and server 2.6.15.4 but like i
said eariler
version seems to not matter much

any further info needed ask
testing i can also do but it might take me a while before i can get around to it
took me a couple months just to get around to doing this :\


2006-03-16 20:42:04

by Jan Engelhardt

[permalink] [raw]
Subject: Re: nfs udp 1000/100baseT issue

>
>a while ago i noticed a issue when one has a nfs server that has
>gigabit connection
>to a network and a client that connects to that network instead via 100baseT
>that udp connection from client to server fails the client gets a
>server not responding
>message when trying to access a file, interesting bit is you can get a directory
>listing without issue
>work around i found for this is adding proto=tcp to the client side
>and all works
>without error

UDP has its implications, like silently dropping packets when the link
is full, by design. Try tcpdump on both systems and compare what packets
are sent and which do arrive. The error message is then probably because
the client is confused of not receiving some packets.

>error message i see client side are as follows:
>nfs: server vox.net not responding, still trying
>nfs: server vox.net not responding, still trying
>nfs: server vox.net not responding, still trying


Jan Engelhardt
--

2006-03-17 01:33:21

by Bret Towe

[permalink] [raw]
Subject: Re: nfs udp 1000/100baseT issue

On 3/16/06, Jan Engelhardt <[email protected]> wrote:
> >
> >a while ago i noticed a issue when one has a nfs server that has
> >gigabit connection
> >to a network and a client that connects to that network instead via 100baseT
> >that udp connection from client to server fails the client gets a
> >server not responding
> >message when trying to access a file, interesting bit is you can get a directory
> >listing without issue
> >work around i found for this is adding proto=tcp to the client side
> >and all works
> >without error
>
> UDP has its implications, like silently dropping packets when the link
> is full, by design. Try tcpdump on both systems and compare what packets
> are sent and which do arrive. The error message is then probably because
> the client is confused of not receiving some packets.

after compairing a working and not working client i found that
packets containing offset 19240, 20720, 22200 are missing
and the 100baseT client had an extra offset of 32560
on the working client it ends at 31080

the missing ones are mostly constantly missing 22200 appears every so often
on retransmission and 23680 also disappears every so often

i hope that isnt too confusing i dont use tcpdump type stuff much
(well i did give up on tcpdump and had to use ethereal...)

> >error message i see client side are as follows:
> >nfs: server vox.net not responding, still trying
> >nfs: server vox.net not responding, still trying
> >nfs: server vox.net not responding, still trying
>
>
> Jan Engelhardt
> --
>

2006-03-17 02:22:18

by NeilBrown

[permalink] [raw]
Subject: Re: nfs udp 1000/100baseT issue

On Thursday March 16, [email protected] wrote:
> On 3/16/06, Jan Engelhardt <[email protected]> wrote:
> > >
> > >a while ago i noticed a issue when one has a nfs server that has
> > >gigabit connection
> > >to a network and a client that connects to that network instead via 100baseT
> > >that udp connection from client to server fails the client gets a
> > >server not responding
> > >message when trying to access a file, interesting bit is you can get a directory
> > >listing without issue
> > >work around i found for this is adding proto=tcp to the client side
> > >and all works
> > >without error
> >
> > UDP has its implications, like silently dropping packets when the link
> > is full, by design. Try tcpdump on both systems and compare what packets
> > are sent and which do arrive. The error message is then probably because
> > the client is confused of not receiving some packets.
>
> after compairing a working and not working client i found that
> packets containing offset 19240, 20720, 22200 are missing
> and the 100baseT client had an extra offset of 32560
> on the working client it ends at 31080
>
> the missing ones are mostly constantly missing 22200 appears every so often
> on retransmission and 23680 also disappears every so often
>
> i hope that isnt too confusing i dont use tcpdump type stuff much
> (well i did give up on tcpdump and had to use ethereal...)

This is all to be expected. I remember having this issue with a
server on 100M and clients in 10M...

There is no flow control in UDP. If anything gets lots, the client
has to resend the request, and the server then has to respond again.
If the respond is large (e.g. a read) and gets fragmented (if > 1500bytes)
then there is a good chance that one or more fragments of a reply will
get lots in the switch stepping down from 1G to 100M. Every time.

Your options include:

- use tcp
- get a switch with a (much) bigger packet buffer
- drop the server down to 100M
- drop the nfs rsize down to 1024 to you don't get fragments.

NeilBrown

2006-03-17 03:11:24

by Bret Towe

[permalink] [raw]
Subject: Re: nfs udp 1000/100baseT issue

On 3/16/06, Neil Brown <[email protected]> wrote:
> On Thursday March 16, [email protected] wrote:
> > On 3/16/06, Jan Engelhardt <[email protected]> wrote:
> > > >
> > > >a while ago i noticed a issue when one has a nfs server that has
> > > >gigabit connection
> > > >to a network and a client that connects to that network instead via 100baseT
> > > >that udp connection from client to server fails the client gets a
> > > >server not responding
> > > >message when trying to access a file, interesting bit is you can get a directory
> > > >listing without issue
> > > >work around i found for this is adding proto=tcp to the client side
> > > >and all works
> > > >without error
> > >
> > > UDP has its implications, like silently dropping packets when the link
> > > is full, by design. Try tcpdump on both systems and compare what packets
> > > are sent and which do arrive. The error message is then probably because
> > > the client is confused of not receiving some packets.
> >
> > after compairing a working and not working client i found that
> > packets containing offset 19240, 20720, 22200 are missing
> > and the 100baseT client had an extra offset of 32560
> > on the working client it ends at 31080
> >
> > the missing ones are mostly constantly missing 22200 appears every so often
> > on retransmission and 23680 also disappears every so often
> >
> > i hope that isnt too confusing i dont use tcpdump type stuff much
> > (well i did give up on tcpdump and had to use ethereal...)
>
> This is all to be expected. I remember having this issue with a
> server on 100M and clients in 10M...
>
> There is no flow control in UDP

is this a linux design flaw or just nature of udp?

>. If anything gets lots, the client
> has to resend the request, and the server then has to respond again.
> If the respond is large (e.g. a read) and gets fragmented (if > 1500bytes)
> then there is a good chance that one or more fragments of a reply will
> get lots in the switch stepping down from 1G to 100M. Every time.
>
> Your options include:
>
> - use tcp

im wondering why this isnt the default to begin with

> - get a switch with a (much) bigger packet buffer
> - drop the server down to 100M
> - drop the nfs rsize down to 1024 to you don't get fragments.
these last 2 options sound rather painfull speed wise
tcp work around is prob by far the easiest

>
> NeilBrown
>

2006-03-17 03:43:08

by NeilBrown

[permalink] [raw]
Subject: Re: nfs udp 1000/100baseT issue

On Thursday March 16, [email protected] wrote:
> On 3/16/06, Neil Brown <[email protected]> wrote:
> >
> > There is no flow control in UDP
>
> is this a linux design flaw or just nature of udp?

Just the nature of UDP.

> >
> > - use tcp
>
> im wondering why this isnt the default to begin with

Because it wasn't that many years ago that Linux NFS didn't support
tcp at all.
Some distributions modify 'mount' to get it to prefer tcp over udp.

NeilBrown

2006-03-17 04:14:04

by Lee Revell

[permalink] [raw]
Subject: Re: nfs udp 1000/100baseT issue

On Fri, 2006-03-17 at 14:41 +1100, Neil Brown wrote:
>
> > >
> > > - use tcp
> >
> > im wondering why this isnt the default to begin with
>
> Because it wasn't that many years ago that Linux NFS didn't support
> tcp at all.
> Some distributions modify 'mount' to get it to prefer tcp over udp.

Also historical reasons that predate Linux, the original NFS
implementations were UDP only. TCP was not an option until NFSv3.

Lee

2006-03-17 09:13:26

by Helge Hafting

[permalink] [raw]
Subject: Re: nfs udp 1000/100baseT issue

Bret Towe wrote:

>On 3/16/06, Neil Brown <[email protected]> wrote:
>
>
>>There is no flow control in UDP
>>
>>
>
>is this a linux design flaw or just nature of udp?
>
>
That has nothing to do with linux at all.

"Now flow control in udp" is a udp design issue. And it is not
a flaw either - the rule is simple:

If you need flow control - use tcp.
If you don't need flow control, and don't want the
overhead of flow control - use udp.

Udp is for those cases where flow control is consideres a waste of time.

Now, the original decision to base early NFS on udp, that was
a design mistake. Again, not a linux problem but a nfs problem.
Fortunately, today a solution for this exists and is implemented
in linux - and it is nfs over tcp.

>>. If anything gets lots, the client
>>has to resend the request, and the server then has to respond again.
>>If the respond is large (e.g. a read) and gets fragmented (if > 1500bytes)
>>then there is a good chance that one or more fragments of a reply will
>>get lots in the switch stepping down from 1G to 100M. Every time.
>>
>>Your options include:
>>
>> - use tcp
>>
>>
>
>im wondering why this isnt the default to begin with
>
>
Hard to say. I guess someone thought they could get better
performance with udp - it has less overhead.,
Then didn't bother testing this idea with a somewhat congested network?

Helge Hafting

2006-03-17 11:21:22

by Andrew Morton

[permalink] [raw]
Subject: Re: nfs udp 1000/100baseT issue

"Bret Towe" <[email protected]> wrote:
>
> ive seen this on kernels as far back as 2.6.13 on my own machines
> (was around that time when i accutally got gigabit at home)
> and recently noticed on some thin clients i maintain that 2.4 kernels
> on the client side are also affected so perhaps its server side issue?
> as all servers ive seen this on are 2.6 i havent used 2.4 kernels in ages
> on my own machines so i havent looked into if 2.4 has that issue server side
> or not

It would be interesting if you could do so. I do recall that
nfs-over-crappy-udp was much better behaved in 2.4...

2006-03-17 15:53:42

by Trond Myklebust

[permalink] [raw]
Subject: Re: nfs udp 1000/100baseT issue

On Fri, 2006-03-17 at 03:18 -0800, Andrew Morton wrote:
> "Bret Towe" <[email protected]> wrote:
> >
> > ive seen this on kernels as far back as 2.6.13 on my own machines
> > (was around that time when i accutally got gigabit at home)
> > and recently noticed on some thin clients i maintain that 2.4 kernels
> > on the client side are also affected so perhaps its server side issue?
> > as all servers ive seen this on are 2.6 i havent used 2.4 kernels in ages
> > on my own machines so i havent looked into if 2.4 has that issue server side
> > or not
>
> It would be interesting if you could do so. I do recall that
> nfs-over-crappy-udp was much better behaved in 2.4...

The 2.6 servers allow clients to use 32k block sizes for READ and WRITE
requests, and set that as the preferred size for both TCP and UDP. In
2.4, they only supported 8k blocks.

Cheers,
Trond