2002-04-21 19:06:41

by Andrew Ryan

[permalink] [raw]
Subject: 3 bugs with UDP mounts

Running performance tests on NFS this past week, I've uncovered 3 apparent
bugs. The first bug is pretty unpleasant, but the second two are fairly minor.

kernel: 2.4.17+NFS-ALL-2002-Jan-17, 2.4.19pre7+NFS-ALL. Both SMP on a 2-CPU
system.
nfs-utils-0.3.1-13.7.2.1, mount-2.11g-5
server: NetApp
network: 100baseTx-FD on client, gigE-FD on server

1. UDP read performance (v2 and v3) on large files is really poor. I first
noticed this problem when trying to get bonnie++ and tiobench results and
seeing runs which should take about 20 minutes take 24+ hours. To duplicate
this, I created a 1600MB file filled with zeros, and this went quickly.
While writing the file, on the server I saw ~500 NFS ops/sec, and about
4400 kB/s in on the network interface. When I tried reading the file back,
with 'cat /path/to/file > /dev/null' I saw initially fast reads; 7900kB/s
for a few seconds, then slowing down to 100kB/s and ~100 NFSops/sec after a
few seconds.

I can get excellent read performance, as long as the files are relatively
small. With a Solaris client, mounting the same server UDP (and also
100baseTx-FD), reading the same file is done at a consistently high speed,
so it definitely seems to be a linux-specific problem.


2. I accidentally mounted an NFS filesystem
"udp,nfsvers=3,rsize=32768,wsize=32768". For one, I don't exactly
understand why this was allowed, since it doesn't seem like I should be
allowed to mount NFS/UDP with these options. Also, this creates a
disagreement between what is returned by the 'mount' command and 'cat
/proc/mounts'.

The relevant line from the 'mount' command shows this:
fileserver:/vol/stage/data on /shared/data type nfs
(rw,udp,nfsvers=3,rsize=32768,wsize=32768,intr,hard,addr=192.168.100.240)

while the relevant line from /proc/mounts shows this:
fileserver:/vol/stage/data /shared/data nfs
rw,v3,rsize=8192,wsize=8192,hard,intr,udp,lock,addr=fileserver 0 0
This may be a problem with my version of mount or nfs-utils, anyone seen
this before?


3. Running the cthon02 tests with UDP/v3 I get one failure. This looks like
a trivial problem to fix, if in fact it is a problem and not just a
disagreement about how the spec is interpreted.
Test #6 - Try to lock the MAXEOF byte.
Parent: 6.0 - F_TLOCK [7fffffff, 1] PASSED.
Child: 6.1 - F_TEST [7ffffffe, 1] PASSED.
Child: 6.2 - F_TEST [7ffffffe, 2] FAILED!
Child: **** Expected EACCES, returned EOVERFLOW...
Child: **** Probably implementation error.


_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2002-04-22 17:01:18

by Bruce Allan

[permalink] [raw]
Subject: Re: 3 bugs with UDP mounts


I also witnessed the same problem (problem #3 below) at the Connectathon
event this year with NFSv3 on both TCP and UDP. Trond was there and he
verified this was a bug in NetApp's ONTAP 6.1.1 having to do with, IIRC,
how they handle byte swapping against little-endian clients. This bug was
also discovered at the 2001 Connectathon event. NetApp fixed this in a
later release of ONTAP (I think they said it was fixed in 6.1.2 but don't
quote me on that).

---
Bruce Allan <[email protected]>
Software Engineer, Linux Technology Center
IBM Corporation, Beaverton OR
503-578-4187 IBM Tie-line 775-4187




Andrew Ryan
<[email protected] To: [email protected]
m> cc:
Sent by: Subject: [NFS] 3 bugs with UDP mounts
[email protected]
ceforge.net


04/21/2002 12:06 PM






Running performance tests on NFS this past week, I've uncovered 3 apparent
bugs. The first bug is pretty unpleasant, but the second two are fairly
minor.

kernel: 2.4.17+NFS-ALL-2002-Jan-17, 2.4.19pre7+NFS-ALL. Both SMP on a 2-CPU
system.
nfs-utils-0.3.1-13.7.2.1, mount-2.11g-5
server: NetApp
network: 100baseTx-FD on client, gigE-FD on server

1. UDP read performance (v2 and v3) on large files is really poor. I first
noticed this problem when trying to get bonnie++ and tiobench results and
seeing runs which should take about 20 minutes take 24+ hours. To duplicate
this, I created a 1600MB file filled with zeros, and this went quickly.
While writing the file, on the server I saw ~500 NFS ops/sec, and about
4400 kB/s in on the network interface. When I tried reading the file back,
with 'cat /path/to/file > /dev/null' I saw initially fast reads; 7900kB/s
for a few seconds, then slowing down to 100kB/s and ~100 NFSops/sec after a
few seconds.

I can get excellent read performance, as long as the files are relatively
small. With a Solaris client, mounting the same server UDP (and also
100baseTx-FD), reading the same file is done at a consistently high speed,
so it definitely seems to be a linux-specific problem.


2. I accidentally mounted an NFS filesystem
"udp,nfsvers=3,rsize=32768,wsize=32768". For one, I don't exactly
understand why this was allowed, since it doesn't seem like I should be
allowed to mount NFS/UDP with these options. Also, this creates a
disagreement between what is returned by the 'mount' command and 'cat
/proc/mounts'.

The relevant line from the 'mount' command shows this:
fileserver:/vol/stage/data on /shared/data type nfs
(rw,udp,nfsvers=3,rsize=32768,wsize=32768,intr,hard,addr=192.168.100.240)

while the relevant line from /proc/mounts shows this:
fileserver:/vol/stage/data /shared/data nfs
rw,v3,rsize=8192,wsize=8192,hard,intr,udp,lock,addr=fileserver 0 0
This may be a problem with my version of mount or nfs-utils, anyone seen
this before?


3. Running the cthon02 tests with UDP/v3 I get one failure. This looks like
a trivial problem to fix, if in fact it is a problem and not just a
disagreement about how the spec is interpreted.
Test #6 - Try to lock the MAXEOF byte.
Parent: 6.0 - F_TLOCK [7fffffff, 1] PASSED.
Child: 6.1 - F_TEST [7ffffffe, 1] PASSED.
Child: 6.2 - F_TEST [7ffffffe, 2] FAILED!
Child: **** Expected EACCES, returned EOVERFLOW...
Child: **** Probably implementation error.


_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs




_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-04-22 22:23:27

by Andrew Ryan

[permalink] [raw]
Subject: Re: 3 bugs with UDP mounts

On Monday 22 April 2002 10:00 am, Bruce Allan wrote:
> I also witnessed the same problem (problem #3 below) at the Connectathon
> event this year with NFSv3 on both TCP and UDP. Trond was there and he
> verified this was a bug in NetApp's ONTAP 6.1.1 having to do with, IIRC,
> how they handle byte swapping against little-endian clients. This bug was
> also discovered at the 2001 Connectathon event. NetApp fixed this in a
> later release of ONTAP (I think they said it was fixed in 6.1.2 but don't
> quote me on that).

Thanks for the info. We're running 6.1.2R1, so I would guess they've not
fixed it yet. Perhaps in 6.1.3. I wonder, is the problem basically cosmetic
or does it manifest itself in other places as well?


andrew

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-04-22 22:32:57

by Bruce Allan

[permalink] [raw]
Subject: Re: 3 bugs with UDP mounts


FYI, NetApp had another server at Connectathon2002 running ONTAP 6.2 X37
which did not exhibit this problem.

I can't speak of the extent of the bug as the Connectathon cthon02 test
suite is the extent of my exposure to NetApp servers.

---
Bruce Allan <[email protected]>
Software Engineer, Linux Technology Center
IBM Corporation, Beaverton OR
503-578-4187 IBM Tie-line 775-4187




Andrew Ryan
<andrewr@nam-shub To: Bruce Allan/Beaverton/IBM@IBMUS
.com> cc: [email protected]
Subject: Re: [NFS] 3 bugs with UDP mounts
04/22/2002 03:22
PM






On Monday 22 April 2002 10:00 am, Bruce Allan wrote:
> I also witnessed the same problem (problem #3 below) at the Connectathon
> event this year with NFSv3 on both TCP and UDP. Trond was there and he
> verified this was a bug in NetApp's ONTAP 6.1.1 having to do with, IIRC,
> how they handle byte swapping against little-endian clients. This bug
was
> also discovered at the 2001 Connectathon event. NetApp fixed this in a
> later release of ONTAP (I think they said it was fixed in 6.1.2 but don't
> quote me on that).

Thanks for the info. We're running 6.1.2R1, so I would guess they've not
fixed it yet. Perhaps in 6.1.3. I wonder, is the problem basically cosmetic
or does it manifest itself in other places as well?


andrew




_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-04-23 01:23:33

by Lever, Charles

[permalink] [raw]
Subject: RE: 3 bugs with UDP mounts

hi all-

> kernel: 2.4.17+NFS-ALL-2002-Jan-17, 2.4.19pre7+NFS-ALL. Both
> SMP on a 2-CPU
> system.
> nfs-utils-0.3.1-13.7.2.1, mount-2.11g-5
> server: NetApp

for the whole list: when reporting issues against NetApp filers,
please mention the OnTap version and filer hardware if you can...
that would help us a lot! thanks!

> network: 100baseTx-FD on client, gigE-FD on server
>
> 1. UDP read performance (v2 and v3) on large files is really
> poor. [....]
> I can get excellent read performance, as long as the files
> are relatively small.

you are probably hitting the network speed step down between
the GbE filer and the 100Mb client. the first packet loss
will cause your read throughput to drop. i'll bet your
small file knee occurs right about at the size of your switch's
memory buffer.

have you tried this test with TCP?

> With a Solaris client, mounting the same server UDP (and also
> 100baseTx-FD), reading the same file is done at a
> consistently high speed,
> so it definitely seems to be a linux-specific problem.

Solaris may have workarounds (like a loose interpretation of
Van Jacobsen) or bugs that allow this to work well. just a guess.

> 2. I accidentally mounted an NFS filesystem
> "udp,nfsvers=3,rsize=32768,wsize=32768". For one, I don't exactly
> understand why this was allowed, since it doesn't seem like I
> should be
> allowed to mount NFS/UDP with these options.

we have filers mounted with UDP and r/wsize=32K from a R7.2 system
running 2.4.19pre. NFS over UDP allows r/wsize up to 32K. the
limitation is the number of bytes possible in a single fragmented
IP packet which is about 65536 bytes. the RPC/UDP header overhead
for a 64K r/wsize cause it not to fit in a single IP packet.

> Also, this creates a
> disagreement between what is returned by the 'mount' command and 'cat
> /proc/mounts'.
>
> The relevant line from the 'mount' command shows this:
> fileserver:/vol/stage/data on /shared/data type nfs
> (rw,udp,nfsvers=3,rsize=32768,wsize=32768,intr,hard,addr=192.1
> 68.100.240)
>
> while the relevant line from /proc/mounts shows this:
> fileserver:/vol/stage/data /shared/data nfs
> rw,v3,rsize=8192,wsize=8192,hard,intr,udp,lock,addr=fileserver 0 0
> This may be a problem with my version of mount or nfs-utils,
> anyone seen
> this before?

the mismatch between the requested size and the reported mount
options i have never seen before, and looks like a bug to me.
your version of mount may not allow more than 8k r/wsize because
of old limitations on the Linux server?

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-04-23 07:36:58

by Andrew Ryan

[permalink] [raw]
Subject: RE: 3 bugs with UDP mounts

At 06:23 PM 4/22/02 -0700, Lever, Charles wrote:
>for the whole list: when reporting issues against NetApp filers,
>please mention the OnTap version and filer hardware if you can...
>that would help us a lot! thanks!

Sorry: F820, 6.1.2R3.


> > network: 100baseTx-FD on client, gigE-FD on server
> >
> > 1. UDP read performance (v2 and v3) on large files is really
> > poor. [....]
> > I can get excellent read performance, as long as the files
> > are relatively small.
>
>you are probably hitting the network speed step down between
>the GbE filer and the 100Mb client. the first packet loss
>will cause your read throughput to drop. i'll bet your
>small file knee occurs right about at the size of your switch's
>memory buffer.

I'm not seeing any packet loss on client or server acc. to interface
statistics. How would I find this out? I am seeing a 2 second delay between
reads when I turn on NFS debugging. A thread ("2.4.18: NFS_ALL patch
greatly hurting UDP speed") mentioned this same delay, so I'm probably
running into the same problem.

BTW, I did more performance tests. If I set rsize and wsize to 1024 or
2048, I can get acceptable performance with NFS+UDP. If I go to 4096 or
higher, performance drops immediately to an unacceptable level.


>have you tried this test with TCP?

Yeah, TCP performance is fine. It's the hangs under 2.4.17+NFS-ALL
(described in an earlier mail) that are killing me. The hangs seem to have
gone away in 2.4.19pre7+NFS-ALL, but I need more time to test all of
2.4.19pre7 in my environment to make sure it holds up OK.



>Solaris may have workarounds (like a loose interpretation of
>Van Jacobsen) or bugs that allow this to work well. just a guess.

Would it be possible -- via a kernel option or mount option -- to control
the behavior in the Linux client as well so that high performance can be
achieved on a Linux NFS client with high block sizes and UDP? Mixed
100/1000 networks are becoming more of a reality and NFS/TCP, while having
gotten tremendously better in the last year or so, is still a work in
progress, and it's nice to have another acceptable option for mounting.

Solaris has clearly solved this problem somehow -- it would be nice if
linux could as well.


>we have filers mounted with UDP and r/wsize=32K from a R7.2 system
>running 2.4.19pre. NFS over UDP allows r/wsize up to 32K.

Right, my bad. I was confused by a day of too many mounts and umounts and
too little stuff working properly. 32k should have been fine.


>the mismatch between the requested size and the reported mount
>options i have never seen before, and looks like a bug to me.
>your version of mount may not allow more than 8k r/wsize because
>of old limitations on the Linux server?

I'm using mount 2.11g, as supplied with RH 7.2. Is there a newer mount
program I should be using? Can others duplicate this bug?


thanks,
andrew


_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-04-23 14:55:46

by Lever, Charles

[permalink] [raw]
Subject: RE: 3 bugs with UDP mounts

> > > 1. UDP read performance (v2 and v3) on large files is really
> > > poor. [....]
> > > I can get excellent read performance, as long as the files
> > > are relatively small.
> >
> >you are probably hitting the network speed step down between
> >the GbE filer and the 100Mb client. the first packet loss
> >will cause your read throughput to drop. i'll bet your
> >small file knee occurs right about at the size of your switch's
> >memory buffer.
>
> I'm not seeing any packet loss on client or server acc. to interface
> statistics. How would I find this out?

the interface statistics reflect errors right at the client's NIC.
the switch may also drop packets, for example, to match link speed,
because it has exhausted buffer space, because of QOS or packet
scheduling, and so on. IOW the client can get a packet to the switch
perfectly (so no Tx error occurs) but the switch may drop the packet
before getting it to the server. and also in the reverse direction.

the client may also lose replies because of lack of socket buffer space.
packets are received perfectly (thus no Rx error occurs) but the
driver/network layer fails to copy the data to the RPC layer because
there is not enough socket buffer space. the network layer drops
the packet.

take a look at "nfsstat -c" -- there's a retransmission number
at the top that indicates how often the client was not able to
get a request to the server (and a reply from the server) the first
time. you can also look at "netstat -s" to see if there are
reassembly failures (see other thread: "nfs performance: read
only/gigE/nolock/1Tb per day").

> >have you tried this test with TCP?
>
> Yeah, TCP performance is fine. It's the hangs under 2.4.17+NFS-ALL
> (described in an earlier mail) that are killing me. The hangs
> seem to have
> gone away in 2.4.19pre7+NFS-ALL, but I need more time to test all of
> 2.4.19pre7 in my environment to make sure it holds up OK.

that's good news.

> Would it be possible -- via a kernel option or mount option
> -- to control
> the behavior in the Linux client as well so that high
> performance can be
> achieved on a Linux NFS client with high block sizes and UDP? Mixed
> 100/1000 networks are becoming more of a reality and NFS/TCP,
> while having
> gotten tremendously better in the last year or so, is still a work in
> progress, and it's nice to have another acceptable option for
> mounting.

my 2 cents:

IMHO the right answer is to get TCP fixed properly as soon as possible.
UDP is always subject to the quality of your network, and thus is
a greater risk to data integrity. also, the application and O/S on
the client should have some ability to control the flow of network
data, even on single speed networks -- UDP doesn't provide that.
though TCP has greater network and CPU overhead, it is a better choice
if NFS is to be taken seriously as a robust enterprise-quality file
system.

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-04-23 15:20:14

by Ion Badulescu

[permalink] [raw]
Subject: Re: 3 bugs with UDP mounts

On Tue, 23 Apr 2002 00:37:15 -0700, Andrew Ryan <[email protected]> wrote:

> Would it be possible -- via a kernel option or mount option -- to control
> the behavior in the Linux client as well so that high performance can be
> achieved on a Linux NFS client with high block sizes and UDP? Mixed
> 100/1000 networks are becoming more of a reality and NFS/TCP, while having
> gotten tremendously better in the last year or so, is still a work in
> progress, and it's nice to have another acceptable option for mounting.
>
> Solaris has clearly solved this problem somehow -- it would be nice if
> linux could as well.

Are you absolutely sure that Solaris is mounting NFS/UDP? The reason I'm
asking is because Solaris' mount_nfs defaults to using TCP unless you
explicitly force it to use UDP with -o proto=udp.

Just checking...

Thanks,
Ion

--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-04-23 16:26:04

by Bogdan Costescu

[permalink] [raw]
Subject: RE: 3 bugs with UDP mounts

On Tue, 23 Apr 2002, Lever, Charles wrote:

> the client may also lose replies because of lack of socket buffer space.
> packets are received perfectly (thus no Rx error occurs) but the
> driver/network layer fails to copy the data to the RPC layer because
> there is not enough socket buffer space. the network layer drops
> the packet.

Ouch, completely missed OOM situations in my earlier list. They are quite
tricky because:
- you never know when they hit you
- you never know how much they last
- some packets can go through and some cannot. Lots of drivers derived
from Donald Becker's skeleton use a "rx_copybreak" limit to decide
whether to just hand the (maximum sized) packet to the upper layers or
allocate a properly sized skb and copy the packet there then hand it up.
While allocation of a small skb in the second case might succeed
(rx_copybreak is usually set to 200), the refilling of the Rx ring in the
first case with a maximum sized one might fail.

A scenario that might happen is this: a client mounts some tree through
NFS which is then used to serve web pages with apache + mods. There is no
limit on number of simultaneously running apache and child processes. The
machine works well most of the time, however, when it receives a high
number of HTTP requests it spawns too many processes which exhaust
memory... Several cases similar to this were reported to me (by people
using the 3c59x driver as they thought it was related to the driver).

--
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: [email protected]




_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-04-23 16:28:15

by Andrew Ryan

[permalink] [raw]
Subject: Re: 3 bugs with UDP mounts

At 11:20 AM 4/23/02 -0400, Ion Badulescu wrote:
>On Tue, 23 Apr 2002 00:37:15 -0700, Andrew Ryan <[email protected]> wrote:
> > Solaris has clearly solved this problem somehow -- it would be nice if
> > linux could as well.
>
>Are you absolutely sure that Solaris is mounting NFS/UDP? The reason I'm
>asking is because Solaris' mount_nfs defaults to using TCP unless you
>explicitly force it to use UDP with -o proto=udp.


Yes, indeed, I did mount the server with the 'proto=udp' option on the
Solaris host.



thanks,
andrew


_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-04-23 16:29:31

by miah

[permalink] [raw]
Subject: Re: 3 bugs with UDP mounts

2.11q seems to be the current. But sadly, the HISTORY file is lacking in decent
descriptions of changes. eg;

* mount: minor fix (Andrey J. Melnikoff)

Though, I don't see anything that looks to be a major fix for nfs issues.

-miah

On Tue, Apr 23, 2002 at 12:37:15AM -0700, Andrew Ryan wrote:
> I'm using mount 2.11g, as supplied with RH 7.2. Is there a newer mount
> program I should be using? Can others duplicate this bug?
>
>
> thanks,
> andrew

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs