2003-06-11 17:27:53

by gb

[permalink] [raw]
Subject: Linux / tcp-nfs / NetApp problems (summary)


I'd like to thank Charles Lever and Trond for not only
helping narrow down where the problem was occuring,
but for providing a fix (patches) for our linux
clients that enable them to once again play nicely
with NetApps using tcp-nfs when under heavy network
load.

All of this was done out of the kindness of their
hearts, and the linux NFS community (especially me) is
in their debt.

* Environment:

A large (sample size 30) base of linux clients
(2.4.18) using tcp-nfs (iozone generated) to a NetApp
760 filer running 6.2.2D21.

server/directory/mount options:

vger:/vol/vol2/admin-test2 /u/admin-test2 nfs
rw,v3,rsize=32768,wsize=32768,hard,intr,tcp,lock,addr=vger
0 0

admin-test2 (vger:vol2) has 13 disks + 1 parity

Problem Summary:

When the linux clients (2.4.18) begin accessing the
same partition on the filer (reads/writes), the linux
clients process performing the access "hangs". Syslog
output contains these messages:

can't get request slot
nfs server not responding

The process itself (and other requests to the mount
point) hang in a "disk sleep" status.

[root@valk101 log]# cat /proc/2123/status
Name: bash
State: D (disk sleep)

After looking at multiple tcpdump and rpc debug logs,
Charles hit the nail on the head with this suggestion:

"the filer closes the TCP window on sockets that
are creating a heavy load. this is meant as a
form of back pressure on clients to cause them
to slow down... it looks like the Linux clients are
not recognizing this subtle hint, and continue to
generate more requests even though the TCP window is
small or zero."

Trond jumped on this, and produced a series of patches
over time and testing that fixed the linux client-side
issues using tcp-nfs. For what they do, I suggest
looking here:

http://www.fys.uio.no/%7Etrondmy/src/2.4.21-rc6/

linux-2.4.21-14-xprt_fixes.dif
linux-2.4.21-15-fix_tcprace.dif
linux-2.4.21-16-fix_tcprace2.dif
linux-2.4.21-17-fix_tcprace3.dif

Using the base 2.4.21-rc7 kernel + above patches, we
get the expected (linux client) behavior...

1. linux clients generate huge load on filer
2. filer advertises window 0
3. linux clients probe filer, wating for window size
to increase

...time passes... <count to 300 slowly>

4. filer closes tcp connection
5. linux clients re-open connection and continue
merrily along their way

I'm working with NetApp to answer the question, "Why
is the filer advertising a receive window of 0 and
eventually closing a tcp connection, instead of
re-opening it's receive window as resources become
available?"

But, for the meantime, linux 2.4.21-rc7 + Trond
patches, make our linux + NetApp environment happy!

Thanks everyone,

--Greg

__________________________________
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com


-------------------------------------------------------
This SF.NET email is sponsored by: eBay
Great deals on office technology -- on eBay now! Click here:
http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2003-06-11 17:44:45

by Stuckless, Colin

[permalink] [raw]
Subject: RE: Linux / tcp-nfs / NetApp problems (summary)


> -----Original Message-----
> From: gb [mailto:[email protected]]
> Sent: Wednesday, June 11, 2003 2:58 PM
> To: [email protected]
> Subject: [NFS] Linux / tcp-nfs / NetApp problems (summary)
>
>
>
> I'd like to thank Charles Lever and Trond for not only
> helping narrow down where the problem was occuring,
> but for providing a fix (patches) for our linux
> clients that enable them to once again play nicely
> with NetApps using tcp-nfs when under heavy network
> load.
<...>
> server/directory/mount options:
>
> vger:/vol/vol2/admin-test2 /u/admin-test2 nfs
> rw,v3,rsize=32768,wsize=32768,hard,intr,tcp,lock,addr=vger
> 0 0
<...>
> Problem Summary:
>
> When the linux clients (2.4.18) begin accessing the
> same partition on the filer (reads/writes), the linux
> clients process performing the access "hangs". Syslog
> output contains these messages:
>
> can't get request slot
> nfs server not responding


Greg,

Can you tell us what the client performance is like with
the fix in place?

I (and many others) have experienced the same problem
with RedHat 7.3 against Solaris file servers when using
UDP as the transport. I had to cut back from a 32K w/rsize
to 8K, and even then the Linux clients were still able to
consume all the Solaris file server's resources, causing
other clients to suffer the "can not get request slot"
problem.

It sounds like the fix Trond has supplied would work for
my situation as well. At the moment I'm running kernel
2.4.18-27.7.xsmp from RedHat which has also proven to
provide a workaround, although I don't know if their
solution is the same as Trond's or not.

Colin Stuckless



********************

This email communication is intended as a private communication for the sole
use of the primary addressee and those individuals listed for copies in the
original message. The information contained in this email is private and
confidential and if you are not an intended recipient you are hereby
notified that copying, forwarding or other dissemination or distribution of
this communication by any means is prohibited. If you are not specifically
authorized to receive this email and if you believe that you received it in
error please notify the original sender immediately. We honour similar
requests relating to the privacy of email communications.

Cette communication par courrier ?lectronique est une communication priv?e ?
l'usage exclusif du destinataire principal ainsi que des personnes dont les
noms figurent en copie. Les renseignements contenus dans ce courriel sont
confidentiels et si vous n'?tes pas le destinataire pr?vu, vous ?tes avis?,
par les pr?sentes que toute reproduction, tout transfert ou toute autre
forme de diffusion de cette communication par quelque moyen que ce soit est
interdit. Si vous n'?tes pas sp?cifiquement autoris? ? recevoir ce courriel
ou si vous croyez l'avoir re?u par erreur, veuillez en aviser l'exp?diteur
original imm?diatement. Nous respectons les demandes similaires qui
touchent la confidentialit? des communications par courrier ?lectronique.

2003-06-11 18:00:58

by Stuckless, Colin

[permalink] [raw]
Subject: RE: Linux / tcp-nfs / NetApp problems (summary)




>Greg,
>Can you tell us what the client performance is like with
>the fix in place?


Apologies for following up on my own message, but I
quickly realised the flaw in my logic. The Trond fix for
Greg address TCP window sizes, and since I am using
UDP as the transport it does not apply to my situation.

Back to your regularly scheduled programming...


Colin


********************

This email communication is intended as a private communication for the sole
use of the primary addressee and those individuals listed for copies in the
original message. The information contained in this email is private and
confidential and if you are not an intended recipient you are hereby
notified that copying, forwarding or other dissemination or distribution of
this communication by any means is prohibited. If you are not specifically
authorized to receive this email and if you believe that you received it in
error please notify the original sender immediately. We honour similar
requests relating to the privacy of email communications.

Cette communication par courrier ?lectronique est une communication priv?e ?
l'usage exclusif du destinataire principal ainsi que des personnes dont les
noms figurent en copie. Les renseignements contenus dans ce courriel sont
confidentiels et si vous n'?tes pas le destinataire pr?vu, vous ?tes avis?,
par les pr?sentes que toute reproduction, tout transfert ou toute autre
forme de diffusion de cette communication par quelque moyen que ce soit est
interdit. Si vous n'?tes pas sp?cifiquement autoris? ? recevoir ce courriel
ou si vous croyez l'avoir re?u par erreur, veuillez en aviser l'exp?diteur
original imm?diatement. Nous respectons les demandes similaires qui
touchent la confidentialit? des communications par courrier ?lectronique.