2005-11-02 20:06:24

by Steven Timm

[permalink] [raw]
Subject: rpc-srv/tcp: nfsd: sent only -107 bytes (fwd)



I am seeing repeated errors of rpc-srv/tcp: nfsd: sent only -107 bytes
in the /var/log/messages of my machine. Full configuration
info is below. Only suggestion I have seen thus far
increase the number of nfsd that are running. we have done
this, raising from 8 to 64, the problem persists. Are there
any other suggestions that could help this problem?

Thanks

Steve Timm

--------------------------------------------------------------------
Node "fnpcsrv1" is new NFS server,
root@fnpcsrv1 lsi_home]# uname -a
Linux fnpcsrv1.fnal.gov 2.4.21-32.0.1.EL.XFSsmp #1 SMP Wed Jun 8 18:35:19 CEST
2005 i686 i686 i386 GNU/Linux
this is 4-way Dell Poweredge 6850, 16 GB RAM, running in 32-bit mode,
hyperthreading on.
All NFS-exported file systems are XFS file systems and have quotas.
Currently running 64 NFS daemons.
Disk hardware includes Dell Megaraid controller and LSI e2400 RAID controller.

Kernel 2.4.21-32.0.1.EL.XFSsmp is the kernel recompiled from same
.src.rpm as Red Hat Enterprise Linux 3 update 5, with XFS patches applied.

Two NFS clients running 2.4.21-37.ELsmp, namely fnpcg, fngp-osg.
These are grid nodes, heavy consumers of NFS.

150-some worker node NFS clients running 2.4.21-27.0.2.ELsmp kernel.
(From Red Hat Enterprise Linux 3 update 4).
All linux-to-linux NFS access is version 3 over TCP.


Problem #1: "Failed to mount xxxxxxx"
Seen in /var/log/messages of normal worker nodes and grid nodes.
All home and staging areas are mounted on worker nodes via the
automounter. We see on average 10-12 failures a day over a sample
of 150 worker nodes. Each failure can cause a job to fail.
No corresponding error message on the server when this happens.
These happen more when server is at high load.

Problem #2:
Sep 13 04:40:24 fngp-osg kernel: RPC: tcp_data_ready socket info not
found!
Sep 13 04:40:26 fngp-osg kernel: nfs_safe_remove:
335e33085de705df938049bc260619
/lock busy, d_count=2
These 2 messages are seen only on the client nodes that are running
2.4.21-32.0.1 kernel. Sometimes they occur together, more usually
the nfs_safe_remove occurs alone.

If we investigate the file "lock" that is trying to be removed we find the
following:

ls of the directory it is in, on the client, does not show the file.
"ls lock" does show the file
"rm lock" claims the file does not exist.
A portion of the strace of the process in question.

link("/home/cdf/.globus/.gass_cache/global/md5/76/f3/51/365016f6aebcebeb671288a2ff/data",
"/home/cdf/.globus/.gass_cache/global/md5/76/f3/51/365016f6aebcebeb671288a2ff/lock")
= -1 EEXIST (File exists)
stat64("/home/cdf/.globus/.gass_cache/global/md5/76/f3/51/365016f6aebcebeb671288a2ff/lock",
{st_mode=S_IFREG|0755, st_size=458, ...}) = 0
time(NULL) = 1130864802
unlink("/home/cdf/.globus/.gass_cache/global/md5/76/f3/51/365016f6aebcebeb671288a2ff/lock")
= -1 ENOENT (No such file or directory)


On the server the file does not show up at all.

A umount and remount of the system on the client clears the problem
but that's not an operation that can always be done. The undeletable
file causes a large number of client processes to get very confused
and get caught in a tight loop of statting, unlinking, and linking the
same file over and over again. Only recourse is to kill the processes
in question.

2a) we see some stale NFS file handles on these systems as well, also causing
hung processes.

3) /var/log/messages on the server node.
Sep 20 05:03:31 fnpcsrv1 kernel: rpc-srv/tcp: nfsd: sent only -107 bytes of
132- shutting down socket
Sep 20 05:03:32 fnpcsrv1 kernel: rpc-srv/tcp: nfsd: sent only -107 bytes of
140- shutting down socket


We believe these are associated with the two nodes running
the 2.4.21-37 kernel because when we shut them down, these errors stop.
NFS mailing list archives suggested to increase the number of NFSD,
we have increased to 64, these errors continue. NFS mailing list
says root cause is that the nfsd tries to do a sendto and gets a
ENOTCONN error from the client.

------------------------------------------------------------------
Things we have done so far:

a) Increased TCP buffering on the server and the two busiest clients with the
following tunes:


# increase Linux TCP buffer limits
net.core.rmem_max = 8388608
net.core.wmem_max = 8388608

# increase Linux autotuning TCP buffer limits
net.ipv4.tcp_rmem = 32768 87380 8388608
net.ipv4.tcp_wmem = 16384 65536 8388608
net.ipv4.tcp_mem = 8388608 8388608 8388608

b) Upgraded the two most busy clients to kernel 2.4.21-37.

To date:


1) The "Failed to mount" errors have continued when the nfs server is at high
load.

2) There have been no repeats of the tcp_data_ready sequence
There have been some instances of the nfs_safe_remove error, but only one such
error for any one given file, and it seems like the file does eventually get
removed on the retry.
Some processes still hang, as described above.

2a) No stale nfs file handles as yet, but we keep looking.

3) rpc-srv/tcp messages continue at previous rate.

---------------------------------------------------------------

Contents of /etc/exports



[root@fnpcsrv1 nfs]# cat /etc/exports
/export/lsi_home
131.225.167.0/255.255.254.0(rw,insecure,insecure_locks,no_subtree_check,sync)
/export/lsi_stage
131.225.167.0/255.255.254.0(rw,insecure,insecure_locks,no_subtree_check,sync)
/export/products
131.225.167.0/255.255.254.0(rw,insecure,insecure_locks,no_subtree_check,sync)
/export/stage
131.225.167.0/255.255.254.0(rw,insecure,insecure_locks,no_subtree_check,sync

Sample entry from /var/lib/nfs/xtab

/export/lsi_stage
fnpc131.fnal.gov(rw,sync,wdelay,hide,nocrossmnt,insecure,root_squash,no_all_squash,no_subtree_check,insecure_locks,acl,mapping=identity,anonuid=-2,anongid=-2)

Mount options on client:

/home /etc/auto.home
-rw,bg,hard,tcp,timeo=15,retrans=8,intr,rsize=8192,wsize=8192 0 0

-------------------------------------------------------------------
All default nfs and nfsd proc entries.


[root@fnpcsrv1 nfs]# cat nfs3_acl_max_entries
1024
[root@fnpcsrv1 nfs]# cat nlm_grace_period
0
[root@fnpcsrv1 nfs]# cat nlm_tcpport
0
[root@fnpcsrv1 nfs]# cat nlm_timeout
10
[root@fnpcsrv1 nfs]# cat nlm_udpport
0
[root@fnpcsrv1 nfs]# pwd
/proc/sys/fs/nfs
[root@fnpcsrv1 nfs]# [root@fnpcsrv1 nfsd]# cat nfsd3_acl_max_entries
1024
[root@fnpcsrv1 nfsd]# pwd
/proc/sys/fs/nfsd
[root@fnpcsrv1 nfsd]#

-----------------------------
Things we are in the process of doing, that can be done live without reboot.

1) Change /etc/sysctl.conf again Add:
net.core.wmem_default = 524288
net.core.rmem_default = 524288
net.ipv4.tcp_window_scaling = 1
can do this live by changing /etc/sysctl.conf
sysctl -p

2) Load 2.4.21-37.EL.XFS kernel, ready for reboot if it comes before downtime.
First on test fnpt124, then ready on fnpcsrv1.

3) Change mount options on a few clients, fngp-osg and fnpcg first,
then a few workers.
Hold a couple nodes to try auto.master changes.

Question--what mount options to change to on clients
Suggestion was remove bg, add noac add grpid, add rsize=32768, wsize=32768
4) change /etc/exports to async from sync?


Things we'll do at next reboot.

1) add command tag queueing to qla2300 driver on fnpcsrv1
options qla2300 ql2xmaxqdepth 256
2) add nfs/nfsd options to modules.conf
a) to all NFS clients, add
options nfs nfs3_acl_max_entries=256
b) to NFS servers also add
options nfsd nfsd3_acl_max_entries=256
3) reboot server into new kernel

5) add same sysctl.conf options to NFS clients
6) add good client mount options to NFS clients.




--
------------------------------------------------------------------
Steven C. Timm, Ph.D (630) 840-8525 [email protected]
http://home.fnal.gov/~timm/
Fermilab Computing Div/Core Support Services Dept./Scientific Computing Section
Assistant Group Leader, Farms and Clustered Systems Group
Lead of Computing Farms Team


2005-11-07 15:40:28

by Roger Heflin

[permalink] [raw]
Subject: RE: rpc-srv/tcp: nfsd: sent only -107 bytes (fwd)



> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Steven Timm
> Sent: Wednesday, November 02, 2005 2:06 PM
> To: [email protected]
> Subject: rpc-srv/tcp: nfsd: sent only -107 bytes (fwd)
>
>
>
> I am seeing repeated errors of rpc-srv/tcp: nfsd: sent only
> -107 bytes in the /var/log/messages of my machine. Full
> configuration info is below. Only suggestion I have seen
> thus far increase the number of nfsd that are running. we
> have done this, raising from 8 to 64, the problem persists.
> Are there any other suggestions that could help this problem?
>
> Thanks
>

I have only seen this problem with large numbers of NFS clients, given
your address I suspect that would be the issue.

What kernel are you running? I saw this issue on the 2.4 series, the
large 2.6 things we have built avoided using TCP given that we had
seen this issue.

There is a calculation someplace in nfs that determines how many of these
things exist, is is some base + so many per nfs thread. If you
search for message you have there are some posts on the NFS lists
about it that I made last year.

On the client side each separate mount against a server counts as
one, so if each client is mounting /opt and /home and /data and
you have 100 machines you need at least 300.

The solution that I came to was to use UDP mounts, as this limit
is not there. In the situation I had we would have had to change
the number of nfsd to 256 and even that was going to be close,
and the 256 caused some other failures. To not have the issue you
will need to use UDP mounts everywere if you have enough tcp mounts
to cause the error it will affect the udp mounts in a similar bad way.

We also could have changed the thread to resource count, but we had
some other process starvation issue with TCP that seemed to not be
duplicatable with UDP.

Roger
Atipa Technologies

2005-11-07 15:59:34

by Steven Timm

[permalink] [raw]
Subject: RE: rpc-srv/tcp: nfsd: sent only -107 bytes (fwd)



On Mon, 7 Nov 2005, Roger Heflin wrote:
>>
>> I am seeing repeated errors of rpc-srv/tcp: nfsd: sent only
>> -107 bytes in the /var/log/messages of my machine. Full
>> configuration info is below. Only suggestion I have seen
>> thus far increase the number of nfsd that are running. we
>> have done this, raising from 8 to 64, the problem persists.
>> Are there any other suggestions that could help this problem?
>>
>> Thanks
>>
>
> I have only seen this problem with large numbers of NFS clients, given
> your address I suspect that would be the issue.

wc -l on /var/lib/nfs/rmtab gives 744 lines. There are 203 client
machines, each mounting on the average of 3 mounts, but obviously
some mount a few more. Most of the mounts are done via automount, which
as you can guess is timing out on the average of 10-12 attempts in
the course of the day.
>
> What kernel are you running? I saw this issue on the 2.4 series, the
> large 2.6 things we have built avoided using TCP given that we had
> seen this issue.

Currently running 2.4.21-32.0.1 as put out by RedHat with XFS file
system patches. will upgrade to 2.4.21-37 later this week.
>
> There is a calculation someplace in nfs that determines how many of these
> things exist, is is some base + so many per nfs thread. If you
> search for message you have there are some posts on the NFS lists
> about it that I made last year.
>
> On the client side each separate mount against a server counts as
> one, so if each client is mounting /opt and /home and /data and
> you have 100 machines you need at least 300.
>
Are you suggesting in such a scenario we would need 300 nfsd?

> The solution that I came to was to use UDP mounts, as this limit
> is not there. In the situation I had we would have had to change
> the number of nfsd to 256 and even that was going to be close,
> and the 256 caused some other failures. To not have the issue you
> will need to use UDP mounts everywere if you have enough tcp mounts
> to cause the error it will affect the udp mounts in a similar bad way.
>
> We also could have changed the thread to resource count, but we had
> some other process starvation issue with TCP that seemed to not be
> duplicatable with UDP.

We had changed from UDP to TCP due to our network's propensity to drop
UDP packets. The same mount options we are using now against a Linux
server were working fine when our NFS server was a much-less-powerful SGI
IRIX server. No doubt, as you say, that switching back to UDP would
make this rpc-srv/tcp message go away.

Steve Timm



>
> Roger
> Atipa Technologies
>
>

--
------------------------------------------------------------------
Steven C. Timm, Ph.D (630) 840-8525 [email protected] http://home.fnal.gov/~timm/
Fermilab Computing Div/Core Support Services Dept./Scientific Computing Section
Assistant Group Leader, Farms and Clustered Systems Group
Lead of Computing Farms Team

2005-11-07 16:34:44

by Roger Heflin

[permalink] [raw]
Subject: RE: rpc-srv/tcp: nfsd: sent only -107 bytes (fwd)



> > What kernel are you running? I saw this issue on the 2.4
> series, the
> > large 2.6 things we have built avoided using TCP given that we had
> > seen this issue.
>
> Currently running 2.4.21-32.0.1 as put out by RedHat with XFS
> file system patches. will upgrade to 2.4.21-37 later this week.

I had the problem with a RHEL3 kernel, given what the base issue is,
I don't believe any later updates will change things much as it is not
a bug so much as a basic design problem.

If you have a few more nodes/mounts/activity then things get uglier. It
does
expire "unused" ones when new ones are needed, but the code seems to
thrash quite badly when too many need to be used. A few more nodes
and the server will go into timeout mode where it won't respond to
requests (tcp or udp) for long periods of time
(minutes), and then will finally recover. It is pretty ugly.

The code is in net/sunrpc/svcsock.c and looks something like this:
(sv_nrthreads+3)*10 Given that you need 700 you would need at least
70 threads, but things still get funny below that. You could try changing
the *10 to be a larger number, and I don't know that the later versions
of RH still use *10 they may use something smaller. You could also change
the number of threads up a bit more, but at some point the number of
threads seems to cause resource issues also. 256 on the machines we were
testing was causing some other modules loads to fail, that did not fail
with nfsd set smalller. I have not tried change the *10 to something
larger, it may or may not work.

>From what I can tell, basically the reuse code does not work the
best (or maybe at all) with large groups of machines being heavily
active.

> >
> Are you suggesting in such a scenario we would need 300 nfsd?

Not 300. The situation I had was more machines, and more mounts
per machine.

You can watch the current count with netstat and counting the number of
nfs connections, when things hit the limit you will start getting
the message, and things will start getting messy. Things get worse
if you have open executables on the machines across nfs as these have
a higher utilization of the resource, and cause more thrashing.

I could duplicate the error by doing a "fornodes ls -l ..." with it listing
the top level of each nfs mount.

>
> > The solution that I came to was to use UDP mounts, as this limit
> > is not there. In the situation I had we would have had to change
> > the number of nfsd to 256 and even that was going to be
> close, and the
> > 256 caused some other failures. To not have the issue you
> will need
> > to use UDP mounts everywere if you have enough tcp mounts
> to cause the
> > error it will affect the udp mounts in a similar bad way.
> >
> > We also could have changed the thread to resource count, but we had
> > some other process starvation issue with TCP that seemed to not be
> > duplicatable with UDP.
>
> We had changed from UDP to TCP due to our network's
> propensity to drop UDP packets. The same mount options we
> are using now against a Linux server were working fine when
> our NFS server was a much-less-powerful SGI IRIX server. No
> doubt, as you say, that switching back to UDP would make this
> rpc-srv/tcp message go away.
>
> Steve Timm


The SGI implements nfs differently. Were you using tcp against
the SGI?

No way to fix the network to stop the dropping udp packets?


Roger