-----Original Message-----
From: [email protected]
[mailto:[email protected]]
Sent: Wednesday, March 19, 2003 2:02 PM
To: [email protected]
Subject: NFS digest, Vol 1 #1365 - 4 msgs
Send NFS mailing list submissions to
[email protected]
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.sourceforge.net/lists/listinfo/nfs
or, via email, send a message with subject or body 'help' to
[email protected]
You can reach the person managing the list at
[email protected]
When replying, please edit your Subject line so it is more specific
than "Re: Contents of NFS digest..."
Today's Topics:
1. NFS and lock problems (Glover George)
2. Mortgage Rate Alert (take action now) (Sal Parrish)
3. NFSD Flow Control Using the TCP Transport (Steve Dickson)
4. NFS problems (kernel locks up) (Kresimir Kukulj)
--__--__--
Message: 1
From: Glover George <[email protected]>
To: [email protected]
Date: 18 Mar 2003 16:00:37 -0600
Subject: [NFS] NFS and lock problems
Hello all, please bare with me as I figure this has probably been asked
a million times, but I can't find anything like what I'm looking for.
I have multiple clients and a single server. The server I am running
redhat 8.0 on, and nfs versions that came with it. I have a fairly
simple setup. On the client machines, iptables is set to drop
everything, except it allows all outgoing requests and only allows
incoming ssh. On the server it is the same, except for I am allowing in
a range of ports for nfs.
$IPTABLES -A INPUT -p tcp -s 131.95.190.0/24 --dport 32765:32768 -j
ACCEPT
$IPTABLES -A INPUT -p udp -s 131.95.190.0/24 --dport 32765:32768 -j
ACCEPT
... as per the NFS HOWTO.
I am starting in the startup scripts the following:
daemon rpc.mountd -p 32767 $RPCMOUNTDOPTS
daemon rpc.statd -p 32765 -o 32766
Also, in /etc/modules.conf for lockd i have the following:
options lockd nlm_udpport=32768 nlm_tcpport=32768
Anything else I may be missing, I'll gladly supply to you. Let me get
to the problem and the questions. I am having problems it seems with
locking. When users try to log in with the gnome desktop they get error
messages compaining about nfslockd possibly not running on the server.
However it is. Everything as far as nfs is concerned seem to be fine.
Just to be more verbose, here is the output of rpcinfo -p on server and
client respectively.
#server
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
391002 2 tcp 32768 sgi_fam
100011 1 udp 744 rquotad
100011 2 udp 744 rquotad
100011 1 tcp 747 rquotad
100011 2 tcp 747 rquotad
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100021 1 udp 32768 nlockmgr
100021 3 udp 32768 nlockmgr
100021 4 udp 32768 nlockmgr
100005 1 udp 32767 mountd
100005 1 tcp 32767 mountd
100005 2 udp 32767 mountd
100005 2 tcp 32767 mountd
100005 3 udp 32767 mountd
100005 3 tcp 32767 mountd
100024 1 udp 32765 status
100024 1 tcp 32765 status
#client
[root@black root]# rpcinfo -p
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
391002 2 tcp 32769 sgi_fam
100021 1 udp 32775 nlockmgr
100021 3 udp 32775 nlockmgr
100021 4 udp 32775 nlockmgr
100024 1 udp 32778 status
100024 1 tcp 33409 status
100011 1 udp 1022 rquotad
100011 2 udp 1022 rquotad
100011 1 tcp 601 rquotad
100011 2 tcp 601 rquotad
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100005 1 udp 32779 mountd
100005 1 tcp 33410 mountd
100005 2 udp 32779 mountd
100005 2 tcp 33410 mountd
100005 3 udp 32779 mountd
100005 3 tcp 33410 mountd
Does this look ok for locking? Is there someway i can verify if file
locking is working ok (I only assume that it's not because of that
message, I have no idea how i would test this for sure). Do i have to
bind the daemons to a specific port on the client as well as the
server? Do I have to allow initiated connections to any of these
daemons from the server to the client? I mean, turning off iptables
completely on the clients doesn't help anyway. Still the same errors.
Are my options to /etc/modules correct? I noticed on the server side
that sgi_fam and nfslockmgr are both running on the same port. Is this
ok? If not, how do i tell sgi_fam to move to a different port.
One last thing, I haven't implemented this, but just if someone wants to
pipe in, on the server side i want to run rpc.rquotad on a specific
port. how do i do this with redhat's packages, as i can't use a -p
option.
I know I'm probably bugging a lot, but I have no idea what's going on
here, as everything has always worked for me when i simply use KDE, but
this problem is plaguing me. I'd like to know if maybe i've done
something wrong or just have the whole wrong idea about it.
Much thanks in advance.
--
Glover George
Systems Administrator
High Performance Visualization Lab
University of Southern Mississippi
[email protected]
(601) 266-5634
--__--__--
Message: 2
From: "Sal Parrish" <[email protected]>
To: <[email protected]>, <[email protected]>,
<[email protected]>, <[email protected]>
Date: Tue, 18 Mar 03 20:07:54 GMT
Subject: [NFS] Mortgage Rate Alert (take action now)
This is a multi-part message in MIME format.
--9_B_1C.FFE
Content-Type: text/html
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Untitled Document</title>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859=
-1">
</head>
<body alink=3D"#0066FF" vlink=3D"#0066FF">
<table align=3D"center" width=3D"622" cellspacing=3D"12" cellpadding=3D"5"=
>
<tr>
<td align=3D"center">
<font color=3D"#990000"face=3D"arial"
size=3D"6"><b>Finding the best ra=
tes for a new home
loan or refinancing an old one can be a daunting
task</b></font>
</td>
</tr>
</table>
<table align=3D"center" width=3D"622" cellspacing=3D"12" cellpadding=3D"5"=
>
<tr>
<td align=3D"center">
<font size=3D"5"><b><a
href=3D"http://www.e-bestfinancerate.com/mortgag=
e/index.asp?Afft=3DQM43">It doesn't have to be.</a></b></font>
</td>
</tr>
<tr>
<td align=3D"center">
<font face=3D"Arial">We do the work for you. By
submitting
your information across to hundreds of lenders, we
can get you the
best interest rates around. <BR><BR>
Interest rates are lower than they have been in over
40 years, but
it won't stay that way for long. Our simple form
only takes a few mome=
nts,
there is absolutly <b>NO OBLIGATION</b>, and it's
<b>100% FREE</b>. Yo=
u have nothing
to lose, and everything to gain. </font>
</td>
</tr>
</table>
<table align=3D"center" width=3D"622" cellspacing=3D"12" cellpadding=3D"5"=
>
<tr>
<td align=3D"center">
<a
href=3D"http://www.e-bestfinancerate.com/mortgage/index.asp?Afft=3DQ=
M43"><b><font face=3D"arial" size=3D"6">Let us start working for YOU!</fon=
t></b></a>
</td>
</tr>
</table>
<table align=3D"center" width=3D"622" cellspacing=3D"12" cellpadding=3D"5"=
>
<tr>
<td>
<font face=3D"Arial, Helvetica, sans-serif"
size=3D1><p align=3Dcenter>=
Please know that we do not want to send you
information regarding our
special offers if you do not wish to receive it.
If you<br> would no longer like us to contact you or
feel that you have=
received
this email in error, please
<a
href=3D"http://www.e-bestfinancerate.com/Automatic/index.htm"><font =
color=3D"#0000ee">
click here to unsubscribe</a>.</font>
</td>
</tr>
</table>
</body>
</html>
vowxskyyllufeiaqy lq pwaya
--9_B_1C.FFE--
--__--__--
Message: 3
Date: Wed, 19 Mar 2003 10:05:15 -0500
From: Steve Dickson <[email protected]>
To: [email protected]
Subject: [NFS] NFSD Flow Control Using the TCP Transport
Hello,
There seems to be some issues (probably known) with the flow control
over TCP connections (on an SMP machine) to NFSD. Unfortunately,
the fstress benchmark brings these issues out fairly nicely :-(
This is occurring in a 2.4.20 kernel.
When fstress starts it's stress tests, svc_tcp_sendto() immediately
starts failing
with -EGAINs. Initially, this caused an oops because svc_delete_socket()
was being called twice for the same socket [ which was easily fixed by
checking
for the SK_DEAD bit in svsk->sk_flags], but now the tests just fail.
The problem seems to stem from the fact that the queued memory in
the TCP send buffer (i.e. sk->wmem_queued) is not being released ( i.e
tcp_wspace(sk) becomes negative and never recovers).
Here is what's (appears to be) happening:
Fstress opens one TCP connection and then start sending
multiple nfs ops with different fhandles . The problems start when
a nfs op, with a large responses (like a read), gets 'stuck' in the nfs code
for a few microseconds and in the meantime other nfs ops, with smaller
responses are being processed. With every smaller response, the
sk->wmem_queued value is incremented. Now when the 'stuck' nfs read
tries to send its responses the send buffer is full (i.e.
tcp_memory_free(sk)
in tcp_sendmsg() fails) and after a 30 second sleep (in tcp_sendmsg())
-EAGAIN is returned and the show is over.....
I _guess_ what is suppose to happen is that the queued memory will be
freed (or reclaimed) when a socket buffer is freed (via kfree_skb()).
Which in turn causes the threads waiting for memory (i.e. sleeping
in tcp_sendmsg()) to be woke up via a call to sk->write_space().
But this does not seem to be happening even when the smaller
replies are processed....
Can anyone shed some light on what the heck is going on here
and if there are any patches or solutions or ideas addressing this
problem.
TIA,
SteveD.
--__--__--
Message: 4
Date: Wed, 19 Mar 2003 19:22:41 +0100
From: Kresimir Kukulj <[email protected]>
To: [email protected]
Subject: [NFS] NFS problems (kernel locks up)
Hi
We are trying to assess if linux could perform as a NFS server to linux
client(s). In our test we moved part of mailboxes of a freemail service
(after some initial testing) to a NFS storage (linux NFS server). It worked
ok, and used very little resources. But, during the nightly backup, NFS
server crashed. Symptoms were that:
1. client detected that NFS server is not responding
2. NFS server responded to ping, but you could not log in to it. Every
attempt to log-in stopped at TCP connection being established, but
daemon did not respond (I presume, that at that particular moment
TCP/IP stack was still working).
3. After cca 10 minutes, it locks up (not ping-able).
4. I have serial console attached to the server, and kernel did not
respond to SYS-REQ.
5. After turning off the power and then back on, server booted, and
resumed its function.
This happened three times, every time during the backup (Networker),
sometimes only 5 minutes after backup started, sometimes after 1.5 hours.
This was all using 2.4.20 kernel (no extra patches), using NFSv3, udp,
async.
NFS client was using: rw,hard,intr,udp,rsize=8192,wsize=8192,nodev,nosuid
NFS server used: rw,no_root_squash (default is async).
Then, I have put 2.4.21-pre5 because it contained some NFS fixes. After
that, server survived three days (2 incrementals and one full backup
completed successfully). Then it crashed during the day for no apparent
reason (we have the server monitored with 'cricket', and there were no
unusual activities...).
I have changed to NFSv2,sync,udp and it crashed during the backup that
night,
and then again during the day. This resulted with filesystem corruption
(replaying the ext3 journal caused fsck to be invoked - couple of hours was
wasted on checking).
Now I have reverted back to NFSv3,udp, but kept 'sync'. I will see tonight
will it survive or not.
Filesystem is 99Gb ext3 partition, with 1024 block size, internal journal.
That fs is 50% full, and contains around 290000 files (13.7% fragmentation).
Files are between few kilobytes up to 10 Mb.
Normal filesystem usage is ~200kb read, 300Kb write per second with < 5%
disk utilization. When backup runs, reading gets ~ 5Mb/sec with disk
utilization of ~ 100%.
Client and server are connected to the same switch, with no dropped packets.
We are satisfied with performance (while the server works).
Can anybody give a suggestion ? I have tried everything I can think of.
We would like to use linux as a NFS server, but if this does not work, we
will be forced to consider alternatives like Solaris x86.
Can anyone here suggest a good alternative NFS server OS (for x86) with a
good support for SCSI HW RAID controllers ? ICP Vortex unfortunately is
not supported under Solaris x86, but what other controllers (let's say for
Solaris x86) do you reccommend ?
Also, I am concerned about filesystem. Will ext3 be able to handle, let's
say, 10 million files ? If not, will Solaris x86 UFS be any better.
[ For us, reiser proved to be sometimes difficult, and we had couple of fs
related crashes, so we are trying to find alternatives. Filesystem check
on that amount of files is measured in days. ]
Some info about hardware:
Dell PowerApp 200 with 2 x Pentium III (Coppermine), each 1GHz.
1Gb memory, with CONFIG_HIGHMEM4G=y.
eepro100 ethernet
ServerWorks chipset but nothing except CDROM is connected to it.
ICP Vortex Hardware RAID model GDT8523RZ
Driver for this (SCSI) controller is from 2.4.20 kernel (its pretty new).
5 FUJITSU MAJ3364MC 34Gb drives in RAID5 (4+hotfix).
Filesystem is ext3 with journal=ordered.
Kernel is vanilla 2.4.20, and 2.4.21-pre5.
I can provide 'dmesg' and '.config' for that kernel.
Distribution is Debian stable 3.0.
These packages are installed:
ii nfs-common 1.0-2 NFS support files common
to client and server
ii nfs-kernel-server 1.0-2 Kernel NFS server
support
NFS server and client use fixed ports as described at NFS-Howto:
Kernel command line: root=/dev/sda2 lockd.udpport=32768 \
lockd.tcpport=32768 console=tty0 console=ttyS0,9600
statd, mountd are fixed as well, and iptables are configured to pass
fragmented packets. By default, NFS server runs with 8 kernel threads
(knfsd). According to /proc/net/rpc/nfsd there is no need for more kernel
threads.
Services that run on NFS client are POP3 and SMTP daemons and a web based
frontend that uses them. Both daemons are configured to use their version of
dot locking (as recommended).
Thanks.
--
Kresimir Kukulj
Iskon Internet d.d.
ISS
Savska 41/X.
10000 Zagreb
--__--__--
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
End of NFS Digest
-------------------------------------------------------
This SF.net email is sponsored by: Does your code think in ink?
You could win a Tablet PC. Get a free Tablet PC hat just for playing.
What are you waiting for?
http://ads.sourceforge.net/cgi-bin/redirect.pl?micr5043en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs