2004-06-24 09:54:04

by Jaco Kroon

[permalink] [raw]
Subject: NFS lockups+empty directories

Hello there

I'm having 2 problems with nfs atm, I'll first describe the problem of
the mysterious deadlocks.

On the client that has deadlocked I see these (tcpdump host servername -n)
11:37:31.272353 IP 137.215.40.16.3750658894 > 137.215.98.25.2049: 140
read [|nfs]
11:37:32.671933 IP 137.215.40.16.3717104462 > 137.215.98.25.2049: 140
read [|nfs]
11:37:33.371732 IP 137.215.40.16.3733881678 > 137.215.98.25.2049: 140
read [|nfs]

On the server I get lots and lots of these (tcpdump host clientname -n)
11:39:23.239369 IP 137.215.40.16.3733881678 > 137.215.98.25.2049: 140
read [|nfs]
11:37:03.280950 IP 137.215.98.25.2049 > 137.215.40.16.3717104462: reply
ok 1472 read [|nfs]
11:37:03.280965 IP 137.215.98.25 > 137.215.40.16: udp
11:37:03.280975 IP 137.215.98.25 > 137.215.40.16: udp
11:37:03.280986 IP 137.215.98.25 > 137.215.40.16: udp
11:37:03.280997 IP 137.215.98.25 > 137.215.40.16: udp
11:37:03.281008 IP 137.215.98.25 > 137.215.40.16: udp

The port numbers are *obviously* wrong, but I suspect this is a tcpdump
problem. What worries me is that it doesn't look like the packets from
the server reaches the client. Now unless there is someone playing
around on the routers (which might well be the case) or the routers is
just messed up, then this means that there is a problem in the
networking code - unlikely since all other services are still
functioning, or I'm doing something wrong. My fstab line looks like:

phugeet.cs.up.ac.za:/home/ /home/phugeet nfs defaults,sync 0 0

Whilst my exports file contains:

/home 137.215.40.19(no_root_squash,rw,sync)

My question then is this - why can't tcpdump get the port numbers on
those latter packets? Then also, shouldn't the source port numbers for
these also be 2049? Once this starts happening I can't fix anything
without rebooting - or I haven't figured out how to yet. Also, I cannot
umount the mount, not even with -f (k, correction, at least doing a
umount -f causes all "deadlocked" processes to be shot down, after which
a second umount -f actually umounts the mount).

Right, now for the disappearing directory entries:

(client):
oasis home # mount
/dev/rd/host0/target0/part1 on / type ext3 (rw,noatime)
none on /dev type devfs (rw)
none on /proc type proc (rw)
none on /sys type sysfs (rw)
none on /dev/pts type devpts (rw)
/dev/rd/host0/target0/part3 on /usr type reiserfs (rw,noatime)
/dev/rd/host0/target0/part5 on /tmp type reiserfs (rw,noatime)
/dev/rd/host0/target0/part6 on /var type reiserfs (rw,noatime)
/dev/rd/host0/target0/part7 on /home type reiserfs (rw,noatime)
none on /dev/shm type tmpfs (rw)
phugeet.cs.up.ac.za:/home/ on /home/phugeet type nfs
(rw,sync,addr=137.215.98.25)
oasis home # ll /home/phugeet/courses
total 0

(server):
phugeet home # mount
/dev/hda2 on / type ext3 (rw,noatime)
none on /proc type proc (rw)
none on /sys type sysfs (rw)
none on /dev/pts type devpts (rw)
none on /dev/shm type tmpfs (rw)
none on /proc/bus/usb type usbfs (rw)
phugeet home # ls /home/courses
ACE101 CBD780 COS151 COS226 COS332 DCP780
HONS2003 KVM780 RNW780 Studymaterial index.html
ACM CEN101 COS160 COS283 COS333 EPE121
HONS2004 MIT853 RNW781 UPUV lib
AIP780 COS110 COS213 COS284 COS341 EPE210
INY272 MScPhD SEC780 VRS780 post
ALGON COS130 COS214 COS301 COS343 GPG780
INY326 PGT780 SEC781 WCJ results.php
Announcements COS130-EPE111 COS221 COS314 COS344 GRF780
IRA310 PIN780 SWA780-ERA780 ZZZ111 temp
BTI720 COS130SS COS222-ERB210 COS324 COS444 HONS2002
KMI780 PIN781 SWC780-ERD732 change_filenames.txt visitors.php

The options are the same as above.

Any help, or any other pointers will be much appreciated thanks.

Jaco

--
#ifdef STUPIDLY_TRUST_BROKEN_PCMD_ENA_BIT
2.4.0-test2 /usr/src/linux/drivers/ide/cmd640.c
===========================================
This message and attachments are subject to a disclaimer. Please refer to http://www.it.up.ac.za/documentation/governance/disclaimer/ for full details.
Hierdie boodskap en aanhangsels is aan 'n vrywaringsklousule onderhewig. Volledige besonderhede is by http://www.it.up.ac.za/documentation/governance/disclaimer/ beskikbaar.
===========================================


Attachments:
smime.p7s (3.10 kB)
S/MIME Cryptographic Signature

2004-06-24 20:01:36

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS lockups+empty directories

P? to , 24/06/2004 klokka 05:53, skreiv Jaco Kroon:
> Hello there
>
> I'm having 2 problems with nfs atm, I'll first describe the problem of
> the mysterious deadlocks.
>
> On the client that has deadlocked I see these (tcpdump host servername -n)
> 11:37:31.272353 IP 137.215.40.16.3750658894 > 137.215.98.25.2049: 140
> read [|nfs]
> 11:37:32.671933 IP 137.215.40.16.3717104462 > 137.215.98.25.2049: 140
> read [|nfs]
> 11:37:33.371732 IP 137.215.40.16.3733881678 > 137.215.98.25.2049: 140
> read [|nfs]
>
> On the server I get lots and lots of these (tcpdump host clientname -n)
> 11:39:23.239369 IP 137.215.40.16.3733881678 > 137.215.98.25.2049: 140
> read [|nfs]
> 11:37:03.280950 IP 137.215.98.25.2049 > 137.215.40.16.3717104462: reply
> ok 1472 read [|nfs]
> 11:37:03.280965 IP 137.215.98.25 > 137.215.40.16: udp
> 11:37:03.280975 IP 137.215.98.25 > 137.215.40.16: udp
> 11:37:03.280986 IP 137.215.98.25 > 137.215.40.16: udp
> 11:37:03.280997 IP 137.215.98.25 > 137.215.40.16: udp
> 11:37:03.281008 IP 137.215.98.25 > 137.215.40.16: udp
>
> The port numbers are *obviously* wrong, but I suspect this is a tcpdump
> problem. What worries me is that it doesn't look like the packets from
> the server reaches the client.

You are using UDP: There are never any guarantees there that the packets
from the server will actually reach the client. Now please read the
sections in the NFS FAQ and HOWTOs about this sort of problem:

http://nfs.sourceforge.net/

Cheers,
Trond