2008-08-22 21:57:01

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS regression? Odd delays and lockups accessing an NFS export.

On Fri, 2008-08-22 at 22:37 +0100, Ian Campbell wrote:
> I can ssh to the server fine. The same server also serves my NFS home
> directory to the box I'm writing this from and I've not seen any trouble
> with this box at all, it's a 2.6.18-xen box.

OK... Are you able to reproduce the problem reliably?

If so, can you provide me with a binary tcpdump or wireshark dump? If
using tcpdump, then please use something like

tcpdump -w /tmp/dump.out -s 90000 host myserver.foo.bar and port 2049

Please also try to provide a netstat dump of the current TCP connections
as soon as the hang occurs:

netstat -t

Cheers
Trond



2008-08-22 22:41:51

by Ian Campbell

[permalink] [raw]
Subject: Re: NFS regression? Odd delays and lockups accessing an NFS export.

On Fri, 2008-08-22 at 14:56 -0700, Trond Myklebust wrote:
> On Fri, 2008-08-22 at 22:37 +0100, Ian Campbell wrote:
> > I can ssh to the server fine. The same server also serves my NFS home
> > directory to the box I'm writing this from and I've not seen any trouble
> > with this box at all, it's a 2.6.18-xen box.
>
> OK... Are you able to reproduce the problem reliably?

It usually happens in around a day, but I can't make it happen at will
so that I can arrange to be present at the time. It has usually locked
up over night in the past.

> If so, can you provide me with a binary tcpdump or wireshark dump? If
> using tcpdump, then please use something like
>
> tcpdump -w /tmp/dump.out -s 90000 host myserver.foo.bar and port 2049

I'll try leaving this going overnight but using -C and -W to limit the
size to the disk space available.

> Please also try to provide a netstat dump of the current TCP connections
> as soon as the hang occurs:
>
> netstat -t

Will do it ASAP after it happens.

Ian.
--
Ian Campbell

revision 1.17.2.7
date: 2001/05/31 21:32:44; author: branden; state: Exp; lines: +1 -1
ARRRRGH!! GOT THE G** D*** SENSE OF A F******* TEST BACKWARDS!


Attachments:
signature.asc (197.00 B)
This is a digitally signed message part

2008-08-24 18:53:01

by Ian Campbell

[permalink] [raw]
Subject: Re: NFS regression? Odd delays and lockups accessing an NFS export.

On Fri, 2008-08-22 at 14:56 -0700, Trond Myklebust wrote:
> On Fri, 2008-08-22 at 22:37 +0100, Ian Campbell wrote:
> > I can ssh to the server fine. The same server also serves my NFS home
> > directory to the box I'm writing this from and I've not seen any trouble
> > with this box at all, it's a 2.6.18-xen box.
>
> OK... Are you able to reproduce the problem reliably?
>
> If so, can you provide me with a binary tcpdump or wireshark dump? If
> using tcpdump, then please use something like
>
> tcpdump -w /tmp/dump.out -s 90000 host myserver.foo.bar and port 2049
>
> Please also try to provide a netstat dump of the current TCP connections
> as soon as the hang occurs:
>
> netstat -t

Aug 24 18:08:59 iranon kernel: [168839.556017] nfs: server hopkins not responding, still trying
but I wasn't around until 19:38 to spot it.

netstat when I got to it was:

Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 localhost.localdo:50891 localhost.localdom:6543 ESTABLISHED
tcp 1 0 iranon.hellion.org.:ssh azathoth.hellion.:52682 CLOSE_WAIT
tcp 0 0 localhost.localdom:6543 localhost.localdo:50893 ESTABLISHED
tcp 0 0 iranon.hellion.org.:837 hopkins.hellion.org:nfs FIN_WAIT2
tcp 0 0 localhost.localdom:6543 localhost.localdo:41831 ESTABLISHED
tcp 0 0 localhost.localdo:13666 localhost.localdo:59482 ESTABLISHED
tcp 0 0 localhost.localdo:34288 localhost.localdom:6545 ESTABLISHED
tcp 0 0 iranon.hellion.org.:ssh azathoth.hellion.:48977 ESTABLISHED
tcp 0 0 iranon.hellion.org.:ssh azathoth.hellion.:52683 ESTABLISHED
tcp 0 0 localhost.localdom:6545 localhost.localdo:34288 ESTABLISHED
tcp 0 0 localhost.localdom:6543 localhost.localdo:50891 ESTABLISHED
tcp 0 0 localhost.localdo:50893 localhost.localdom:6543 ESTABLISHED
tcp 0 0 localhost.localdo:41831 localhost.localdom:6543 ESTABLISHED
tcp 0 87 localhost.localdo:59482 localhost.localdo:13666 ESTABLISHED
tcp 1 0 localhost.localdom:6543 localhost.localdo:41830 CLOSE_WAIT

(iranon is the problematic host .4, azathoth is my desktop machine .5, hopkins is the NFS server .6)

tcpdumps are pretty big. I've attached the last 100 packets captured. If
you need more I can put the full file up somewhere.

-rw-r--r-- 1 root root 1.3G Aug 24 17:57 dump.out0
-rw-r--r-- 1 root root 536M Aug 24 19:38 dump.out1

Ian.

--
Ian Campbell

Prizes are for children.
-- Charles Ives, upon being given, but refusing, the
Pulitzer prize


Attachments:
last100.dump.bz2 (105.23 kB)
signature.asc (197.00 B)
This is a digitally signed message part
Download all attachments

2008-08-25 20:24:07

by Grant Coady

[permalink] [raw]
Subject: Re: NFS regression? Odd delays and lockups accessing an NFS export.

On Fri, 22 Aug 2008 14:56:53 -0700, Trond Myklebust <[email protected]> wrote:

>On Fri, 2008-08-22 at 22:37 +0100, Ian Campbell wrote:
>> I can ssh to the server fine. The same server also serves my NFS home
>> directory to the box I'm writing this from and I've not seen any trouble
>> with this box at all, it's a 2.6.18-xen box.
>
>OK... Are you able to reproduce the problem reliably?
>
>If so, can you provide me with a binary tcpdump or wireshark dump? If
>using tcpdump, then please use something like
>
> tcpdump -w /tmp/dump.out -s 90000 host myserver.foo.bar and port 2049
^^^^^^^^--> typo?

man tcpdump:
-s Snarf snaplen bytes of data from each packet rather than the default of
68 (with SunOS's NIT, the minimum is actually 96). 68 bytes is adequate


I've reverted the NFS server to 2.6.24.7 -- inconclusive results for me
'cos NFS stalls seem so random.

Grant.