2013-09-05 17:18:53

by Emmanuel Florac

[permalink] [raw]
Subject: Hard to debug NFS loss of connectivity


Hi list, I have a serious problem I've never met before. Here is the
setup:

The NFS server is running Debian 6 amd64 but with a plain vanilla 3.2.50
kernel. it shares a large 81 TB volume (XFS over LVM on hardware RAID6)
through nfs without any particular options. Here is a glimpse
of /etc/exports:

/mnt/raid 10.1.1.0/255.255.255.128(fsid=1,rw,no_root_squash,async,no_subtree_check)

On the other side is a VMWare ESX VM running Ubuntu 12.04LTS, kernel 3.2.0-52 Ubuntu
amd64 mounting the share. From the fstab:

10.1.1.99:/mnt/raid /server nfs rw,hard,intr 0 0

The problem is as follow: stat'ing files on the VM makes the
NFS connection drop. For instance:

find /server -type f -ls

It works for a while, then stops responding. The NFS mount is frozen.
The network link is OK; I still can ssh from the server to the VM
and back, I can wget from the VM to the server, ping the server
from the VM, etc. Only NFS is affected.

Restarting NFS on the server does nothing to unfreeze the mount.
Using nfs4 instead of nfs3 does nothing. The only remedy is to reboot the VM.
There isn't any error in dmesg, /var/log/syslog or
/var/log/messages in the VM nor the server.

I've tried rebooting the server on a 3.9.7 kernel. Same thing.
Of course there isn't any data corruption of any sort.
Running "find /mnt/raid -type f -ls" on the server works
perfectly and lists about 25000 files without the slightest trouble.

It works equally well if I mount the NFS share on the server itself.


Now it's becoming crazier: When I run the find command as
previously said, it freezes always on the same file, for
instance :

/server/folder1/folder2/folder3/folder4/.svn/somefile

However, if after a fresh reboot I do

stat /server/folder1/folder2/folder3/folder4/.svn/somefile

no problem. Even doing this:

cd /server/folder1/folder2/folder3/folder4/ && find . -type f -ls

works. However this

cd /server/folder1/folder2/folder3/ && find . -type f -ls

doesn't fly. It freezes at exactly the same point.
In the first test (running directly from /server) it
freezes after successfully listing 10000 files. In the last
test it freezes after only 25 files.
So apparently it's not about the number of files.


Now I'm stuck. Out of going through tcpdump, I have absolutely
not the faintest idea about what's going on, except I tend to
think that's some Ubuntu kernel bug.

Any hint, idea, etc would be extremely welcome. Even some
debugging method less painful than digging through huge
tcpdumps would be nice :)

--
------------------------------------------------------------------------
Emmanuel Florac | Direction technique
| Intellique
| <[email protected]>
| +33 1 78 94 84 02
------------------------------------------------------------------------


2013-09-06 16:55:14

by Emmanuel Florac

[permalink] [raw]
Subject: Re: Hard to debug NFS loss of connectivity

Le Fri, 6 Sep 2013 12:07:35 -0400
"J. Bruce Fields" <[email protected]> ?crivait:

> Weird. Things look normal up through frame 14, which is a READDIRPLUS
> reply. Then the server resends the reply after .2s, and and the
> client resends its call shortly thereafter (but without acking the
> latest reply). And then the rest of the trace is resends of the
> reply.
>
> So it looks like the client stopped ACKing the server's replies?
>
> You may also have filtered out some TCP ACKs, which makes this harder
> to work out.

Ah yes, my bad, one TCP ACK was filtered out. Here I've kept trafic
between the two machines but ssh.

Here I was capturing from the server, maybe I should try capturing on
the client side?

--
------------------------------------------------------------------------
Emmanuel Florac | Direction technique
| Intellique
| <[email protected]>
| +33 1 78 94 84 02
------------------------------------------------------------------------


Attachments:
(No filename) (1.02 kB)
nfserror2.pcap (2.23 kB)
Download all attachments

2013-09-11 15:11:56

by Emmanuel Florac

[permalink] [raw]
Subject: Re: Hard to debug NFS loss of connectivity: problem solved

Le Thu, 5 Sep 2013 19:18:00 +0200
Emmanuel Florac <[email protected]> ?crivait:

> Any hint, idea, etc would be extremely welcome. Even some
> debugging method less painful than digging through huge
> tcpdumps would be nice :)

I post this answer in the faint hope that it may avoid a long and
painful week of testing to someone else :)

The problem comes from the virtual network adapter "Intel pro 1000"
using the e1000e linux driver. The Intel hardware supports a MTU of
4078 max. On a physical machine, it will fail to set up a greater value
with something like:

# ifconfig eth0 mtu 4079
SIOCSIFMTU: Invalid argument

However the virtual e1000 from VMWare ESX silently accept any value;
then it will mostly work, and fail in mysterious way and only with
certain protocols (such as nfs).

In the case of VMWare virtual machines, using a vmnet3 virtual network
adapter works fine with a mtu of 9000 under NFS. Therefore the problem
is solved.

For the sake of completeness and curiosity, I'll try and see what
happens with a KVM e1000 virtual device.

--
------------------------------------------------------------------------
Emmanuel Florac | Direction technique
| Intellique
| <[email protected]>
| +33 1 78 94 84 02
------------------------------------------------------------------------

2013-09-11 20:14:40

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Hard to debug NFS loss of connectivity: problem solved

On Wed, Sep 11, 2013 at 05:11:45PM +0200, Emmanuel Florac wrote:
> Le Thu, 5 Sep 2013 19:18:00 +0200
> Emmanuel Florac <[email protected]> écrivait:
>
> > Any hint, idea, etc would be extremely welcome. Even some
> > debugging method less painful than digging through huge
> > tcpdumps would be nice :)
>
> I post this answer in the faint hope that it may avoid a long and
> painful week of testing to someone else :)
>
> The problem comes from the virtual network adapter "Intel pro 1000"
> using the e1000e linux driver. The Intel hardware supports a MTU of
> 4078 max. On a physical machine, it will fail to set up a greater value
> with something like:
>
> # ifconfig eth0 mtu 4079
> SIOCSIFMTU: Invalid argument
>
> However the virtual e1000 from VMWare ESX silently accept any value;
> then it will mostly work, and fail in mysterious way and only with
> certain protocols (such as nfs).

Ah-hah!

Thanks for the followup.--b.

>
> In the case of VMWare virtual machines, using a vmnet3 virtual network
> adapter works fine with a mtu of 9000 under NFS. Therefore the problem
> is solved.
>
> For the sake of completeness and curiosity, I'll try and see what
> happens with a KVM e1000 virtual device.
>
> --
> ------------------------------------------------------------------------
> Emmanuel Florac | Direction technique
> | Intellique
> | <[email protected]>
> | +33 1 78 94 84 02
> ------------------------------------------------------------------------
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-09-06 16:07:42

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Hard to debug NFS loss of connectivity

On Fri, Sep 06, 2013 at 05:57:21PM +0200, Emmanuel Florac wrote:
> Le Thu, 5 Sep 2013 17:40:02 -0400
> "J. Bruce Fields" <[email protected]> écrivait:
>
> > I was asking about the on-the-wire errors and getattr replies here,
> > not the application system calls.
> >
>
> OK, I've done the dump; I've kept just the last few working calls, then
> the failure calls, and filtered for NFS traffic (there are SSH and
> LACP frames interspersed here). I have absolutely no idea about what's
> going on there :) Any light from the network savvy?

Weird. Things look normal up through frame 14, which is a READDIRPLUS
reply. Then the server resends the reply after .2s, and and the client
resends its call shortly thereafter (but without acking the latest
reply). And then the rest of the trace is resends of the reply.

So it looks like the client stopped ACKing the server's replies?

You may also have filtered out some TCP ACKs, which makes this harder to
work out.

Looks like a networking problem, but I don't know.

--b.

2013-09-06 17:15:28

by Jim Rees

[permalink] [raw]
Subject: Re: Hard to debug NFS loss of connectivity

Emmanuel Florac wrote:

Le Thu, 5 Sep 2013 17:40:02 -0400
"J. Bruce Fields" <[email protected]> ?crivait:

> I was asking about the on-the-wire errors and getattr replies here,
> not the application system calls.
>

OK, I've done the dump; I've kept just the last few working calls, then
the failure calls, and filtered for NFS traffic (there are SSH and
LACP frames interspersed here). I have absolutely no idea about what's
going on there :) Any light from the network savvy?

I suggest you filter on port=2049 instead of proto=nfs.

2013-09-06 15:57:27

by Emmanuel Florac

[permalink] [raw]
Subject: Re: Hard to debug NFS loss of connectivity

Le Thu, 5 Sep 2013 17:40:02 -0400
"J. Bruce Fields" <[email protected]> ?crivait:

> I was asking about the on-the-wire errors and getattr replies here,
> not the application system calls.
>

OK, I've done the dump; I've kept just the last few working calls, then
the failure calls, and filtered for NFS traffic (there are SSH and
LACP frames interspersed here). I have absolutely no idea about what's
going on there :) Any light from the network savvy?

--
------------------------------------------------------------------------
Emmanuel Florac | Direction technique
| Intellique
| <[email protected]>
| +33 1 78 94 84 02
------------------------------------------------------------------------


Attachments:
(No filename) (777.00 B)
nfserror.pcap (2.39 kB)
Download all attachments

2013-09-05 20:45:38

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Hard to debug NFS loss of connectivity

On Thu, Sep 05, 2013 at 07:18:00PM +0200, Emmanuel Florac wrote:
>
> Hi list, I have a serious problem I've never met before. Here is the
> setup:
>
> The NFS server is running Debian 6 amd64 but with a plain vanilla 3.2.50
> kernel. it shares a large 81 TB volume (XFS over LVM on hardware RAID6)
> through nfs without any particular options. Here is a glimpse
> of /etc/exports:
>
> /mnt/raid 10.1.1.0/255.255.255.128(fsid=1,rw,no_root_squash,async,no_subtree_check)
>
> On the other side is a VMWare ESX VM running Ubuntu 12.04LTS, kernel 3.2.0-52 Ubuntu
> amd64 mounting the share. From the fstab:
>
> 10.1.1.99:/mnt/raid /server nfs rw,hard,intr 0 0
>
> The problem is as follow: stat'ing files on the VM makes the
> NFS connection drop. For instance:
>
> find /server -type f -ls
>
> It works for a while, then stops responding. The NFS mount is frozen.
> The network link is OK; I still can ssh from the server to the VM
> and back, I can wget from the VM to the server, ping the server
> from the VM, etc. Only NFS is affected.
>
> Restarting NFS on the server does nothing to unfreeze the mount.
> Using nfs4 instead of nfs3 does nothing. The only remedy is to reboot the VM.
> There isn't any error in dmesg, /var/log/syslog or
> /var/log/messages in the VM nor the server.
>
> I've tried rebooting the server on a 3.9.7 kernel. Same thing.
> Of course there isn't any data corruption of any sort.
> Running "find /mnt/raid -type f -ls" on the server works
> perfectly and lists about 25000 files without the slightest trouble.
>
> It works equally well if I mount the NFS share on the server itself.
>
>
> Now it's becoming crazier: When I run the find command as
> previously said, it freezes always on the same file, for
> instance :
>
> /server/folder1/folder2/folder3/folder4/.svn/somefile
>
> However, if after a fresh reboot I do
>
> stat /server/folder1/folder2/folder3/folder4/.svn/somefile
>
> no problem. Even doing this:
>
> cd /server/folder1/folder2/folder3/folder4/ && find . -type f -ls
>
> works. However this
>
> cd /server/folder1/folder2/folder3/ && find . -type f -ls
>
> doesn't fly. It freezes at exactly the same point.
> In the first test (running directly from /server) it
> freezes after successfully listing 10000 files. In the last
> test it freezes after only 25 files.
> So apparently it's not about the number of files.
>
>
> Now I'm stuck. Out of going through tcpdump, I have absolutely
> not the faintest idea about what's going on, except I tend to
> think that's some Ubuntu kernel bug.
>
> Any hint, idea, etc would be extremely welcome. Even some
> debugging method less painful than digging through huge
> tcpdumps would be nice :)

Well, it sounds like you have a reproducer that shouldn't be *too* huge
(the test where it freezes after stat'ing 25 files).

What do you see on the network in that case?

Are you literally using just tcpdump? Wireshark will give more
(and easier to read) information.

Does the server stop responding at some point, or reply with an error?
Or does the getattr reply on the problem file look odd in any way?

--b.

2013-09-05 21:35:16

by Emmanuel Florac

[permalink] [raw]
Subject: Re: Hard to debug NFS loss of connectivity

Le Thu, 5 Sep 2013 16:45:36 -0400 vous écriviez:

> Well, it sounds like you have a reproducer that shouldn't be *too*
> huge (the test where it freezes after stat'ing 25 files).
>
> What do you see on the network in that case?

I didn't look yet what's actually happening, out of the drop in network
throughput.

> Are you literally using just tcpdump? Wireshark will give more
> (and easier to read) information.

I didn't install tcpdump yet, but isn't wireshark with a GUI? I can't
run anything with a GUI, this is a remote site behind several ssh
portals. So I should run tshark, then get my hands on the results to
analyze them with wireshark on my PC. Will try that.

> Does the server stop responding at some point, or reply with an error?

No response. I don't know if the server actually stops responding or if
something's wrong on the client side. Unfortunately I don't have
access to any other NFS client on this network, out of this VM and the
server itself. When rebooting the VM reconnects to the server all by
itself, so the server most probably is OK at all times.

> Or does the getattr reply on the problem file look odd in any way?

Not odd at all; it just stops after a particular file, though this file
and the following files can be accessed OK before I run the failing
test.

regards,
--
------------------------------------------------------------------------
Emmanuel Florac | Direction technique
| Intellique
| <[email protected]>
| +33 1 78 94 84 02
------------------------------------------------------------------------

2013-09-10 13:28:42

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Hard to debug NFS loss of connectivity

On Fri, Sep 06, 2013 at 06:55:08PM +0200, Emmanuel Florac wrote:
> Le Fri, 6 Sep 2013 12:07:35 -0400
> "J. Bruce Fields" <[email protected]> écrivait:
>
> > Weird. Things look normal up through frame 14, which is a READDIRPLUS
> > reply. Then the server resends the reply after .2s, and and the
> > client resends its call shortly thereafter (but without acking the
> > latest reply). And then the rest of the trace is resends of the
> > reply.
> >
> > So it looks like the client stopped ACKing the server's replies?
> >
> > You may also have filtered out some TCP ACKs, which makes this harder
> > to work out.
>
> Ah yes, my bad, one TCP ACK was filtered out. Here I've kept trafic
> between the two machines but ssh.

Huh, no idea. You can see the server retransmitting the readdir plus
reply, and still no ACKs from the client.

> Here I was capturing from the server, maybe I should try capturing on
> the client side?

Sure, maybe.

Honestly looks like a network problem, if it weren't for the failing on
the filesystem operation each time.

Hm, it may just be the first packet of a certain size. In fact it's the
first frame > 1500 bytes in that trace. Is there some problem with
jumbo frame configuration on your network?

--b.

2013-09-05 21:40:03

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Hard to debug NFS loss of connectivity

On Thu, Sep 05, 2013 at 11:34:49PM +0200, Emmanuel Florac wrote:
> Le Thu, 5 Sep 2013 16:45:36 -0400 vous écriviez:
>
> > Well, it sounds like you have a reproducer that shouldn't be *too*
> > huge (the test where it freezes after stat'ing 25 files).
> >
> > What do you see on the network in that case?
>
> I didn't look yet what's actually happening, out of the drop in network
> throughput.
>
> > Are you literally using just tcpdump? Wireshark will give more
> > (and easier to read) information.
>
> I didn't install tcpdump yet, but isn't wireshark with a GUI? I can't
> run anything with a GUI, this is a remote site behind several ssh
> portals. So I should run tshark, then get my hands on the results to
> analyze them with wireshark on my PC. Will try that.

Right, I just do "tcpdump -s0 -wtmp.pcap -i<interface>" and then run
wireshark on tmp.pcap.

> > Does the server stop responding at some point, or reply with an error?
>
> No response. I don't know if the server actually stops responding or if
> something's wrong on the client side. Unfortunately I don't have
> access to any other NFS client on this network, out of this VM and the
> server itself. When rebooting the VM reconnects to the server all by
> itself, so the server most probably is OK at all times.
>
> > Or does the getattr reply on the problem file look odd in any way?
>
> Not odd at all; it just stops after a particular file, though this file
> and the following files can be accessed OK before I run the failing
> test.

I was asking about the on-the-wire errors and getattr replies here, not
the application system calls.

--b.

2013-09-10 13:34:17

by Emmanuel Florac

[permalink] [raw]
Subject: Re: Hard to debug NFS loss of connectivity

Le Tue, 10 Sep 2013 09:28:41 -0400
"J. Bruce Fields" <[email protected]> ?crivait:

> Sure, maybe.
>
> Honestly looks like a network problem, if it weren't for the failing
> on the filesystem operation each time.
>
> Hm, it may just be the first packet of a certain size. In fact it's
> the first frame > 1500 bytes in that trace. Is there some problem
> with jumbo frame configuration on your network?
>

Interesting idea. I could try forcing both interfaces to 1500 mtu and
see what's happening. Oh, maybe the VMWare host interface /isn't/ set
to 9000? I'll check that. Thank you.

--
------------------------------------------------------------------------
Emmanuel Florac | Direction technique
| Intellique
| <[email protected]>
| +33 1 78 94 84 02
------------------------------------------------------------------------