2013-06-18 13:58:01

by Michael Richardson

[permalink] [raw]
Subject: big send queues on NFS server


Attachments:
(No filename) (2.90 kB)
(No filename) (307.00 B)
Download all attachments

2013-06-18 17:25:39

by J. Bruce Fields

[permalink] [raw]
Subject: Re: big send queues on NFS server

On Tue, Jun 18, 2013 at 09:48:31AM -0400, [email protected] wrote:
>
> Hi, I have been an NFS user and enthusiast for 20+ years.
> My home systems still have the numerical uid that doe.carleton.ca
> assigned me back in 1989... cause of NFS... Recently, I turned off
> a NetBSD 5 machine that was my NFS server, and everything is on a
> Linux/Ubuntu server, LVM+raid setup.
>
> I have a slightly interesting setup at my home. A VM with a public IP
> (cassidy) address runs a custom web server on port 81 to stream mp3/ogg to
> whatever device needs it. My music skips/pauses. Some of this was traced
> down to bufferbloat issues when I was listening from work. But, it's
> happening at my home desk, connected by Gb/E. An issue with an IPv6 RA
> server was ruled out.
>
> To be clear:
> desktop(obiwan)---IPv4:81---->server(cassidy)---NFSv4-IPv6-->herring
>
> I am running a tmux ("screen") on NFS server, with one pane being:
> watch 'ss -tan | grep 2049'
>
> And in the other, initially, I was running:
> sudo tcpdump -i eth0 -n -p ether host ETHERNETOFCASSIDY
>
> as that was very busy, I ran instead:
> sudo tcpdump -i eth0 -n -p ether host 00:16:3e:11:22:e4 and \
> '(tcp[13] & 2!=0 or ip6[53]&2 !=0)'
>
> and each time the music stops I see huge xmit queues on the NFS server,
>
> ESTAB 0 789156 2607:dead:f:2::231:2049 2607:dead:f:2:216:3eff:fe11:22e4:868
>
> *usually* that then results in a TCP restart:
>
> 09:40:12.701402 IP6 2607:dead:f:2:216:3eff:fe11:22e4.868 >
> 2607:dead:f:2::231.2049: Flags [S], seq 2570499549, win 5712, options [mss
> 1440,sackOK,TS val 2994659072 ecr 1552097470,nop,wscale 2], length 0
>
> 09:40:12.701456 IP6 2607:dead:f:2::231.2049 >
> 2607:dead:f:2:216:3eff:fe11:22e4.868: Flags [S.], seq 707413120, ack
> 2570499550, win 14280, options [mss 1440,sackOK,TS val 1552097470 ecr
> 2994659072,nop,wscale 7], length 0
>
> I notice that it always seem to use the same source port number.
> I didn't think that this was allowed until after 2*RTT.
>
> What seems to be occuring to me is some kind of head of queue problem in the
> TCP stream. I would be happy to install experimental kernels, instrument
> stuff, whatever..., particularly on the NFS client, as it's not a critical
> machine. If I need to do something on the NFS server, it will possible.
> I will shortly update the kernel the debian backports on the client.
>
> I watch and I regularly see large (+1M) send queues on the server:
>
> ESTAB 0 1434080 2607:dead:f:2::231:2049 2607:dead:f:2:216:3eff:fe11:22e4:868
>
> If they decline in time, there is no interruption, otherwise, the web server
> gets an underrun, and the music stops.
>
> I could also capture the entire NFS stream, or just do TCP window analysis on
> this stream, but I would suspect that it's a problem on the client.

Could be, though it sounds like all you changed here was replacing the
NetBSD server by a Linux server?

Of course, that's a rather complicated change in itself (default NFS
version, transport (tcp vs udp), etc. may have changed as well.

Might be worth fooling with those parameters using mount options. The
defaults should be best, but it might help narrow down the problem.

--b.

>
> NFS server:
> herring-[~] mcr 1001 %uname -a
> Linux herring 3.2.0-39-generic #62-Ubuntu SMP Thu Feb 28 00:28:53 UTC 2013
> x86_64 x86_64 x86_64 GNU/Linux
>
> NFS client:
> cassidy-[~] mcr 1010 %uname -a
> Linux cassidy.sandelman.ca 2.6.32-5-xen-686 #1 SMP Wed May 18 09:43:15 UTC
> 2011 i686 GNU/Linux
>
>
>
>
>



2013-06-18 19:33:11

by Michael Richardson

[permalink] [raw]
Subject: Re: big send queues on NFS server


On Tue, Jun 18, 2013 at 09:48:31AM -0400, [email protected] wrote:
>> Hi, I have been an NFS user and enthusiast for 20+ years. My home
>> systems still have the numerical uid that doe.carleton.ca assigned me
>> back in 1989... cause of NFS... Recently, I turned off a NetBSD 5

...

>> If they decline in time, there is no interruption, otherwise, the web
>> server gets an underrun, and the music stops.
>>
>> I could also capture the entire NFS stream, or just do TCP window
>> analysis on this stream, but I would suspect that it's a problem on
>> the client.

J. Bruce Fields <[email protected]> wrote:
jb> Could be, though it sounds like all you changed here was replacing
jb> the NetBSD server by a Linux server?

well, I mentioned NetBSD to indicate the length of time I have used various
NFS systems, not because I felt that it was a specific interop issue.

jb> Of course, that's a rather complicated change in itself (default NFS
jb> version, transport (tcp vs udp), etc. may have changed as well.

jb> Might be worth fooling with those parameters using mount options.
jb> The defaults should be best, but it might help narrow down the
jb> problem.

I am using mostly default options: nosuid, nodev, hard.
Generally, I have solved problems in the past by going back to NFSv3 on UDP
mounts, and then doing the classic nfsd worker tuning dance, and the
rsize=/wsize= game.

I am posting to understand if someone says, "oh, yes, you found issue 34534,
and it's a client side problem, and it's fixed in 3.7.2..."
or: "thats is weird. What does /proc/nfs/magic_client_side_tunnable say?"
or: "I have that too"
or: "can you send a pcap?"

I would love: "that's a client problem" vs "that's a server problem",
and I'd go investigate deeper there :-)

--
] Never tell me the odds! | ipv6 mesh networks [
] Michael Richardson, Sandelman Software Works | network architect [
] [email protected] http://www.sandelman.ca/ | ruby on rails [