2015-10-28 11:06:07

by Richard Kojedzinszky

[permalink] [raw]
Subject: nfs client hangs / repeating nfs requests

Dear all,

I am again having some trouble with nfs client. Actually I am running
minidlna over nfs over gre over ipsec configuration, but I suppose the
latter two should not count.

Minidlna scans the entire directory, reading all files (or just parts),
and while doing it, sometimes the process hangs.

I could emulate this with reading the first 16k off all files from nfs,
and fortunately the effect is the same.

After a hang/recovery, no userspace is accessing the nfs right now, but
tcpdump shows repeating requests/replies (attached). Meanwhile the rpc
debug also emits logs.

# uname -a
Linux k-desktop 4.1.0-2-amd64 #1 SMP Debian 4.1.6-1 (2015-08-23) x86_64
GNU/Linux
# mount | grep nfs
10.0.0.2:/home/kh on /remote/kh type nfs
(rw,relatime,vers=3,rsize=1024,wsize=1024,namlen=255,hard,nolock,proto=udp,timeo=11,retrans=3,sec=sys,mountaddr=10.0.0.2,mountvers=3,mountport=644,mountproto=udp,local_lock=all,addr=10.0.0.2)


Any suggestions?

Kojedzinszky Richard
Euronet Magyarorszag Informatika Zrt.


Attachments:
nfs.repeating.cap (16.42 kB)
rpc.debug.log (2.78 kB)
Download all attachments

2015-10-28 13:57:42

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfs client hangs / repeating nfs requests

On Wed, Oct 28, 2015 at 12:06:02PM +0100, [email protected] wrote:
> Dear all,
>
> I am again having some trouble with nfs client. Actually I am
> running minidlna over nfs over gre over ipsec configuration, but I
> suppose the latter two should not count.
>
> Minidlna scans the entire directory, reading all files (or just
> parts), and while doing it, sometimes the process hangs.
>
> I could emulate this with reading the first 16k off all files from
> nfs, and fortunately the effect is the same.
>
> After a hang/recovery, no userspace is accessing the nfs right now,
> but tcpdump shows repeating requests/replies (attached). Meanwhile
> the rpc debug also emits logs.
>
> # uname -a
> Linux k-desktop 4.1.0-2-amd64 #1 SMP Debian 4.1.6-1 (2015-08-23)
> x86_64 GNU/Linux
> # mount | grep nfs
> 10.0.0.2:/home/kh on /remote/kh type nfs (rw,relatime,vers=3,rsize=1024,wsize=1024,namlen=255,hard,nolock,proto=udp,timeo=11,retrans=3,sec=sys,mountaddr=10.0.0.2,mountvers=3,mountport=644,mountproto=udp,local_lock=all,addr=10.0.0.2)

rsize=1024,wsize=1024,proto=udp is unusual and not normally what we'd
recommend. Is there a reason for not sticking with defaults there?

But I don't have any explanation for the hang.

--b.

> Any suggestions?
>
> Kojedzinszky Richard
> Euronet Magyarorszag Informatika Zrt.


> Oct 28 12:02:18 k-desktop kernel: [12237.516020] RPC: 54539 xprt_prepare_transmit
> Oct 28 12:02:18 k-desktop kernel: [12237.516022] RPC: 54539 xprt_cwnd_limited cong = 0 cwnd = 256
> Oct 28 12:02:18 k-desktop kernel: [12237.516027] RPC: 54539 xprt_transmit(120)
> Oct 28 12:02:18 k-desktop kernel: [12237.516054] RPC: 54539 xmit complete
> Oct 28 12:02:22 k-desktop kernel: [12241.932011] RPC: 54539 xprt_timer
> Oct 28 12:02:22 k-desktop kernel: [12241.932016] RPC: cong 256, cwnd was 256, now 256
> Oct 28 12:02:22 k-desktop kernel: [12241.932020] RPC: 54539 xprt_prepare_transmit
> Oct 28 12:02:22 k-desktop kernel: [12241.932022] RPC: 54539 xprt_cwnd_limited cong = 0 cwnd = 256
> Oct 28 12:02:22 k-desktop kernel: [12241.932027] RPC: 54539 xprt_transmit(120)
> Oct 28 12:02:22 k-desktop kernel: [12241.932054] RPC: 54539 xmit complete
> Oct 28 12:02:31 k-desktop kernel: [12250.748013] RPC: 54539 xprt_timer
> Oct 28 12:02:31 k-desktop kernel: [12250.748018] RPC: cong 256, cwnd was 256, now 256
> Oct 28 12:02:31 k-desktop kernel: [12250.748023] RPC: 54539 xprt_prepare_transmit
> Oct 28 12:02:31 k-desktop kernel: [12250.748025] RPC: 54539 xprt_cwnd_limited cong = 0 cwnd = 256
> Oct 28 12:02:31 k-desktop kernel: [12250.748029] RPC: 54539 xprt_transmit(120)
> Oct 28 12:02:31 k-desktop kernel: [12250.748058] RPC: 54539 xmit complete
> Oct 28 12:02:32 k-desktop kernel: [12251.852014] RPC: 54539 xprt_timer
> Oct 28 12:02:32 k-desktop kernel: [12251.852018] RPC: cong 256, cwnd was 256, now 256
> Oct 28 12:02:32 k-desktop kernel: [12251.852021] RPC: 54539 xprt_prepare_transmit
> Oct 28 12:02:32 k-desktop kernel: [12251.852023] RPC: 54539 xprt_cwnd_limited cong = 0 cwnd = 256
> Oct 28 12:02:32 k-desktop kernel: [12251.852027] RPC: 54539 xprt_transmit(120)
> Oct 28 12:02:32 k-desktop kernel: [12251.852048] RPC: 54539 xmit complete
> Oct 28 12:02:35 k-desktop kernel: [12254.060011] RPC: 54539 xprt_timer
> Oct 28 12:02:35 k-desktop kernel: [12254.060015] RPC: cong 256, cwnd was 256, now 256
> Oct 28 12:02:35 k-desktop kernel: [12254.060018] RPC: 54539 xprt_prepare_transmit
> Oct 28 12:02:35 k-desktop kernel: [12254.060020] RPC: 54539 xprt_cwnd_limited cong = 0 cwnd = 256
> Oct 28 12:02:35 k-desktop kernel: [12254.060023] RPC: 54539 xprt_transmit(120)
> Oct 28 12:02:35 k-desktop kernel: [12254.060041] RPC: 54539 xmit complete
> Oct 28 12:02:39 k-desktop kernel: [12258.476012] RPC: 54539 xprt_timer
> Oct 28 12:02:39 k-desktop kernel: [12258.476016] RPC: cong 256, cwnd was 256, now 256
> Oct 28 12:02:39 k-desktop kernel: [12258.476020] RPC: 54539 xprt_prepare_transmit
> Oct 28 12:02:39 k-desktop kernel: [12258.476022] RPC: 54539 xprt_cwnd_limited cong = 0 cwnd = 256
> Oct 28 12:02:39 k-desktop kernel: [12258.476027] RPC: 54539 xprt_transmit(120)
> Oct 28 12:02:39 k-desktop kernel: [12258.476052] RPC: 54539 xmit complete


2015-10-28 15:19:54

by Richard Kojedzinszky

[permalink] [raw]
Subject: Re: nfs client hangs / repeating nfs requests

Dear Bruce,

That was an n-th experiment. Actually, with default settings the hang
still happens, but I've just noticed the repeating packets with these
settings. I've chosen 1024 sizes to fit the requests in one packets,
to avoid any ip/udp fragmentation.

The server is a freebsd nfs. I am struggling with this setup for a few
months now, and I dont know, what is the cause of the hang. I could hardly
believe that the nfs server is buggy, as many people around the world is
using freebsd as an nfs server. The same for linux, I hardly can believe
that the nfs client is faulty, as again many people is using it.

But still I have the issue, hanging nfs mounts.

I will report on any advance.

Regards,

Kojedzinszky Richard
Euronet Magyarorszag Informatika Zrt.

On Wed, 28 Oct 2015, J. Bruce Fields wrote:

> Date: Wed, 28 Oct 2015 09:57:42 -0400
> From: J. Bruce Fields <[email protected]>
> To: [email protected]
> Cc: [email protected]
> Subject: Re: nfs client hangs / repeating nfs requests
>
> On Wed, Oct 28, 2015 at 12:06:02PM +0100, [email protected] wrote:
>> Dear all,
>>
>> I am again having some trouble with nfs client. Actually I am
>> running minidlna over nfs over gre over ipsec configuration, but I
>> suppose the latter two should not count.
>>
>> Minidlna scans the entire directory, reading all files (or just
>> parts), and while doing it, sometimes the process hangs.
>>
>> I could emulate this with reading the first 16k off all files from
>> nfs, and fortunately the effect is the same.
>>
>> After a hang/recovery, no userspace is accessing the nfs right now,
>> but tcpdump shows repeating requests/replies (attached). Meanwhile
>> the rpc debug also emits logs.
>>
>> # uname -a
>> Linux k-desktop 4.1.0-2-amd64 #1 SMP Debian 4.1.6-1 (2015-08-23)
>> x86_64 GNU/Linux
>> # mount | grep nfs
>> 10.0.0.2:/home/kh on /remote/kh type nfs (rw,relatime,vers=3,rsize=1024,wsize=1024,namlen=255,hard,nolock,proto=udp,timeo=11,retrans=3,sec=sys,mountaddr=10.0.0.2,mountvers=3,mountport=644,mountproto=udp,local_lock=all,addr=10.0.0.2)
>
> rsize=1024,wsize=1024,proto=udp is unusual and not normally what we'd
> recommend. Is there a reason for not sticking with defaults there?
>
> But I don't have any explanation for the hang.
>
> --b.
>
>> Any suggestions?
>>
>> Kojedzinszky Richard
>> Euronet Magyarorszag Informatika Zrt.
>
>
>> Oct 28 12:02:18 k-desktop kernel: [12237.516020] RPC: 54539 xprt_prepare_transmit
>> Oct 28 12:02:18 k-desktop kernel: [12237.516022] RPC: 54539 xprt_cwnd_limited cong = 0 cwnd = 256
>> Oct 28 12:02:18 k-desktop kernel: [12237.516027] RPC: 54539 xprt_transmit(120)
>> Oct 28 12:02:18 k-desktop kernel: [12237.516054] RPC: 54539 xmit complete
>> Oct 28 12:02:22 k-desktop kernel: [12241.932011] RPC: 54539 xprt_timer
>> Oct 28 12:02:22 k-desktop kernel: [12241.932016] RPC: cong 256, cwnd was 256, now 256
>> Oct 28 12:02:22 k-desktop kernel: [12241.932020] RPC: 54539 xprt_prepare_transmit
>> Oct 28 12:02:22 k-desktop kernel: [12241.932022] RPC: 54539 xprt_cwnd_limited cong = 0 cwnd = 256
>> Oct 28 12:02:22 k-desktop kernel: [12241.932027] RPC: 54539 xprt_transmit(120)
>> Oct 28 12:02:22 k-desktop kernel: [12241.932054] RPC: 54539 xmit complete
>> Oct 28 12:02:31 k-desktop kernel: [12250.748013] RPC: 54539 xprt_timer
>> Oct 28 12:02:31 k-desktop kernel: [12250.748018] RPC: cong 256, cwnd was 256, now 256
>> Oct 28 12:02:31 k-desktop kernel: [12250.748023] RPC: 54539 xprt_prepare_transmit
>> Oct 28 12:02:31 k-desktop kernel: [12250.748025] RPC: 54539 xprt_cwnd_limited cong = 0 cwnd = 256
>> Oct 28 12:02:31 k-desktop kernel: [12250.748029] RPC: 54539 xprt_transmit(120)
>> Oct 28 12:02:31 k-desktop kernel: [12250.748058] RPC: 54539 xmit complete
>> Oct 28 12:02:32 k-desktop kernel: [12251.852014] RPC: 54539 xprt_timer
>> Oct 28 12:02:32 k-desktop kernel: [12251.852018] RPC: cong 256, cwnd was 256, now 256
>> Oct 28 12:02:32 k-desktop kernel: [12251.852021] RPC: 54539 xprt_prepare_transmit
>> Oct 28 12:02:32 k-desktop kernel: [12251.852023] RPC: 54539 xprt_cwnd_limited cong = 0 cwnd = 256
>> Oct 28 12:02:32 k-desktop kernel: [12251.852027] RPC: 54539 xprt_transmit(120)
>> Oct 28 12:02:32 k-desktop kernel: [12251.852048] RPC: 54539 xmit complete
>> Oct 28 12:02:35 k-desktop kernel: [12254.060011] RPC: 54539 xprt_timer
>> Oct 28 12:02:35 k-desktop kernel: [12254.060015] RPC: cong 256, cwnd was 256, now 256
>> Oct 28 12:02:35 k-desktop kernel: [12254.060018] RPC: 54539 xprt_prepare_transmit
>> Oct 28 12:02:35 k-desktop kernel: [12254.060020] RPC: 54539 xprt_cwnd_limited cong = 0 cwnd = 256
>> Oct 28 12:02:35 k-desktop kernel: [12254.060023] RPC: 54539 xprt_transmit(120)
>> Oct 28 12:02:35 k-desktop kernel: [12254.060041] RPC: 54539 xmit complete
>> Oct 28 12:02:39 k-desktop kernel: [12258.476012] RPC: 54539 xprt_timer
>> Oct 28 12:02:39 k-desktop kernel: [12258.476016] RPC: cong 256, cwnd was 256, now 256
>> Oct 28 12:02:39 k-desktop kernel: [12258.476020] RPC: 54539 xprt_prepare_transmit
>> Oct 28 12:02:39 k-desktop kernel: [12258.476022] RPC: 54539 xprt_cwnd_limited cong = 0 cwnd = 256
>> Oct 28 12:02:39 k-desktop kernel: [12258.476027] RPC: 54539 xprt_transmit(120)
>> Oct 28 12:02:39 k-desktop kernel: [12258.476052] RPC: 54539 xmit complete
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2015-10-28 23:17:31

by Richard Kojedzinszky

[permalink] [raw]
Subject: Re: nfs client hangs / repeating nfs requests

Dear Bruce,

I am trying to do more dbugging regarding my issue. Now I am at a stage
where the nfs mount is unmounted, and still the kernel log shows periodic
nfs messages, and also packets transmitted. The nfs mount point is
unmounted.

Something strange is going here.

The nfs was mounted over gre over ipsec over a WAN connection, which may
have random packet loss or high RTT. Can this cause the nfs client to
behave this way? I suppose it should no.

Please tell me what info should I collect for more debugging.

Unfortunately I could not reproduce this in a virtualised network
environment even with netem qdisc to emulate the WAN characteristics. But
I am trying to do it so.

Thanks in advance,

Kojedzinszky Richard
Euronet Magyarorszag Informatika Zrt.

On Wed, 28 Oct 2015, [email protected] wrote:

> Date: Wed, 28 Oct 2015 16:19:50 +0100 (CET)
> From: [email protected]
> To: J. Bruce Fields <[email protected]>
> Cc: [email protected]
> Subject: Re: nfs client hangs / repeating nfs requests
>
> Dear Bruce,
>
> That was an n-th experiment. Actually, with default settings the hang still
> happens, but I've just noticed the repeating packets with these settings.
> I've chosen 1024 sizes to fit the requests in one packets, to avoid any
> ip/udp fragmentation.
>
> The server is a freebsd nfs. I am struggling with this setup for a few months
> now, and I dont know, what is the cause of the hang. I could hardly believe
> that the nfs server is buggy, as many people around the world is using
> freebsd as an nfs server. The same for linux, I hardly can believe that the
> nfs client is faulty, as again many people is using it.
>
> But still I have the issue, hanging nfs mounts.
>
> I will report on any advance.
>
> Regards,
>
> Kojedzinszky Richard
> Euronet Magyarorszag Informatika Zrt.
>
> On Wed, 28 Oct 2015, J. Bruce Fields wrote:
>
>> Date: Wed, 28 Oct 2015 09:57:42 -0400
>> From: J. Bruce Fields <[email protected]>
>> To: [email protected]
>> Cc: [email protected]
>> Subject: Re: nfs client hangs / repeating nfs requests
>>
>> On Wed, Oct 28, 2015 at 12:06:02PM +0100, [email protected] wrote:
>>> Dear all,
>>>
>>> I am again having some trouble with nfs client. Actually I am
>>> running minidlna over nfs over gre over ipsec configuration, but I
>>> suppose the latter two should not count.
>>>
>>> Minidlna scans the entire directory, reading all files (or just
>>> parts), and while doing it, sometimes the process hangs.
>>>
>>> I could emulate this with reading the first 16k off all files from
>>> nfs, and fortunately the effect is the same.
>>>
>>> After a hang/recovery, no userspace is accessing the nfs right now,
>>> but tcpdump shows repeating requests/replies (attached). Meanwhile
>>> the rpc debug also emits logs.
>>>
>>> # uname -a
>>> Linux k-desktop 4.1.0-2-amd64 #1 SMP Debian 4.1.6-1 (2015-08-23)
>>> x86_64 GNU/Linux
>>> # mount | grep nfs
>>> 10.0.0.2:/home/kh on /remote/kh type nfs
>>> (rw,relatime,vers=3,rsize=1024,wsize=1024,namlen=255,hard,nolock,proto=udp,timeo=11,retrans=3,sec=sys,mountaddr=10.0.0.2,mountvers=3,mountport=644,mountproto=udp,local_lock=all,addr=10.0.0.2)
>>
>> rsize=1024,wsize=1024,proto=udp is unusual and not normally what we'd
>> recommend. Is there a reason for not sticking with defaults there?
>>
>> But I don't have any explanation for the hang.
>>
>> --b.
>>
>>> Any suggestions?
>>>
>>> Kojedzinszky Richard
>>> Euronet Magyarorszag Informatika Zrt.
>>
>>
>>> Oct 28 12:02:18 k-desktop kernel: [12237.516020] RPC: 54539
>>> xprt_prepare_transmit
>>> Oct 28 12:02:18 k-desktop kernel: [12237.516022] RPC: 54539
>>> xprt_cwnd_limited cong = 0 cwnd = 256
>>> Oct 28 12:02:18 k-desktop kernel: [12237.516027] RPC: 54539
>>> xprt_transmit(120)
>>> Oct 28 12:02:18 k-desktop kernel: [12237.516054] RPC: 54539 xmit complete
>>> Oct 28 12:02:22 k-desktop kernel: [12241.932011] RPC: 54539 xprt_timer
>>> Oct 28 12:02:22 k-desktop kernel: [12241.932016] RPC: cong 256, cwnd
>>> was 256, now 256
>>> Oct 28 12:02:22 k-desktop kernel: [12241.932020] RPC: 54539
>>> xprt_prepare_transmit
>>> Oct 28 12:02:22 k-desktop kernel: [12241.932022] RPC: 54539
>>> xprt_cwnd_limited cong = 0 cwnd = 256
>>> Oct 28 12:02:22 k-desktop kernel: [12241.932027] RPC: 54539
>>> xprt_transmit(120)
>>> Oct 28 12:02:22 k-desktop kernel: [12241.932054] RPC: 54539 xmit complete
>>> Oct 28 12:02:31 k-desktop kernel: [12250.748013] RPC: 54539 xprt_timer
>>> Oct 28 12:02:31 k-desktop kernel: [12250.748018] RPC: cong 256, cwnd
>>> was 256, now 256
>>> Oct 28 12:02:31 k-desktop kernel: [12250.748023] RPC: 54539
>>> xprt_prepare_transmit
>>> Oct 28 12:02:31 k-desktop kernel: [12250.748025] RPC: 54539
>>> xprt_cwnd_limited cong = 0 cwnd = 256
>>> Oct 28 12:02:31 k-desktop kernel: [12250.748029] RPC: 54539
>>> xprt_transmit(120)
>>> Oct 28 12:02:31 k-desktop kernel: [12250.748058] RPC: 54539 xmit complete
>>> Oct 28 12:02:32 k-desktop kernel: [12251.852014] RPC: 54539 xprt_timer
>>> Oct 28 12:02:32 k-desktop kernel: [12251.852018] RPC: cong 256, cwnd
>>> was 256, now 256
>>> Oct 28 12:02:32 k-desktop kernel: [12251.852021] RPC: 54539
>>> xprt_prepare_transmit
>>> Oct 28 12:02:32 k-desktop kernel: [12251.852023] RPC: 54539
>>> xprt_cwnd_limited cong = 0 cwnd = 256
>>> Oct 28 12:02:32 k-desktop kernel: [12251.852027] RPC: 54539
>>> xprt_transmit(120)
>>> Oct 28 12:02:32 k-desktop kernel: [12251.852048] RPC: 54539 xmit complete
>>> Oct 28 12:02:35 k-desktop kernel: [12254.060011] RPC: 54539 xprt_timer
>>> Oct 28 12:02:35 k-desktop kernel: [12254.060015] RPC: cong 256, cwnd
>>> was 256, now 256
>>> Oct 28 12:02:35 k-desktop kernel: [12254.060018] RPC: 54539
>>> xprt_prepare_transmit
>>> Oct 28 12:02:35 k-desktop kernel: [12254.060020] RPC: 54539
>>> xprt_cwnd_limited cong = 0 cwnd = 256
>>> Oct 28 12:02:35 k-desktop kernel: [12254.060023] RPC: 54539
>>> xprt_transmit(120)
>>> Oct 28 12:02:35 k-desktop kernel: [12254.060041] RPC: 54539 xmit complete
>>> Oct 28 12:02:39 k-desktop kernel: [12258.476012] RPC: 54539 xprt_timer
>>> Oct 28 12:02:39 k-desktop kernel: [12258.476016] RPC: cong 256, cwnd
>>> was 256, now 256
>>> Oct 28 12:02:39 k-desktop kernel: [12258.476020] RPC: 54539
>>> xprt_prepare_transmit
>>> Oct 28 12:02:39 k-desktop kernel: [12258.476022] RPC: 54539
>>> xprt_cwnd_limited cong = 0 cwnd = 256
>>> Oct 28 12:02:39 k-desktop kernel: [12258.476027] RPC: 54539
>>> xprt_transmit(120)
>>> Oct 28 12:02:39 k-desktop kernel: [12258.476052] RPC: 54539 xmit complete
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2015-10-29 13:22:49

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfs client hangs / repeating nfs requests

On Thu, Oct 29, 2015 at 12:17:27AM +0100, [email protected] wrote:
> I am trying to do more dbugging regarding my issue. Now I am at a
> stage where the nfs mount is unmounted, and still the kernel log
> shows periodic nfs messages, and also packets transmitted. The nfs
> mount point is unmounted.
>
> Something strange is going here.
>
> The nfs was mounted over gre over ipsec over a WAN connection, which
> may have random packet loss or high RTT. Can this cause the nfs
> client to behave this way? I suppose it should no.

I'd expect poor performance, but not an indefinite hang.

One possible exception is that if the protocols depend on userspace
helpers of some sort (e.g. for crypto setup), then there's a potential
deadlock where the helper needs to allocate memory to make progress, but
memory allocation is waiting on nfs to write out dirty pages. Since
you're not talking about a write-heavy workload that's unlikely in your
case.

> Please tell me what info should I collect for more debugging.

Possible things to try:

- ps to identify which process are hanging.
- look at network traffic with wireshark or tcpdump--are client
and server still communicating, and if so what are they doing?
- echo 0 >/proc/sys/sunrpc/rpc_debug will log information about
rpc tasks.
- echo t >/proc/sysrq-trigger will dump information about tasks.

And also check the server to see if it's logging anything or if nfsd is
hung (but I'm not familiar with freebsd debugging).

--b.

>
> Unfortunately I could not reproduce this in a virtualised network
> environment even with netem qdisc to emulate the WAN
> characteristics. But I am trying to do it so.
>
> Thanks in advance,
>
> Kojedzinszky Richard
> Euronet Magyarorszag Informatika Zrt.
>
> On Wed, 28 Oct 2015, [email protected] wrote:
>
> >Date: Wed, 28 Oct 2015 16:19:50 +0100 (CET)
> >From: [email protected]
> >To: J. Bruce Fields <[email protected]>
> >Cc: [email protected]
> >Subject: Re: nfs client hangs / repeating nfs requests
> >
> >Dear Bruce,
> >
> >That was an n-th experiment. Actually, with default settings the
> >hang still happens, but I've just noticed the repeating packets
> >with these settings. I've chosen 1024 sizes to fit the requests in
> >one packets, to avoid any ip/udp fragmentation.
> >
> >The server is a freebsd nfs. I am struggling with this setup for a
> >few months now, and I dont know, what is the cause of the hang. I
> >could hardly believe that the nfs server is buggy, as many people
> >around the world is using freebsd as an nfs server. The same for
> >linux, I hardly can believe that the nfs client is faulty, as
> >again many people is using it.
> >
> >But still I have the issue, hanging nfs mounts.
> >
> >I will report on any advance.
> >
> >Regards,
> >
> >Kojedzinszky Richard
> >Euronet Magyarorszag Informatika Zrt.
> >
> >On Wed, 28 Oct 2015, J. Bruce Fields wrote:
> >
> >>Date: Wed, 28 Oct 2015 09:57:42 -0400
> >>From: J. Bruce Fields <[email protected]>
> >>To: [email protected]
> >>Cc: [email protected]
> >>Subject: Re: nfs client hangs / repeating nfs requests
> >>
> >>On Wed, Oct 28, 2015 at 12:06:02PM +0100, [email protected] wrote:
> >>>Dear all,
> >>>
> >>>I am again having some trouble with nfs client. Actually I am
> >>>running minidlna over nfs over gre over ipsec configuration, but I
> >>>suppose the latter two should not count.
> >>>
> >>>Minidlna scans the entire directory, reading all files (or just
> >>>parts), and while doing it, sometimes the process hangs.
> >>>
> >>>I could emulate this with reading the first 16k off all files from
> >>>nfs, and fortunately the effect is the same.
> >>>
> >>>After a hang/recovery, no userspace is accessing the nfs right now,
> >>>but tcpdump shows repeating requests/replies (attached). Meanwhile
> >>>the rpc debug also emits logs.
> >>>
> >>># uname -a
> >>>Linux k-desktop 4.1.0-2-amd64 #1 SMP Debian 4.1.6-1 (2015-08-23)
> >>>x86_64 GNU/Linux
> >>># mount | grep nfs
> >>>10.0.0.2:/home/kh on /remote/kh type nfs (rw,relatime,vers=3,rsize=1024,wsize=1024,namlen=255,hard,nolock,proto=udp,timeo=11,retrans=3,sec=sys,mountaddr=10.0.0.2,mountvers=3,mountport=644,mountproto=udp,local_lock=all,addr=10.0.0.2)
> >>
> >>rsize=1024,wsize=1024,proto=udp is unusual and not normally what we'd
> >>recommend. Is there a reason for not sticking with defaults there?
> >>
> >>But I don't have any explanation for the hang.
> >>
> >>--b.
> >>
> >>>Any suggestions?
> >>>
> >>>Kojedzinszky Richard
> >>>Euronet Magyarorszag Informatika Zrt.
> >>
> >>
> >>>Oct 28 12:02:18 k-desktop kernel: [12237.516020] RPC: 54539
> >>>xprt_prepare_transmit
> >>>Oct 28 12:02:18 k-desktop kernel: [12237.516022] RPC: 54539
> >>>xprt_cwnd_limited cong = 0 cwnd = 256
> >>>Oct 28 12:02:18 k-desktop kernel: [12237.516027] RPC: 54539
> >>>xprt_transmit(120)
> >>>Oct 28 12:02:18 k-desktop kernel: [12237.516054] RPC: 54539 xmit complete
> >>>Oct 28 12:02:22 k-desktop kernel: [12241.932011] RPC: 54539 xprt_timer
> >>>Oct 28 12:02:22 k-desktop kernel: [12241.932016] RPC:
> >>>cong 256, cwnd was 256, now 256
> >>>Oct 28 12:02:22 k-desktop kernel: [12241.932020] RPC: 54539
> >>>xprt_prepare_transmit
> >>>Oct 28 12:02:22 k-desktop kernel: [12241.932022] RPC: 54539
> >>>xprt_cwnd_limited cong = 0 cwnd = 256
> >>>Oct 28 12:02:22 k-desktop kernel: [12241.932027] RPC: 54539
> >>>xprt_transmit(120)
> >>>Oct 28 12:02:22 k-desktop kernel: [12241.932054] RPC: 54539 xmit complete
> >>>Oct 28 12:02:31 k-desktop kernel: [12250.748013] RPC: 54539 xprt_timer
> >>>Oct 28 12:02:31 k-desktop kernel: [12250.748018] RPC:
> >>>cong 256, cwnd was 256, now 256
> >>>Oct 28 12:02:31 k-desktop kernel: [12250.748023] RPC: 54539
> >>>xprt_prepare_transmit
> >>>Oct 28 12:02:31 k-desktop kernel: [12250.748025] RPC: 54539
> >>>xprt_cwnd_limited cong = 0 cwnd = 256
> >>>Oct 28 12:02:31 k-desktop kernel: [12250.748029] RPC: 54539
> >>>xprt_transmit(120)
> >>>Oct 28 12:02:31 k-desktop kernel: [12250.748058] RPC: 54539 xmit complete
> >>>Oct 28 12:02:32 k-desktop kernel: [12251.852014] RPC: 54539 xprt_timer
> >>>Oct 28 12:02:32 k-desktop kernel: [12251.852018] RPC:
> >>>cong 256, cwnd was 256, now 256
> >>>Oct 28 12:02:32 k-desktop kernel: [12251.852021] RPC: 54539
> >>>xprt_prepare_transmit
> >>>Oct 28 12:02:32 k-desktop kernel: [12251.852023] RPC: 54539
> >>>xprt_cwnd_limited cong = 0 cwnd = 256
> >>>Oct 28 12:02:32 k-desktop kernel: [12251.852027] RPC: 54539
> >>>xprt_transmit(120)
> >>>Oct 28 12:02:32 k-desktop kernel: [12251.852048] RPC: 54539 xmit complete
> >>>Oct 28 12:02:35 k-desktop kernel: [12254.060011] RPC: 54539 xprt_timer
> >>>Oct 28 12:02:35 k-desktop kernel: [12254.060015] RPC:
> >>>cong 256, cwnd was 256, now 256
> >>>Oct 28 12:02:35 k-desktop kernel: [12254.060018] RPC: 54539
> >>>xprt_prepare_transmit
> >>>Oct 28 12:02:35 k-desktop kernel: [12254.060020] RPC: 54539
> >>>xprt_cwnd_limited cong = 0 cwnd = 256
> >>>Oct 28 12:02:35 k-desktop kernel: [12254.060023] RPC: 54539
> >>>xprt_transmit(120)
> >>>Oct 28 12:02:35 k-desktop kernel: [12254.060041] RPC: 54539 xmit complete
> >>>Oct 28 12:02:39 k-desktop kernel: [12258.476012] RPC: 54539 xprt_timer
> >>>Oct 28 12:02:39 k-desktop kernel: [12258.476016] RPC:
> >>>cong 256, cwnd was 256, now 256
> >>>Oct 28 12:02:39 k-desktop kernel: [12258.476020] RPC: 54539
> >>>xprt_prepare_transmit
> >>>Oct 28 12:02:39 k-desktop kernel: [12258.476022] RPC: 54539
> >>>xprt_cwnd_limited cong = 0 cwnd = 256
> >>>Oct 28 12:02:39 k-desktop kernel: [12258.476027] RPC: 54539
> >>>xprt_transmit(120)
> >>>Oct 28 12:02:39 k-desktop kernel: [12258.476052] RPC: 54539 xmit complete
> >>
> >>--
> >>To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >>the body of a message to [email protected]
> >>More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>
> >--
> >To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >the body of a message to [email protected]
> >More majordomo info at http://vger.kernel.org/majordomo-info.html
> >

2015-10-29 14:27:29

by Richard Kojedzinszky

[permalink] [raw]
Subject: Re: nfs client hangs / repeating nfs requests


And do you have idea of what could be the repeating nfs client requests
after unmounting the nfs share? Could there be any cases when a reply
arrives in, but the kernel nfs code does not or can not process it, thus
not changing that request's state?


Kojedzinszky Richard
Euronet Magyarorszag Informatika Zrt.

On Thu, 29 Oct 2015, J. Bruce Fields wrote:

> Date: Thu, 29 Oct 2015 09:22:47 -0400
> From: J. Bruce Fields <[email protected]>
> To: [email protected]
> Cc: [email protected]
> Subject: Re: nfs client hangs / repeating nfs requests
>
> On Thu, Oct 29, 2015 at 12:17:27AM +0100, [email protected] wrote:
>> I am trying to do more dbugging regarding my issue. Now I am at a
>> stage where the nfs mount is unmounted, and still the kernel log
>> shows periodic nfs messages, and also packets transmitted. The nfs
>> mount point is unmounted.
>>
>> Something strange is going here.
>>
>> The nfs was mounted over gre over ipsec over a WAN connection, which
>> may have random packet loss or high RTT. Can this cause the nfs
>> client to behave this way? I suppose it should no.
>
> I'd expect poor performance, but not an indefinite hang.
>
> One possible exception is that if the protocols depend on userspace
> helpers of some sort (e.g. for crypto setup), then there's a potential
> deadlock where the helper needs to allocate memory to make progress, but
> memory allocation is waiting on nfs to write out dirty pages. Since
> you're not talking about a write-heavy workload that's unlikely in your
> case.
>
>> Please tell me what info should I collect for more debugging.
>
> Possible things to try:
>
> - ps to identify which process are hanging.
> - look at network traffic with wireshark or tcpdump--are client
> and server still communicating, and if so what are they doing?
> - echo 0 >/proc/sys/sunrpc/rpc_debug will log information about
> rpc tasks.
> - echo t >/proc/sysrq-trigger will dump information about tasks.
>
> And also check the server to see if it's logging anything or if nfsd is
> hung (but I'm not familiar with freebsd debugging).
>
> --b.
>
>>
>> Unfortunately I could not reproduce this in a virtualised network
>> environment even with netem qdisc to emulate the WAN
>> characteristics. But I am trying to do it so.
>>
>> Thanks in advance,
>>
>> Kojedzinszky Richard
>> Euronet Magyarorszag Informatika Zrt.
>>
>> On Wed, 28 Oct 2015, [email protected] wrote:
>>
>>> Date: Wed, 28 Oct 2015 16:19:50 +0100 (CET)
>>> From: [email protected]
>>> To: J. Bruce Fields <[email protected]>
>>> Cc: [email protected]
>>> Subject: Re: nfs client hangs / repeating nfs requests
>>>
>>> Dear Bruce,
>>>
>>> That was an n-th experiment. Actually, with default settings the
>>> hang still happens, but I've just noticed the repeating packets
>>> with these settings. I've chosen 1024 sizes to fit the requests in
>>> one packets, to avoid any ip/udp fragmentation.
>>>
>>> The server is a freebsd nfs. I am struggling with this setup for a
>>> few months now, and I dont know, what is the cause of the hang. I
>>> could hardly believe that the nfs server is buggy, as many people
>>> around the world is using freebsd as an nfs server. The same for
>>> linux, I hardly can believe that the nfs client is faulty, as
>>> again many people is using it.
>>>
>>> But still I have the issue, hanging nfs mounts.
>>>
>>> I will report on any advance.
>>>
>>> Regards,
>>>
>>> Kojedzinszky Richard
>>> Euronet Magyarorszag Informatika Zrt.
>>>
>>> On Wed, 28 Oct 2015, J. Bruce Fields wrote:
>>>
>>>> Date: Wed, 28 Oct 2015 09:57:42 -0400
>>>> From: J. Bruce Fields <[email protected]>
>>>> To: [email protected]
>>>> Cc: [email protected]
>>>> Subject: Re: nfs client hangs / repeating nfs requests
>>>>
>>>> On Wed, Oct 28, 2015 at 12:06:02PM +0100, [email protected] wrote:
>>>>> Dear all,
>>>>>
>>>>> I am again having some trouble with nfs client. Actually I am
>>>>> running minidlna over nfs over gre over ipsec configuration, but I
>>>>> suppose the latter two should not count.
>>>>>
>>>>> Minidlna scans the entire directory, reading all files (or just
>>>>> parts), and while doing it, sometimes the process hangs.
>>>>>
>>>>> I could emulate this with reading the first 16k off all files from
>>>>> nfs, and fortunately the effect is the same.
>>>>>
>>>>> After a hang/recovery, no userspace is accessing the nfs right now,
>>>>> but tcpdump shows repeating requests/replies (attached). Meanwhile
>>>>> the rpc debug also emits logs.
>>>>>
>>>>> # uname -a
>>>>> Linux k-desktop 4.1.0-2-amd64 #1 SMP Debian 4.1.6-1 (2015-08-23)
>>>>> x86_64 GNU/Linux
>>>>> # mount | grep nfs
>>>>> 10.0.0.2:/home/kh on /remote/kh type nfs (rw,relatime,vers=3,rsize=1024,wsize=1024,namlen=255,hard,nolock,proto=udp,timeo=11,retrans=3,sec=sys,mountaddr=10.0.0.2,mountvers=3,mountport=644,mountproto=udp,local_lock=all,addr=10.0.0.2)
>>>>
>>>> rsize=1024,wsize=1024,proto=udp is unusual and not normally what we'd
>>>> recommend. Is there a reason for not sticking with defaults there?
>>>>
>>>> But I don't have any explanation for the hang.
>>>>
>>>> --b.
>>>>
>>>>> Any suggestions?
>>>>>
>>>>> Kojedzinszky Richard
>>>>> Euronet Magyarorszag Informatika Zrt.
>>>>
>>>>
>>>>> Oct 28 12:02:18 k-desktop kernel: [12237.516020] RPC: 54539
>>>>> xprt_prepare_transmit
>>>>> Oct 28 12:02:18 k-desktop kernel: [12237.516022] RPC: 54539
>>>>> xprt_cwnd_limited cong = 0 cwnd = 256
>>>>> Oct 28 12:02:18 k-desktop kernel: [12237.516027] RPC: 54539
>>>>> xprt_transmit(120)
>>>>> Oct 28 12:02:18 k-desktop kernel: [12237.516054] RPC: 54539 xmit complete
>>>>> Oct 28 12:02:22 k-desktop kernel: [12241.932011] RPC: 54539 xprt_timer
>>>>> Oct 28 12:02:22 k-desktop kernel: [12241.932016] RPC:
>>>>> cong 256, cwnd was 256, now 256
>>>>> Oct 28 12:02:22 k-desktop kernel: [12241.932020] RPC: 54539
>>>>> xprt_prepare_transmit
>>>>> Oct 28 12:02:22 k-desktop kernel: [12241.932022] RPC: 54539
>>>>> xprt_cwnd_limited cong = 0 cwnd = 256
>>>>> Oct 28 12:02:22 k-desktop kernel: [12241.932027] RPC: 54539
>>>>> xprt_transmit(120)
>>>>> Oct 28 12:02:22 k-desktop kernel: [12241.932054] RPC: 54539 xmit complete
>>>>> Oct 28 12:02:31 k-desktop kernel: [12250.748013] RPC: 54539 xprt_timer
>>>>> Oct 28 12:02:31 k-desktop kernel: [12250.748018] RPC:
>>>>> cong 256, cwnd was 256, now 256
>>>>> Oct 28 12:02:31 k-desktop kernel: [12250.748023] RPC: 54539
>>>>> xprt_prepare_transmit
>>>>> Oct 28 12:02:31 k-desktop kernel: [12250.748025] RPC: 54539
>>>>> xprt_cwnd_limited cong = 0 cwnd = 256
>>>>> Oct 28 12:02:31 k-desktop kernel: [12250.748029] RPC: 54539
>>>>> xprt_transmit(120)
>>>>> Oct 28 12:02:31 k-desktop kernel: [12250.748058] RPC: 54539 xmit complete
>>>>> Oct 28 12:02:32 k-desktop kernel: [12251.852014] RPC: 54539 xprt_timer
>>>>> Oct 28 12:02:32 k-desktop kernel: [12251.852018] RPC:
>>>>> cong 256, cwnd was 256, now 256
>>>>> Oct 28 12:02:32 k-desktop kernel: [12251.852021] RPC: 54539
>>>>> xprt_prepare_transmit
>>>>> Oct 28 12:02:32 k-desktop kernel: [12251.852023] RPC: 54539
>>>>> xprt_cwnd_limited cong = 0 cwnd = 256
>>>>> Oct 28 12:02:32 k-desktop kernel: [12251.852027] RPC: 54539
>>>>> xprt_transmit(120)
>>>>> Oct 28 12:02:32 k-desktop kernel: [12251.852048] RPC: 54539 xmit complete
>>>>> Oct 28 12:02:35 k-desktop kernel: [12254.060011] RPC: 54539 xprt_timer
>>>>> Oct 28 12:02:35 k-desktop kernel: [12254.060015] RPC:
>>>>> cong 256, cwnd was 256, now 256
>>>>> Oct 28 12:02:35 k-desktop kernel: [12254.060018] RPC: 54539
>>>>> xprt_prepare_transmit
>>>>> Oct 28 12:02:35 k-desktop kernel: [12254.060020] RPC: 54539
>>>>> xprt_cwnd_limited cong = 0 cwnd = 256
>>>>> Oct 28 12:02:35 k-desktop kernel: [12254.060023] RPC: 54539
>>>>> xprt_transmit(120)
>>>>> Oct 28 12:02:35 k-desktop kernel: [12254.060041] RPC: 54539 xmit complete
>>>>> Oct 28 12:02:39 k-desktop kernel: [12258.476012] RPC: 54539 xprt_timer
>>>>> Oct 28 12:02:39 k-desktop kernel: [12258.476016] RPC:
>>>>> cong 256, cwnd was 256, now 256
>>>>> Oct 28 12:02:39 k-desktop kernel: [12258.476020] RPC: 54539
>>>>> xprt_prepare_transmit
>>>>> Oct 28 12:02:39 k-desktop kernel: [12258.476022] RPC: 54539
>>>>> xprt_cwnd_limited cong = 0 cwnd = 256
>>>>> Oct 28 12:02:39 k-desktop kernel: [12258.476027] RPC: 54539
>>>>> xprt_transmit(120)
>>>>> Oct 28 12:02:39 k-desktop kernel: [12258.476052] RPC: 54539 xmit complete
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2015-10-30 17:48:25

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfs client hangs / repeating nfs requests

On Thu, Oct 29, 2015 at 03:27:25PM +0100, [email protected] wrote:
> And do you have idea of what could be the repeating nfs client
> requests after unmounting the nfs share? Could there be any cases
> when a reply arrives in, but the kernel nfs code does not or can not
> process it, thus not changing that request's state?

I missed your description of what was happening with those requests,
sorry.

Also, I'm a little confused by your description of the hangs. Do they
last until you give up and power off the machine, or do they resolve
themselves eventually? If the latter then for all we know that might
just be what happens over a network with some random packet loss.

--b.

>
>
> Kojedzinszky Richard
> Euronet Magyarorszag Informatika Zrt.
>
> On Thu, 29 Oct 2015, J. Bruce Fields wrote:
>
> >Date: Thu, 29 Oct 2015 09:22:47 -0400
> >From: J. Bruce Fields <[email protected]>
> >To: [email protected]
> >Cc: [email protected]
> >Subject: Re: nfs client hangs / repeating nfs requests
> >
> >On Thu, Oct 29, 2015 at 12:17:27AM +0100, [email protected] wrote:
> >>I am trying to do more dbugging regarding my issue. Now I am at a
> >>stage where the nfs mount is unmounted, and still the kernel log
> >>shows periodic nfs messages, and also packets transmitted. The nfs
> >>mount point is unmounted.
> >>
> >>Something strange is going here.
> >>
> >>The nfs was mounted over gre over ipsec over a WAN connection, which
> >>may have random packet loss or high RTT. Can this cause the nfs
> >>client to behave this way? I suppose it should no.
> >
> >I'd expect poor performance, but not an indefinite hang.
> >
> >One possible exception is that if the protocols depend on userspace
> >helpers of some sort (e.g. for crypto setup), then there's a potential
> >deadlock where the helper needs to allocate memory to make progress, but
> >memory allocation is waiting on nfs to write out dirty pages. Since
> >you're not talking about a write-heavy workload that's unlikely in your
> >case.
> >
> >>Please tell me what info should I collect for more debugging.
> >
> >Possible things to try:
> >
> > - ps to identify which process are hanging.
> > - look at network traffic with wireshark or tcpdump--are client
> > and server still communicating, and if so what are they doing?
> > - echo 0 >/proc/sys/sunrpc/rpc_debug will log information about
> > rpc tasks.
> > - echo t >/proc/sysrq-trigger will dump information about tasks.
> >
> >And also check the server to see if it's logging anything or if nfsd is
> >hung (but I'm not familiar with freebsd debugging).
> >
> >--b.
> >
> >>
> >>Unfortunately I could not reproduce this in a virtualised network
> >>environment even with netem qdisc to emulate the WAN
> >>characteristics. But I am trying to do it so.
> >>
> >>Thanks in advance,
> >>
> >>Kojedzinszky Richard
> >>Euronet Magyarorszag Informatika Zrt.
> >>
> >>On Wed, 28 Oct 2015, [email protected] wrote:
> >>
> >>>Date: Wed, 28 Oct 2015 16:19:50 +0100 (CET)
> >>>From: [email protected]
> >>>To: J. Bruce Fields <[email protected]>
> >>>Cc: [email protected]
> >>>Subject: Re: nfs client hangs / repeating nfs requests
> >>>
> >>>Dear Bruce,
> >>>
> >>>That was an n-th experiment. Actually, with default settings the
> >>>hang still happens, but I've just noticed the repeating packets
> >>>with these settings. I've chosen 1024 sizes to fit the requests in
> >>>one packets, to avoid any ip/udp fragmentation.
> >>>
> >>>The server is a freebsd nfs. I am struggling with this setup for a
> >>>few months now, and I dont know, what is the cause of the hang. I
> >>>could hardly believe that the nfs server is buggy, as many people
> >>>around the world is using freebsd as an nfs server. The same for
> >>>linux, I hardly can believe that the nfs client is faulty, as
> >>>again many people is using it.
> >>>
> >>>But still I have the issue, hanging nfs mounts.
> >>>
> >>>I will report on any advance.
> >>>
> >>>Regards,
> >>>
> >>>Kojedzinszky Richard
> >>>Euronet Magyarorszag Informatika Zrt.
> >>>
> >>>On Wed, 28 Oct 2015, J. Bruce Fields wrote:
> >>>
> >>>>Date: Wed, 28 Oct 2015 09:57:42 -0400
> >>>>From: J. Bruce Fields <[email protected]>
> >>>>To: [email protected]
> >>>>Cc: [email protected]
> >>>>Subject: Re: nfs client hangs / repeating nfs requests
> >>>>
> >>>>On Wed, Oct 28, 2015 at 12:06:02PM +0100, [email protected] wrote:
> >>>>>Dear all,
> >>>>>
> >>>>>I am again having some trouble with nfs client. Actually I am
> >>>>>running minidlna over nfs over gre over ipsec configuration, but I
> >>>>>suppose the latter two should not count.
> >>>>>
> >>>>>Minidlna scans the entire directory, reading all files (or just
> >>>>>parts), and while doing it, sometimes the process hangs.
> >>>>>
> >>>>>I could emulate this with reading the first 16k off all files from
> >>>>>nfs, and fortunately the effect is the same.
> >>>>>
> >>>>>After a hang/recovery, no userspace is accessing the nfs right now,
> >>>>>but tcpdump shows repeating requests/replies (attached). Meanwhile
> >>>>>the rpc debug also emits logs.
> >>>>>
> >>>>># uname -a
> >>>>>Linux k-desktop 4.1.0-2-amd64 #1 SMP Debian 4.1.6-1 (2015-08-23)
> >>>>>x86_64 GNU/Linux
> >>>>># mount | grep nfs
> >>>>>10.0.0.2:/home/kh on /remote/kh type nfs (rw,relatime,vers=3,rsize=1024,wsize=1024,namlen=255,hard,nolock,proto=udp,timeo=11,retrans=3,sec=sys,mountaddr=10.0.0.2,mountvers=3,mountport=644,mountproto=udp,local_lock=all,addr=10.0.0.2)
> >>>>
> >>>>rsize=1024,wsize=1024,proto=udp is unusual and not normally what we'd
> >>>>recommend. Is there a reason for not sticking with defaults there?
> >>>>
> >>>>But I don't have any explanation for the hang.
> >>>>
> >>>>--b.
> >>>>
> >>>>>Any suggestions?
> >>>>>
> >>>>>Kojedzinszky Richard
> >>>>>Euronet Magyarorszag Informatika Zrt.
> >>>>
> >>>>
> >>>>>Oct 28 12:02:18 k-desktop kernel: [12237.516020] RPC: 54539
> >>>>>xprt_prepare_transmit
> >>>>>Oct 28 12:02:18 k-desktop kernel: [12237.516022] RPC: 54539
> >>>>>xprt_cwnd_limited cong = 0 cwnd = 256
> >>>>>Oct 28 12:02:18 k-desktop kernel: [12237.516027] RPC: 54539
> >>>>>xprt_transmit(120)
> >>>>>Oct 28 12:02:18 k-desktop kernel: [12237.516054] RPC: 54539 xmit complete
> >>>>>Oct 28 12:02:22 k-desktop kernel: [12241.932011] RPC: 54539 xprt_timer
> >>>>>Oct 28 12:02:22 k-desktop kernel: [12241.932016] RPC:
> >>>>>cong 256, cwnd was 256, now 256
> >>>>>Oct 28 12:02:22 k-desktop kernel: [12241.932020] RPC: 54539
> >>>>>xprt_prepare_transmit
> >>>>>Oct 28 12:02:22 k-desktop kernel: [12241.932022] RPC: 54539
> >>>>>xprt_cwnd_limited cong = 0 cwnd = 256
> >>>>>Oct 28 12:02:22 k-desktop kernel: [12241.932027] RPC: 54539
> >>>>>xprt_transmit(120)
> >>>>>Oct 28 12:02:22 k-desktop kernel: [12241.932054] RPC: 54539 xmit complete
> >>>>>Oct 28 12:02:31 k-desktop kernel: [12250.748013] RPC: 54539 xprt_timer
> >>>>>Oct 28 12:02:31 k-desktop kernel: [12250.748018] RPC:
> >>>>>cong 256, cwnd was 256, now 256
> >>>>>Oct 28 12:02:31 k-desktop kernel: [12250.748023] RPC: 54539
> >>>>>xprt_prepare_transmit
> >>>>>Oct 28 12:02:31 k-desktop kernel: [12250.748025] RPC: 54539
> >>>>>xprt_cwnd_limited cong = 0 cwnd = 256
> >>>>>Oct 28 12:02:31 k-desktop kernel: [12250.748029] RPC: 54539
> >>>>>xprt_transmit(120)
> >>>>>Oct 28 12:02:31 k-desktop kernel: [12250.748058] RPC: 54539 xmit complete
> >>>>>Oct 28 12:02:32 k-desktop kernel: [12251.852014] RPC: 54539 xprt_timer
> >>>>>Oct 28 12:02:32 k-desktop kernel: [12251.852018] RPC:
> >>>>>cong 256, cwnd was 256, now 256
> >>>>>Oct 28 12:02:32 k-desktop kernel: [12251.852021] RPC: 54539
> >>>>>xprt_prepare_transmit
> >>>>>Oct 28 12:02:32 k-desktop kernel: [12251.852023] RPC: 54539
> >>>>>xprt_cwnd_limited cong = 0 cwnd = 256
> >>>>>Oct 28 12:02:32 k-desktop kernel: [12251.852027] RPC: 54539
> >>>>>xprt_transmit(120)
> >>>>>Oct 28 12:02:32 k-desktop kernel: [12251.852048] RPC: 54539 xmit complete
> >>>>>Oct 28 12:02:35 k-desktop kernel: [12254.060011] RPC: 54539 xprt_timer
> >>>>>Oct 28 12:02:35 k-desktop kernel: [12254.060015] RPC:
> >>>>>cong 256, cwnd was 256, now 256
> >>>>>Oct 28 12:02:35 k-desktop kernel: [12254.060018] RPC: 54539
> >>>>>xprt_prepare_transmit
> >>>>>Oct 28 12:02:35 k-desktop kernel: [12254.060020] RPC: 54539
> >>>>>xprt_cwnd_limited cong = 0 cwnd = 256
> >>>>>Oct 28 12:02:35 k-desktop kernel: [12254.060023] RPC: 54539
> >>>>>xprt_transmit(120)
> >>>>>Oct 28 12:02:35 k-desktop kernel: [12254.060041] RPC: 54539 xmit complete
> >>>>>Oct 28 12:02:39 k-desktop kernel: [12258.476012] RPC: 54539 xprt_timer
> >>>>>Oct 28 12:02:39 k-desktop kernel: [12258.476016] RPC:
> >>>>>cong 256, cwnd was 256, now 256
> >>>>>Oct 28 12:02:39 k-desktop kernel: [12258.476020] RPC: 54539
> >>>>>xprt_prepare_transmit
> >>>>>Oct 28 12:02:39 k-desktop kernel: [12258.476022] RPC: 54539
> >>>>>xprt_cwnd_limited cong = 0 cwnd = 256
> >>>>>Oct 28 12:02:39 k-desktop kernel: [12258.476027] RPC: 54539
> >>>>>xprt_transmit(120)
> >>>>>Oct 28 12:02:39 k-desktop kernel: [12258.476052] RPC: 54539 xmit complete
> >>>>
> >>>>--
> >>>>To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >>>>the body of a message to [email protected]
> >>>>More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>>>
> >>>--
> >>>To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >>>the body of a message to [email protected]
> >>>More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>>
> >--
> >To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >the body of a message to [email protected]
> >More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2016-02-02 14:00:44

by Richard Kojedzinszky

[permalink] [raw]
Subject: Re: nfs client hangs / repeating nfs requests

Dear all,

Unfortunately my issue is still there, just I had no time to play with it.
So for now, I am attaching a tcpdump at the server and at the client side.
What I see is some kind of tcp retransmissions/anomalities, which causes
the whole nfs mount to block. The very strange thing is that I am
reproducing this with the attached script, and it always stops at the very
same file, after IMGP0731.JPG.

I hope someone could give some ideas, what can be wrong. Unfortunately I
wont give my files for testing, but I can reproduce it anytime.

I have arount ~20 msec rtt to the nfs server. There are no packet losses,
you will see in both capture files, but for some reason nfs hangs at the
very same file. And after some time (few minutes) it passes through.

I've packed the test script and the two captures here:

http://krichy.web.nmdps.net/nfs/files.tar.gz

I am using debian jessie's backport kernel (4.3.3-7) right now on the
client, and the server is a freebsd 10.2.

Many thanks in advance,
Kojedzinszky Rich?rd
Euronet Magyarorszag Informatika Zrt.

On Fri, 30 Oct 2015, J. Bruce Fields wrote:

> Date: Fri, 30 Oct 2015 13:48:23 -0400
> From: J. Bruce Fields <[email protected]>
> To: [email protected]
> Cc: [email protected]
> Subject: Re: nfs client hangs / repeating nfs requests
>
> On Thu, Oct 29, 2015 at 03:27:25PM +0100, [email protected] wrote:
>> And do you have idea of what could be the repeating nfs client
>> requests after unmounting the nfs share? Could there be any cases
>> when a reply arrives in, but the kernel nfs code does not or can not
>> process it, thus not changing that request's state?
>
> I missed your description of what was happening with those requests,
> sorry.
>
> Also, I'm a little confused by your description of the hangs. Do they
> last until you give up and power off the machine, or do they resolve
> themselves eventually? If the latter then for all we know that might
> just be what happens over a network with some random packet loss.
>
> --b.
>
>>
>>
>> Kojedzinszky Richard
>> Euronet Magyarorszag Informatika Zrt.
>>
>> On Thu, 29 Oct 2015, J. Bruce Fields wrote:
>>
>>> Date: Thu, 29 Oct 2015 09:22:47 -0400
>>> From: J. Bruce Fields <[email protected]>
>>> To: [email protected]
>>> Cc: [email protected]
>>> Subject: Re: nfs client hangs / repeating nfs requests
>>>
>>> On Thu, Oct 29, 2015 at 12:17:27AM +0100, [email protected] wrote:
>>>> I am trying to do more dbugging regarding my issue. Now I am at a
>>>> stage where the nfs mount is unmounted, and still the kernel log
>>>> shows periodic nfs messages, and also packets transmitted. The nfs
>>>> mount point is unmounted.
>>>>
>>>> Something strange is going here.
>>>>
>>>> The nfs was mounted over gre over ipsec over a WAN connection, which
>>>> may have random packet loss or high RTT. Can this cause the nfs
>>>> client to behave this way? I suppose it should no.
>>>
>>> I'd expect poor performance, but not an indefinite hang.
>>>
>>> One possible exception is that if the protocols depend on userspace
>>> helpers of some sort (e.g. for crypto setup), then there's a potential
>>> deadlock where the helper needs to allocate memory to make progress, but
>>> memory allocation is waiting on nfs to write out dirty pages. Since
>>> you're not talking about a write-heavy workload that's unlikely in your
>>> case.
>>>
>>>> Please tell me what info should I collect for more debugging.
>>>
>>> Possible things to try:
>>>
>>> - ps to identify which process are hanging.
>>> - look at network traffic with wireshark or tcpdump--are client
>>> and server still communicating, and if so what are they doing?
>>> - echo 0 >/proc/sys/sunrpc/rpc_debug will log information about
>>> rpc tasks.
>>> - echo t >/proc/sysrq-trigger will dump information about tasks.
>>>
>>> And also check the server to see if it's logging anything or if nfsd is
>>> hung (but I'm not familiar with freebsd debugging).
>>>
>>> --b.
>>>
>>>>
>>>> Unfortunately I could not reproduce this in a virtualised network
>>>> environment even with netem qdisc to emulate the WAN
>>>> characteristics. But I am trying to do it so.
>>>>
>>>> Thanks in advance,
>>>>
>>>> Kojedzinszky Richard
>>>> Euronet Magyarorszag Informatika Zrt.
>>>>
>>>> On Wed, 28 Oct 2015, [email protected] wrote:
>>>>
>>>>> Date: Wed, 28 Oct 2015 16:19:50 +0100 (CET)
>>>>> From: [email protected]
>>>>> To: J. Bruce Fields <[email protected]>
>>>>> Cc: [email protected]
>>>>> Subject: Re: nfs client hangs / repeating nfs requests
>>>>>
>>>>> Dear Bruce,
>>>>>
>>>>> That was an n-th experiment. Actually, with default settings the
>>>>> hang still happens, but I've just noticed the repeating packets
>>>>> with these settings. I've chosen 1024 sizes to fit the requests in
>>>>> one packets, to avoid any ip/udp fragmentation.
>>>>>
>>>>> The server is a freebsd nfs. I am struggling with this setup for a
>>>>> few months now, and I dont know, what is the cause of the hang. I
>>>>> could hardly believe that the nfs server is buggy, as many people
>>>>> around the world is using freebsd as an nfs server. The same for
>>>>> linux, I hardly can believe that the nfs client is faulty, as
>>>>> again many people is using it.
>>>>>
>>>>> But still I have the issue, hanging nfs mounts.
>>>>>
>>>>> I will report on any advance.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Kojedzinszky Richard
>>>>> Euronet Magyarorszag Informatika Zrt.
>>>>>
>>>>> On Wed, 28 Oct 2015, J. Bruce Fields wrote:
>>>>>
>>>>>> Date: Wed, 28 Oct 2015 09:57:42 -0400
>>>>>> From: J. Bruce Fields <[email protected]>
>>>>>> To: [email protected]
>>>>>> Cc: [email protected]
>>>>>> Subject: Re: nfs client hangs / repeating nfs requests
>>>>>>
>>>>>> On Wed, Oct 28, 2015 at 12:06:02PM +0100, [email protected] wrote:
>>>>>>> Dear all,
>>>>>>>
>>>>>>> I am again having some trouble with nfs client. Actually I am
>>>>>>> running minidlna over nfs over gre over ipsec configuration, but I
>>>>>>> suppose the latter two should not count.
>>>>>>>
>>>>>>> Minidlna scans the entire directory, reading all files (or just
>>>>>>> parts), and while doing it, sometimes the process hangs.
>>>>>>>
>>>>>>> I could emulate this with reading the first 16k off all files from
>>>>>>> nfs, and fortunately the effect is the same.
>>>>>>>
>>>>>>> After a hang/recovery, no userspace is accessing the nfs right now,
>>>>>>> but tcpdump shows repeating requests/replies (attached). Meanwhile
>>>>>>> the rpc debug also emits logs.
>>>>>>>
>>>>>>> # uname -a
>>>>>>> Linux k-desktop 4.1.0-2-amd64 #1 SMP Debian 4.1.6-1 (2015-08-23)
>>>>>>> x86_64 GNU/Linux
>>>>>>> # mount | grep nfs
>>>>>>> 10.0.0.2:/home/kh on /remote/kh type nfs (rw,relatime,vers=3,rsize=1024,wsize=1024,namlen=255,hard,nolock,proto=udp,timeo=11,retrans=3,sec=sys,mountaddr=10.0.0.2,mountvers=3,mountport=644,mountproto=udp,local_lock=all,addr=10.0.0.2)
>>>>>>
>>>>>> rsize=1024,wsize=1024,proto=udp is unusual and not normally what we'd
>>>>>> recommend. Is there a reason for not sticking with defaults there?
>>>>>>
>>>>>> But I don't have any explanation for the hang.
>>>>>>
>>>>>> --b.
>>>>>>
>>>>>>> Any suggestions?
>>>>>>>
>>>>>>> Kojedzinszky Richard
>>>>>>> Euronet Magyarorszag Informatika Zrt.
>>>>>>
>>>>>>
>>>>>>> Oct 28 12:02:18 k-desktop kernel: [12237.516020] RPC: 54539
>>>>>>> xprt_prepare_transmit
>>>>>>> Oct 28 12:02:18 k-desktop kernel: [12237.516022] RPC: 54539
>>>>>>> xprt_cwnd_limited cong = 0 cwnd = 256
>>>>>>> Oct 28 12:02:18 k-desktop kernel: [12237.516027] RPC: 54539
>>>>>>> xprt_transmit(120)
>>>>>>> Oct 28 12:02:18 k-desktop kernel: [12237.516054] RPC: 54539 xmit complete
>>>>>>> Oct 28 12:02:22 k-desktop kernel: [12241.932011] RPC: 54539 xprt_timer
>>>>>>> Oct 28 12:02:22 k-desktop kernel: [12241.932016] RPC:
>>>>>>> cong 256, cwnd was 256, now 256
>>>>>>> Oct 28 12:02:22 k-desktop kernel: [12241.932020] RPC: 54539
>>>>>>> xprt_prepare_transmit
>>>>>>> Oct 28 12:02:22 k-desktop kernel: [12241.932022] RPC: 54539
>>>>>>> xprt_cwnd_limited cong = 0 cwnd = 256
>>>>>>> Oct 28 12:02:22 k-desktop kernel: [12241.932027] RPC: 54539
>>>>>>> xprt_transmit(120)
>>>>>>> Oct 28 12:02:22 k-desktop kernel: [12241.932054] RPC: 54539 xmit complete
>>>>>>> Oct 28 12:02:31 k-desktop kernel: [12250.748013] RPC: 54539 xprt_timer
>>>>>>> Oct 28 12:02:31 k-desktop kernel: [12250.748018] RPC:
>>>>>>> cong 256, cwnd was 256, now 256
>>>>>>> Oct 28 12:02:31 k-desktop kernel: [12250.748023] RPC: 54539
>>>>>>> xprt_prepare_transmit
>>>>>>> Oct 28 12:02:31 k-desktop kernel: [12250.748025] RPC: 54539
>>>>>>> xprt_cwnd_limited cong = 0 cwnd = 256
>>>>>>> Oct 28 12:02:31 k-desktop kernel: [12250.748029] RPC: 54539
>>>>>>> xprt_transmit(120)
>>>>>>> Oct 28 12:02:31 k-desktop kernel: [12250.748058] RPC: 54539 xmit complete
>>>>>>> Oct 28 12:02:32 k-desktop kernel: [12251.852014] RPC: 54539 xprt_timer
>>>>>>> Oct 28 12:02:32 k-desktop kernel: [12251.852018] RPC:
>>>>>>> cong 256, cwnd was 256, now 256
>>>>>>> Oct 28 12:02:32 k-desktop kernel: [12251.852021] RPC: 54539
>>>>>>> xprt_prepare_transmit
>>>>>>> Oct 28 12:02:32 k-desktop kernel: [12251.852023] RPC: 54539
>>>>>>> xprt_cwnd_limited cong = 0 cwnd = 256
>>>>>>> Oct 28 12:02:32 k-desktop kernel: [12251.852027] RPC: 54539
>>>>>>> xprt_transmit(120)
>>>>>>> Oct 28 12:02:32 k-desktop kernel: [12251.852048] RPC: 54539 xmit complete
>>>>>>> Oct 28 12:02:35 k-desktop kernel: [12254.060011] RPC: 54539 xprt_timer
>>>>>>> Oct 28 12:02:35 k-desktop kernel: [12254.060015] RPC:
>>>>>>> cong 256, cwnd was 256, now 256
>>>>>>> Oct 28 12:02:35 k-desktop kernel: [12254.060018] RPC: 54539
>>>>>>> xprt_prepare_transmit
>>>>>>> Oct 28 12:02:35 k-desktop kernel: [12254.060020] RPC: 54539
>>>>>>> xprt_cwnd_limited cong = 0 cwnd = 256
>>>>>>> Oct 28 12:02:35 k-desktop kernel: [12254.060023] RPC: 54539
>>>>>>> xprt_transmit(120)
>>>>>>> Oct 28 12:02:35 k-desktop kernel: [12254.060041] RPC: 54539 xmit complete
>>>>>>> Oct 28 12:02:39 k-desktop kernel: [12258.476012] RPC: 54539 xprt_timer
>>>>>>> Oct 28 12:02:39 k-desktop kernel: [12258.476016] RPC:
>>>>>>> cong 256, cwnd was 256, now 256
>>>>>>> Oct 28 12:02:39 k-desktop kernel: [12258.476020] RPC: 54539
>>>>>>> xprt_prepare_transmit
>>>>>>> Oct 28 12:02:39 k-desktop kernel: [12258.476022] RPC: 54539
>>>>>>> xprt_cwnd_limited cong = 0 cwnd = 256
>>>>>>> Oct 28 12:02:39 k-desktop kernel: [12258.476027] RPC: 54539
>>>>>>> xprt_transmit(120)
>>>>>>> Oct 28 12:02:39 k-desktop kernel: [12258.476052] RPC: 54539 xmit complete
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>>> the body of a message to [email protected]
>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>> the body of a message to [email protected]
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>