2021-04-25 02:29:42

by Rick Macklem

[permalink] [raw]
Subject: weird Linux client behaviour when there are multiple concurrent CB_RECALLs

Hi,

I have been running a simple test using two clients (one FreeBSD
and the other Linux, Ferdora Core30, 5.2 kernel) with delegations
enabled in the server.

The test consists of running the connectathon general tests
alternately om each client, using the same directory on the
server.
--> As such, each one results in CB_RECALLs of delegations
from the other client.
Everything seems fine until the server does multiple concurrent
CB_RECALLs for different files/delegations using different
callback session slots.
--> Then the Linux client decides it must create a new connection,
which breaks the back channel.
After 0.1sec, the FreeBSD server notices the broken back
channel and starts setting SEQ4_STATUS_CB_PATH_DOWN.
--> 15sec after that, the Linux client does a BindConnectionToSession
and things start working again.

The mystery to me is why the client decides to create a new TCP
connection, forcing this 15sec hickup each time it happens?

If you are interested in looking at a packet capture. you can
% fetch https://people.freebsd.org/~rmacklem/twoclientdeleg.pcap
There are multiple examples in it. One is at:
packet# 3518, 3520, 3521 CB_RECALL requests for 3 different delegations
time 137.5
--> This is followed by a close and open of a new TCP connection...
packet# 3582 - first one with SEQ4_STATUS_CB_PATH_DOWN at
time 137.6
packet# 3604 - client does a bindconnectiontosession at
time 152.7
Then things start to happen again...
192.168.1.5 - FreeBSD server
192.168.1.6 - Linux client
192.168.1.13 - FreeBSD client

If this is a known issue that you think is fixed in a more recent
Linux kernel, then sorry about the noise.

rick


2021-04-25 17:27:48

by Trond Myklebust

[permalink] [raw]
Subject: Re: weird Linux client behaviour when there are multiple concurrent CB_RECALLs

On Sun, 2021-04-25 at 02:29 +0000, Rick Macklem wrote:
> Hi,
>
> I have been running a simple test using two clients (one FreeBSD
> and the other Linux, Ferdora Core30, 5.2 kernel) with delegations
> enabled in the server.
>
> The test consists of running the connectathon general tests
> alternately om each client, using the same directory on the
> server.
> --> As such, each one results in CB_RECALLs of delegations
>       from the other client.
> Everything seems fine until the server does multiple concurrent
> CB_RECALLs for different files/delegations using different
> callback session slots.
> --> Then the Linux client decides it must create a new connection,
>        which breaks the back channel.
>        After 0.1sec, the FreeBSD server notices the broken back
>        channel and starts setting SEQ4_STATUS_CB_PATH_DOWN.
>        --> 15sec after that, the Linux client does a
> BindConnectionToSession
>               and things start working again.
>
> The mystery to me is why the client decides to create a new TCP
> connection, forcing this 15sec hickup each time it happens?
>
> If you are interested in looking at a packet capture. you can
> % fetch https://people.freebsd.org/~rmacklem/twoclientdeleg.pcap
> There are multiple examples in it. One is at:
> packet# 3518, 3520, 3521 CB_RECALL requests for 3 different
> delegations
>                          time 137.5
> --> This is followed by a close and open of a new TCP connection...
> packet# 3582 - first one with SEQ4_STATUS_CB_PATH_DOWN at
>                          time 137.6
> packet# 3604 - client does a bindconnectiontosession at
>                          time 152.7
> Then things start to happen again...
> 192.168.1.5 - FreeBSD server
> 192.168.1.6 - Linux client
> 192.168.1.13 - FreeBSD client
>
> If this is a known issue that you think is fixed in a more recent
> Linux kernel, then sorry about the noise.
>

Should have been fixed in Linux 5.3 by commit 7402a4fedc2b ("SUNRPC:
Fix up backchannel slot table accounting") AFAICT.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2021-04-25 21:46:25

by Rick Macklem

[permalink] [raw]
Subject: Re: weird Linux client behaviour when there are multiple concurrent CB_RECALLs

Thanks Trond. I upgrade my kernel one of these days
and test it.

rick

________________________________________
From: Trond Myklebust <[email protected]>
Sent: Sunday, April 25, 2021 1:27 PM
To: [email protected]; Rick Macklem
Subject: Re: weird Linux client behaviour when there are multiple concurrent CB_RECALLs

CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to [email protected]


On Sun, 2021-04-25 at 02:29 +0000, Rick Macklem wrote:
> Hi,
>
> I have been running a simple test using two clients (one FreeBSD
> and the other Linux, Ferdora Core30, 5.2 kernel) with delegations
> enabled in the server.
>
> The test consists of running the connectathon general tests
> alternately om each client, using the same directory on the
> server.
> --> As such, each one results in CB_RECALLs of delegations
> from the other client.
> Everything seems fine until the server does multiple concurrent
> CB_RECALLs for different files/delegations using different
> callback session slots.
> --> Then the Linux client decides it must create a new connection,
> which breaks the back channel.
> After 0.1sec, the FreeBSD server notices the broken back
> channel and starts setting SEQ4_STATUS_CB_PATH_DOWN.
> --> 15sec after that, the Linux client does a
> BindConnectionToSession
> and things start working again.
>
> The mystery to me is why the client decides to create a new TCP
> connection, forcing this 15sec hickup each time it happens?
>
> If you are interested in looking at a packet capture. you can
> % fetch https://people.freebsd.org/~rmacklem/twoclientdeleg.pcap
> There are multiple examples in it. One is at:
> packet# 3518, 3520, 3521 CB_RECALL requests for 3 different
> delegations
> time 137.5
> --> This is followed by a close and open of a new TCP connection...
> packet# 3582 - first one with SEQ4_STATUS_CB_PATH_DOWN at
> time 137.6
> packet# 3604 - client does a bindconnectiontosession at
> time 152.7
> Then things start to happen again...
> 192.168.1.5 - FreeBSD server
> 192.168.1.6 - Linux client
> 192.168.1.13 - FreeBSD client
>
> If this is a known issue that you think is fixed in a more recent
> Linux kernel, then sorry about the noise.
>

Should have been fixed in Linux 5.3 by commit 7402a4fedc2b ("SUNRPC:
Fix up backchannel slot table accounting") AFAICT.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]