LinuxLists.cc - question about handling off an unresponsive server during lease renewal

2020-07-13 18:01:25

Subject: question about handling off an unresponsive server during lease renewal

Hi Trond,

To the best of your knowledge, does the client implement this part of
the spec that deals with when the server isn't responding and the
lease is timing out.

RFC5661 section 8.3 talks about:

Transport retransmission delays might become so large as to
approach or exceed the length of the lease period. This may be
particularly likely when the server is unresponsive due to a
restart; see Section 8.4.2.1. If the client implementation is not
careful, transport retransmission delays can result in the client
failing to detect a server restart before the grace period ends.
The scenario is that the client is using a transport with
exponential backoff, such that the maximum retransmission timeout
exceeds both the grace period and the lease_time attribute. A
network partition causes the client's connection's retransmission
interval to back off, and even after the partition heals, the next
transport-level retransmission is sent after the server has
restarted and its grace period ends.

The client MUST either recover from the ensuing NFS4ERR_NO_GRACE
errors or it MUST ensure that, despite transport-level
retransmission intervals that exceed the lease_time, a SEQUENCE
operation is sent that renews the lease before expiration. The
client can achieve this by associating a new connection with the
session, and sending a SEQUENCE operation on it. However, if the
attempt to establish a new connection is delayed for some reason
(e.g., exponential backoff of the connection establishment
packets), the client will have to abort the connection
establishment attempt before the lease expires, and attempt to
reconnect.

SEQUNCE op is sent and server rebooted, it's coming up (but not responding).
At the TCP layer, TCP is exponentially backing off before retrying. At
some point the timeout goes more than 100s. Which means that by the
time the client resends the server is up and out of grace.

Does the client have any control over not letting the TCP wait for
longer than the lease period and instead, it needs to abort the
connection and start the new one? I mean I sort of find the 2nd
paragraph in contradiction to the fact that the client must never give
up on waiting for a reply from the server? But maybe this is a special
case where the client is supposed to know its lease hasn't been
renewed and it's OK to give up?

2020-07-13 18:17:50

by Trond Myklebust

[permalink] [raw]

Subject: Re: question about handling off an unresponsive server during lease renewal

Hi Olga

On Mon, 2020-07-13 at 13:59 -0400, Olga Kornievskaia wrote:
> Hi Trond,
>
> To the best of your knowledge, does the client implement this part of
> the spec that deals with when the server isn't responding and the
> lease is timing out.
>
> RFC5661 section 8.3 talks about:
>
> Transport retransmission delays might become so large as to
> approach or exceed the length of the lease period. This may be
> particularly likely when the server is unresponsive due to a
> restart; see Section 8.4.2.1. If the client implementation is
> not
> careful, transport retransmission delays can result in the
> client
> failing to detect a server restart before the grace period
> ends.
> The scenario is that the client is using a transport with
> exponential backoff, such that the maximum retransmission
> timeout
> exceeds both the grace period and the lease_time attribute. A
> network partition causes the client's connection's
> retransmission
> interval to back off, and even after the partition heals, the
> next
> transport-level retransmission is sent after the server has
> restarted and its grace period ends.
>
> The client MUST either recover from the ensuing
> NFS4ERR_NO_GRACE
> errors or it MUST ensure that, despite transport-level
> retransmission intervals that exceed the lease_time, a SEQUENCE
> operation is sent that renews the lease before expiration. The
> client can achieve this by associating a new connection with
> the
> session, and sending a SEQUENCE operation on it. However, if
> the
> attempt to establish a new connection is delayed for some
> reason
> (e.g., exponential backoff of the connection establishment
> packets), the client will have to abort the connection
> establishment attempt before the lease expires, and attempt to
> reconnect.
>
> SEQUNCE op is sent and server rebooted, it's coming up (but not
> responding).
> At the TCP layer, TCP is exponentially backing off before retrying.
> At
> some point the timeout goes more than 100s. Which means that by the
> time the client resends the server is up and out of grace.
>
> Does the client have any control over not letting the TCP wait for
> longer than the lease period and instead, it needs to abort the
> connection and start the new one? I mean I sort of find the 2nd
> paragraph in contradiction to the fact that the client must never
> give
> up on waiting for a reply from the server? But maybe this is a
> special
> case where the client is supposed to know its lease hasn't been
> renewed and it's OK to give up?

That is what this code is supposed to ensure:

/**
* nfs4_set_lease_period - Sets the lease period on a nfs_client
*
* @clp: pointer to nfs_client
* @lease: new value for lease period
*/
void nfs4_set_lease_period(struct nfs_client *clp,
unsigned long lease)
{
spin_lock(&clp->cl_lock);
clp->cl_lease_time = lease;
spin_unlock(&clp->cl_lock);

/* Cap maximum reconnect timeout at 1/2 lease period */
rpc_set_connect_timeout(clp->cl_rpcclient, lease, lease >> 1);
}

The call to rpc_set_connect_timeout() iterates through all of the
transports associated with that server, and calls xprt->ops-
>set_connect_timeout() with the appropriate connect and reconnect
timeouts.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]

2020-08-26 16:25:37

by Dai Ngo

[permalink] [raw]

Subject: Re: question about handling off an unresponsive server during lease renewal

Hi Olga and Trond,

On 7/13/20 11:15 AM, Trond Myklebust wrote:
> Hi Olga
>
> On Mon, 2020-07-13 at 13:59 -0400, Olga Kornievskaia wrote:
>> Hi Trond,
>>
>> To the best of your knowledge, does the client implement this part of
>> the spec that deals with when the server isn't responding and the
>> lease is timing out.
>>
>> RFC5661 section 8.3 talks about:
>>
>> Transport retransmission delays might become so large as to
>> approach or exceed the length of the lease period. This may be
>> particularly likely when the server is unresponsive due to a
>> restart; see Section 8.4.2.1. If the client implementation is
>> not
>> careful, transport retransmission delays can result in the
>> client
>> failing to detect a server restart before the grace period
>> ends.
>> The scenario is that the client is using a transport with
>> exponential backoff, such that the maximum retransmission
>> timeout
>> exceeds both the grace period and the lease_time attribute. A
>> network partition causes the client's connection's
>> retransmission
>> interval to back off, and even after the partition heals, the
>> next
>> transport-level retransmission is sent after the server has
>> restarted and its grace period ends.
>>
>> The client MUST either recover from the ensuing
>> NFS4ERR_NO_GRACE
>> errors or it MUST ensure that, despite transport-level
>> retransmission intervals that exceed the lease_time, a SEQUENCE
>> operation is sent that renews the lease before expiration. The
>> client can achieve this by associating a new connection with
>> the
>> session, and sending a SEQUENCE operation on it. However, if
>> the
>> attempt to establish a new connection is delayed for some
>> reason
>> (e.g., exponential backoff of the connection establishment
>> packets), the client will have to abort the connection
>> establishment attempt before the lease expires, and attempt to
>> reconnect.
>>
>> SEQUNCE op is sent and server rebooted, it's coming up (but not
>> responding).
>> At the TCP layer, TCP is exponentially backing off before retrying.
>> At
>> some point the timeout goes more than 100s. Which means that by the
>> time the client resends the server is up and out of grace.
>>
>> Does the client have any control over not letting the TCP wait for
>> longer than the lease period and instead, it needs to abort the
>> connection and start the new one? I mean I sort of find the 2nd
>> paragraph in contradiction to the fact that the client must never
>> give
>> up on waiting for a reply from the server? But maybe this is a
>> special
>> case where the client is supposed to know its lease hasn't been
>> renewed and it's OK to give up?
> That is what this code is supposed to ensure:
>
> /**
> * nfs4_set_lease_period - Sets the lease period on a nfs_client
> *
> * @clp: pointer to nfs_client
> * @lease: new value for lease period
> */
> void nfs4_set_lease_period(struct nfs_client *clp,
> unsigned long lease)
> {
> spin_lock(&clp->cl_lock);
> clp->cl_lease_time = lease;
> spin_unlock(&clp->cl_lock);
>
> /* Cap maximum reconnect timeout at 1/2 lease period */
> rpc_set_connect_timeout(clp->cl_rpcclient, lease, lease >> 1);
> }
>
> The call to rpc_set_connect_timeout() iterates through all of the
> transports associated with that server, and calls xprt->ops-
>> set_connect_timeout() with the appropriate connect and reconnect
> timeouts.

xs_tcp_set_connect_timeout is called to setup the rpc_timeout structure
in sock_xprt based on lease and lease >> 1. With the v4 lease period
of 90 secs, the to_initval and to_maxval are both set to 30000ms and
to_retries is set to 2 (default).

xs_tcp_set_socket_timeouts uses the rpc_timeout in sock_xprt to set up
the TCP keep-alive timer and the TCP_USER_TIMEOUT option for the socket.

Currently, with the v4 lease of 90 secs, the TCP_USER_TIMEOUT is set to
90,000ms which is the same as the lease period. Since the lease period
and the TCP_USER_TIMEOUT are the same, there will be cases where the
client does not have enough time to reclaim its locks. Should the
TCP_USER_TIMEOUT value be less than the lease period, perhaps the same
as the lease renewal period which is 60 secs?

Thanks,
-Dai