2018-10-04 01:20:59

by Olga Kornievskaia

[permalink] [raw]
Subject: NFS/TCP timeouts

Hi folks,

Is it true that NFS mount option "timeo" has nothing to do with the
socket's setting of the user-specified timeout TCP_USER_TIMEOUT.
Instead, when creating a TCP socket NFS uses either default/hard coded
value of 60s for v3 or for v4.x it's lease based. Is there no value is
having an adjustable TCP timeout value?

Thank you.


2018-10-04 01:35:29

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS/TCP timeouts

T24gV2VkLCAyMDE4LTEwLTAzIGF0IDE0OjMxIC0wNDAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90
ZToNCj4gSGkgZm9sa3MsDQo+IA0KPiBJcyBpdCB0cnVlIHRoYXQgTkZTIG1vdW50IG9wdGlvbiAi
dGltZW8iIGhhcyBub3RoaW5nIHRvIGRvIHdpdGggdGhlDQo+IHNvY2tldCdzIHNldHRpbmcgb2Yg
dGhlIHVzZXItc3BlY2lmaWVkIHRpbWVvdXQgVENQX1VTRVJfVElNRU9VVC4NCj4gSW5zdGVhZCwg
d2hlbiBjcmVhdGluZyBhIFRDUCBzb2NrZXQgTkZTIHVzZXMgZWl0aGVyIGRlZmF1bHQvaGFyZA0K
PiBjb2RlZA0KPiB2YWx1ZSBvZiA2MHMgZm9yIHYzIG9yIGZvciB2NC54IGl0J3MgbGVhc2UgYmFz
ZWQuIElzIHRoZXJlIG5vIHZhbHVlDQo+IGlzDQo+IGhhdmluZyBhbiBhZGp1c3RhYmxlIFRDUCB0
aW1lb3V0IHZhbHVlPw0KPiANCg0KSXQgaXMgYWRqdXN0ZWQuIFBsZWFzZSBzZWUgdGhlIGNhbGN1
bGF0aW9uIGluDQp4c190Y3Bfc2V0X3NvY2tldF90aW1lb3V0cygpLg0KDQotLSANClRyb25kIE15
a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyLCBIYW1tZXJzcGFjZQ0KdHJvbmQu
bXlrbGVidXN0QGhhbW1lcnNwYWNlLmNvbQ0KDQoNCg==

2018-10-04 01:55:53

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: NFS/TCP timeouts

On Wed, Oct 3, 2018 at 2:45 PM Trond Myklebust <[email protected]> wrote:
>
> On Wed, 2018-10-03 at 14:31 -0400, Olga Kornievskaia wrote:
> > Hi folks,
> >
> > Is it true that NFS mount option "timeo" has nothing to do with the
> > socket's setting of the user-specified timeout TCP_USER_TIMEOUT.
> > Instead, when creating a TCP socket NFS uses either default/hard
> > coded
> > value of 60s for v3 or for v4.x it's lease based. Is there no value
> > is
> > having an adjustable TCP timeout value?
> >
>
> It is adjusted. Please see the calculation in
> xs_tcp_set_socket_timeouts().

but it's not user configurable, is it? I don't see a way to modify
v3's default 60s TCP timeout. and also in v4, the timeouts are set
from xs_tcp_set_connect_timeout() for the lease period but again not
user configurable, as far as i can tell.

>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>

2019-12-11 20:36:45

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: NFS/TCP timeouts

Hi Trond,

I'd like to raise this once again. Is this true that setting a timeout
limit (TCP_USER_TIMEOUT) is not user configurable (rather I'm pretty
sure it is not) but my question is why shouldn't it be tied to the
"timeo" mount option? Right now, only the sesson/lease manager thread
sets it via rpc_set_connect_timeout() to be lease period related.

Is it the fact that we don't want to allow user to control TCP
settings via the mount options? But somehow folks are expecting to be
able to set low "timeo" value and have the (dead) connection to be
considered dead earlier than for a rather long timeout period which is
happening now.

Thanks.

On Wed, Oct 3, 2018 at 3:06 PM Olga Kornievskaia <[email protected]> wrote:
>
> On Wed, Oct 3, 2018 at 2:45 PM Trond Myklebust <[email protected]> wrote:
> >
> > On Wed, 2018-10-03 at 14:31 -0400, Olga Kornievskaia wrote:
> > > Hi folks,
> > >
> > > Is it true that NFS mount option "timeo" has nothing to do with the
> > > socket's setting of the user-specified timeout TCP_USER_TIMEOUT.
> > > Instead, when creating a TCP socket NFS uses either default/hard
> > > coded
> > > value of 60s for v3 or for v4.x it's lease based. Is there no value
> > > is
> > > having an adjustable TCP timeout value?
> > >
> >
> > It is adjusted. Please see the calculation in
> > xs_tcp_set_socket_timeouts().
>
> but it's not user configurable, is it? I don't see a way to modify
> v3's default 60s TCP timeout. and also in v4, the timeouts are set
> from xs_tcp_set_connect_timeout() for the lease period but again not
> user configurable, as far as i can tell.
>
> >
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > [email protected]
> >
> >

2019-12-12 16:48:21

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS/TCP timeouts

Hi Olga,

On Wed, 2019-12-11 at 15:36 -0500, Olga Kornievskaia wrote:
> Hi Trond,
>
> I'd like to raise this once again. Is this true that setting a
> timeout
> limit (TCP_USER_TIMEOUT) is not user configurable (rather I'm pretty
> sure it is not) but my question is why shouldn't it be tied to the
> "timeo" mount option? Right now, only the sesson/lease manager thread
> sets it via rpc_set_connect_timeout() to be lease period related.
>
> Is it the fact that we don't want to allow user to control TCP
> settings via the mount options? But somehow folks are expecting to be
> able to set low "timeo" value and have the (dead) connection to be
> considered dead earlier than for a rather long timeout period which
> is
> happening now.

In my mind, the two are correlated, but are not equivalent.

The 'timeo' value is basically a timeout for how long it takes for the
whole process of "send RPC call", "have it processed by the server" and
"receive reply".
IOW: 'timeo' is about how long it takes for an RPC call to execute end-
to-end.

The TCP_USER_TIMEOUT, is essentially a timeout for how long it takes
the server to ACK receipt of the RPC call once we've placed it in the
TCP socket.
IOW: it is a timeout for the networking part of an RPC call
transmission.

So, as I said, the two are correlated: if the server is down, then your
timeout is dominated by the fact that the network transmission never
completes. However if the server is up and congested, then the
"processing by the server" is likely to dominate.

The other thing to note is that if the TCP connection is unresponsive,
we may want to fail that much faster in order to give ourselves a
chance to close the connection, open a new one and retransmit the
requests from the old connection before the 'timeo' is triggered (since
in the case of a soft timeout, that could be a fatal error).

Does that make sense?

>
> Thanks.
>
> On Wed, Oct 3, 2018 at 3:06 PM Olga Kornievskaia <[email protected]>
> wrote:
> > On Wed, Oct 3, 2018 at 2:45 PM Trond Myklebust <
> > [email protected]> wrote:
> > > On Wed, 2018-10-03 at 14:31 -0400, Olga Kornievskaia wrote:
> > > > Hi folks,
> > > >
> > > > Is it true that NFS mount option "timeo" has nothing to do with
> > > > the
> > > > socket's setting of the user-specified timeout
> > > > TCP_USER_TIMEOUT.
> > > > Instead, when creating a TCP socket NFS uses either
> > > > default/hard
> > > > coded
> > > > value of 60s for v3 or for v4.x it's lease based. Is there no
> > > > value
> > > > is
> > > > having an adjustable TCP timeout value?
> > > >
> > >
> > > It is adjusted. Please see the calculation in
> > > xs_tcp_set_socket_timeouts().
> >
> > but it's not user configurable, is it? I don't see a way to modify
> > v3's default 60s TCP timeout. and also in v4, the timeouts are set
> > from xs_tcp_set_connect_timeout() for the lease period but again
> > not
> > user configurable, as far as i can tell.
> >
> > > --
> > > Trond Myklebust
> > > Linux NFS client maintainer, Hammerspace
> > > [email protected]
> > >
> > >
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-12-12 18:14:31

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: NFS/TCP timeouts

On Thu, Dec 12, 2019 at 11:47 AM Trond Myklebust
<[email protected]> wrote:
>
> Hi Olga,
>
> On Wed, 2019-12-11 at 15:36 -0500, Olga Kornievskaia wrote:
> > Hi Trond,
> >
> > I'd like to raise this once again. Is this true that setting a
> > timeout
> > limit (TCP_USER_TIMEOUT) is not user configurable (rather I'm pretty
> > sure it is not) but my question is why shouldn't it be tied to the
> > "timeo" mount option? Right now, only the sesson/lease manager thread
> > sets it via rpc_set_connect_timeout() to be lease period related.
> >
> > Is it the fact that we don't want to allow user to control TCP
> > settings via the mount options? But somehow folks are expecting to be
> > able to set low "timeo" value and have the (dead) connection to be
> > considered dead earlier than for a rather long timeout period which
> > is
> > happening now.
>
> In my mind, the two are correlated, but are not equivalent.
>
> The 'timeo' value is basically a timeout for how long it takes for the
> whole process of "send RPC call", "have it processed by the server" and
> "receive reply".
> IOW: 'timeo' is about how long it takes for an RPC call to execute end-
> to-end.

Ok, but what happens is there are no actions (connection wise) are
taken when this timeout goes off and that' a problem for detecting bad
connections.

> The TCP_USER_TIMEOUT, is essentially a timeout for how long it takes
> the server to ACK receipt of the RPC call once we've placed it in the
> TCP socket.
> IOW: it is a timeout for the networking part of an RPC call
> transmission.

But why isn't TCP time out (1) not user configurable and/or (2) not
tied to the "timeo" ?

> So, as I said, the two are correlated: if the server is down, then your
> timeout is dominated by the fact that the network transmission never
> completes. However if the server is up and congested, then the
> "processing by the server" is likely to dominate.
>
> The other thing to note is that if the TCP connection is unresponsive,
> we may want to fail that much faster in order to give ourselves a
> chance to close the connection, open a new one and retransmit the
> requests from the old connection before the 'timeo' is triggered (since
> in the case of a soft timeout, that could be a fatal error).

"we may want to fail" doesn't happen and that's exactly what I would
like to happen. Also, TCP timeout is set to the a lease time (let's
take linux server which sets 90s timeout) and that's larger than the
default "timeo" which is 60s. That goes against your intention to
recover in time.

> Does that make sense?

It's the last case I'm interested in. The issue I'm having is that
after a "timeout" (which should be a lease period), the client doesn't
sent a SYN trying to establish a new connection.
-
Here's a current problem. In the cloud environment, a server node goes
down. It's spun up again in a different VM (but with the same IP) and
server is ready to be receiving requests and continue with the IO. The
problem is the client doesn't try to send a new SYN until the old
connection timeout. This timeout is 3mins for v3 and can't be shorted
because TCP_USER_TIMEOUT isn't user configurable or tied into the
timeo. But user expects that connections times out after 60s (as
default timeo) (or whatever value timeo is specified during mount).
Current linux client doesn't do that.

Even in v4, in my testing ,the client doesn't send the new SYN after
the lease period (but I believe that's a bug). The only time it does
do it if I change rpc_set_connect_time() to something low so that
default of 18000 is set.

(1) I could be wrong but I think there is a bug that doesn't
re-establish connection (unless some low value is set).
(2) I think there should be ability (at least for v3) to set the
timeout for lower than 3mins. Perhaps we can add a new mount option,
either have a totally separate tcp timeout value or something like
"sync_nfstcp_timeouts" and use timeo to govern both NFS and TCP
timeout.


>
> >
> > Thanks.
> >
> > On Wed, Oct 3, 2018 at 3:06 PM Olga Kornievskaia <[email protected]>
> > wrote:
> > > On Wed, Oct 3, 2018 at 2:45 PM Trond Myklebust <
> > > [email protected]> wrote:
> > > > On Wed, 2018-10-03 at 14:31 -0400, Olga Kornievskaia wrote:
> > > > > Hi folks,
> > > > >
> > > > > Is it true that NFS mount option "timeo" has nothing to do with
> > > > > the
> > > > > socket's setting of the user-specified timeout
> > > > > TCP_USER_TIMEOUT.
> > > > > Instead, when creating a TCP socket NFS uses either
> > > > > default/hard
> > > > > coded
> > > > > value of 60s for v3 or for v4.x it's lease based. Is there no
> > > > > value
> > > > > is
> > > > > having an adjustable TCP timeout value?
> > > > >
> > > >
> > > > It is adjusted. Please see the calculation in
> > > > xs_tcp_set_socket_timeouts().
> > >
> > > but it's not user configurable, is it? I don't see a way to modify
> > > v3's default 60s TCP timeout. and also in v4, the timeouts are set
> > > from xs_tcp_set_connect_timeout() for the lease period but again
> > > not
> > > user configurable, as far as i can tell.
> > >
> > > > --
> > > > Trond Myklebust
> > > > Linux NFS client maintainer, Hammerspace
> > > > [email protected]
> > > >
> > > >
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>

2019-12-12 19:31:48

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS/TCP timeouts

On Thu, 2019-12-12 at 13:13 -0500, Olga Kornievskaia wrote:
> On Thu, Dec 12, 2019 at 11:47 AM Trond Myklebust
> <[email protected]> wrote:
> > Hi Olga,
> >
> > On Wed, 2019-12-11 at 15:36 -0500, Olga Kornievskaia wrote:
> > > Hi Trond,
> > >
> > > I'd like to raise this once again. Is this true that setting a
> > > timeout
> > > limit (TCP_USER_TIMEOUT) is not user configurable (rather I'm
> > > pretty
> > > sure it is not) but my question is why shouldn't it be tied to
> > > the
> > > "timeo" mount option? Right now, only the sesson/lease manager
> > > thread
> > > sets it via rpc_set_connect_timeout() to be lease period related.
> > >
> > > Is it the fact that we don't want to allow user to control TCP
> > > settings via the mount options? But somehow folks are expecting
> > > to be
> > > able to set low "timeo" value and have the (dead) connection to
> > > be
> > > considered dead earlier than for a rather long timeout period
> > > which
> > > is
> > > happening now.
> >
> > In my mind, the two are correlated, but are not equivalent.
> >
> > The 'timeo' value is basically a timeout for how long it takes for
> > the
> > whole process of "send RPC call", "have it processed by the server"
> > and
> > "receive reply".
> > IOW: 'timeo' is about how long it takes for an RPC call to execute
> > end-
> > to-end.
>
> Ok, but what happens is there are no actions (connection wise) are
> taken when this timeout goes off and that' a problem for detecting
> bad
> connections.

I'm not sure I understand what you mean. The point of TCP_USER_TIMEOUT
is that the TCP layer is told when to time out and break the
connection. Furthermore, the other side (i.e. the server) is told about
the existence of this timeout, and hence knows what to expect.

IOW: there are no actions at the RPC layer because this is a TCP layer
thing.

>
> > The TCP_USER_TIMEOUT, is essentially a timeout for how long it
> > takes
> > the server to ACK receipt of the RPC call once we've placed it in
> > the
> > TCP socket.
> > IOW: it is a timeout for the networking part of an RPC call
> > transmission.
>
> But why isn't TCP time out (1) not user configurable and/or (2) not
> tied to the "timeo" ?
>
> > So, as I said, the two are correlated: if the server is down, then
> > your
> > timeout is dominated by the fact that the network transmission
> > never
> > completes. However if the server is up and congested, then the
> > "processing by the server" is likely to dominate.
> >
> > The other thing to note is that if the TCP connection is
> > unresponsive,
> > we may want to fail that much faster in order to give ourselves a
> > chance to close the connection, open a new one and retransmit the
> > requests from the old connection before the 'timeo' is triggered
> > (since
> > in the case of a soft timeout, that could be a fatal error).
>
> "we may want to fail" doesn't happen and that's exactly what I would
> like to happen. Also, TCP timeout is set to the a lease time (let's
> take linux server which sets 90s timeout) and that's larger than the
> default "timeo" which is 60s. That goes against your intention to
> recover in time.
>
> > Does that make sense?
>
> It's the last case I'm interested in. The issue I'm having is that
> after a "timeout" (which should be a lease period), the client
> doesn't
> sent a SYN trying to establish a new connection.

TCP_USER_TIMEOUT should not affect the handshake part of the TCP
connection (see 'man 7 tcp'). It can't solve a problem with the SYN
states.

> -
> Here's a current problem. In the cloud environment, a server node
> goes
> down. It's spun up again in a different VM (but with the same IP) and
> server is ready to be receiving requests and continue with the IO.
> The
> problem is the client doesn't try to send a new SYN until the old
> connection timeout. This timeout is 3mins for v3 and can't be shorted
> because TCP_USER_TIMEOUT isn't user configurable or tied into the
> timeo. But user expects that connections times out after 60s (as
> default timeo) (or whatever value timeo is specified during mount).
> Current linux client doesn't do that.
>
> Even in v4, in my testing ,the client doesn't send the new SYN after
> the lease period (but I believe that's a bug). The only time it does
> do it if I change rpc_set_connect_time() to something low so that
> default of 18000 is set.
>
> (1) I could be wrong but I think there is a bug that doesn't
> re-establish connection (unless some low value is set).
> (2) I think there should be ability (at least for v3) to set the
> timeout for lower than 3mins. Perhaps we can add a new mount option,
> either have a totally separate tcp timeout value or something like
> "sync_nfstcp_timeouts" and use timeo to govern both NFS and TCP
> timeout.

This needs to be resolved using something different. I'm not sure what
to use for timing the handshake out more quickly.

>
> > > Thanks.
> > >
> > > On Wed, Oct 3, 2018 at 3:06 PM Olga Kornievskaia <[email protected]>
> > > wrote:
> > > > On Wed, Oct 3, 2018 at 2:45 PM Trond Myklebust <
> > > > [email protected]> wrote:
> > > > > On Wed, 2018-10-03 at 14:31 -0400, Olga Kornievskaia wrote:
> > > > > > Hi folks,
> > > > > >
> > > > > > Is it true that NFS mount option "timeo" has nothing to do
> > > > > > with
> > > > > > the
> > > > > > socket's setting of the user-specified timeout
> > > > > > TCP_USER_TIMEOUT.
> > > > > > Instead, when creating a TCP socket NFS uses either
> > > > > > default/hard
> > > > > > coded
> > > > > > value of 60s for v3 or for v4.x it's lease based. Is there
> > > > > > no
> > > > > > value
> > > > > > is
> > > > > > having an adjustable TCP timeout value?
> > > > > >
> > > > >
> > > > > It is adjusted. Please see the calculation in
> > > > > xs_tcp_set_socket_timeouts().
> > > >
> > > > but it's not user configurable, is it? I don't see a way to
> > > > modify
> > > > v3's default 60s TCP timeout. and also in v4, the timeouts are
> > > > set
> > > > from xs_tcp_set_connect_timeout() for the lease period but
> > > > again
> > > > not
> > > > user configurable, as far as i can tell.
> > > >
> > > > > --
> > > > > Trond Myklebust
> > > > > Linux NFS client maintainer, Hammerspace
> > > > > [email protected]
> > > > >
> > > > >
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > [email protected]
> >
> >
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]