LinuxLists.cc - question: re-try of operations in PNFS

2018-05-17 20:43:36

Subject: question: re-try of operations in PNFS

Hi Trond,

Is there a reason why an rpc connection to the DS is set to timeout
requests instead of waiting until the reply from the server ? Requests
to DS timeout in 10sec and are resent to MDS.

Thank you.

2018-05-22 20:44:22

by Mkrtchyan, Tigran

[permalink] [raw]

Subject: Re: question: re-try of operations in PNFS

Hi Olga,

we saw similar issues with early version of RHEL6 kernels, but this was fixed in the later version.
and it's possible now to set timeout with

dataserver_timeo and dataserver_retrans

bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1175413

Which which kernel do you observe it?

Regards,
Tigran.

----- Original Message -----
> From: "Olga Kornievskaia" <[email protected]>
> To: "linux-nfs" <[email protected]>
> Sent: Thursday, May 17, 2018 10:43:34 PM
> Subject: question: re-try of operations in PNFS

> Hi Trond,
>
> Is there a reason why an rpc connection to the DS is set to timeout
> requests instead of waiting until the reply from the server ? Requests
> to DS timeout in 10sec and are resent to MDS.
>
> Thank you.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2018-05-22 20:46:00

by Olga Kornievskaia

[permalink] [raw]

Subject: Re: question: re-try of operations in PNFS

On Tue, May 22, 2018 at 4:34 PM, Mkrtchyan, Tigran
<[email protected]> wrote:
> Hi Olga,
>
> we saw similar issues with early version of RHEL6 kernels, but this was fixed in the later version.
> and it's possible now to set timeout with
>
> dataserver_timeo and dataserver_retrans
>
> bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1175413
>
> Which which kernel do you observe it?

Upstream kernel. But I'm arguing that there shouldn't be a need to
specify a dataserver_timeo because it shouldn't timeout at all just
like MDS operations.

Also curiously, "man nfs" doesn't list "datasever_timeo" option and
when I try to use it on a RHEL7.4 machine it says incorrect option.
Also grep thru the upstream kernel code for "dataserver_timeo" is
empty too.

>
> Regards,
> Tigran.
>
> ----- Original Message -----
>> From: "Olga Kornievskaia" <[email protected]>
>> To: "linux-nfs" <[email protected]>
>> Sent: Thursday, May 17, 2018 10:43:34 PM
>> Subject: question: re-try of operations in PNFS
>
>> Hi Trond,
>>
>> Is there a reason why an rpc connection to the DS is set to timeout
>> requests instead of waiting until the reply from the server ? Requests
>> to DS timeout in 10sec and are resent to MDS.
>>
>> Thank you.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html

2018-05-22 21:01:10

by Mkrtchyan, Tigran

[permalink] [raw]

Subject: Re: question: re-try of operations in PNFS

Agree, there shouldn't be an extra option for that and probably fresher kernels have dropped
that option. If I recall correctly, the options was there to protect client from pnfs bugs. E.g.
fall back to MDS if it thinks that something went wrong and client happy.

I haven't seen that problem for quite some time. However on modern kernels (and RHEL 7) we prefer
to use flexfiles pnfs layout. One of the reasons is the option to avoid io through mds.

I will try to reproduce it.

Regards,
Tigran.

----- Original Message -----
> From: "Olga Kornievskaia" <[email protected]>
> To: "Tigran Mkrtchyan" <[email protected]>
> Cc: "linux-nfs" <[email protected]>
> Sent: Tuesday, May 22, 2018 10:45:59 PM
> Subject: Re: question: re-try of operations in PNFS

> On Tue, May 22, 2018 at 4:34 PM, Mkrtchyan, Tigran
> <[email protected]> wrote:
>> Hi Olga,
>>
>> we saw similar issues with early version of RHEL6 kernels, but this was fixed in
>> the later version.
>> and it's possible now to set timeout with
>>
>> dataserver_timeo and dataserver_retrans
>>
>> bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1175413
>>
>> Which which kernel do you observe it?
>
> Upstream kernel. But I'm arguing that there shouldn't be a need to
> specify a dataserver_timeo because it shouldn't timeout at all just
> like MDS operations.
>
> Also curiously, "man nfs" doesn't list "datasever_timeo" option and
> when I try to use it on a RHEL7.4 machine it says incorrect option.
> Also grep thru the upstream kernel code for "dataserver_timeo" is
> empty too.
>
>>
>> Regards,
>> Tigran.
>>
>> ----- Original Message -----
>>> From: "Olga Kornievskaia" <[email protected]>
>>> To: "linux-nfs" <[email protected]>
>>> Sent: Thursday, May 17, 2018 10:43:34 PM
>>> Subject: question: re-try of operations in PNFS
>>
>>> Hi Trond,
>>>
>>> Is there a reason why an rpc connection to the DS is set to timeout
>>> requests instead of waiting until the reply from the server ? Requests
>>> to DS timeout in 10sec and are resent to MDS.
>>>
>>> Thank you.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to [email protected]
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html

2018-05-23 00:26:28

by Rick Macklem

[permalink] [raw]

Subject: Re: question: re-try of operations in PNFS

Olga Kornievskaia wrote:
[good stuff snipped]
>Upstream kernel. But I'm arguing that there shouldn't be a need to
>specify a dataserver_timeo because it shouldn't timeout at all just
>like MDS operations.
If/when the server is providing mirrored DSs, I've found this timeout useful
in the FreeBSD client since it allows the client to detect a DS failure.
It can then report the failure to the MDS via LayoutReturn (or another one
on NFSv4.2 which I can't remember the name of since I haven't done 4.2;-).

For non-mirrored DSs, the only thing I can think of (I've never seen this) would
be some sort of network partitioning such that the client can't reach the DS but
can reach the MDS.

I have no idea if this is relevant to Linux, but thought I'd mention it, just in case.
[more stuff snipped]
rick

2018-05-23 13:25:36

by Olga Kornievskaia

[permalink] [raw]

Subject: Re: question: re-try of operations in PNFS

On Tue, May 22, 2018 at 8:26 PM, Rick Macklem <[email protected]> wrote:
> Olga Kornievskaia wrote:
> [good stuff snipped]
>>Upstream kernel. But I'm arguing that there shouldn't be a need to
>>specify a dataserver_timeo because it shouldn't timeout at all just
>>like MDS operations.
> If/when the server is providing mirrored DSs, I've found this timeout useful
> in the FreeBSD client since it allows the client to detect a DS failure.
> It can then report the failure to the MDS via LayoutReturn (or another one
> on NFSv4.2 which I can't remember the name of since I haven't done 4.2;-).
>
> For non-mirrored DSs, the only thing I can think of (I've never seen this) would
> be some sort of network partitioning such that the client can't reach the DS but
> can reach the MDS.
>
> I have no idea if this is relevant to Linux, but thought I'd mention it, just in case.
> [more stuff snipped]

Isn't retrying makes the implementation not spec compliant?

2018-05-23 13:42:11

by Trond Myklebust

[permalink] [raw]

Subject: Re: question: re-try of operations in PNFS

T24gV2VkLCAyMDE4LTA1LTIzIGF0IDA5OjI1IC0wNDAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90
ZToNCj4gT24gVHVlLCBNYXkgMjIsIDIwMTggYXQgODoyNiBQTSwgUmljayBNYWNrbGVtIDxybWFj
a2xlbUB1b2d1ZWxwaC5jYT4NCj4gd3JvdGU6DQo+ID4gT2xnYSBLb3JuaWV2c2thaWEgd3JvdGU6
DQo+ID4gW2dvb2Qgc3R1ZmYgc25pcHBlZF0NCj4gPiA+IFVwc3RyZWFtIGtlcm5lbC4gQnV0IEkn
bSBhcmd1aW5nIHRoYXQgdGhlcmUgc2hvdWxkbid0IGJlIGEgbmVlZA0KPiA+ID4gdG8NCj4gPiA+
IHNwZWNpZnkgYSBkYXRhc2VydmVyX3RpbWVvIGJlY2F1c2UgaXQgc2hvdWxkbid0IHRpbWVvdXQg
YXQgYWxsDQo+ID4gPiBqdXN0DQo+ID4gPiBsaWtlIE1EUyBvcGVyYXRpb25zLg0KPiA+IA0KPiA+
IElmL3doZW4gdGhlIHNlcnZlciBpcyBwcm92aWRpbmcgbWlycm9yZWQgRFNzLCBJJ3ZlIGZvdW5k
IHRoaXMNCj4gPiB0aW1lb3V0IHVzZWZ1bA0KPiA+IGluIHRoZSBGcmVlQlNEIGNsaWVudCBzaW5j
ZSBpdCBhbGxvd3MgdGhlIGNsaWVudCB0byBkZXRlY3QgYSBEUw0KPiA+IGZhaWx1cmUuDQo+ID4g
SXQgY2FuIHRoZW4gcmVwb3J0IHRoZSBmYWlsdXJlIHRvIHRoZSBNRFMgdmlhIExheW91dFJldHVy
biAob3INCj4gPiBhbm90aGVyIG9uZQ0KPiA+IG9uIE5GU3Y0LjIgd2hpY2ggSSBjYW4ndCByZW1l
bWJlciB0aGUgbmFtZSBvZiBzaW5jZSBJIGhhdmVuJ3QgZG9uZQ0KPiA+IDQuMjstKS4NCj4gPiAN
Cj4gPiBGb3Igbm9uLW1pcnJvcmVkIERTcywgdGhlIG9ubHkgdGhpbmcgSSBjYW4gdGhpbmsgb2Yg
KEkndmUgbmV2ZXINCj4gPiBzZWVuIHRoaXMpIHdvdWxkDQo+ID4gYmUgc29tZSBzb3J0IG9mIG5l
dHdvcmsgcGFydGl0aW9uaW5nIHN1Y2ggdGhhdCB0aGUgY2xpZW50IGNhbid0DQo+ID4gcmVhY2gg
dGhlIERTIGJ1dA0KPiA+IGNhbiByZWFjaCB0aGUgTURTLg0KPiA+IA0KPiA+IEkgaGF2ZSBubyBp
ZGVhIGlmIHRoaXMgaXMgcmVsZXZhbnQgdG8gTGludXgsIGJ1dCB0aG91Z2h0IEknZA0KPiA+IG1l
bnRpb24gaXQsIGp1c3QgaW4gY2FzZS4NCj4gPiBbbW9yZSBzdHVmZiBzbmlwcGVkXQ0KPiANCj4g
SXNuJ3QgcmV0cnlpbmcgbWFrZXMgdGhlIGltcGxlbWVudGF0aW9uIG5vdCBzcGVjIGNvbXBsaWFu
dD8NCg0KUmVwbGF5aW5nIGEgcmVxdWVzdCB3b3VsZCBub3QgYmUgc3BlYyBjb21wbGlhbnQuIFBs
YXlpbmcgbmV3IHJlcXVlc3RzDQppcyBwZXJmZWN0bHkgZmluZSAoZS5nLiBhZnRlciBwaWNraW5n
IHVwIGEgbmV3IGxheW91dCBvciByZWRpcmVjdGluZw0KdGhlIEkvTyB0byB0aGUgTURTKS4NCg0K
SGlzdG9yaWNhbGx5LCBJIHNlZW0gdG8gcmVtZW1iZXIgdGhhdCBhdCBvbmUgcG9pbnQgd2UgaW50
cm9kdWNlZCBhIDE1cw0KdGltZW91dCBvbiBJL08gcmVxdWVzdHMgdG8gdGhlIERTIGluIG9yZGVy
IHRvIGFsbG93IGZhc3QgZmFpbG92ZXIgb2YNCnRoZSBwTkZTIGNsaWVudCB3aGVuIHRoZSBEUyB3
YXMgZG93biBvciB1bnJlc3BvbnNpdmUuIEknbSBub3Qgc3VyZQ0Kd2hldGhlciBvciBub3QgdGhh
dCBtZWNoYW5pc20gc3RpbGwgZXhpc3RzIGFuZCB3aGV0aGVyIGl0IGlzIHdoYXQgeW91DQphcmUg
c2VlaW5nIGhlcmUuDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1h
aW50YWluZXIsIEhhbW1lcnNwYWNlDQp0cm9uZC5teWtsZWJ1c3RAaGFtbWVyc3BhY2UuY29tDQoN
Cg==

2018-05-23 15:16:08

by Olga Kornievskaia

[permalink] [raw]

Subject: Re: question: re-try of operations in PNFS

On Wed, May 23, 2018 at 9:42 AM, Trond Myklebust
<[email protected]> wrote:
> On Wed, 2018-05-23 at 09:25 -0400, Olga Kornievskaia wrote:
>> On Tue, May 22, 2018 at 8:26 PM, Rick Macklem <[email protected]>
>> wrote:
>> > Olga Kornievskaia wrote:
>> > [good stuff snipped]
>> > > Upstream kernel. But I'm arguing that there shouldn't be a need
>> > > to
>> > > specify a dataserver_timeo because it shouldn't timeout at all
>> > > just
>> > > like MDS operations.
>> >
>> > If/when the server is providing mirrored DSs, I've found this
>> > timeout useful
>> > in the FreeBSD client since it allows the client to detect a DS
>> > failure.
>> > It can then report the failure to the MDS via LayoutReturn (or
>> > another one
>> > on NFSv4.2 which I can't remember the name of since I haven't done
>> > 4.2;-).
>> >
>> > For non-mirrored DSs, the only thing I can think of (I've never
>> > seen this) would
>> > be some sort of network partitioning such that the client can't
>> > reach the DS but
>> > can reach the MDS.
>> >
>> > I have no idea if this is relevant to Linux, but thought I'd
>> > mention it, just in case.
>> > [more stuff snipped]
>>
>> Isn't retrying makes the implementation not spec compliant?
>
> Replaying a request would not be spec compliant. Playing new requests
> is perfectly fine (e.g. after picking up a new layout or redirecting
> the I/O to the MDS).

I see you are right. The request to the MDS is a "new request" as it
uses a different filehandle.

> Historically, I seem to remember that at one point we introduced a 15s
> timeout on I/O requests to the DS in order to allow fast failover of
> the pNFS client when the DS was down or unresponsive. I'm not sure
> whether or not that mechanism still exists and whether it is what you
> are seeing here.

Then I'd guess it probably is that and the timeout now is 10s.

2018-05-30 10:47:37

by Suresh Jayaraman

[permalink] [raw]

Subject: Re: question: re-try of operations in PNFS

On 05/23/2018 02:15 AM, Olga Kornievskaia wrote:
>> we saw similar issues with early version of RHEL6 kernels, but this was fixed in the later version.
>> and it's possible now to set timeout with
>>
>> dataserver_timeo and dataserver_retrans
>>
>> bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1175413
>>
>> Which which kernel do you observe it?
>
> Upstream kernel. But I'm arguing that there shouldn't be a need to
> specify a dataserver_timeo because it shouldn't timeout at all just
> like MDS operations.
>
> Also curiously, "man nfs" doesn't list "datasever_timeo" option and
> when I try to use it on a RHEL7.4 machine it says incorrect option.
> Also grep thru the upstream kernel code for "dataserver_timeo" is
> empty too.
>

I still see these options (as module parameters to
nfs_layout_nfsv41_files module) in the mainline kernel (4.17-rc7).

We are facing the problem with IO being routed through MDS when DS is
momentarily unavailable (for e.g. DS restart or DS failover). Wondering
if anyone found this timeout helpful in the case when the network
connection goes down as part of DS failover for instance. In the past,
we had observed that the IO is being routed through MDS immediately
after DS is restarted and MDS won't be in a position to complete the IO.

Regards,
Suresh