2018-05-22 18:40:51

by Olga Kornievskaia

[permalink] [raw]
Subject: [PATCH 1/1] [SUNRPC] make sure to clone timeout values

From: Olga Kornievskaia <[email protected]>

For pNFS, the operations to DS currently timeout in 10s. According
to the spec, the client must not be re-trying an NFSv4.1 operation
unless the connection was broken.

Signed-off-by: Olga Kornievskaia <[email protected]>
---
net/sunrpc/clnt.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 6e432ec..97517eb 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -668,6 +668,7 @@ struct rpc_clnt *
.prognumber = clnt->cl_prog,
.version = clnt->cl_vers,
.authflavor = flavor,
+ .timeout = clnt->cl_timeout,
};
return __rpc_clone_client(&args, clnt);
}
--
1.8.3.1



2018-05-22 19:03:03

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 1/1] [SUNRPC] make sure to clone timeout values

T24gVHVlLCAyMDE4LTA1LTIyIGF0IDE0OjQwIC0wNDAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90
ZToNCj4gRnJvbTogT2xnYSBLb3JuaWV2c2thaWEgPG9sZ2Eua29ybmlldnNrYWlhQGdtYWlsLmNv
bT4NCj4gDQo+IEZvciBwTkZTLCB0aGUgb3BlcmF0aW9ucyB0byBEUyBjdXJyZW50bHkgdGltZW91
dCBpbiAxMHMuIEFjY29yZGluZw0KPiB0byB0aGUgc3BlYywgdGhlIGNsaWVudCBtdXN0IG5vdCBi
ZSByZS10cnlpbmcgYW4gTkZTdjQuMSBvcGVyYXRpb24NCj4gdW5sZXNzIHRoZSBjb25uZWN0aW9u
IHdhcyBicm9rZW4uDQo+IA0KPiBTaWduZWQtb2ZmLWJ5OiBPbGdhIEtvcm5pZXZza2FpYSA8a29s
Z2FAbmV0YXBwLmNvbT4NCj4gLS0tDQo+ICBuZXQvc3VucnBjL2NsbnQuYyB8IDEgKw0KPiAgMSBm
aWxlIGNoYW5nZWQsIDEgaW5zZXJ0aW9uKCspDQo+IA0KPiBkaWZmIC0tZ2l0IGEvbmV0L3N1bnJw
Yy9jbG50LmMgYi9uZXQvc3VucnBjL2NsbnQuYw0KPiBpbmRleCA2ZTQzMmVjLi45NzUxN2ViIDEw
MDY0NA0KPiAtLS0gYS9uZXQvc3VucnBjL2NsbnQuYw0KPiArKysgYi9uZXQvc3VucnBjL2NsbnQu
Yw0KPiBAQCAtNjY4LDYgKzY2OCw3IEBAIHN0cnVjdCBycGNfY2xudCAqDQo+ICAJCS5wcm9nbnVt
YmVyCT0gY2xudC0+Y2xfcHJvZywNCj4gIAkJLnZlcnNpb24JPSBjbG50LT5jbF92ZXJzLA0KPiAg
CQkuYXV0aGZsYXZvcgk9IGZsYXZvciwNCj4gKwkJLnRpbWVvdXQJPSBjbG50LT5jbF90aW1lb3V0
LA0KPiAgCX07DQo+ICAJcmV0dXJuIF9fcnBjX2Nsb25lX2NsaWVudCgmYXJncywgY2xudCk7DQo+
ICB9DQoNCldoYXQgZG9lcyB0aGlzIHBhdGNoIGhhdmUgdG8gZG8gd2l0aCBwTkZTPyBUaGF0J3Mg
dGhlIGdlbmVyaWMgUlBDDQpjbGllbnQgY2xvbmluZyBBUEkgeW91IGFyZSBjaGFuZ2luZy4NCg0K
VGhlIHBORlMvZmlsZXMgdGltZW91dHMgYXJlIGludGVuZGVkIHRvIGJlIHNldCB1c2luZyB0aGUN
CmRhdGFzZXJ2ZXJfcmV0cmFucyBhbmQgZGF0YXNlcnZlcl90aW1lbyBtb2R1bGUgcGFyYW1ldGVy
cyBkZXNjcmliZWQgYXQNCnRoZSBib3R0b20gb2YgZnMvbmZzL2ZpbGVsYXlvdXQvZmlsZWxheW91
dGRldi5jDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWlu
ZXIsIEhhbW1lcnNwYWNlDQp0cm9uZC5teWtsZWJ1c3RAaGFtbWVyc3BhY2UuY29tDQoNCg==

2018-05-22 19:28:54

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [PATCH 1/1] [SUNRPC] make sure to clone timeout values

On Tue, May 22, 2018 at 3:03 PM, Trond Myklebust
<[email protected]> wrote:
> On Tue, 2018-05-22 at 14:40 -0400, Olga Kornievskaia wrote:
>> From: Olga Kornievskaia <[email protected]>
>>
>> For pNFS, the operations to DS currently timeout in 10s. According
>> to the spec, the client must not be re-trying an NFSv4.1 operation
>> unless the connection was broken.
>>
>> Signed-off-by: Olga Kornievskaia <[email protected]>
>> ---
>> net/sunrpc/clnt.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
>> index 6e432ec..97517eb 100644
>> --- a/net/sunrpc/clnt.c
>> +++ b/net/sunrpc/clnt.c
>> @@ -668,6 +668,7 @@ struct rpc_clnt *
>> .prognumber = clnt->cl_prog,
>> .version = clnt->cl_vers,
>> .authflavor = flavor,
>> + .timeout = clnt->cl_timeout,
>> };
>> return __rpc_clone_client(&args, clnt);
>> }
>
> What does this patch have to do with pNFS? That's the generic RPC
> client cloning API you are changing.
>
> The pNFS/files timeouts are intended to be set using the
> dataserver_retrans and dataserver_timeo module parameters described at
> the bottom of fs/nfs/filelayout/filelayoutdev.c

Ok so perhaps the code needs to re-written so that it allows for the
DS to get an rpc client with its timeouts set. Which currently doesn't
happen.

2018-05-22 19:56:26

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 1/1] [SUNRPC] make sure to clone timeout values

T24gVHVlLCAyMDE4LTA1LTIyIGF0IDE1OjI4IC0wNDAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90
ZToNCj4gT24gVHVlLCBNYXkgMjIsIDIwMTggYXQgMzowMyBQTSwgVHJvbmQgTXlrbGVidXN0DQo+
IDx0cm9uZG15QGhhbW1lcnNwYWNlLmNvbT4gd3JvdGU6DQo+ID4gT24gVHVlLCAyMDE4LTA1LTIy
IGF0IDE0OjQwIC0wNDAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90ZToNCj4gPiA+IEZyb206IE9s
Z2EgS29ybmlldnNrYWlhIDxvbGdhLmtvcm5pZXZza2FpYUBnbWFpbC5jb20+DQo+ID4gPiANCj4g
PiA+IEZvciBwTkZTLCB0aGUgb3BlcmF0aW9ucyB0byBEUyBjdXJyZW50bHkgdGltZW91dCBpbiAx
MHMuDQo+ID4gPiBBY2NvcmRpbmcNCj4gPiA+IHRvIHRoZSBzcGVjLCB0aGUgY2xpZW50IG11c3Qg
bm90IGJlIHJlLXRyeWluZyBhbiBORlN2NC4xDQo+ID4gPiBvcGVyYXRpb24NCj4gPiA+IHVubGVz
cyB0aGUgY29ubmVjdGlvbiB3YXMgYnJva2VuLg0KPiA+ID4gDQo+ID4gPiBTaWduZWQtb2ZmLWJ5
OiBPbGdhIEtvcm5pZXZza2FpYSA8a29sZ2FAbmV0YXBwLmNvbT4NCj4gPiA+IC0tLQ0KPiA+ID4g
IG5ldC9zdW5ycGMvY2xudC5jIHwgMSArDQo+ID4gPiAgMSBmaWxlIGNoYW5nZWQsIDEgaW5zZXJ0
aW9uKCspDQo+ID4gPiANCj4gPiA+IGRpZmYgLS1naXQgYS9uZXQvc3VucnBjL2NsbnQuYyBiL25l
dC9zdW5ycGMvY2xudC5jDQo+ID4gPiBpbmRleCA2ZTQzMmVjLi45NzUxN2ViIDEwMDY0NA0KPiA+
ID4gLS0tIGEvbmV0L3N1bnJwYy9jbG50LmMNCj4gPiA+ICsrKyBiL25ldC9zdW5ycGMvY2xudC5j
DQo+ID4gPiBAQCAtNjY4LDYgKzY2OCw3IEBAIHN0cnVjdCBycGNfY2xudCAqDQo+ID4gPiAgICAg
ICAgICAgICAgIC5wcm9nbnVtYmVyICAgICA9IGNsbnQtPmNsX3Byb2csDQo+ID4gPiAgICAgICAg
ICAgICAgIC52ZXJzaW9uICAgICAgICA9IGNsbnQtPmNsX3ZlcnMsDQo+ID4gPiAgICAgICAgICAg
ICAgIC5hdXRoZmxhdm9yICAgICA9IGZsYXZvciwNCj4gPiA+ICsgICAgICAgICAgICAgLnRpbWVv
dXQgICAgICAgID0gY2xudC0+Y2xfdGltZW91dCwNCj4gPiA+ICAgICAgIH07DQo+ID4gPiAgICAg
ICByZXR1cm4gX19ycGNfY2xvbmVfY2xpZW50KCZhcmdzLCBjbG50KTsNCj4gPiA+ICB9DQo+ID4g
DQo+ID4gV2hhdCBkb2VzIHRoaXMgcGF0Y2ggaGF2ZSB0byBkbyB3aXRoIHBORlM/IFRoYXQncyB0
aGUgZ2VuZXJpYyBSUEMNCj4gPiBjbGllbnQgY2xvbmluZyBBUEkgeW91IGFyZSBjaGFuZ2luZy4N
Cj4gPiANCj4gPiBUaGUgcE5GUy9maWxlcyB0aW1lb3V0cyBhcmUgaW50ZW5kZWQgdG8gYmUgc2V0
IHVzaW5nIHRoZQ0KPiA+IGRhdGFzZXJ2ZXJfcmV0cmFucyBhbmQgZGF0YXNlcnZlcl90aW1lbyBt
b2R1bGUgcGFyYW1ldGVycyBkZXNjcmliZWQNCj4gPiBhdA0KPiA+IHRoZSBib3R0b20gb2YgZnMv
bmZzL2ZpbGVsYXlvdXQvZmlsZWxheW91dGRldi5jDQo+IA0KPiBPayBzbyBwZXJoYXBzIHRoZSBj
b2RlIG5lZWRzIHRvIHJlLXdyaXR0ZW4gc28gdGhhdCBpdCBhbGxvd3MgZm9yIHRoZQ0KPiBEUyB0
byBnZXQgYW4gcnBjIGNsaWVudCB3aXRoIGl0cyB0aW1lb3V0cyBzZXQuIFdoaWNoIGN1cnJlbnRs
eQ0KPiBkb2Vzbid0DQo+IGhhcHBlbi4NCj4gDQo+IEZyb20gd2hhdCBJIGNvdWxkIHRlbGwgdGhl
IERTIGNvZGUgdHJpZXMgdG8gc2V0IHRoZSB0aW1lb3V0IHZhbHVlcyBpbg0KPiBuZnM0X3NldF9k
c19jbGllbnQoKSBidXQgdGhhdCBoYXMgbm8gZWZmZWN0Lg0KPiANCj4gbmZzNF9maW5kX29yX2Ny
ZWF0ZV9kc19jbGllbnQoKSBjYWxscyBycGNfY2xvbmVfY2xpZW50X3NldF9hdXRoKCkNCj4gd2hp
Y2ggY3JlYXRlcyBhbiBycGMgY2xpZW50IGJ1dCB0aGUgdGltZW91dCB0aGF0IHdlcmUgc2V0IGFy
ZSBpZ25vcmVkDQo+IGFuZCBpbnN0ZWFkIHRoZSBycGMgY2xpZW50IGlzIGdldHRpbmcgY3JlYXRl
ZCB3aXRoIHRoaXMgMTBzIHRpbWVvdXQuDQo+IA0KPiAoYnV0IEkgdGhvdWdodCB0aGF0IGluIGdl
bmVyYWwgaXQgbWFkZSBzZW5zZSB0aGF0IGEgY2xvbmUgYWxzbyBjb3BpZXMNCj4gdGhlIHRpbWVv
dXQgdmFsdWVzKQ0KPiANCg0KSXQgZG9lcyBub3QgbWFrZSBzZW5zZSB3aGVuIHlvdSBjb25zaWRl
ciB0aGF0IHRoZSB0aW1lb3V0IGlzIGEgcGVyLQ0KdHJhbnNwb3J0IGF0dHJpYnV0ZS4NCg0KRldJ
VywgSSd2ZSBubyBpZGVhIHdoZXJlIHRoaXMgMTBzIHRpbWVvdXQgeW91IGFyZSBzZWVpbmcgaXMg
Y29taW5nDQpmcm9tLiBQZXJoYXBzIGl0IGlzIHdvcnRod2hpbGUgZmlndXJpbmcgdGhhdCBvdXQg
Zmlyc3Q/DQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWlu
ZXIsIEhhbW1lcnNwYWNlDQp0cm9uZC5teWtsZWJ1c3RAaGFtbWVyc3BhY2UuY29tDQoNCg==

2018-05-22 20:51:47

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [PATCH 1/1] [SUNRPC] make sure to clone timeout values

On Tue, May 22, 2018 at 3:56 PM, Trond Myklebust
<[email protected]> wrote:
> On Tue, 2018-05-22 at 15:28 -0400, Olga Kornievskaia wrote:
>> On Tue, May 22, 2018 at 3:03 PM, Trond Myklebust
>> <[email protected]> wrote:
>> > On Tue, 2018-05-22 at 14:40 -0400, Olga Kornievskaia wrote:
>> > > From: Olga Kornievskaia <[email protected]>
>> > >
>> > > For pNFS, the operations to DS currently timeout in 10s.
>> > > According
>> > > to the spec, the client must not be re-trying an NFSv4.1
>> > > operation
>> > > unless the connection was broken.
>> > >
>> > > Signed-off-by: Olga Kornievskaia <[email protected]>
>> > > ---
>> > > net/sunrpc/clnt.c | 1 +
>> > > 1 file changed, 1 insertion(+)
>> > >
>> > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
>> > > index 6e432ec..97517eb 100644
>> > > --- a/net/sunrpc/clnt.c
>> > > +++ b/net/sunrpc/clnt.c
>> > > @@ -668,6 +668,7 @@ struct rpc_clnt *
>> > > .prognumber = clnt->cl_prog,
>> > > .version = clnt->cl_vers,
>> > > .authflavor = flavor,
>> > > + .timeout = clnt->cl_timeout,
>> > > };
>> > > return __rpc_clone_client(&args, clnt);
>> > > }
>> >
>> > What does this patch have to do with pNFS? That's the generic RPC
>> > client cloning API you are changing.
>> >
>> > The pNFS/files timeouts are intended to be set using the
>> > dataserver_retrans and dataserver_timeo module parameters described
>> > at
>> > the bottom of fs/nfs/filelayout/filelayoutdev.c
>>
>> Ok so perhaps the code needs to re-written so that it allows for the
>> DS to get an rpc client with its timeouts set. Which currently
>> doesn't
>> happen.
>>
>> From what I could tell the DS code tries to set the timeout values in
>> nfs4_set_ds_client() but that has no effect.
>>
>> nfs4_find_or_create_ds_client() calls rpc_clone_client_set_auth()
>> which creates an rpc client but the timeout that were set are ignored
>> and instead the rpc client is getting created with this 10s timeout.
>>
>> (but I thought that in general it made sense that a clone also copies
>> the timeout values)
>>
>
> It does not make sense when you consider that the timeout is a per-
> transport attribute.
>
> FWIW, I've no idea where this 10s timeout you are seeing is coming
> from. Perhaps it is worthwhile figuring that out first?

Besides the value of the 10s (which I also have been having a really
hard time figuring out) it's also the max timeout and the fact that,
after the 10s are up it's giving up and failing the operation which
is then re-tried against the MDS. This shouldn't happen. So I felt
like even if that value was 60s, it shouldn't have timed out after 60s
and re-tried (without the fix that I'm proposing).

I'll give it a bit more to figure out where 10s is coming from.