2016-10-21 00:01:23

by NeilBrown

[permalink] [raw]
Subject: [PATCH] NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.


Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL
'ds', then ds->ds_clp will also be non-NULL.

This is not necessasrily true in the case when the process received a
fatal signal while nfs4_pnfs_ds_connect is waiting in
nfs4_wait_ds_connect(). In that case ->ds_clp may not be set, and the
devid may not recently have been marked unavailable.

So add a test for ->ds_clp == NULL and return NULL in that case.

Fixes: c23266d532b4 ("NFS4.1 Fix data server connection race")
Signed-off-by: NeilBrown <[email protected]>
---
fs/nfs/filelayout/filelayoutdev.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c
index 4946ef40ba87..85ef38f9765f 100644
--- a/fs/nfs/filelayout/filelayoutdev.c
+++ b/fs/nfs/filelayout/filelayoutdev.c
@@ -283,7 +283,8 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
s->nfs_client->cl_rpcclient->cl_auth->au_flavor);

out_test_devid:
- if (filelayout_test_devid_unavailable(devid))
+ if (ret->ds_clp == NULL ||
+ filelayout_test_devid_unavailable(devid))
ret = NULL;
out:
return ret;
--
2.10.1


Attachments:
signature.asc (800.00 B)

2016-11-18 05:01:17

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.


Ping?
No sign of this in linux-next, and no replies....

Thanks,
NeilBrown


On Fri, Oct 21 2016, NeilBrown wrote:

>
> Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL
> 'ds', then ds->ds_clp will also be non-NULL.
>
> This is not necessasrily true in the case when the process received a
> fatal signal while nfs4_pnfs_ds_connect is waiting in
> nfs4_wait_ds_connect(). In that case ->ds_clp may not be set, and the
> devid may not recently have been marked unavailable.
>
> So add a test for ->ds_clp == NULL and return NULL in that case.
>
> Fixes: c23266d532b4 ("NFS4.1 Fix data server connection race")
> Signed-off-by: NeilBrown <[email protected]>
> ---
> fs/nfs/filelayout/filelayoutdev.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c
> index 4946ef40ba87..85ef38f9765f 100644
> --- a/fs/nfs/filelayout/filelayoutdev.c
> +++ b/fs/nfs/filelayout/filelayoutdev.c
> @@ -283,7 +283,8 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
> s->nfs_client->cl_rpcclient->cl_auth->au_flavor);
>
> out_test_devid:
> - if (filelayout_test_devid_unavailable(devid))
> + if (ret->ds_clp == NULL ||
> + filelayout_test_devid_unavailable(devid))
> ret = NULL;
> out:
> return ret;
> --
> 2.10.1


Attachments:
signature.asc (800.00 B)

2016-11-18 14:32:42

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.

DQo+IE9uIE5vdiAxOCwgMjAxNiwgYXQgMDA6MDEsIE5laWxCcm93biA8bmVpbGJAc3VzZS5jb20+
IHdyb3RlOg0KPiANCj4gDQo+IFBpbmc/DQo+IE5vIHNpZ24gb2YgdGhpcyBpbiBsaW51eC1uZXh0
LCBhbmQgbm8gcmVwbGllc+KApi4NCg0KSeKAmWQgbGlrZSBzb21lIGNvbmZpcm1hdGlvbiBmcm9t
IHRoZSBOZXRBcHAgZm9sa3Mgb24gdGhpcy4gVGhleSBhcmUgdGhlIG1haW4gc3Rha2Vob2xkZXJz
IGZvciB0aGUgcE5GUyBmaWxlcyBpbXBsZW1lbnRhdGlvbi4gSSBkb27igJl0IHRoaW5rIHdlIG5l
ZWQgYW4gZXF1aXZhbGVudCBmb3IgZmxleGZpbGVzLg0KDQo+IA0KPiBUaGFua3MsDQo+IE5laWxC
cm93bg0KPiANCj4gDQo+IE9uIEZyaSwgT2N0IDIxIDIwMTYsIE5laWxCcm93biB3cm90ZToNCj4g
DQo+PiANCj4+IFZhcmlvdXMgcGxhY2VzIGFzc3VtZSB0aGF0IGlmIG5mczRfZmxfcHJlcGFyZV9k
cygpIHR1cm5zIGEgbm9uLU5VTEwNCj4+ICdkcycsIHRoZW4gZHMtPmRzX2NscCB3aWxsIGFsc28g
YmUgbm9uLU5VTEwuDQo+PiANCj4+IFRoaXMgaXMgbm90IG5lY2Vzc2FzcmlseSB0cnVlIGluIHRo
ZSBjYXNlIHdoZW4gdGhlIHByb2Nlc3MgcmVjZWl2ZWQgYQ0KPj4gZmF0YWwgc2lnbmFsIHdoaWxl
IG5mczRfcG5mc19kc19jb25uZWN0IGlzIHdhaXRpbmcgaW4NCj4+IG5mczRfd2FpdF9kc19jb25u
ZWN0KCkuICBJbiB0aGF0IGNhc2UgLT5kc19jbHAgbWF5IG5vdCBiZSBzZXQsIGFuZCB0aGUNCj4+
IGRldmlkIG1heSBub3QgcmVjZW50bHkgaGF2ZSBiZWVuIG1hcmtlZCB1bmF2YWlsYWJsZS4NCj4+
IA0KPj4gU28gYWRkIGEgdGVzdCBmb3IgLT5kc19jbHAgPT0gTlVMTCBhbmQgcmV0dXJuIE5VTEwg
aW4gdGhhdCBjYXNlLg0KPj4gDQo+PiBGaXhlczogYzIzMjY2ZDUzMmI0ICgiTkZTNC4xIEZpeCBk
YXRhIHNlcnZlciBjb25uZWN0aW9uIHJhY2UiKQ0KPj4gU2lnbmVkLW9mZi1ieTogTmVpbEJyb3du
IDxuZWlsYkBzdXNlLmNvbT4NCj4+IC0tLQ0KPj4gZnMvbmZzL2ZpbGVsYXlvdXQvZmlsZWxheW91
dGRldi5jIHwgMyArKy0NCj4+IDEgZmlsZSBjaGFuZ2VkLCAyIGluc2VydGlvbnMoKyksIDEgZGVs
ZXRpb24oLSkNCj4+IA0KPj4gZGlmZiAtLWdpdCBhL2ZzL25mcy9maWxlbGF5b3V0L2ZpbGVsYXlv
dXRkZXYuYyBiL2ZzL25mcy9maWxlbGF5b3V0L2ZpbGVsYXlvdXRkZXYuYw0KPj4gaW5kZXggNDk0
NmVmNDBiYTg3Li44NWVmMzhmOTc2NWYgMTAwNjQ0DQo+PiAtLS0gYS9mcy9uZnMvZmlsZWxheW91
dC9maWxlbGF5b3V0ZGV2LmMNCj4+ICsrKyBiL2ZzL25mcy9maWxlbGF5b3V0L2ZpbGVsYXlvdXRk
ZXYuYw0KPj4gQEAgLTI4Myw3ICsyODMsOCBAQCBuZnM0X2ZsX3ByZXBhcmVfZHMoc3RydWN0IHBu
ZnNfbGF5b3V0X3NlZ21lbnQgKmxzZWcsIHUzMiBkc19pZHgpDQo+PiAJCQkgICAgIHMtPm5mc19j
bGllbnQtPmNsX3JwY2NsaWVudC0+Y2xfYXV0aC0+YXVfZmxhdm9yKTsNCj4+IA0KPj4gb3V0X3Rl
c3RfZGV2aWQ6DQo+PiAtCWlmIChmaWxlbGF5b3V0X3Rlc3RfZGV2aWRfdW5hdmFpbGFibGUoZGV2
aWQpKQ0KPj4gKwlpZiAocmV0LT5kc19jbHAgPT0gTlVMTCB8fA0KPj4gKwkgICAgZmlsZWxheW91
dF90ZXN0X2RldmlkX3VuYXZhaWxhYmxlKGRldmlkKSkNCj4+IAkJcmV0ID0gTlVMTDsNCj4+IG91
dDoNCj4+IAlyZXR1cm4gcmV0Ow0KPj4gLS0gDQo+PiAyLjEwLjENCg0K


2016-11-18 17:34:40

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [PATCH] NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.

On Fri, Nov 18, 2016 at 9:32 AM, Trond Myklebust
<[email protected]> wrote:
>
>> On Nov 18, 2016, at 00:01, NeilBrown <[email protected]> wrote:
>>
>>
>> Ping?
>> No sign of this in linux-next, and no replies=E2=80=A6.
>
> I=E2=80=99d like some confirmation from the NetApp folks on this. They ar=
e the main stakeholders for the pNFS files implementation. I don=E2=80=99t =
think we need an equivalent for flex files.

Just to see if I understand this patch. If "ds" isn't NULL, then the
client has an RPC client with the ds. But it doesn't have a valid NFS
client? Looking at the code I really don't see how that can happen.
Maybe there is a bug elsewhere? If we return here, is there a chance
we are going to have a zombie RPC client with the ds? If that's not a
problem then I don't think there an issue to assume that if there is
no valid NFS client then we would want to return a NULL from
nfs4_fl_prepare_ds().


>
>>
>> Thanks,
>> NeilBrown
>>
>>
>> On Fri, Oct 21 2016, NeilBrown wrote:
>>
>>>
>>> Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL
>>> 'ds', then ds->ds_clp will also be non-NULL.
>>>
>>> This is not necessasrily true in the case when the process received a
>>> fatal signal while nfs4_pnfs_ds_connect is waiting in
>>> nfs4_wait_ds_connect(). In that case ->ds_clp may not be set, and the
>>> devid may not recently have been marked unavailable.
>>>
>>> So add a test for ->ds_clp =3D=3D NULL and return NULL in that case.
>>>
>>> Fixes: c23266d532b4 ("NFS4.1 Fix data server connection race")
>>> Signed-off-by: NeilBrown <[email protected]>
>>> ---
>>> fs/nfs/filelayout/filelayoutdev.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/file=
layoutdev.c
>>> index 4946ef40ba87..85ef38f9765f 100644
>>> --- a/fs/nfs/filelayout/filelayoutdev.c
>>> +++ b/fs/nfs/filelayout/filelayoutdev.c
>>> @@ -283,7 +283,8 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg=
, u32 ds_idx)
>>> s->nfs_client->cl_rpcclient->cl_auth->au_flav=
or);
>>>
>>> out_test_devid:
>>> - if (filelayout_test_devid_unavailable(devid))
>>> + if (ret->ds_clp =3D=3D NULL ||
>>> + filelayout_test_devid_unavailable(devid))
>>> ret =3D NULL;
>>> out:
>>> return ret;
>>> --
>>> 2.10.1
>

2016-11-19 06:33:23

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.

On Sat, Nov 19 2016, Olga Kornievskaia wrote:

> On Fri, Nov 18, 2016 at 9:32 AM, Trond Myklebust
> <[email protected]> wrote:
>>
>>> On Nov 18, 2016, at 00:01, NeilBrown <[email protected]> wrote:
>>>
>>>
>>> Ping?
>>> No sign of this in linux-next, and no replies….
>>
>> I’d like some confirmation from the NetApp folks on this. They are the main stakeholders for the pNFS files implementation. I don’t think we need an equivalent for flex files.
>
> Just to see if I understand this patch. If "ds" isn't NULL, then the
> client has an RPC client with the ds. But it doesn't have a valid NFS
> client? Looking at the code I really don't see how that can happen.

I thought I explained that, but I was rather brief.

nfs4_fl_prepare_ds() calls nfs4_pnfs_ds_connect() in order to create the
NFS client. The first time it is called it will set NFS4DS_CONNECTING
and then call _nfs4_pnfs_v?_ds_connect() which will establish the
connection and set ds->ds_clp. If the server is unresponsive, this an
block indefinitely.

If a second thread then tries the same time, it will enter
nfs4_pnfs_ds_connect() and will discovery that NFS4DS_CONNECTING is
already set, so it will call nfs4_wait_ds_connect().

As nfs4_wait_ds_connect() waits with TASK_KILLABLE, it will abort if the
process is kill by an uncaught signal (such as SIGKILL).
In this case it will return even though NFS4DS_CONNECTING is still set,
and ds->ds_clp is still NULL. i.e. the first thread hasn't established
the connection yet.

As the process has been killed, the 'ds' that it holds that doesn't have
an NFS client handle will never be used. But we need to be sure that
the process will exit cleanly without trying to de-reference that NULL
ds_clp.

That is what the patch ensures.


> Maybe there is a bug elsewhere? If we return here, is there a chance
> we are going to have a zombie RPC client with the ds? If that's not a
> problem then I don't think there an issue to assume that if there is
> no valid NFS client then we would want to return a NULL from
> nfs4_fl_prepare_ds().

It isn't a "zombie RPC client", but rather an unborn RPC client, which
still has NFS4DS_CONNECTING set.

Thanks,
NeilBrown


>
>
>>
>>>
>>> Thanks,
>>> NeilBrown
>>>
>>>
>>> On Fri, Oct 21 2016, NeilBrown wrote:
>>>
>>>>
>>>> Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL
>>>> 'ds', then ds->ds_clp will also be non-NULL.
>>>>
>>>> This is not necessasrily true in the case when the process received a
>>>> fatal signal while nfs4_pnfs_ds_connect is waiting in
>>>> nfs4_wait_ds_connect(). In that case ->ds_clp may not be set, and the
>>>> devid may not recently have been marked unavailable.
>>>>
>>>> So add a test for ->ds_clp == NULL and return NULL in that case.
>>>>
>>>> Fixes: c23266d532b4 ("NFS4.1 Fix data server connection race")
>>>> Signed-off-by: NeilBrown <[email protected]>
>>>> ---
>>>> fs/nfs/filelayout/filelayoutdev.c | 3 ++-
>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c
>>>> index 4946ef40ba87..85ef38f9765f 100644
>>>> --- a/fs/nfs/filelayout/filelayoutdev.c
>>>> +++ b/fs/nfs/filelayout/filelayoutdev.c
>>>> @@ -283,7 +283,8 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
>>>> s->nfs_client->cl_rpcclient->cl_auth->au_flavor);
>>>>
>>>> out_test_devid:
>>>> - if (filelayout_test_devid_unavailable(devid))
>>>> + if (ret->ds_clp == NULL ||
>>>> + filelayout_test_devid_unavailable(devid))
>>>> ret = NULL;
>>>> out:
>>>> return ret;
>>>> --
>>>> 2.10.1
>>


Attachments:
signature.asc (800.00 B)

2016-11-22 20:32:31

by Adamson, Andy

[permalink] [raw]
Subject: Re: [PATCH] NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.

VGhlIHBhdGNoIGxvb2tzIGdvb2QgdG8gbWUuIFdlIGRvIG5lZWQgdG8gZW5zdXJlIHRoYXQgZHNf
Y2xwIGlzIHNldCBiZWZvcmUgcmV0dXJuaW5nIGEgbm9uLW51bGwgbmZzNF9wbmZzX2RzIGZyb20g
bmZzNF9mbF9wcmVwYXJlX2RzLg0KQXMgVHJvbmQgcG9pbnRlZCBvdXQsIG5mczRfZmZfbGF5b3V0
X3ByZXBhcmVfZHMgYWxyZWFkeSBkb2VzIHRoaXMuDQoNCuKGkkFuZHkNCg0KT24gMTEvMTkvMTYs
IDE6MzMgQU0sICJOZWlsQnJvd24iIDxuZWlsYkBzdXNlLmNvbT4gd3JvdGU6DQoNCk9uIFNhdCwg
Tm92IDE5IDIwMTYsIE9sZ2EgS29ybmlldnNrYWlhIHdyb3RlOg0KDQo+IE9uIEZyaSwgTm92IDE4
LCAyMDE2IGF0IDk6MzIgQU0sIFRyb25kIE15a2xlYnVzdA0KPiA8dHJvbmRteUBwcmltYXJ5ZGF0
YS5jb20+IHdyb3RlOg0KPj4NCj4+PiBPbiBOb3YgMTgsIDIwMTYsIGF0IDAwOjAxLCBOZWlsQnJv
d24gPG5laWxiQHN1c2UuY29tPiB3cm90ZToNCj4+Pg0KPj4+DQo+Pj4gUGluZz8NCj4+PiBObyBz
aWduIG9mIHRoaXMgaW4gbGludXgtbmV4dCwgYW5kIG5vIHJlcGxpZXPigKYuDQo+Pg0KPj4gSeKA
mWQgbGlrZSBzb21lIGNvbmZpcm1hdGlvbiBmcm9tIHRoZSBOZXRBcHAgZm9sa3Mgb24gdGhpcy4g
VGhleSBhcmUgdGhlIG1haW4gc3Rha2Vob2xkZXJzIGZvciB0aGUgcE5GUyBmaWxlcyBpbXBsZW1l
bnRhdGlvbi4gSSBkb27igJl0IHRoaW5rIHdlIG5lZWQgYW4gZXF1aXZhbGVudCBmb3IgZmxleCBm
aWxlcy4NCj4NCj4gSnVzdCB0byBzZWUgaWYgSSB1bmRlcnN0YW5kIHRoaXMgcGF0Y2guIElmICJk
cyIgaXNuJ3QgTlVMTCwgdGhlbiB0aGUNCj4gY2xpZW50IGhhcyBhbiBSUEMgY2xpZW50IHdpdGgg
dGhlIGRzLiBCdXQgaXQgZG9lc24ndCBoYXZlIGEgdmFsaWQgTkZTDQo+IGNsaWVudD8gTG9va2lu
ZyBhdCB0aGUgY29kZSBJIHJlYWxseSBkb24ndCBzZWUgaG93IHRoYXQgY2FuIGhhcHBlbi4NCg0K
SSB0aG91Z2h0IEkgZXhwbGFpbmVkIHRoYXQsIGJ1dCBJIHdhcyByYXRoZXIgYnJpZWYuDQoNCm5m
czRfZmxfcHJlcGFyZV9kcygpIGNhbGxzIG5mczRfcG5mc19kc19jb25uZWN0KCkgaW4gb3JkZXIg
dG8gY3JlYXRlIHRoZQ0KTkZTIGNsaWVudC4gIFRoZSBmaXJzdCB0aW1lIGl0IGlzIGNhbGxlZCBp
dCB3aWxsIHNldCBORlM0RFNfQ09OTkVDVElORw0KYW5kIHRoZW4gY2FsbCBfbmZzNF9wbmZzX3Y/
X2RzX2Nvbm5lY3QoKSB3aGljaCB3aWxsIGVzdGFibGlzaCB0aGUNCmNvbm5lY3Rpb24gYW5kIHNl
dCBkcy0+ZHNfY2xwLiAgSWYgdGhlIHNlcnZlciBpcyB1bnJlc3BvbnNpdmUsIHRoaXMgYW4NCmJs
b2NrIGluZGVmaW5pdGVseS4NCg0KSWYgYSBzZWNvbmQgdGhyZWFkIHRoZW4gdHJpZXMgdGhlIHNh
bWUgdGltZSwgaXQgd2lsbCBlbnRlcg0KbmZzNF9wbmZzX2RzX2Nvbm5lY3QoKSBhbmQgd2lsbCBk
aXNjb3ZlcnkgdGhhdCBORlM0RFNfQ09OTkVDVElORyBpcw0KYWxyZWFkeSBzZXQsIHNvIGl0IHdp
bGwgY2FsbCBuZnM0X3dhaXRfZHNfY29ubmVjdCgpLg0KDQpBcyBuZnM0X3dhaXRfZHNfY29ubmVj
dCgpIHdhaXRzIHdpdGggVEFTS19LSUxMQUJMRSwgaXQgd2lsbCBhYm9ydCBpZiB0aGUNCnByb2Nl
c3MgaXMga2lsbCBieSBhbiB1bmNhdWdodCBzaWduYWwgKHN1Y2ggYXMgU0lHS0lMTCkuDQpJbiB0
aGlzIGNhc2UgaXQgd2lsbCByZXR1cm4gZXZlbiB0aG91Z2ggTkZTNERTX0NPTk5FQ1RJTkcgaXMg
c3RpbGwgc2V0LA0KYW5kIGRzLT5kc19jbHAgaXMgc3RpbGwgTlVMTC4gIGkuZS4gdGhlIGZpcnN0
IHRocmVhZCBoYXNuJ3QgZXN0YWJsaXNoZWQNCnRoZSBjb25uZWN0aW9uIHlldC4NCg0KQXMgdGhl
IHByb2Nlc3MgaGFzIGJlZW4ga2lsbGVkLCB0aGUgJ2RzJyB0aGF0IGl0IGhvbGRzIHRoYXQgZG9l
c24ndCBoYXZlDQphbiBORlMgY2xpZW50IGhhbmRsZSB3aWxsIG5ldmVyIGJlIHVzZWQuICBCdXQg
d2UgbmVlZCB0byBiZSBzdXJlIHRoYXQNCnRoZSBwcm9jZXNzIHdpbGwgZXhpdCBjbGVhbmx5IHdp
dGhvdXQgdHJ5aW5nIHRvIGRlLXJlZmVyZW5jZSB0aGF0IE5VTEwNCmRzX2NscC4NCg0KVGhhdCBp
cyB3aGF0IHRoZSBwYXRjaCBlbnN1cmVzLg0KDQoNCj4gTWF5YmUgdGhlcmUgaXMgYSBidWcgZWxz
ZXdoZXJlPyBJZiB3ZSByZXR1cm4gaGVyZSwgaXMgdGhlcmUgYSBjaGFuY2UNCj4gd2UgYXJlIGdv
aW5nIHRvIGhhdmUgYSB6b21iaWUgUlBDIGNsaWVudCB3aXRoIHRoZSBkcz8gSWYgdGhhdCdzIG5v
dCBhDQo+IHByb2JsZW0gdGhlbiBJIGRvbid0IHRoaW5rIHRoZXJlIGFuIGlzc3VlIHRvIGFzc3Vt
ZSB0aGF0IGlmIHRoZXJlIGlzDQo+IG5vIHZhbGlkIE5GUyBjbGllbnQgdGhlbiB3ZSB3b3VsZCB3
YW50IHRvIHJldHVybiBhIE5VTEwgZnJvbQ0KPiBuZnM0X2ZsX3ByZXBhcmVfZHMoKS4NCg0KSXQg
aXNuJ3QgYSAiem9tYmllIFJQQyBjbGllbnQiLCBidXQgcmF0aGVyIGFuIHVuYm9ybiBSUEMgY2xp
ZW50LCB3aGljaA0Kc3RpbGwgaGFzIE5GUzREU19DT05ORUNUSU5HIHNldC4NCg0KVGhhbmtzLA0K
TmVpbEJyb3duDQoNCg0KPg0KPg0KPj4NCj4+Pg0KPj4+IFRoYW5rcywNCj4+PiBOZWlsQnJvd24N
Cj4+Pg0KPj4+DQo+Pj4gT24gRnJpLCBPY3QgMjEgMjAxNiwgTmVpbEJyb3duIHdyb3RlOg0KPj4+
DQo+Pj4+DQo+Pj4+IFZhcmlvdXMgcGxhY2VzIGFzc3VtZSB0aGF0IGlmIG5mczRfZmxfcHJlcGFy
ZV9kcygpIHR1cm5zIGEgbm9uLU5VTEwNCj4+Pj4gJ2RzJywgdGhlbiBkcy0+ZHNfY2xwIHdpbGwg
YWxzbyBiZSBub24tTlVMTC4NCj4+Pj4NCj4+Pj4gVGhpcyBpcyBub3QgbmVjZXNzYXNyaWx5IHRy
dWUgaW4gdGhlIGNhc2Ugd2hlbiB0aGUgcHJvY2VzcyByZWNlaXZlZCBhDQo+Pj4+IGZhdGFsIHNp
Z25hbCB3aGlsZSBuZnM0X3BuZnNfZHNfY29ubmVjdCBpcyB3YWl0aW5nIGluDQo+Pj4+IG5mczRf
d2FpdF9kc19jb25uZWN0KCkuICBJbiB0aGF0IGNhc2UgLT5kc19jbHAgbWF5IG5vdCBiZSBzZXQs
IGFuZCB0aGUNCj4+Pj4gZGV2aWQgbWF5IG5vdCByZWNlbnRseSBoYXZlIGJlZW4gbWFya2VkIHVu
YXZhaWxhYmxlLg0KPj4+Pg0KPj4+PiBTbyBhZGQgYSB0ZXN0IGZvciAtPmRzX2NscCA9PSBOVUxM
IGFuZCByZXR1cm4gTlVMTCBpbiB0aGF0IGNhc2UuDQo+Pj4+DQo+Pj4+IEZpeGVzOiBjMjMyNjZk
NTMyYjQgKCJORlM0LjEgRml4IGRhdGEgc2VydmVyIGNvbm5lY3Rpb24gcmFjZSIpDQo+Pj4+IFNp
Z25lZC1vZmYtYnk6IE5laWxCcm93biA8bmVpbGJAc3VzZS5jb20+DQo+Pj4+IC0tLQ0KPj4+PiBm
cy9uZnMvZmlsZWxheW91dC9maWxlbGF5b3V0ZGV2LmMgfCAzICsrLQ0KPj4+PiAxIGZpbGUgY2hh
bmdlZCwgMiBpbnNlcnRpb25zKCspLCAxIGRlbGV0aW9uKC0pDQo+Pj4+DQo+Pj4+IGRpZmYgLS1n
aXQgYS9mcy9uZnMvZmlsZWxheW91dC9maWxlbGF5b3V0ZGV2LmMgYi9mcy9uZnMvZmlsZWxheW91
dC9maWxlbGF5b3V0ZGV2LmMNCj4+Pj4gaW5kZXggNDk0NmVmNDBiYTg3Li44NWVmMzhmOTc2NWYg
MTAwNjQ0DQo+Pj4+IC0tLSBhL2ZzL25mcy9maWxlbGF5b3V0L2ZpbGVsYXlvdXRkZXYuYw0KPj4+
PiArKysgYi9mcy9uZnMvZmlsZWxheW91dC9maWxlbGF5b3V0ZGV2LmMNCj4+Pj4gQEAgLTI4Myw3
ICsyODMsOCBAQCBuZnM0X2ZsX3ByZXBhcmVfZHMoc3RydWN0IHBuZnNfbGF5b3V0X3NlZ21lbnQg
KmxzZWcsIHUzMiBkc19pZHgpDQo+Pj4+ICAgICAgICAgICAgICAgICAgICAgICAgICAgcy0+bmZz
X2NsaWVudC0+Y2xfcnBjY2xpZW50LT5jbF9hdXRoLT5hdV9mbGF2b3IpOw0KPj4+Pg0KPj4+PiBv
dXRfdGVzdF9kZXZpZDoNCj4+Pj4gLSAgICBpZiAoZmlsZWxheW91dF90ZXN0X2RldmlkX3VuYXZh
aWxhYmxlKGRldmlkKSkNCj4+Pj4gKyAgICBpZiAocmV0LT5kc19jbHAgPT0gTlVMTCB8fA0KPj4+
PiArICAgICAgICBmaWxlbGF5b3V0X3Rlc3RfZGV2aWRfdW5hdmFpbGFibGUoZGV2aWQpKQ0KPj4+
PiAgICAgICAgICAgICAgcmV0ID0gTlVMTDsNCj4+Pj4gb3V0Og0KPj4+PiAgICAgIHJldHVybiBy
ZXQ7DQo+Pj4+IC0tDQo+Pj4+IDIuMTAuMQ0KPj4NCg0KDQo=

2016-11-23 17:38:02

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [PATCH] NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.

On Sat, Nov 19, 2016 at 1:33 AM, NeilBrown <[email protected]> wrote:
> On Sat, Nov 19 2016, Olga Kornievskaia wrote:
>
>> On Fri, Nov 18, 2016 at 9:32 AM, Trond Myklebust
>> <[email protected]> wrote:
>>>
>>>> On Nov 18, 2016, at 00:01, NeilBrown <[email protected]> wrote:
>>>>
>>>>
>>>> Ping?
>>>> No sign of this in linux-next, and no replies=E2=80=A6.
>>>
>>> I=E2=80=99d like some confirmation from the NetApp folks on this. They =
are the main stakeholders for the pNFS files implementation. I don=E2=80=99=
t think we need an equivalent for flex files.
>>
>> Just to see if I understand this patch. If "ds" isn't NULL, then the
>> client has an RPC client with the ds. But it doesn't have a valid NFS
>> client? Looking at the code I really don't see how that can happen.
>
> I thought I explained that, but I was rather brief.
>
> nfs4_fl_prepare_ds() calls nfs4_pnfs_ds_connect() in order to create the
> NFS client. The first time it is called it will set NFS4DS_CONNECTING
> and then call _nfs4_pnfs_v?_ds_connect() which will establish the
> connection and set ds->ds_clp. If the server is unresponsive, this an
> block indefinitely.
>
> If a second thread then tries the same time, it will enter
> nfs4_pnfs_ds_connect() and will discovery that NFS4DS_CONNECTING is
> already set, so it will call nfs4_wait_ds_connect().
>
> As nfs4_wait_ds_connect() waits with TASK_KILLABLE, it will abort if the
> process is kill by an uncaught signal (such as SIGKILL).
> In this case it will return even though NFS4DS_CONNECTING is still set,
> and ds->ds_clp is still NULL. i.e. the first thread hasn't established
> the connection yet.
>
> As the process has been killed, the 'ds' that it holds that doesn't have
> an NFS client handle will never be used. But we need to be sure that
> the process will exit cleanly without trying to de-reference that NULL
> ds_clp.
>
> That is what the patch ensures.

Thanks for the explanation. Makes sense.

>> Maybe there is a bug elsewhere? If we return here, is there a chance
>> we are going to have a zombie RPC client with the ds? If that's not a
>> problem then I don't think there an issue to assume that if there is
>> no valid NFS client then we would want to return a NULL from
>> nfs4_fl_prepare_ds().
>
> It isn't a "zombie RPC client", but rather an unborn RPC client, which
> still has NFS4DS_CONNECTING set.
>
> Thanks,
> NeilBrown
>
>
>>
>>
>>>
>>>>
>>>> Thanks,
>>>> NeilBrown
>>>>
>>>>
>>>> On Fri, Oct 21 2016, NeilBrown wrote:
>>>>
>>>>>
>>>>> Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL
>>>>> 'ds', then ds->ds_clp will also be non-NULL.
>>>>>
>>>>> This is not necessasrily true in the case when the process received a
>>>>> fatal signal while nfs4_pnfs_ds_connect is waiting in
>>>>> nfs4_wait_ds_connect(). In that case ->ds_clp may not be set, and th=
e
>>>>> devid may not recently have been marked unavailable.
>>>>>
>>>>> So add a test for ->ds_clp =3D=3D NULL and return NULL in that case.
>>>>>
>>>>> Fixes: c23266d532b4 ("NFS4.1 Fix data server connection race")
>>>>> Signed-off-by: NeilBrown <[email protected]>
>>>>> ---
>>>>> fs/nfs/filelayout/filelayoutdev.c | 3 ++-
>>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/fi=
lelayoutdev.c
>>>>> index 4946ef40ba87..85ef38f9765f 100644
>>>>> --- a/fs/nfs/filelayout/filelayoutdev.c
>>>>> +++ b/fs/nfs/filelayout/filelayoutdev.c
>>>>> @@ -283,7 +283,8 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *ls=
eg, u32 ds_idx)
>>>>> s->nfs_client->cl_rpcclient->cl_auth->au_fl=
avor);
>>>>>
>>>>> out_test_devid:
>>>>> - if (filelayout_test_devid_unavailable(devid))
>>>>> + if (ret->ds_clp =3D=3D NULL ||
>>>>> + filelayout_test_devid_unavailable(devid))
>>>>> ret =3D NULL;
>>>>> out:
>>>>> return ret;
>>>>> --
>>>>> 2.10.1
>>>

2016-12-19 00:19:42

by NeilBrown

[permalink] [raw]
Subject: [PATCH resend] NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.


Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL 'ds',
then ds->ds_clp will also be non-NULL.

This is not necessasrily true in the case when the process received a fatal signal
while nfs4_pnfs_ds_connect is waiting in nfs4_wait_ds_connect().
In that case ->ds_clp may not be set, and the devid may not recently have been marked
unavailable.

So add a test for ds_clp == NULL and return NULL in that case.

Fixes: c23266d532b4 ("NFS4.1 Fix data server connection race")
Signed-off-by: NeilBrown <[email protected]>
Acked-by: Olga Kornievskaia <[email protected]>
Acked-by: Adamson, Andy <[email protected]>
---

Hi Trond,
I just noticed that this wasn't in your 4.10 pull request. So I've
added Acked-bys from Andy and Olga and am resending.

Thanks,
NeilBrown


fs/nfs/filelayout/filelayoutdev.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/filelayout/filelayoutdev.c b/fs/nfs/filelayout/filelayoutdev.c
index a5589b791439..f956ca20a8a3 100644
--- a/fs/nfs/filelayout/filelayoutdev.c
+++ b/fs/nfs/filelayout/filelayoutdev.c
@@ -282,7 +282,8 @@ nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
s->nfs_client->cl_minorversion);

out_test_devid:
- if (filelayout_test_devid_unavailable(devid))
+ if (ret->ds_clp == NULL ||
+ filelayout_test_devid_unavailable(devid))
ret = NULL;
out:
return ret;
--
2.11.0


Attachments:
signature.asc (832.00 B)

2016-12-19 00:28:39

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH resend] NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.

DQo+IE9uIERlYyAxOCwgMjAxNiwgYXQgMTk6MTksIE5laWxCcm93biA8bmVpbGJAc3VzZS5jb20+
IHdyb3RlOg0KPiANCj4gDQo+IFZhcmlvdXMgcGxhY2VzIGFzc3VtZSB0aGF0IGlmIG5mczRfZmxf
cHJlcGFyZV9kcygpIHR1cm5zIGEgbm9uLU5VTEwgJ2RzJywNCj4gdGhlbiBkcy0+ZHNfY2xwIHdp
bGwgYWxzbyBiZSBub24tTlVMTC4NCj4gDQo+IFRoaXMgaXMgbm90IG5lY2Vzc2FzcmlseSB0cnVl
IGluIHRoZSBjYXNlIHdoZW4gdGhlIHByb2Nlc3MgcmVjZWl2ZWQgYSBmYXRhbCBzaWduYWwNCj4g
d2hpbGUgbmZzNF9wbmZzX2RzX2Nvbm5lY3QgaXMgd2FpdGluZyBpbiBuZnM0X3dhaXRfZHNfY29u
bmVjdCgpLg0KPiBJbiB0aGF0IGNhc2UgLT5kc19jbHAgbWF5IG5vdCBiZSBzZXQsIGFuZCB0aGUg
ZGV2aWQgbWF5IG5vdCByZWNlbnRseSBoYXZlIGJlZW4gbWFya2VkDQo+IHVuYXZhaWxhYmxlLg0K
PiANCj4gU28gYWRkIGEgdGVzdCBmb3IgZHNfY2xwID09IE5VTEwgYW5kIHJldHVybiBOVUxMIGlu
IHRoYXQgY2FzZS4NCj4gDQo+IEZpeGVzOiBjMjMyNjZkNTMyYjQgKCJORlM0LjEgRml4IGRhdGEg
c2VydmVyIGNvbm5lY3Rpb24gcmFjZSIpDQo+IFNpZ25lZC1vZmYtYnk6IE5laWxCcm93biA8bmVp
bGJAc3VzZS5jb20+DQo+IEFja2VkLWJ5OiBPbGdhIEtvcm5pZXZza2FpYSA8YWdsb0B1bWljaC5l
ZHU+DQo+IEFja2VkLWJ5OiBBZGFtc29uLCBBbmR5IDxXaWxsaWFtLkFkYW1zb25AbmV0YXBwLmNv
bT4NCj4gLS0tDQo+IA0KPiBIaSBUcm9uZCwNCj4gSSBqdXN0IG5vdGljZWQgdGhhdCB0aGlzIHdh
c24ndCBpbiB5b3VyIDQuMTAgcHVsbCByZXF1ZXN0LiAgU28gSSd2ZQ0KPiBhZGRlZCBBY2tlZC1i
eXMgZnJvbSBBbmR5IGFuZCBPbGdhIGFuZCBhbSByZXNlbmRpbmcuDQo+IA0KDQpTb3JyeSBmb3Ig
aGF2aW5nIG1pc3NlZCB0aGVpciBBY2tlZC1ieXMgZWFybGllcuKApiBJ4oCZbGwgc2VuZCB0aGlz
IHRvZ2V0aGVyIHdpdGggdGhlIG90aGVyIDEwIHBhdGNoZXMgdG9tb3Jyb3cuDQoNCkNoZWVycw0K
ICBUcm9uZA0KDQo=