nfs4_proc_exchange_id() can return -EINVAL if the server
reported NFS4INVAL (which I have seen in a packet trace),
or nfs4_check_cl_exchange_flags() exchange flags detects
a problem.
Each of these mean that NFSv4.1 later cannot be use, but
they should not prevent fallback to NFSv4.0.
Currently this EINVAL error is returned by nfs4_proc_exchange_id() to
nfs41_discover_server_trunking(), and thence to
nfs4_discover_server_trunking().
nfs4_discover_server_trunking() doesn't understand EINVAL, so converts
it to EIO which causes mount.nfs to think something is horribly wrong
and to give up.
It would be a more graceful failure if nfs4_discover_server_trunking()
mapped -EINVAL to -EPROTONOSUPPORT - a failure to negotiate a client
ID clearly shows that NFSv4.1 cannot be supported, but isn't as
general a failure as EIO.
Signed-off-by: NeilBrown <[email protected]>
---
fs/nfs/nfs4state.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 91a4d4eeb235..7e237bb9c699 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -2219,6 +2219,8 @@ int nfs4_discover_server_trunking(struct nfs_client *clp,
clnt = clp->cl_rpcclient;
goto again;
+ case -NFS4INVAL:
+ /* Server confused - assume this minor isn't supported */
case -NFS4ERR_MINOR_VERS_MISMATCH:
status = -EPROTONOSUPPORT;
break;
--
2.14.0.rc0.dirty
nfs4_proc_exchange_id() can return -EINVAL if the server
reported NFS4INVAL (which I have seen in a packet trace),
or nfs4_check_cl_exchange_flags() exchange flags detects
a problem.
Each of these mean that NFSv4.1 later cannot be use, but
they should not prevent fallback to NFSv4.0.
Currently this EINVAL error is returned by nfs4_proc_exchange_id() to
nfs41_discover_server_trunking(), and thence to
nfs4_discover_server_trunking().
nfs4_discover_server_trunking() doesn't understand EINVAL, so converts
it to EIO which causes mount.nfs to think something is horribly wrong
and to give up.
It would be a more graceful failure if nfs4_discover_server_trunking()
mapped -EINVAL to -EPROTONOSUPPORT - a failure to negotiate a client
ID clearly shows that NFSv4.1 cannot be supported, but isn't as
general a failure as EIO.
Signed-off-by: NeilBrown <[email protected]>
---
Sorry - a bit too trigger-happy with that first version of the patch.
NeilBrown
fs/nfs/nfs4state.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 91a4d4eeb235..b988e460553d 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -2219,6 +2219,8 @@ int nfs4_discover_server_trunking(struct nfs_client *clp,
clnt = clp->cl_rpcclient;
goto again;
+ case -NFS4ERR_INVAL:
+ /* Server confused - assume this minor isn't supported */
case -NFS4ERR_MINOR_VERS_MISMATCH:
status = -EPROTONOSUPPORT;
break;
--
2.14.0.rc0.dirty
Hi Neil,
according to rfc5661, NFS4ERR_INVAL is returned by the server if it
thinks that client sends an invalid request (e.g. points to a client bug)
or server misinterpret it (broken server).
With your change instead of failing the mount, client will silently go for
v4.0, even v4.1 mount was requested and produce undesirable behavior, e.g.
proxy-io instead of pnfs. I fill prefer fail-fast instead of long debug
sessions.
On the other hand, I understand, that it's not always possible to fix server
or clients in production environment and time-to-time workarounds are necessary.
Tigran.
----- Original Message -----
> From: "NeilBrown" <[email protected]>
> To: "Trond Myklebust" <[email protected]>, "Anna Schumaker" <[email protected]>
> Cc: "linux-nfs" <[email protected]>
> Sent: Friday, March 16, 2018 12:44:23 AM
> Subject: [PATCH - v2] NFSv4: handle EINVAL from EXCHANGE_ID better.
> nfs4_proc_exchange_id() can return -EINVAL if the server
> reported NFS4INVAL (which I have seen in a packet trace),
> or nfs4_check_cl_exchange_flags() exchange flags detects
> a problem.
> Each of these mean that NFSv4.1 later cannot be use, but
> they should not prevent fallback to NFSv4.0.
>
> Currently this EINVAL error is returned by nfs4_proc_exchange_id() to
> nfs41_discover_server_trunking(), and thence to
> nfs4_discover_server_trunking().
> nfs4_discover_server_trunking() doesn't understand EINVAL, so converts
> it to EIO which causes mount.nfs to think something is horribly wrong
> and to give up.
>
> It would be a more graceful failure if nfs4_discover_server_trunking()
> mapped -EINVAL to -EPROTONOSUPPORT - a failure to negotiate a client
> ID clearly shows that NFSv4.1 cannot be supported, but isn't as
> general a failure as EIO.
>
> Signed-off-by: NeilBrown <[email protected]>
> ---
>
> Sorry - a bit too trigger-happy with that first version of the patch.
>
> NeilBrown
>
> fs/nfs/nfs4state.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 91a4d4eeb235..b988e460553d 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -2219,6 +2219,8 @@ int nfs4_discover_server_trunking(struct nfs_client *clp,
> clnt = clp->cl_rpcclient;
> goto again;
>
> + case -NFS4ERR_INVAL:
> + /* Server confused - assume this minor isn't supported */
> case -NFS4ERR_MINOR_VERS_MISMATCH:
> status = -EPROTONOSUPPORT;
> break;
> --
> 2.14.0.rc0.dirty
T24gRnJpLCAyMDE4LTAzLTE2IGF0IDEwOjMxICswMTAwLCBNa3J0Y2h5YW4sIFRpZ3JhbiB3cm90
ZToNCj4gSGkgTmVpbCwNCj4gDQo+IGFjY29yZGluZyB0byByZmM1NjYxLCBORlM0RVJSX0lOVkFM
IGlzIHJldHVybmVkIGJ5IHRoZSBzZXJ2ZXIgaWYgaXQNCj4gdGhpbmtzIHRoYXQgY2xpZW50IHNl
bmRzIGFuIGludmFsaWQgcmVxdWVzdCAoZS5nLiBwb2ludHMgdG8gYSBjbGllbnQNCj4gYnVnKQ0K
PiBvciBzZXJ2ZXIgbWlzaW50ZXJwcmV0IGl0IChicm9rZW4gc2VydmVyKS4NCj4gDQo+IFdpdGgg
eW91ciBjaGFuZ2UgaW5zdGVhZCBvZiBmYWlsaW5nIHRoZSBtb3VudCwgY2xpZW50IHdpbGwgc2ls
ZW50bHkNCj4gZ28gZm9yDQo+IHY0LjAsIGV2ZW4gdjQuMSBtb3VudCB3YXMgcmVxdWVzdGVkIGFu
ZCBwcm9kdWNlIHVuZGVzaXJhYmxlIGJlaGF2aW9yLA0KPiBlLmcuDQo+IHByb3h5LWlvIGluc3Rl
YWQgb2YgcG5mcy4gSSBmaWxsIHByZWZlciBmYWlsLWZhc3QgaW5zdGVhZCBvZiBsb25nDQo+IGRl
YnVnDQo+IHNlc3Npb25zLg0KPiANCj4gT24gdGhlIG90aGVyIGhhbmQsIEkgdW5kZXJzdGFuZCwg
dGhhdCBpdCdzIG5vdCBhbHdheXMgcG9zc2libGUgdG8gZml4DQo+IHNlcnZlcg0KPiBvciBjbGll
bnRzIGluIHByb2R1Y3Rpb24gZW52aXJvbm1lbnQgYW5kIHRpbWUtdG8tdGltZSB3b3JrYXJvdW5k
cyBhcmUNCj4gbmVjZXNzYXJ5Lg0KPiANCj4gDQoNCkknZCB0ZW5kIHRvIGFncmVlIHdpdGggVGln
cmFuLiBIaWRpbmcgc2VydmVyIGJ1Z3MsIHNob3VsZCBub3QgYmUgYQ0KcHJpb3JpdHkgYW5kIHBh
cnRpY3VsYXJseSBub3QgaW4gdGhpcyBjYXNlLCB3aGVyZSB0aGUgd29ya2Fyb3VuZCBpcw0Kc2lt
cGxlOiBlaXRoZXIgdHVybiBvZmYgdmVyc2lvbiBuZWdvdGlhdGlvbiBhbHRvZ2V0aGVyLCBvciBl
ZGl0DQovZXRjL25mc21vdW50LmNvbmYgdG8gbmVnb3RpYXRlIGEgZGlmZmVyZW50IHNldCBvZiB2
ZXJzaW9ucy4NCg0KV2hhdCB3ZSBtaWdodCB3YW50IHRvIGRvLCBpcyBtYWtlIGl0IGVhc2llciB0
byBhbGxvdyB0aGUgdXNlciB0byBkZXRlY3QNCnRoYXQgdGhpcyBpcyBpbmRlZWQgYSBzZXJ2ZXIg
YnVnIGFuZCBpcyBub3QgYSBwcm9ibGVtIHdpdGggdGhlDQphcmd1bWVudHMgc3VwcGxpZWQgdG8g
dGhlICJtb3VudCIgdXRpbGl0eS4gUGVyaGFwcyB3ZSBtaWdodCBoYXZlIHRoZQ0Ka2VybmVsIGxv
ZyBzb21ldGhpbmcgaW4gdGhlIHN5c2xvZ3M/DQoNCkNoZWVycw0KICBUcm9uZA0KDQo+IFRpZ3Jh
bi4NCj4gDQo+IA0KPiAtLS0tLSBPcmlnaW5hbCBNZXNzYWdlIC0tLS0tDQo+ID4gRnJvbTogIk5l
aWxCcm93biIgPG5laWxiQHN1c2UuY29tPg0KPiA+IFRvOiAiVHJvbmQgTXlrbGVidXN0IiA8dHJv
bmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbT4sICJBbm5hDQo+ID4gU2NodW1ha2VyIiA8YW5u
YS5zY2h1bWFrZXJAbmV0YXBwLmNvbT4NCj4gPiBDYzogImxpbnV4LW5mcyIgPGxpbnV4LW5mc0B2
Z2VyLmtlcm5lbC5vcmc+DQo+ID4gU2VudDogRnJpZGF5LCBNYXJjaCAxNiwgMjAxOCAxMjo0NDoy
MyBBTQ0KPiA+IFN1YmplY3Q6IFtQQVRDSCAtIHYyXSBORlN2NDogaGFuZGxlIEVJTlZBTCBmcm9t
IEVYQ0hBTkdFX0lEIGJldHRlci4NCj4gPiBuZnM0X3Byb2NfZXhjaGFuZ2VfaWQoKSBjYW4gcmV0
dXJuIC1FSU5WQUwgaWYgdGhlIHNlcnZlcg0KPiA+IHJlcG9ydGVkIE5GUzRJTlZBTCAod2hpY2gg
SSBoYXZlIHNlZW4gaW4gYSBwYWNrZXQgdHJhY2UpLA0KPiA+IG9yIG5mczRfY2hlY2tfY2xfZXhj
aGFuZ2VfZmxhZ3MoKSBleGNoYW5nZSBmbGFncyBkZXRlY3RzDQo+ID4gYSBwcm9ibGVtLg0KPiA+
IEVhY2ggb2YgdGhlc2UgbWVhbiB0aGF0IE5GU3Y0LjEgbGF0ZXIgY2Fubm90IGJlIHVzZSwgYnV0
DQo+ID4gdGhleSBzaG91bGQgbm90IHByZXZlbnQgZmFsbGJhY2sgdG8gTkZTdjQuMC4NCj4gPiAN
Cj4gPiBDdXJyZW50bHkgdGhpcyBFSU5WQUwgZXJyb3IgaXMgcmV0dXJuZWQgYnkgbmZzNF9wcm9j
X2V4Y2hhbmdlX2lkKCkNCj4gPiB0bw0KPiA+IG5mczQxX2Rpc2NvdmVyX3NlcnZlcl90cnVua2lu
ZygpLCBhbmQgdGhlbmNlIHRvDQo+ID4gbmZzNF9kaXNjb3Zlcl9zZXJ2ZXJfdHJ1bmtpbmcoKS4N
Cj4gPiBuZnM0X2Rpc2NvdmVyX3NlcnZlcl90cnVua2luZygpIGRvZXNuJ3QgdW5kZXJzdGFuZCBF
SU5WQUwsIHNvDQo+ID4gY29udmVydHMNCj4gPiBpdCB0byBFSU8gd2hpY2ggY2F1c2VzIG1vdW50
Lm5mcyB0byB0aGluayBzb21ldGhpbmcgaXMgaG9ycmlibHkNCj4gPiB3cm9uZw0KPiA+IGFuZCB0
byBnaXZlIHVwLg0KPiA+IA0KPiA+IEl0IHdvdWxkIGJlIGEgbW9yZSBncmFjZWZ1bCBmYWlsdXJl
IGlmDQo+ID4gbmZzNF9kaXNjb3Zlcl9zZXJ2ZXJfdHJ1bmtpbmcoKQ0KPiA+IG1hcHBlZCAtRUlO
VkFMIHRvIC1FUFJPVE9OT1NVUFBPUlQgLSBhIGZhaWx1cmUgdG8gbmVnb3RpYXRlIGENCj4gPiBj
bGllbnQNCj4gPiBJRCBjbGVhcmx5IHNob3dzIHRoYXQgTkZTdjQuMSBjYW5ub3QgYmUgc3VwcG9y
dGVkLCBidXQgaXNuJ3QgYXMNCj4gPiBnZW5lcmFsIGEgZmFpbHVyZSBhcyBFSU8uDQo+ID4gDQo+
ID4gU2lnbmVkLW9mZi1ieTogTmVpbEJyb3duIDxuZWlsYkBzdXNlLmNvbT4NCj4gPiAtLS0NCj4g
PiANCj4gPiBTb3JyeSAtIGEgYml0IHRvbyB0cmlnZ2VyLWhhcHB5IHdpdGggdGhhdCBmaXJzdCB2
ZXJzaW9uIG9mIHRoZQ0KPiA+IHBhdGNoLg0KPiA+IA0KPiA+IE5laWxCcm93bg0KPiA+IA0KPiA+
IGZzL25mcy9uZnM0c3RhdGUuYyB8IDIgKysNCj4gPiAxIGZpbGUgY2hhbmdlZCwgMiBpbnNlcnRp
b25zKCspDQo+ID4gDQo+ID4gZGlmZiAtLWdpdCBhL2ZzL25mcy9uZnM0c3RhdGUuYyBiL2ZzL25m
cy9uZnM0c3RhdGUuYw0KPiA+IGluZGV4IDkxYTRkNGVlYjIzNS4uYjk4OGU0NjA1NTNkIDEwMDY0
NA0KPiA+IC0tLSBhL2ZzL25mcy9uZnM0c3RhdGUuYw0KPiA+ICsrKyBiL2ZzL25mcy9uZnM0c3Rh
dGUuYw0KPiA+IEBAIC0yMjE5LDYgKzIyMTksOCBAQCBpbnQgbmZzNF9kaXNjb3Zlcl9zZXJ2ZXJf
dHJ1bmtpbmcoc3RydWN0DQo+ID4gbmZzX2NsaWVudCAqY2xwLA0KPiA+IAkJY2xudCA9IGNscC0+
Y2xfcnBjY2xpZW50Ow0KPiA+IAkJZ290byBhZ2FpbjsNCj4gPiANCj4gPiArCWNhc2UgLU5GUzRF
UlJfSU5WQUw6DQo+ID4gKwkJLyogU2VydmVyIGNvbmZ1c2VkIC0gYXNzdW1lIHRoaXMgbWlub3Ig
aXNuJ3QNCj4gPiBzdXBwb3J0ZWQgKi8NCj4gPiAJY2FzZSAtTkZTNEVSUl9NSU5PUl9WRVJTX01J
U01BVENIOg0KPiA+IAkJc3RhdHVzID0gLUVQUk9UT05PU1VQUE9SVDsNCj4gPiAJCWJyZWFrOw0K
PiA+IC0tDQo+ID4gMi4xNC4wLnJjMC5kaXJ0eQ0KPiANCj4gLS0NCj4gVG8gdW5zdWJzY3JpYmUg
ZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2NyaWJlIGxpbnV4LW5mcyINCj4g
aW4NCj4gdGhlIGJvZHkgb2YgYSBtZXNzYWdlIHRvIG1ham9yZG9tb0B2Z2VyLmtlcm5lbC5vcmcN
Cj4gTW9yZSBtYWpvcmRvbW8gaW5mbyBhdCAgaHR0cDovL3ZnZXIua2VybmVsLm9yZy9tYWpvcmRv
bW8taW5mby5odG1sDQo+IA0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQg
bWFpbnRhaW5lciwgUHJpbWFyeURhdGENCnRyb25kLm15a2xlYnVzdEBwcmltYXJ5ZGF0YS5jb20N
Cg==
On Fri, Mar 16 2018, Mkrtchyan, Tigran wrote:
> Hi Neil,
>
> according to rfc5661, NFS4ERR_INVAL is returned by the server if it
> thinks that client sends an invalid request (e.g. points to a client bug)
> or server misinterpret it (broken server).
Agreed.
>
> With your change instead of failing the mount, client will silently go for
> v4.0, even v4.1 mount was requested and produce undesirable behavior, e.g.
> proxy-io instead of pnfs. I fill prefer fail-fast instead of long debug
> sessions.
I don't quite agree. With my change, the client will behave exactly the
same way as if the server explicitly didn't support v4.1
So if you explicitly ask for v4.1, you will get a fail-fast.
If you ask for "mount with whatever protocol seems to work" (the
default), then you will get a protocol that works - not 4.1 in this
case.
>
> On the other hand, I understand, that it's not always possible to fix server
> or clients in production environment and time-to-time workarounds are
> necessary.
Yes. We are not overly eager to work-around broken implementations, but
my memory is that we do it when it brings measurable benefits without
unreasonable compromises.
In this can I am suggesting that change that results in a user-visible
error of EPROTONOSUPPORT instead of EIO - in a case where the server
doesn't respond to out v4.1 in the way we think it should. I don't
think there is any compromise there.
Thanks,
NeilBrown
>
>
> Tigran.
>
>
> ----- Original Message -----
>> From: "NeilBrown" <[email protected]>
>> To: "Trond Myklebust" <[email protected]>, "Anna Schumaker" <[email protected]>
>> Cc: "linux-nfs" <[email protected]>
>> Sent: Friday, March 16, 2018 12:44:23 AM
>> Subject: [PATCH - v2] NFSv4: handle EINVAL from EXCHANGE_ID better.
>
>> nfs4_proc_exchange_id() can return -EINVAL if the server
>> reported NFS4INVAL (which I have seen in a packet trace),
>> or nfs4_check_cl_exchange_flags() exchange flags detects
>> a problem.
>> Each of these mean that NFSv4.1 later cannot be use, but
>> they should not prevent fallback to NFSv4.0.
>>
>> Currently this EINVAL error is returned by nfs4_proc_exchange_id() to
>> nfs41_discover_server_trunking(), and thence to
>> nfs4_discover_server_trunking().
>> nfs4_discover_server_trunking() doesn't understand EINVAL, so converts
>> it to EIO which causes mount.nfs to think something is horribly wrong
>> and to give up.
>>
>> It would be a more graceful failure if nfs4_discover_server_trunking()
>> mapped -EINVAL to -EPROTONOSUPPORT - a failure to negotiate a client
>> ID clearly shows that NFSv4.1 cannot be supported, but isn't as
>> general a failure as EIO.
>>
>> Signed-off-by: NeilBrown <[email protected]>
>> ---
>>
>> Sorry - a bit too trigger-happy with that first version of the patch.
>>
>> NeilBrown
>>
>> fs/nfs/nfs4state.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>> index 91a4d4eeb235..b988e460553d 100644
>> --- a/fs/nfs/nfs4state.c
>> +++ b/fs/nfs/nfs4state.c
>> @@ -2219,6 +2219,8 @@ int nfs4_discover_server_trunking(struct nfs_client *clp,
>> clnt = clp->cl_rpcclient;
>> goto again;
>>
>> + case -NFS4ERR_INVAL:
>> + /* Server confused - assume this minor isn't supported */
>> case -NFS4ERR_MINOR_VERS_MISMATCH:
>> status = -EPROTONOSUPPORT;
>> break;
>> --
>> 2.14.0.rc0.dirty
On Fri, Mar 16 2018, Trond Myklebust wrote:
> On Fri, 2018-03-16 at 10:31 +0100, Mkrtchyan, Tigran wrote:
>> Hi Neil,
>>
>> according to rfc5661, NFS4ERR_INVAL is returned by the server if it
>> thinks that client sends an invalid request (e.g. points to a client
>> bug)
>> or server misinterpret it (broken server).
>>
>> With your change instead of failing the mount, client will silently
>> go for
>> v4.0, even v4.1 mount was requested and produce undesirable behavior,
>> e.g.
>> proxy-io instead of pnfs. I fill prefer fail-fast instead of long
>> debug
>> sessions.
>>
>> On the other hand, I understand, that it's not always possible to fix
>> server
>> or clients in production environment and time-to-time workarounds are
>> necessary.
>>
>>
>
> I'd tend to agree with Tigran. Hiding server bugs, should not be a
> priority and particularly not in this case, where the workaround is
> simple: either turn off version negotiation altogether, or edit
> /etc/nfsmount.conf to negotiate a different set of versions.
Yes, it could be worked-around in nfsmount.conf, but manual
configuration should be seen as an optimization or a last resort. If
we can make things work without configuration, that provides the best
experience.
In this case, the kernel has strong evidence that the server
isn't responding as expected, but it gives an unhelpful error message.
At the very least, nfs4_discover_server_trunking() should not treat
-NFS4ERR_INVAL as unexpect (because there is code in
nfs4_check_cl_exchange_flags which explicitly generates it).
If it just let this error through, instead of translating it to EIO,
then the problem would go away.
>
> What we might want to do, is make it easier to allow the user to detect
> that this is indeed a server bug and is not a problem with the
> arguments supplied to the "mount" utility. Perhaps we might have the
> kernel log something in the syslogs?
Yes, logging a message might be useful. Most of the messages logged
about bad servers are currently going through dprintk(), so they won't
often be seen. Is that what we want?? Don't know...
Anyway, you point that it "is not a problem with the arguments" is
stop-on. If the client gets EINVAL from the server, then it shouldn't
blindly report that back to the user as EINVAL means "Invalid
argument" and the argements given to the server are probably not the
argument given by the user.
Following that line of reasoning, I think nfs4_check_cl_exchange_flag()
should *not* return -NFS4ERR_INVAL, and _nfs4_proc_exchange_id()
shouldn't pass NFS4ERR_INVAL through unchanged.
So I propose the following version.
Thanks,
NeilBrown
------------------------8<---------------------------
From: NeilBrown <[email protected]>
Date: Tue, 20 Mar 2018 11:31:33 +1100
Subject: [PATCH] NFSv4: handle EINVAL from EXCHANGE_ID better.
nfs4_proc_exchange_id() can return -EINVAL if the server
reported NFS4INVAL (which I have seen in a packet trace),
or nfs4_check_cl_exchange_flags() exchange flags detects
a problem.
Each of these mean that NFSv4.1 and later cannot be used,
but they should not prevent fallback to NFSv4.0. Currently
they do.
Currently this EINVAL error is returned by
nfs4_proc_exchange_id() to nfs41_discover_server_trunking(),
and thence to nfs4_discover_server_trunking().
nfs4_discover_server_trunking() doesn't understand EINVAL,
so converts it to EIO which causes mount.nfs to think
something is horribly wrong and to give up.
EINVAL is never a sensible error code here. It means "Invalid
argument", but is being used when the problem is "Invalid response
from the server". If we change these two circumstances to report
EPROTONOSUPPORT to the caller (which seems a reasonable assessment
when the server gives confusing responses), and if we enhance
nfs4_discover_server_trunking() to treat -EPROTONOSUPPORT as an
expected error to pass through, then the error reported to user-space
will be more representative of the actual fault.
A failure to negotiate a client ID clearly shows that NFSv4.1 cannot
be supported, but isn't as general a failure as EIO.
Signed-off-by: NeilBrown <[email protected]>
---
fs/nfs/nfs4proc.c | 18 +++++++++++++++---
fs/nfs/nfs4state.c | 1 +
2 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 47f3c273245e..97757f646f13 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7364,7 +7364,8 @@ static int nfs4_check_cl_exchange_flags(u32 flags)
goto out_inval;
return NFS_OK;
out_inval:
- return -NFS4ERR_INVAL;
+ dprintk("NFS: server returns invalid flags for EXCHANGE_ID\n");
+ return -EPROTONOSUPPORT;
}
static bool
@@ -7741,8 +7742,19 @@ static int _nfs4_proc_exchange_id(struct nfs_client *clp, struct rpc_cred *cred,
int status;
task = nfs4_run_exchange_id(clp, cred, sp4_how, NULL);
- if (IS_ERR(task))
- return PTR_ERR(task);
+ if (IS_ERR(task)) {
+ status = PTR_ERR(task);
+ if (status == -NFS4ERR_INVAL) {
+ /* If the server think we did something invalid, it is certainly
+ * not the fault of our caller, so it would wrong to report
+ * this error back up. So in that case simply acknowledge that
+ * we don't seem able to support this protocol.
+ */
+ dprintk("NFS: server return NFS4ERR_INVAL to EXCHANGE_ID\n");
+ status = -EPROTONOSUPPORT;
+ }
+ return status;
+ }
argp = task->tk_msg.rpc_argp;
resp = task->tk_msg.rpc_resp;
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 91a4d4eeb235..273c032089c4 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -2219,6 +2219,7 @@ int nfs4_discover_server_trunking(struct nfs_client *clp,
clnt = clp->cl_rpcclient;
goto again;
+ case -EPROTONOSUPPORT:
case -NFS4ERR_MINOR_VERS_MISMATCH:
status = -EPROTONOSUPPORT;
break;
--
2.14.0.rc0.dirty
On Tue, 2018-03-20 at 11:32 +1100, NeilBrown wrote:
> On Fri, Mar 16 2018, Trond Myklebust wrote:
>
> > On Fri, 2018-03-16 at 10:31 +0100, Mkrtchyan, Tigran wrote:
> > > Hi Neil,
> > >
> > > according to rfc5661, NFS4ERR_INVAL is returned by the server if
> > > it
> > > thinks that client sends an invalid request (e.g. points to a
> > > client
> > > bug)
> > > or server misinterpret it (broken server).
> > >
> > > With your change instead of failing the mount, client will
> > > silently
> > > go for
> > > v4.0, even v4.1 mount was requested and produce undesirable
> > > behavior,
> > > e.g.
> > > proxy-io instead of pnfs. I fill prefer fail-fast instead of long
> > > debug
> > > sessions.
> > >
> > > On the other hand, I understand, that it's not always possible to
> > > fix
> > > server
> > > or clients in production environment and time-to-time workarounds
> > > are
> > > necessary.
> > >
> > >
> >
> > I'd tend to agree with Tigran. Hiding server bugs, should not be a
> > priority and particularly not in this case, where the workaround is
> > simple: either turn off version negotiation altogether, or edit
> > /etc/nfsmount.conf to negotiate a different set of versions.
>
> Yes, it could be worked-around in nfsmount.conf, but manual
> configuration should be seen as an optimization or a last
> resort. If
> we can make things work without configuration, that provides the best
> experience.
> In this case, the kernel has strong evidence that the server
> isn't responding as expected, but it gives an unhelpful error
> message.
Server do not spontaneously break their ability to process NFSv4
operations and so this is not an issue that we need to worry about in
ordinary operation. It should only ever be an issue when
1) An insufficiently tested and broken upgrade is applied to an
existing server, in which case the main workaround should be to revert
the upgrade until it can be fixed.
2) A completely new broken server is introduced to the system.
> At the very least, nfs4_discover_server_trunking() should not treat
> -NFS4ERR_INVAL as unexpect (because there is code in
> nfs4_check_cl_exchange_flags which explicitly generates it).
> If it just let this error through, instead of translating it to EIO,
> then the problem would go away.
The code in nfs4_check_cl_exchange_flags is there to check for
explicit NFSv4.1 protocol violations. Is it broken?
> >
> > What we might want to do, is make it easier to allow the user to
> > detect
> > that this is indeed a server bug and is not a problem with the
> > arguments supplied to the "mount" utility. Perhaps we might have
> > the
> > kernel log something in the syslogs?
>
> Yes, logging a message might be useful. Most of the messages logged
> about bad servers are currently going through dprintk(), so they
> won't
> often be seen. Is that what we want?? Don't know...
>
> Anyway, you point that it "is not a problem with the arguments" is
> stop-on. If the client gets EINVAL from the server, then it
> shouldn't
> blindly report that back to the user as EINVAL means "Invalid
> argument" and the argements given to the server are probably not the
> argument given by the user.
>
> Following that line of reasoning, I think
> nfs4_check_cl_exchange_flag()
> should *not* return -NFS4ERR_INVAL, and _nfs4_proc_exchange_id()
> shouldn't pass NFS4ERR_INVAL through unchanged.
>
> So I propose the following version.
>
> Thanks,
> NeilBrown
>
> ------------------------8<---------------------------
> From: NeilBrown <[email protected]>
> Date: Tue, 20 Mar 2018 11:31:33 +1100
> Subject: [PATCH] NFSv4: handle EINVAL from EXCHANGE_ID better.
>
> nfs4_proc_exchange_id() can return -EINVAL if the server
> reported NFS4INVAL (which I have seen in a packet trace),
> or nfs4_check_cl_exchange_flags() exchange flags detects
> a problem.
> Each of these mean that NFSv4.1 and later cannot be used,
> but they should not prevent fallback to NFSv4.0. Currently
> they do.
>
> Currently this EINVAL error is returned by
> nfs4_proc_exchange_id() to nfs41_discover_server_trunking(),
> and thence to nfs4_discover_server_trunking().
> nfs4_discover_server_trunking() doesn't understand EINVAL,
> so converts it to EIO which causes mount.nfs to think
> something is horribly wrong and to give up.
>
> EINVAL is never a sensible error code here. It means "Invalid
> argument", but is being used when the problem is "Invalid response
> from the server". If we change these two circumstances to report
> EPROTONOSUPPORT to the caller (which seems a reasonable assessment
> when the server gives confusing responses), and if we enhance
> nfs4_discover_server_trunking() to treat -EPROTONOSUPPORT as an
> expected error to pass through, then the error reported to user-space
> will be more representative of the actual fault.
>
> A failure to negotiate a client ID clearly shows that NFSv4.1 cannot
> be supported, but isn't as general a failure as EIO.
>
> Signed-off-by: NeilBrown <[email protected]>
> ---
> fs/nfs/nfs4proc.c | 18 +++++++++++++++---
> fs/nfs/nfs4state.c | 1 +
> 2 files changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 47f3c273245e..97757f646f13 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -7364,7 +7364,8 @@ static int nfs4_check_cl_exchange_flags(u32
> flags)
> goto out_inval;
> return NFS_OK;
> out_inval:
> - return -NFS4ERR_INVAL;
> + dprintk("NFS: server returns invalid flags for
> EXCHANGE_ID\n");
> + return -EPROTONOSUPPORT;
> }
>
> static bool
> @@ -7741,8 +7742,19 @@ static int _nfs4_proc_exchange_id(struct
> nfs_client *clp, struct rpc_cred *cred,
> int status;
>
> task = nfs4_run_exchange_id(clp, cred, sp4_how, NULL);
> - if (IS_ERR(task))
> - return PTR_ERR(task);
> + if (IS_ERR(task)) {
> + status = PTR_ERR(task);
> + if (status == -NFS4ERR_INVAL) {
> + /* If the server think we did something
> invalid, it is certainly
> + * not the fault of our caller, so it would
> wrong to report
> + * this error back up. So in that case
> simply acknowledge that
> + * we don't seem able to support this
> protocol.
> + */
> + dprintk("NFS: server return NFS4ERR_INVAL to
> EXCHANGE_ID\n");
> + status = -EPROTONOSUPPORT;
> + }
> + return status;
> + }
>
> argp = task->tk_msg.rpc_argp;
> resp = task->tk_msg.rpc_resp;
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 91a4d4eeb235..273c032089c4 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -2219,6 +2219,7 @@ int nfs4_discover_server_trunking(struct
> nfs_client *clp,
> clnt = clp->cl_rpcclient;
> goto again;
>
> + case -EPROTONOSUPPORT:
> case -NFS4ERR_MINOR_VERS_MISMATCH:
> status = -EPROTONOSUPPORT;
> break;
--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]
On Tue, Mar 20 2018, Trond Myklebust wrote:
> On Tue, 2018-03-20 at 11:32 +1100, NeilBrown wrote:
>> On Fri, Mar 16 2018, Trond Myklebust wrote:
>>
>> > On Fri, 2018-03-16 at 10:31 +0100, Mkrtchyan, Tigran wrote:
>> > > Hi Neil,
>> > >
>> > > according to rfc5661, NFS4ERR_INVAL is returned by the server if
>> > > it
>> > > thinks that client sends an invalid request (e.g. points to a
>> > > client
>> > > bug)
>> > > or server misinterpret it (broken server).
>> > >
>> > > With your change instead of failing the mount, client will
>> > > silently
>> > > go for
>> > > v4.0, even v4.1 mount was requested and produce undesirable
>> > > behavior,
>> > > e.g.
>> > > proxy-io instead of pnfs. I fill prefer fail-fast instead of long
>> > > debug
>> > > sessions.
>> > >
>> > > On the other hand, I understand, that it's not always possible to
>> > > fix
>> > > server
>> > > or clients in production environment and time-to-time workarounds
>> > > are
>> > > necessary.
>> > >
>> > >
>> >
>> > I'd tend to agree with Tigran. Hiding server bugs, should not be a
>> > priority and particularly not in this case, where the workaround is
>> > simple: either turn off version negotiation altogether, or edit
>> > /etc/nfsmount.conf to negotiate a different set of versions.
>>
>> Yes, it could be worked-around in nfsmount.conf, but manual
>> configuration should be seen as an optimization or a last
>> resort. If
>> we can make things work without configuration, that provides the best
>> experience.
>> In this case, the kernel has strong evidence that the server
>> isn't responding as expected, but it gives an unhelpful error
>> message.
>
> Server do not spontaneously break their ability to process NFSv4
> operations and so this is not an issue that we need to worry about in
> ordinary operation. It should only ever be an issue when
>
> 1) An insufficiently tested and broken upgrade is applied to an
> existing server, in which case the main workaround should be to revert
> the upgrade until it can be fixed.
> 2) A completely new broken server is introduced to the system.
3) An upgrade to the client defaults to trying 4.2 first, then 4.1 and
only then 4.0. Previous client defaulted to 4.0.
>
>> At the very least, nfs4_discover_server_trunking() should not treat
>> -NFS4ERR_INVAL as unexpect (because there is code in
>> nfs4_check_cl_exchange_flags which explicitly generates it).
>> If it just let this error through, instead of translating it to EIO,
>> then the problem would go away.
>
> The code in nfs4_check_cl_exchange_flags is there to check for
> explicit NFSv4.1 protocol violations. Is it broken?
It is broken in that it reports -EINVAL when no arguments were invalid.
This gets translated to -EIO when there was no IO error.
Thanks,
NeilBrown
>
>> >
>> > What we might want to do, is make it easier to allow the user to
>> > detect
>> > that this is indeed a server bug and is not a problem with the
>> > arguments supplied to the "mount" utility. Perhaps we might have
>> > the
>> > kernel log something in the syslogs?
>>
>> Yes, logging a message might be useful. Most of the messages logged
>> about bad servers are currently going through dprintk(), so they
>> won't
>> often be seen. Is that what we want?? Don't know...
>>
>> Anyway, you point that it "is not a problem with the arguments" is
>> stop-on. If the client gets EINVAL from the server, then it
>> shouldn't
>> blindly report that back to the user as EINVAL means "Invalid
>> argument" and the argements given to the server are probably not the
>> argument given by the user.
>>
>> Following that line of reasoning, I think
>> nfs4_check_cl_exchange_flag()
>> should *not* return -NFS4ERR_INVAL, and _nfs4_proc_exchange_id()
>> shouldn't pass NFS4ERR_INVAL through unchanged.
>>
>> So I propose the following version.
>>
>> Thanks,
>> NeilBrown
>>
>> ------------------------8<---------------------------
>> From: NeilBrown <[email protected]>
>> Date: Tue, 20 Mar 2018 11:31:33 +1100
>> Subject: [PATCH] NFSv4: handle EINVAL from EXCHANGE_ID better.
>>
>> nfs4_proc_exchange_id() can return -EINVAL if the server
>> reported NFS4INVAL (which I have seen in a packet trace),
>> or nfs4_check_cl_exchange_flags() exchange flags detects
>> a problem.
>> Each of these mean that NFSv4.1 and later cannot be used,
>> but they should not prevent fallback to NFSv4.0. Currently
>> they do.
>>
>> Currently this EINVAL error is returned by
>> nfs4_proc_exchange_id() to nfs41_discover_server_trunking(),
>> and thence to nfs4_discover_server_trunking().
>> nfs4_discover_server_trunking() doesn't understand EINVAL,
>> so converts it to EIO which causes mount.nfs to think
>> something is horribly wrong and to give up.
>>
>> EINVAL is never a sensible error code here. It means "Invalid
>> argument", but is being used when the problem is "Invalid response
>> from the server". If we change these two circumstances to report
>> EPROTONOSUPPORT to the caller (which seems a reasonable assessment
>> when the server gives confusing responses), and if we enhance
>> nfs4_discover_server_trunking() to treat -EPROTONOSUPPORT as an
>> expected error to pass through, then the error reported to user-space
>> will be more representative of the actual fault.
>>
>> A failure to negotiate a client ID clearly shows that NFSv4.1 cannot
>> be supported, but isn't as general a failure as EIO.
>>
>> Signed-off-by: NeilBrown <[email protected]>
>> ---
>> fs/nfs/nfs4proc.c | 18 +++++++++++++++---
>> fs/nfs/nfs4state.c | 1 +
>> 2 files changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index 47f3c273245e..97757f646f13 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -7364,7 +7364,8 @@ static int nfs4_check_cl_exchange_flags(u32
>> flags)
>> goto out_inval;
>> return NFS_OK;
>> out_inval:
>> - return -NFS4ERR_INVAL;
>> + dprintk("NFS: server returns invalid flags for
>> EXCHANGE_ID\n");
>> + return -EPROTONOSUPPORT;
>> }
>>
>> static bool
>> @@ -7741,8 +7742,19 @@ static int _nfs4_proc_exchange_id(struct
>> nfs_client *clp, struct rpc_cred *cred,
>> int status;
>>
>> task = nfs4_run_exchange_id(clp, cred, sp4_how, NULL);
>> - if (IS_ERR(task))
>> - return PTR_ERR(task);
>> + if (IS_ERR(task)) {
>> + status = PTR_ERR(task);
>> + if (status == -NFS4ERR_INVAL) {
>> + /* If the server think we did something
>> invalid, it is certainly
>> + * not the fault of our caller, so it would
>> wrong to report
>> + * this error back up. So in that case
>> simply acknowledge that
>> + * we don't seem able to support this
>> protocol.
>> + */
>> + dprintk("NFS: server return NFS4ERR_INVAL to
>> EXCHANGE_ID\n");
>> + status = -EPROTONOSUPPORT;
>> + }
>> + return status;
>> + }
>>
>> argp = task->tk_msg.rpc_argp;
>> resp = task->tk_msg.rpc_resp;
>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>> index 91a4d4eeb235..273c032089c4 100644
>> --- a/fs/nfs/nfs4state.c
>> +++ b/fs/nfs/nfs4state.c
>> @@ -2219,6 +2219,7 @@ int nfs4_discover_server_trunking(struct
>> nfs_client *clp,
>> clnt = clp->cl_rpcclient;
>> goto again;
>>
>> + case -EPROTONOSUPPORT:
>> case -NFS4ERR_MINOR_VERS_MISMATCH:
>> status = -EPROTONOSUPPORT;
>> break;
> --
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> [email protected]
On Wed, 2018-03-21 at 05:44 +1100, NeilBrown wrote:
> On Tue, Mar 20 2018, Trond Myklebust wrote:
>
> > On Tue, 2018-03-20 at 11:32 +1100, NeilBrown wrote:
> > > On Fri, Mar 16 2018, Trond Myklebust wrote:
> > >
> > > > On Fri, 2018-03-16 at 10:31 +0100, Mkrtchyan, Tigran wrote:
> > > > > Hi Neil,
> > > > >
> > > > > according to rfc5661, NFS4ERR_INVAL is returned by the server
> > > > > if
> > > > > it
> > > > > thinks that client sends an invalid request (e.g. points to a
> > > > > client
> > > > > bug)
> > > > > or server misinterpret it (broken server).
> > > > >
> > > > > With your change instead of failing the mount, client will
> > > > > silently
> > > > > go for
> > > > > v4.0, even v4.1 mount was requested and produce undesirable
> > > > > behavior,
> > > > > e.g.
> > > > > proxy-io instead of pnfs. I fill prefer fail-fast instead of
> > > > > long
> > > > > debug
> > > > > sessions.
> > > > >
> > > > > On the other hand, I understand, that it's not always
> > > > > possible to
> > > > > fix
> > > > > server
> > > > > or clients in production environment and time-to-time
> > > > > workarounds
> > > > > are
> > > > > necessary.
> > > > >
> > > > >
> > > >
> > > > I'd tend to agree with Tigran. Hiding server bugs, should not
> > > > be a
> > > > priority and particularly not in this case, where the
> > > > workaround is
> > > > simple: either turn off version negotiation altogether, or edit
> > > > /etc/nfsmount.conf to negotiate a different set of versions.
> > >
> > > Yes, it could be worked-around in nfsmount.conf, but manual
> > > configuration should be seen as an optimization or a last
> > > resort. If
> > > we can make things work without configuration, that provides the
> > > best
> > > experience.
> > > In this case, the kernel has strong evidence that the server
> > > isn't responding as expected, but it gives an unhelpful error
> > > message.
> >
> > Server do not spontaneously break their ability to process NFSv4
> > operations and so this is not an issue that we need to worry about
> > in
> > ordinary operation. It should only ever be an issue when
> >
> > 1) An insufficiently tested and broken upgrade is applied to an
> > existing server, in which case the main workaround should be to
> > revert
> > the upgrade until it can be fixed.
> > 2) A completely new broken server is introduced to the system.
>
> 3) An upgrade to the client defaults to trying 4.2 first, then 4.1
> and
> only then 4.0. Previous client defaulted to 4.0.
I'm sorry, but this still does not sounds like a good case for "fix the
kernel client". The kernel has no opinion on which NFS version you
should try first in a mount attempt: that decision is made in
userspace.
If you upgraded the nfs-utils to something that now tries 4.2 first,
then that is a user space policy issue.
> >
> > > At the very least, nfs4_discover_server_trunking() should not
> > > treat
> > > -NFS4ERR_INVAL as unexpect (because there is code in
> > > nfs4_check_cl_exchange_flags which explicitly generates it).
> > > If it just let this error through, instead of translating it to
> > > EIO,
> > > then the problem would go away.
> >
> > The code in nfs4_check_cl_exchange_flags is there to check for
> > explicit NFSv4.1 protocol violations. Is it broken?
>
> It is broken in that it reports -EINVAL when no arguments were
> invalid.
> This gets translated to -EIO when there was no IO error.
The purpose of that function is to ensure that the server is
advertising itself as a bona fide NFSv4.1 or newer server, and to
ensure that it is not replying to our EXCHANGE_ID request with some
flag or service that we did not expect and that might cause us to
break.
IOW: it is there to check protocol compliance. As far as I can tell, it
is doing that, and doing so correctly.
> Thanks,
> NeilBrown
>
>
> >
> > > >
> > > > What we might want to do, is make it easier to allow the user
> > > > to
> > > > detect
> > > > that this is indeed a server bug and is not a problem with the
> > > > arguments supplied to the "mount" utility. Perhaps we might
> > > > have
> > > > the
> > > > kernel log something in the syslogs?
> > >
> > > Yes, logging a message might be useful. Most of the messages
> > > logged
> > > about bad servers are currently going through dprintk(), so they
> > > won't
> > > often be seen. Is that what we want?? Don't know...
> > >
> > > Anyway, you point that it "is not a problem with the arguments"
> > > is
> > > stop-on. If the client gets EINVAL from the server, then it
> > > shouldn't
> > > blindly report that back to the user as EINVAL means "Invalid
> > > argument" and the argements given to the server are probably not
> > > the
> > > argument given by the user.
> > >
> > > Following that line of reasoning, I think
> > > nfs4_check_cl_exchange_flag()
> > > should *not* return -NFS4ERR_INVAL, and _nfs4_proc_exchange_id()
> > > shouldn't pass NFS4ERR_INVAL through unchanged.
> > >
> > > So I propose the following version.
> > >
> > > Thanks,
> > > NeilBrown
> > >
> > > ------------------------8<---------------------------
> > > From: NeilBrown <[email protected]>
> > > Date: Tue, 20 Mar 2018 11:31:33 +1100
> > > Subject: [PATCH] NFSv4: handle EINVAL from EXCHANGE_ID better.
> > >
> > > nfs4_proc_exchange_id() can return -EINVAL if the server
> > > reported NFS4INVAL (which I have seen in a packet trace),
> > > or nfs4_check_cl_exchange_flags() exchange flags detects
> > > a problem.
> > > Each of these mean that NFSv4.1 and later cannot be used,
> > > but they should not prevent fallback to NFSv4.0. Currently
> > > they do.
> > >
> > > Currently this EINVAL error is returned by
> > > nfs4_proc_exchange_id() to nfs41_discover_server_trunking(),
> > > and thence to nfs4_discover_server_trunking().
> > > nfs4_discover_server_trunking() doesn't understand EINVAL,
> > > so converts it to EIO which causes mount.nfs to think
> > > something is horribly wrong and to give up.
> > >
> > > EINVAL is never a sensible error code here. It means "Invalid
> > > argument", but is being used when the problem is "Invalid
> > > response
> > > from the server". If we change these two circumstances to report
> > > EPROTONOSUPPORT to the caller (which seems a reasonable
> > > assessment
> > > when the server gives confusing responses), and if we enhance
> > > nfs4_discover_server_trunking() to treat -EPROTONOSUPPORT as an
> > > expected error to pass through, then the error reported to user-
> > > space
> > > will be more representative of the actual fault.
> > >
> > > A failure to negotiate a client ID clearly shows that NFSv4.1
> > > cannot
> > > be supported, but isn't as general a failure as EIO.
> > >
> > > Signed-off-by: NeilBrown <[email protected]>
> > > ---
> > > fs/nfs/nfs4proc.c | 18 +++++++++++++++---
> > > fs/nfs/nfs4state.c | 1 +
> > > 2 files changed, 16 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > > index 47f3c273245e..97757f646f13 100644
> > > --- a/fs/nfs/nfs4proc.c
> > > +++ b/fs/nfs/nfs4proc.c
> > > @@ -7364,7 +7364,8 @@ static int nfs4_check_cl_exchange_flags(u32
> > > flags)
> > > goto out_inval;
> > > return NFS_OK;
> > > out_inval:
> > > - return -NFS4ERR_INVAL;
> > > + dprintk("NFS: server returns invalid flags for
> > > EXCHANGE_ID\n");
> > > + return -EPROTONOSUPPORT;
> > > }
> > >
> > > static bool
> > > @@ -7741,8 +7742,19 @@ static int _nfs4_proc_exchange_id(struct
> > > nfs_client *clp, struct rpc_cred *cred,
> > > int status;
> > >
> > > task = nfs4_run_exchange_id(clp, cred, sp4_how, NULL);
> > > - if (IS_ERR(task))
> > > - return PTR_ERR(task);
> > > + if (IS_ERR(task)) {
> > > + status = PTR_ERR(task);
> > > + if (status == -NFS4ERR_INVAL) {
> > > + /* If the server think we did something
> > > invalid, it is certainly
> > > + * not the fault of our caller, so it
> > > would
> > > wrong to report
> > > + * this error back up. So in that case
> > > simply acknowledge that
> > > + * we don't seem able to support this
> > > protocol.
> > > + */
> > > + dprintk("NFS: server return
> > > NFS4ERR_INVAL to
> > > EXCHANGE_ID\n");
> > > + status = -EPROTONOSUPPORT;
> > > + }
> > > + return status;
> > > + }
> > >
> > > argp = task->tk_msg.rpc_argp;
> > > resp = task->tk_msg.rpc_resp;
> > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > > index 91a4d4eeb235..273c032089c4 100644
> > > --- a/fs/nfs/nfs4state.c
> > > +++ b/fs/nfs/nfs4state.c
> > > @@ -2219,6 +2219,7 @@ int nfs4_discover_server_trunking(struct
> > > nfs_client *clp,
> > > clnt = clp->cl_rpcclient;
> > > goto again;
> > >
> > > + case -EPROTONOSUPPORT:
> > > case -NFS4ERR_MINOR_VERS_MISMATCH:
> > > status = -EPROTONOSUPPORT;
> > > break;
> >
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, PrimaryData
> > [email protected]
--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]
On Tue, Mar 20 2018, Trond Myklebust wrote:
> On Wed, 2018-03-21 at 05:44 +1100, NeilBrown wrote:
>> On Tue, Mar 20 2018, Trond Myklebust wrote:
>>
>> > On Tue, 2018-03-20 at 11:32 +1100, NeilBrown wrote:
>> > > On Fri, Mar 16 2018, Trond Myklebust wrote:
>> > >
>> > > > On Fri, 2018-03-16 at 10:31 +0100, Mkrtchyan, Tigran wrote:
>> > > > > Hi Neil,
>> > > > >
>> > > > > according to rfc5661, NFS4ERR_INVAL is returned by the server
>> > > > > if
>> > > > > it
>> > > > > thinks that client sends an invalid request (e.g. points to a
>> > > > > client
>> > > > > bug)
>> > > > > or server misinterpret it (broken server).
>> > > > >
>> > > > > With your change instead of failing the mount, client will
>> > > > > silently
>> > > > > go for
>> > > > > v4.0, even v4.1 mount was requested and produce undesirable
>> > > > > behavior,
>> > > > > e.g.
>> > > > > proxy-io instead of pnfs. I fill prefer fail-fast instead of
>> > > > > long
>> > > > > debug
>> > > > > sessions.
>> > > > >
>> > > > > On the other hand, I understand, that it's not always
>> > > > > possible to
>> > > > > fix
>> > > > > server
>> > > > > or clients in production environment and time-to-time
>> > > > > workarounds
>> > > > > are
>> > > > > necessary.
>> > > > >
>> > > > >
>> > > >
>> > > > I'd tend to agree with Tigran. Hiding server bugs, should not
>> > > > be a
>> > > > priority and particularly not in this case, where the
>> > > > workaround is
>> > > > simple: either turn off version negotiation altogether, or edit
>> > > > /etc/nfsmount.conf to negotiate a different set of versions.
>> > >
>> > > Yes, it could be worked-around in nfsmount.conf, but manual
>> > > configuration should be seen as an optimization or a last
>> > > resort. If
>> > > we can make things work without configuration, that provides the
>> > > best
>> > > experience.
>> > > In this case, the kernel has strong evidence that the server
>> > > isn't responding as expected, but it gives an unhelpful error
>> > > message.
>> >
>> > Server do not spontaneously break their ability to process NFSv4
>> > operations and so this is not an issue that we need to worry about
>> > in
>> > ordinary operation. It should only ever be an issue when
>> >
>> > 1) An insufficiently tested and broken upgrade is applied to an
>> > existing server, in which case the main workaround should be to
>> > revert
>> > the upgrade until it can be fixed.
>> > 2) A completely new broken server is introduced to the system.
>>
>> 3) An upgrade to the client defaults to trying 4.2 first, then 4.1
>> and
>> only then 4.0. Previous client defaulted to 4.0.
>
> I'm sorry, but this still does not sounds like a good case for "fix the
> kernel client". The kernel has no opinion on which NFS version you
> should try first in a mount attempt: that decision is made in
> userspace.
Of course. But the kernel should still accurately report what it
found. If it found that it cannot use that protocol, it should report
"I cannot use that protocol". It shouldn't report "IO error".
This allows user space to know what the status is and to make a
correctly informed decision.
> If you upgraded the nfs-utils to something that now tries 4.2 first,
> then that is a user space policy issue.
>
>> >
>> > > At the very least, nfs4_discover_server_trunking() should not
>> > > treat
>> > > -NFS4ERR_INVAL as unexpect (because there is code in
>> > > nfs4_check_cl_exchange_flags which explicitly generates it).
>> > > If it just let this error through, instead of translating it to
>> > > EIO,
>> > > then the problem would go away.
>> >
>> > The code in nfs4_check_cl_exchange_flags is there to check for
>> > explicit NFSv4.1 protocol violations. Is it broken?
>>
>> It is broken in that it reports -EINVAL when no arguments were
>> invalid.
>> This gets translated to -EIO when there was no IO error.
>
> The purpose of that function is to ensure that the server is
> advertising itself as a bona fide NFSv4.1 or newer server, and to
> ensure that it is not replying to our EXCHANGE_ID request with some
> flag or service that we did not expect and that might cause us to
> break.
>
> IOW: it is there to check protocol compliance. As far as I can tell, it
> is doing that, and doing so correctly.
It is checking correctly, but it is not reporting "the protocol is not
compliant". It is reporting "invalid argument".
Thanks,
NeilBrown
>
>
>> Thanks,
>> NeilBrown
>>
>>
>> >
>> > > >
>> > > > What we might want to do, is make it easier to allow the user
>> > > > to
>> > > > detect
>> > > > that this is indeed a server bug and is not a problem with the
>> > > > arguments supplied to the "mount" utility. Perhaps we might
>> > > > have
>> > > > the
>> > > > kernel log something in the syslogs?
>> > >
>> > > Yes, logging a message might be useful. Most of the messages
>> > > logged
>> > > about bad servers are currently going through dprintk(), so they
>> > > won't
>> > > often be seen. Is that what we want?? Don't know...
>> > >
>> > > Anyway, you point that it "is not a problem with the arguments"
>> > > is
>> > > stop-on. If the client gets EINVAL from the server, then it
>> > > shouldn't
>> > > blindly report that back to the user as EINVAL means "Invalid
>> > > argument" and the argements given to the server are probably not
>> > > the
>> > > argument given by the user.
>> > >
>> > > Following that line of reasoning, I think
>> > > nfs4_check_cl_exchange_flag()
>> > > should *not* return -NFS4ERR_INVAL, and _nfs4_proc_exchange_id()
>> > > shouldn't pass NFS4ERR_INVAL through unchanged.
>> > >
>> > > So I propose the following version.
>> > >
>> > > Thanks,
>> > > NeilBrown
>> > >
>> > > ------------------------8<---------------------------
>> > > From: NeilBrown <[email protected]>
>> > > Date: Tue, 20 Mar 2018 11:31:33 +1100
>> > > Subject: [PATCH] NFSv4: handle EINVAL from EXCHANGE_ID better.
>> > >
>> > > nfs4_proc_exchange_id() can return -EINVAL if the server
>> > > reported NFS4INVAL (which I have seen in a packet trace),
>> > > or nfs4_check_cl_exchange_flags() exchange flags detects
>> > > a problem.
>> > > Each of these mean that NFSv4.1 and later cannot be used,
>> > > but they should not prevent fallback to NFSv4.0. Currently
>> > > they do.
>> > >
>> > > Currently this EINVAL error is returned by
>> > > nfs4_proc_exchange_id() to nfs41_discover_server_trunking(),
>> > > and thence to nfs4_discover_server_trunking().
>> > > nfs4_discover_server_trunking() doesn't understand EINVAL,
>> > > so converts it to EIO which causes mount.nfs to think
>> > > something is horribly wrong and to give up.
>> > >
>> > > EINVAL is never a sensible error code here. It means "Invalid
>> > > argument", but is being used when the problem is "Invalid
>> > > response
>> > > from the server". If we change these two circumstances to report
>> > > EPROTONOSUPPORT to the caller (which seems a reasonable
>> > > assessment
>> > > when the server gives confusing responses), and if we enhance
>> > > nfs4_discover_server_trunking() to treat -EPROTONOSUPPORT as an
>> > > expected error to pass through, then the error reported to user-
>> > > space
>> > > will be more representative of the actual fault.
>> > >
>> > > A failure to negotiate a client ID clearly shows that NFSv4.1
>> > > cannot
>> > > be supported, but isn't as general a failure as EIO.
>> > >
>> > > Signed-off-by: NeilBrown <[email protected]>
>> > > ---
>> > > fs/nfs/nfs4proc.c | 18 +++++++++++++++---
>> > > fs/nfs/nfs4state.c | 1 +
>> > > 2 files changed, 16 insertions(+), 3 deletions(-)
>> > >
>> > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> > > index 47f3c273245e..97757f646f13 100644
>> > > --- a/fs/nfs/nfs4proc.c
>> > > +++ b/fs/nfs/nfs4proc.c
>> > > @@ -7364,7 +7364,8 @@ static int nfs4_check_cl_exchange_flags(u32
>> > > flags)
>> > > goto out_inval;
>> > > return NFS_OK;
>> > > out_inval:
>> > > - return -NFS4ERR_INVAL;
>> > > + dprintk("NFS: server returns invalid flags for
>> > > EXCHANGE_ID\n");
>> > > + return -EPROTONOSUPPORT;
>> > > }
>> > >
>> > > static bool
>> > > @@ -7741,8 +7742,19 @@ static int _nfs4_proc_exchange_id(struct
>> > > nfs_client *clp, struct rpc_cred *cred,
>> > > int status;
>> > >
>> > > task = nfs4_run_exchange_id(clp, cred, sp4_how, NULL);
>> > > - if (IS_ERR(task))
>> > > - return PTR_ERR(task);
>> > > + if (IS_ERR(task)) {
>> > > + status = PTR_ERR(task);
>> > > + if (status == -NFS4ERR_INVAL) {
>> > > + /* If the server think we did something
>> > > invalid, it is certainly
>> > > + * not the fault of our caller, so it
>> > > would
>> > > wrong to report
>> > > + * this error back up. So in that case
>> > > simply acknowledge that
>> > > + * we don't seem able to support this
>> > > protocol.
>> > > + */
>> > > + dprintk("NFS: server return
>> > > NFS4ERR_INVAL to
>> > > EXCHANGE_ID\n");
>> > > + status = -EPROTONOSUPPORT;
>> > > + }
>> > > + return status;
>> > > + }
>> > >
>> > > argp = task->tk_msg.rpc_argp;
>> > > resp = task->tk_msg.rpc_resp;
>> > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>> > > index 91a4d4eeb235..273c032089c4 100644
>> > > --- a/fs/nfs/nfs4state.c
>> > > +++ b/fs/nfs/nfs4state.c
>> > > @@ -2219,6 +2219,7 @@ int nfs4_discover_server_trunking(struct
>> > > nfs_client *clp,
>> > > clnt = clp->cl_rpcclient;
>> > > goto again;
>> > >
>> > > + case -EPROTONOSUPPORT:
>> > > case -NFS4ERR_MINOR_VERS_MISMATCH:
>> > > status = -EPROTONOSUPPORT;
>> > > break;
>> >
>> > --
>> > Trond Myklebust
>> > Linux NFS client maintainer, PrimaryData
>> > [email protected]
> --
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> [email protected]
On Wed, 2018-03-21 at 06:58 +1100, NeilBrown wrote:
> On Tue, Mar 20 2018, Trond Myklebust wrote:
>
> > On Wed, 2018-03-21 at 05:44 +1100, NeilBrown wrote:
> > > On Tue, Mar 20 2018, Trond Myklebust wrote:
> > >
> > > > On Tue, 2018-03-20 at 11:32 +1100, NeilBrown wrote:
> > > > > On Fri, Mar 16 2018, Trond Myklebust wrote:
> > > > >
> > > > > > On Fri, 2018-03-16 at 10:31 +0100, Mkrtchyan, Tigran wrote:
> > > > > > > Hi Neil,
> > > > > > >
> > > > > > > according to rfc5661, NFS4ERR_INVAL is returned by the
> > > > > > > server
> > > > > > > if
> > > > > > > it
> > > > > > > thinks that client sends an invalid request (e.g. points
> > > > > > > to a
> > > > > > > client
> > > > > > > bug)
> > > > > > > or server misinterpret it (broken server).
> > > > > > >
> > > > > > > With your change instead of failing the mount, client
> > > > > > > will
> > > > > > > silently
> > > > > > > go for
> > > > > > > v4.0, even v4.1 mount was requested and produce
> > > > > > > undesirable
> > > > > > > behavior,
> > > > > > > e.g.
> > > > > > > proxy-io instead of pnfs. I fill prefer fail-fast instead
> > > > > > > of
> > > > > > > long
> > > > > > > debug
> > > > > > > sessions.
> > > > > > >
> > > > > > > On the other hand, I understand, that it's not always
> > > > > > > possible to
> > > > > > > fix
> > > > > > > server
> > > > > > > or clients in production environment and time-to-time
> > > > > > > workarounds
> > > > > > > are
> > > > > > > necessary.
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > I'd tend to agree with Tigran. Hiding server bugs, should
> > > > > > not
> > > > > > be a
> > > > > > priority and particularly not in this case, where the
> > > > > > workaround is
> > > > > > simple: either turn off version negotiation altogether, or
> > > > > > edit
> > > > > > /etc/nfsmount.conf to negotiate a different set of
> > > > > > versions.
> > > > >
> > > > > Yes, it could be worked-around in nfsmount.conf, but manual
> > > > > configuration should be seen as an optimization or a last
> > > > > resort. If
> > > > > we can make things work without configuration, that provides
> > > > > the
> > > > > best
> > > > > experience.
> > > > > In this case, the kernel has strong evidence that the server
> > > > > isn't responding as expected, but it gives an unhelpful error
> > > > > message.
> > > >
> > > > Server do not spontaneously break their ability to process
> > > > NFSv4
> > > > operations and so this is not an issue that we need to worry
> > > > about
> > > > in
> > > > ordinary operation. It should only ever be an issue when
> > > >
> > > > 1) An insufficiently tested and broken upgrade is applied to an
> > > > existing server, in which case the main workaround should be to
> > > > revert
> > > > the upgrade until it can be fixed.
> > > > 2) A completely new broken server is introduced to the system.
> > >
> > > 3) An upgrade to the client defaults to trying 4.2 first, then
> > > 4.1
> > > and
> > > only then 4.0. Previous client defaulted to 4.0.
> >
> > I'm sorry, but this still does not sounds like a good case for "fix
> > the
> > kernel client". The kernel has no opinion on which NFS version you
> > should try first in a mount attempt: that decision is made in
> > userspace.
>
> Of course. But the kernel should still accurately report what it
> found. If it found that it cannot use that protocol, it should
> report
> "I cannot use that protocol". It shouldn't report "IO error".
EPROTONOSUPPORT means "that protocol is not supported". In the case you
are arguing, the protocol is reported as being supported by both the
client and the server, so that is not the issue. The existence of a
server bug in the protocol implementation _is_ the issue.
As I said, I have no problems changing what we report so that it is
easier to debug. I do have a problem with a change that makes the
failure silent, and thus makes the problem harder to debug.
Bottom line: if the sysadmin has enabled NFSv4.2 on the server, and the
userspace mount program is asking for NFSv4.2, then we should NOT be
telling it to fail over silently to NFSv4.0.
> This allows user space to know what the status is and to make a
> correctly informed decision.
>
> > If you upgraded the nfs-utils to something that now tries 4.2
> > first,
> > then that is a user space policy issue.
> >
> > > >
> > > > > At the very least, nfs4_discover_server_trunking() should not
> > > > > treat
> > > > > -NFS4ERR_INVAL as unexpect (because there is code in
> > > > > nfs4_check_cl_exchange_flags which explicitly generates it).
> > > > > If it just let this error through, instead of translating it
> > > > > to
> > > > > EIO,
> > > > > then the problem would go away.
> > > >
> > > > The code in nfs4_check_cl_exchange_flags is there to check for
> > > > explicit NFSv4.1 protocol violations. Is it broken?
> > >
> > > It is broken in that it reports -EINVAL when no arguments were
> > > invalid.
> > > This gets translated to -EIO when there was no IO error.
> >
> > The purpose of that function is to ensure that the server is
> > advertising itself as a bona fide NFSv4.1 or newer server, and to
> > ensure that it is not replying to our EXCHANGE_ID request with some
> > flag or service that we did not expect and that might cause us to
> > break.
> >
> > IOW: it is there to check protocol compliance. As far as I can
> > tell, it
> > is doing that, and doing so correctly.
>
> It is checking correctly, but it is not reporting "the protocol is
> not
> compliant". It is reporting "invalid argument".
>
> Thanks,
> NeilBrown
>
>
> >
> >
> > > Thanks,
> > > NeilBrown
> > >
> > >
> > > >
> > > > > >
> > > > > > What we might want to do, is make it easier to allow the
> > > > > > user
> > > > > > to
> > > > > > detect
> > > > > > that this is indeed a server bug and is not a problem with
> > > > > > the
> > > > > > arguments supplied to the "mount" utility. Perhaps we might
> > > > > > have
> > > > > > the
> > > > > > kernel log something in the syslogs?
> > > > >
> > > > > Yes, logging a message might be useful. Most of the messages
> > > > > logged
> > > > > about bad servers are currently going through dprintk(), so
> > > > > they
> > > > > won't
> > > > > often be seen. Is that what we want?? Don't know...
> > > > >
> > > > > Anyway, you point that it "is not a problem with the
> > > > > arguments"
> > > > > is
> > > > > stop-on. If the client gets EINVAL from the server, then it
> > > > > shouldn't
> > > > > blindly report that back to the user as EINVAL means "Invalid
> > > > > argument" and the argements given to the server are probably
> > > > > not
> > > > > the
> > > > > argument given by the user.
> > > > >
> > > > > Following that line of reasoning, I think
> > > > > nfs4_check_cl_exchange_flag()
> > > > > should *not* return -NFS4ERR_INVAL, and
> > > > > _nfs4_proc_exchange_id()
> > > > > shouldn't pass NFS4ERR_INVAL through unchanged.
> > > > >
> > > > > So I propose the following version.
> > > > >
> > > > > Thanks,
> > > > > NeilBrown
> > > > >
> > > > > ------------------------8<---------------------------
> > > > > From: NeilBrown <[email protected]>
> > > > > Date: Tue, 20 Mar 2018 11:31:33 +1100
> > > > > Subject: [PATCH] NFSv4: handle EINVAL from EXCHANGE_ID
> > > > > better.
> > > > >
> > > > > nfs4_proc_exchange_id() can return -EINVAL if the server
> > > > > reported NFS4INVAL (which I have seen in a packet trace),
> > > > > or nfs4_check_cl_exchange_flags() exchange flags detects
> > > > > a problem.
> > > > > Each of these mean that NFSv4.1 and later cannot be used,
> > > > > but they should not prevent fallback to NFSv4.0. Currently
> > > > > they do.
> > > > >
> > > > > Currently this EINVAL error is returned by
> > > > > nfs4_proc_exchange_id() to nfs41_discover_server_trunking(),
> > > > > and thence to nfs4_discover_server_trunking().
> > > > > nfs4_discover_server_trunking() doesn't understand EINVAL,
> > > > > so converts it to EIO which causes mount.nfs to think
> > > > > something is horribly wrong and to give up.
> > > > >
> > > > > EINVAL is never a sensible error code here. It means
> > > > > "Invalid
> > > > > argument", but is being used when the problem is "Invalid
> > > > > response
> > > > > from the server". If we change these two circumstances to
> > > > > report
> > > > > EPROTONOSUPPORT to the caller (which seems a reasonable
> > > > > assessment
> > > > > when the server gives confusing responses), and if we enhance
> > > > > nfs4_discover_server_trunking() to treat -EPROTONOSUPPORT as
> > > > > an
> > > > > expected error to pass through, then the error reported to
> > > > > user-
> > > > > space
> > > > > will be more representative of the actual fault.
> > > > >
> > > > > A failure to negotiate a client ID clearly shows that NFSv4.1
> > > > > cannot
> > > > > be supported, but isn't as general a failure as EIO.
> > > > >
> > > > > Signed-off-by: NeilBrown <[email protected]>
> > > > > ---
> > > > > fs/nfs/nfs4proc.c | 18 +++++++++++++++---
> > > > > fs/nfs/nfs4state.c | 1 +
> > > > > 2 files changed, 16 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > > > > index 47f3c273245e..97757f646f13 100644
> > > > > --- a/fs/nfs/nfs4proc.c
> > > > > +++ b/fs/nfs/nfs4proc.c
> > > > > @@ -7364,7 +7364,8 @@ static int
> > > > > nfs4_check_cl_exchange_flags(u32
> > > > > flags)
> > > > > goto out_inval;
> > > > > return NFS_OK;
> > > > > out_inval:
> > > > > - return -NFS4ERR_INVAL;
> > > > > + dprintk("NFS: server returns invalid flags for
> > > > > EXCHANGE_ID\n");
> > > > > + return -EPROTONOSUPPORT;
> > > > > }
> > > > >
> > > > > static bool
> > > > > @@ -7741,8 +7742,19 @@ static int
> > > > > _nfs4_proc_exchange_id(struct
> > > > > nfs_client *clp, struct rpc_cred *cred,
> > > > > int status;
> > > > >
> > > > > task = nfs4_run_exchange_id(clp, cred, sp4_how,
> > > > > NULL);
> > > > > - if (IS_ERR(task))
> > > > > - return PTR_ERR(task);
> > > > > + if (IS_ERR(task)) {
> > > > > + status = PTR_ERR(task);
> > > > > + if (status == -NFS4ERR_INVAL) {
> > > > > + /* If the server think we did
> > > > > something
> > > > > invalid, it is certainly
> > > > > + * not the fault of our caller, so
> > > > > it
> > > > > would
> > > > > wrong to report
> > > > > + * this error back up. So in that
> > > > > case
> > > > > simply acknowledge that
> > > > > + * we don't seem able to support
> > > > > this
> > > > > protocol.
> > > > > + */
> > > > > + dprintk("NFS: server return
> > > > > NFS4ERR_INVAL to
> > > > > EXCHANGE_ID\n");
> > > > > + status = -EPROTONOSUPPORT;
> > > > > + }
> > > > > + return status;
> > > > > + }
> > > > >
> > > > > argp = task->tk_msg.rpc_argp;
> > > > > resp = task->tk_msg.rpc_resp;
> > > > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > > > > index 91a4d4eeb235..273c032089c4 100644
> > > > > --- a/fs/nfs/nfs4state.c
> > > > > +++ b/fs/nfs/nfs4state.c
> > > > > @@ -2219,6 +2219,7 @@ int
> > > > > nfs4_discover_server_trunking(struct
> > > > > nfs_client *clp,
> > > > > clnt = clp->cl_rpcclient;
> > > > > goto again;
> > > > >
> > > > > + case -EPROTONOSUPPORT:
> > > > > case -NFS4ERR_MINOR_VERS_MISMATCH:
> > > > > status = -EPROTONOSUPPORT;
> > > > > break;
> > > >
> > > > --
> > > > Trond Myklebust
> > > > Linux NFS client maintainer, PrimaryData
> > > > [email protected]
> >
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, PrimaryData
> > [email protected]
--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]
On Tue, Mar 20 2018, Trond Myklebust wrote:
> On Wed, 2018-03-21 at 06:58 +1100, NeilBrown wrote:
>> On Tue, Mar 20 2018, Trond Myklebust wrote:
>>
>> > On Wed, 2018-03-21 at 05:44 +1100, NeilBrown wrote:
>> > > On Tue, Mar 20 2018, Trond Myklebust wrote:
>> > >
>> > > > On Tue, 2018-03-20 at 11:32 +1100, NeilBrown wrote:
>> > > > > On Fri, Mar 16 2018, Trond Myklebust wrote:
>> > > > >
>> > > > > > On Fri, 2018-03-16 at 10:31 +0100, Mkrtchyan, Tigran wrote:
>> > > > > > > Hi Neil,
>> > > > > > >
>> > > > > > > according to rfc5661, NFS4ERR_INVAL is returned by the
>> > > > > > > server
>> > > > > > > if
>> > > > > > > it
>> > > > > > > thinks that client sends an invalid request (e.g. points
>> > > > > > > to a
>> > > > > > > client
>> > > > > > > bug)
>> > > > > > > or server misinterpret it (broken server).
>> > > > > > >
>> > > > > > > With your change instead of failing the mount, client
>> > > > > > > will
>> > > > > > > silently
>> > > > > > > go for
>> > > > > > > v4.0, even v4.1 mount was requested and produce
>> > > > > > > undesirable
>> > > > > > > behavior,
>> > > > > > > e.g.
>> > > > > > > proxy-io instead of pnfs. I fill prefer fail-fast instead
>> > > > > > > of
>> > > > > > > long
>> > > > > > > debug
>> > > > > > > sessions.
>> > > > > > >
>> > > > > > > On the other hand, I understand, that it's not always
>> > > > > > > possible to
>> > > > > > > fix
>> > > > > > > server
>> > > > > > > or clients in production environment and time-to-time
>> > > > > > > workarounds
>> > > > > > > are
>> > > > > > > necessary.
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > > I'd tend to agree with Tigran. Hiding server bugs, should
>> > > > > > not
>> > > > > > be a
>> > > > > > priority and particularly not in this case, where the
>> > > > > > workaround is
>> > > > > > simple: either turn off version negotiation altogether, or
>> > > > > > edit
>> > > > > > /etc/nfsmount.conf to negotiate a different set of
>> > > > > > versions.
>> > > > >
>> > > > > Yes, it could be worked-around in nfsmount.conf, but manual
>> > > > > configuration should be seen as an optimization or a last
>> > > > > resort. If
>> > > > > we can make things work without configuration, that provides
>> > > > > the
>> > > > > best
>> > > > > experience.
>> > > > > In this case, the kernel has strong evidence that the server
>> > > > > isn't responding as expected, but it gives an unhelpful error
>> > > > > message.
>> > > >
>> > > > Server do not spontaneously break their ability to process
>> > > > NFSv4
>> > > > operations and so this is not an issue that we need to worry
>> > > > about
>> > > > in
>> > > > ordinary operation. It should only ever be an issue when
>> > > >
>> > > > 1) An insufficiently tested and broken upgrade is applied to an
>> > > > existing server, in which case the main workaround should be to
>> > > > revert
>> > > > the upgrade until it can be fixed.
>> > > > 2) A completely new broken server is introduced to the system.
>> > >
>> > > 3) An upgrade to the client defaults to trying 4.2 first, then
>> > > 4.1
>> > > and
>> > > only then 4.0. Previous client defaulted to 4.0.
>> >
>> > I'm sorry, but this still does not sounds like a good case for "fix
>> > the
>> > kernel client". The kernel has no opinion on which NFS version you
>> > should try first in a mount attempt: that decision is made in
>> > userspace.
>>
>> Of course. But the kernel should still accurately report what it
>> found. If it found that it cannot use that protocol, it should
>> report
>> "I cannot use that protocol". It shouldn't report "IO error".
>
> EPROTONOSUPPORT means "that protocol is not supported". In the case you
> are arguing, the protocol is reported as being supported by both the
> client and the server, so that is not the issue. The existence of a
> server bug in the protocol implementation _is_ the issue.
I don't see how a server that fails the first transaction of a protocol
can be said to "support" that protocol.
>
> As I said, I have no problems changing what we report so that it is
> easier to debug. I do have a problem with a change that makes the
> failure silent, and thus makes the problem harder to debug.
>
> Bottom line: if the sysadmin has enabled NFSv4.2 on the server, and the
> userspace mount program is asking for NFSv4.2, then we should NOT be
> telling it to fail over silently to NFSv4.0.
It is NOT telling it to fail over. It is tell it that v4.2 doesn't
work.
How mount.nfs responds to that information is not the kernel's problem.
Providing meaningful information is.
But I can see that we aren't going anywhere so let's just not bother.
It isn't that much cost for me to carry a tiny patch that you don't
want.
Thanks,
NeilBrown
>
>> This allows user space to know what the status is and to make a
>> correctly informed decision.
>>
>> > If you upgraded the nfs-utils to something that now tries 4.2
>> > first,
>> > then that is a user space policy issue.
>> >
>> > > >
>> > > > > At the very least, nfs4_discover_server_trunking() should not
>> > > > > treat
>> > > > > -NFS4ERR_INVAL as unexpect (because there is code in
>> > > > > nfs4_check_cl_exchange_flags which explicitly generates it).
>> > > > > If it just let this error through, instead of translating it
>> > > > > to
>> > > > > EIO,
>> > > > > then the problem would go away.
>> > > >
>> > > > The code in nfs4_check_cl_exchange_flags is there to check for
>> > > > explicit NFSv4.1 protocol violations. Is it broken?
>> > >
>> > > It is broken in that it reports -EINVAL when no arguments were
>> > > invalid.
>> > > This gets translated to -EIO when there was no IO error.
>> >
>> > The purpose of that function is to ensure that the server is
>> > advertising itself as a bona fide NFSv4.1 or newer server, and to
>> > ensure that it is not replying to our EXCHANGE_ID request with some
>> > flag or service that we did not expect and that might cause us to
>> > break.
>> >
>> > IOW: it is there to check protocol compliance. As far as I can
>> > tell, it
>> > is doing that, and doing so correctly.
>>
>> It is checking correctly, but it is not reporting "the protocol is
>> not
>> compliant". It is reporting "invalid argument".
>>
>> Thanks,
>> NeilBrown
>>
>>
>> >
>> >
>> > > Thanks,
>> > > NeilBrown
>> > >
>> > >
>> > > >
>> > > > > >
>> > > > > > What we might want to do, is make it easier to allow the
>> > > > > > user
>> > > > > > to
>> > > > > > detect
>> > > > > > that this is indeed a server bug and is not a problem with
>> > > > > > the
>> > > > > > arguments supplied to the "mount" utility. Perhaps we might
>> > > > > > have
>> > > > > > the
>> > > > > > kernel log something in the syslogs?
>> > > > >
>> > > > > Yes, logging a message might be useful. Most of the messages
>> > > > > logged
>> > > > > about bad servers are currently going through dprintk(), so
>> > > > > they
>> > > > > won't
>> > > > > often be seen. Is that what we want?? Don't know...
>> > > > >
>> > > > > Anyway, you point that it "is not a problem with the
>> > > > > arguments"
>> > > > > is
>> > > > > stop-on. If the client gets EINVAL from the server, then it
>> > > > > shouldn't
>> > > > > blindly report that back to the user as EINVAL means "Invalid
>> > > > > argument" and the argements given to the server are probably
>> > > > > not
>> > > > > the
>> > > > > argument given by the user.
>> > > > >
>> > > > > Following that line of reasoning, I think
>> > > > > nfs4_check_cl_exchange_flag()
>> > > > > should *not* return -NFS4ERR_INVAL, and
>> > > > > _nfs4_proc_exchange_id()
>> > > > > shouldn't pass NFS4ERR_INVAL through unchanged.
>> > > > >
>> > > > > So I propose the following version.
>> > > > >
>> > > > > Thanks,
>> > > > > NeilBrown
>> > > > >
>> > > > > ------------------------8<---------------------------
>> > > > > From: NeilBrown <[email protected]>
>> > > > > Date: Tue, 20 Mar 2018 11:31:33 +1100
>> > > > > Subject: [PATCH] NFSv4: handle EINVAL from EXCHANGE_ID
>> > > > > better.
>> > > > >
>> > > > > nfs4_proc_exchange_id() can return -EINVAL if the server
>> > > > > reported NFS4INVAL (which I have seen in a packet trace),
>> > > > > or nfs4_check_cl_exchange_flags() exchange flags detects
>> > > > > a problem.
>> > > > > Each of these mean that NFSv4.1 and later cannot be used,
>> > > > > but they should not prevent fallback to NFSv4.0. Currently
>> > > > > they do.
>> > > > >
>> > > > > Currently this EINVAL error is returned by
>> > > > > nfs4_proc_exchange_id() to nfs41_discover_server_trunking(),
>> > > > > and thence to nfs4_discover_server_trunking().
>> > > > > nfs4_discover_server_trunking() doesn't understand EINVAL,
>> > > > > so converts it to EIO which causes mount.nfs to think
>> > > > > something is horribly wrong and to give up.
>> > > > >
>> > > > > EINVAL is never a sensible error code here. It means
>> > > > > "Invalid
>> > > > > argument", but is being used when the problem is "Invalid
>> > > > > response
>> > > > > from the server". If we change these two circumstances to
>> > > > > report
>> > > > > EPROTONOSUPPORT to the caller (which seems a reasonable
>> > > > > assessment
>> > > > > when the server gives confusing responses), and if we enhance
>> > > > > nfs4_discover_server_trunking() to treat -EPROTONOSUPPORT as
>> > > > > an
>> > > > > expected error to pass through, then the error reported to
>> > > > > user-
>> > > > > space
>> > > > > will be more representative of the actual fault.
>> > > > >
>> > > > > A failure to negotiate a client ID clearly shows that NFSv4.1
>> > > > > cannot
>> > > > > be supported, but isn't as general a failure as EIO.
>> > > > >
>> > > > > Signed-off-by: NeilBrown <[email protected]>
>> > > > > ---
>> > > > > fs/nfs/nfs4proc.c | 18 +++++++++++++++---
>> > > > > fs/nfs/nfs4state.c | 1 +
>> > > > > 2 files changed, 16 insertions(+), 3 deletions(-)
>> > > > >
>> > > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> > > > > index 47f3c273245e..97757f646f13 100644
>> > > > > --- a/fs/nfs/nfs4proc.c
>> > > > > +++ b/fs/nfs/nfs4proc.c
>> > > > > @@ -7364,7 +7364,8 @@ static int
>> > > > > nfs4_check_cl_exchange_flags(u32
>> > > > > flags)
>> > > > > goto out_inval;
>> > > > > return NFS_OK;
>> > > > > out_inval:
>> > > > > - return -NFS4ERR_INVAL;
>> > > > > + dprintk("NFS: server returns invalid flags for
>> > > > > EXCHANGE_ID\n");
>> > > > > + return -EPROTONOSUPPORT;
>> > > > > }
>> > > > >
>> > > > > static bool
>> > > > > @@ -7741,8 +7742,19 @@ static int
>> > > > > _nfs4_proc_exchange_id(struct
>> > > > > nfs_client *clp, struct rpc_cred *cred,
>> > > > > int status;
>> > > > >
>> > > > > task = nfs4_run_exchange_id(clp, cred, sp4_how,
>> > > > > NULL);
>> > > > > - if (IS_ERR(task))
>> > > > > - return PTR_ERR(task);
>> > > > > + if (IS_ERR(task)) {
>> > > > > + status = PTR_ERR(task);
>> > > > > + if (status == -NFS4ERR_INVAL) {
>> > > > > + /* If the server think we did
>> > > > > something
>> > > > > invalid, it is certainly
>> > > > > + * not the fault of our caller, so
>> > > > > it
>> > > > > would
>> > > > > wrong to report
>> > > > > + * this error back up. So in that
>> > > > > case
>> > > > > simply acknowledge that
>> > > > > + * we don't seem able to support
>> > > > > this
>> > > > > protocol.
>> > > > > + */
>> > > > > + dprintk("NFS: server return
>> > > > > NFS4ERR_INVAL to
>> > > > > EXCHANGE_ID\n");
>> > > > > + status = -EPROTONOSUPPORT;
>> > > > > + }
>> > > > > + return status;
>> > > > > + }
>> > > > >
>> > > > > argp = task->tk_msg.rpc_argp;
>> > > > > resp = task->tk_msg.rpc_resp;
>> > > > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>> > > > > index 91a4d4eeb235..273c032089c4 100644
>> > > > > --- a/fs/nfs/nfs4state.c
>> > > > > +++ b/fs/nfs/nfs4state.c
>> > > > > @@ -2219,6 +2219,7 @@ int
>> > > > > nfs4_discover_server_trunking(struct
>> > > > > nfs_client *clp,
>> > > > > clnt = clp->cl_rpcclient;
>> > > > > goto again;
>> > > > >
>> > > > > + case -EPROTONOSUPPORT:
>> > > > > case -NFS4ERR_MINOR_VERS_MISMATCH:
>> > > > > status = -EPROTONOSUPPORT;
>> > > > > break;
>> > > >
>> > > > --
>> > > > Trond Myklebust
>> > > > Linux NFS client maintainer, PrimaryData
>> > > > [email protected]
>> >
>> > --
>> > Trond Myklebust
>> > Linux NFS client maintainer, PrimaryData
>> > [email protected]
> --
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> [email protected]
On Fri, Mar 16, 2018 at 10:44:23AM +1100, NeilBrown wrote:
>
> nfs4_proc_exchange_id() can return -EINVAL if the server
> reported NFS4INVAL (which I have seen in a packet trace),
Can you say which server this was? (One we can fix?)
--b.
On Tue, Mar 20 2018, J. Bruce Fields wrote:
> On Fri, Mar 16, 2018 at 10:44:23AM +1100, NeilBrown wrote:
>>
>> nfs4_proc_exchange_id() can return -EINVAL if the server
>> reported NFS4INVAL (which I have seen in a packet trace),
>
> Can you say which server this was? (One we can fix?)
Unfortunately not - all I know is that it is a bit old and isn't SUSE
(which doen't really narrow it down at all).
I've asked for more details and will pass on anything I discover.
Thanks,
NeilBrown
On Tue, Mar 20 2018, J. Bruce Fields wrote:
> On Fri, Mar 16, 2018 at 10:44:23AM +1100, NeilBrown wrote:
>>
>> nfs4_proc_exchange_id() can return -EINVAL if the server
>> reported NFS4INVAL (which I have seen in a packet trace),
>
> Can you say which server this was? (One we can fix?)
It was Linux 2.6.32 (on armv5), and presumably anything before
Commit: 357f54d6b382 ("NFS fix the setting of exchange id flag")
which landed in 2.6.38.
These servers return NFS4ERR_INVAL when EXCHGID4_FLAG_BIND_PRINC_STATEID
is set in the exchange_id flags, which Linux has done since
Commit: 4f0b429df104 ("NFSv4.1: Enable state protection")
in 3.11.
Maybe NFS should retry without that flag if it gets EINVAL?? Or it could
just say that NFSv4.1 isn't supported. I still don't think EIO is justified.
Thanks,
NeilBrown
On Tue, Apr 03, 2018 at 10:41:56AM +1000, NeilBrown wrote:
> On Tue, Mar 20 2018, J. Bruce Fields wrote:
>
> > On Fri, Mar 16, 2018 at 10:44:23AM +1100, NeilBrown wrote:
> >>
> >> nfs4_proc_exchange_id() can return -EINVAL if the server
> >> reported NFS4INVAL (which I have seen in a packet trace),
> >
> > Can you say which server this was? (One we can fix?)
>
> It was Linux 2.6.32 (on armv5), and presumably anything before
> Commit: 357f54d6b382 ("NFS fix the setting of exchange id flag")
> which landed in 2.6.38.
The kernel defaulted to keeping 4.1 off at that point, probably for good
reason--that default didn't change for another 4 or 5 years. I suspect
this is only the first of multiple 4.1 bugs you'd run into in a server
of that vintage. So the solution is probably just to keep 4.1 off.
--b.
>
> These servers return NFS4ERR_INVAL when EXCHGID4_FLAG_BIND_PRINC_STATEID
> is set in the exchange_id flags, which Linux has done since
> Commit: 4f0b429df104 ("NFSv4.1: Enable state protection")
> in 3.11.
>
> Maybe NFS should retry without that flag if it gets EINVAL?? Or it could
> just say that NFSv4.1 isn't supported. I still don't think EIO is justified.
>
> Thanks,
> NeilBrown