Hi Bruce,
I've also find different failures on NFS 4.0:
SEC7 st_secinfo.testRPCSEC_GSS : FAILURE
SECINFO returned mechanism list without RPCSEC_GSS
LOCK24 st_lock.testOpenUpgradeLock : FAILURE
OP_LOCK should return NFS4_OK, instead got
NFS4ERR_BAD_SEQID
They're on stable kernel 5.12.3-1-default (openSUSE). I saw them also on older
kernel 4.19.0-16-amd64 (Debian).
Any idea how to find whether are these are wrong setup or test bugs or real
kernel bugs?
Upstreaming your scripts or better documenting the setup would be great.
Kind regards,
Petr
On Tue, Jun 01, 2021 at 04:01:08PM +0200, Petr Vorel wrote:
> I've also find different failures on NFS 4.0:
>
> SEC7 st_secinfo.testRPCSEC_GSS : FAILURE
> SECINFO returned mechanism list without RPCSEC_GSS
That shouldn't be run by default; see patch, appended.
> LOCK24 st_lock.testOpenUpgradeLock : FAILURE
> OP_LOCK should return NFS4_OK, instead got
> NFS4ERR_BAD_SEQID
I suspect the server's actually OK here, but I need to look more
closely.
> They're on stable kernel 5.12.3-1-default (openSUSE). I saw them also on older
> kernel 4.19.0-16-amd64 (Debian).
>
> Any idea how to find whether are these are wrong setup or test bugs or real
> kernel bugs?
For what it's worth, this is what I do as part of my regular regression
tests, for 4.0:
http://git.linux-nfs.org/?p=bfields/testd.git;a=blob;f=bin/do-pynfs;h=4ed0f7942b9ff0907cbd3bb0ec1643dad02758f5;hb=HEAD
and for 4.1:
http://git.linux-nfs.org/?p=bfields/testd.git;a=blob;f=bin/do-pynfs41;h=b3afc60dfab17aa5037d3f587d3d113bc772970e;hb=HEAD
There are some known 4.0 failures that I skip:
http://git.linux-nfs.org/?p=bfields/testd.git;a=blob;f=data/pynfs-skip;h=44256bb26e3fae86572e7c7057b1889652fa014b;hb=HEAD
(But LOCK24 isn't on that list because I keep saying I'm going to triage
it....)
And for 4.1:
http://git.linux-nfs.org/?p=bfields/testd.git;a=blob;f=data/pynfs41-skip;h=c682bed97742cf799b94364872c7575ac9fc188c;hb=HEAD
--b.
commit 98f4ce2e6418
Author: J. Bruce Fields <[email protected]>
Date: Tue Jun 1 11:10:06 2021 -0400
nfs4.0 secinfo: auth_gss not required
RPCSEC_GSS is mandatory to implement, but that doesn't mean every server
will have it be configured on.
I try to only add the "all" tag to tests whose failure indicates the
server is really out of spec, or at least is very unusual and likely to
cause problems for clients.
Signed-off-by: J. Bruce Fields <[email protected]>
diff --git a/nfs4.0/servertests/st_secinfo.py b/nfs4.0/servertests/st_secinfo.py
index d9363de36969..4c4a44b3c919 100644
--- a/nfs4.0/servertests/st_secinfo.py
+++ b/nfs4.0/servertests/st_secinfo.py
@@ -102,7 +102,7 @@ def testRPCSEC_GSS(t, env):
per section 3.2.1.1 of RFC
- FLAGS: secinfo all
+ FLAGS: secinfo gss
DEPEND: SEC1
CODE: SEC7
"""
Hi Bruce,
thanks a lot for valuable info.
> On Tue, Jun 01, 2021 at 04:01:08PM +0200, Petr Vorel wrote:
> > I've also find different failures on NFS 4.0:
> > SEC7 st_secinfo.testRPCSEC_GSS : FAILURE
> > SECINFO returned mechanism list without RPCSEC_GSS
> That shouldn't be run by default; see patch, appended.
+1, thanks for a quick fix (disabling it for all).
I'll have a look into testd what needs to be enabled (I have
CONFIG_RPCSEC_GSS_KRB5=m and have changed the default sec to
sec=sys,krb5,krb5i,krb5p, but it didn't help).
> > LOCK24 st_lock.testOpenUpgradeLock : FAILURE
> > OP_LOCK should return NFS4_OK, instead got
> > NFS4ERR_BAD_SEQID
> I suspect the server's actually OK here, but I need to look more
> closely.
Thanks a lot!
> > They're on stable kernel 5.12.3-1-default (openSUSE). I saw them also on older
> > kernel 4.19.0-16-amd64 (Debian).
> > Any idea how to find whether are these are wrong setup or test bugs or real
> > kernel bugs?
> For what it's worth, this is what I do as part of my regular regression
> tests, for 4.0:
> http://git.linux-nfs.org/?p=bfields/testd.git;a=blob;f=bin/do-pynfs;h=4ed0f7942b9ff0907cbd3bb0ec1643dad02758f5;hb=HEAD
> and for 4.1:
> http://git.linux-nfs.org/?p=bfields/testd.git;a=blob;f=bin/do-pynfs41;h=b3afc60dfab17aa5037d3f587d3d113bc772970e;hb=HEAD
> There are some known 4.0 failures that I skip:
> http://git.linux-nfs.org/?p=bfields/testd.git;a=blob;f=data/pynfs-skip;h=44256bb26e3fae86572e7c7057b1889652fa014b;hb=HEAD
> (But LOCK24 isn't on that list because I keep saying I'm going to triage
> it....)
> And for 4.1:
> http://git.linux-nfs.org/?p=bfields/testd.git;a=blob;f=data/pynfs41-skip;h=c682bed97742cf799b94364872c7575ac9fc188c;hb=HEAD
Thank you, having scripts with comments is very much appreciated :).
Maybe some of the info or even setup could be moved to pynfs git repository
(to have some setup done by tests). Or testd could be mentioned in pynfs README
as testd is not pynfs specific (uses more testsuites - lock-tests, cthon04 and
even xfstests-dev)
...
Kind regards,
Petr
On Wed, 02 Jun 2021, J. Bruce Fields wrote:
> On Tue, Jun 01, 2021 at 04:01:08PM +0200, Petr Vorel wrote:
>
> > LOCK24 st_lock.testOpenUpgradeLock : FAILURE
> > OP_LOCK should return NFS4_OK, instead got
> > NFS4ERR_BAD_SEQID
>
> I suspect the server's actually OK here, but I need to look more
> closely.
>
I agree.
I think this patch fixes the test.
NeilBrown
From: NeilBrown <[email protected]>
Date: Tue, 18 Jan 2022 15:50:37 +1100
Subject: [PATCH] Fix NFSv4.0 LOCK24 test
Only the first lock request for a given open-owner can use lock_file.
Subsequent lock request must use relock_file.
Signed-off-by: NeilBrown <[email protected]>
---
nfs4.0/servertests/st_lock.py | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/nfs4.0/servertests/st_lock.py b/nfs4.0/servertests/st_lock.py
index 468672403ffe..db08fbeedac4 100644
--- a/nfs4.0/servertests/st_lock.py
+++ b/nfs4.0/servertests/st_lock.py
@@ -886,6 +886,7 @@ class open_sequence:
self.client = client
self.owner = owner
self.lockowner = lockowner
+ self.lockseq = 0
def open(self, access):
self.fh, self.stateid = self.client.create_confirm(self.owner,
access=access,
@@ -899,15 +900,21 @@ class open_sequence:
def close(self):
self.client.close_file(self.owner, self.fh, self.stateid)
def lock(self, type):
- res = self.client.lock_file(self.owner, self.fh, self.stateid,
- type=type, lockowner=self.lockowner)
+ if self.lockseq == 0:
+ res = self.client.lock_file(self.owner, self.fh, self.stateid,
+ type=type, lockowner=self.lockowner)
+ else:
+ res = self.client.relock_file(self.lockseq, self.fh, self.lockstateid,
+ type=type)
check(res)
if res.status == NFS4_OK:
self.lockstateid = res.lockid
+ self.lockseq = self.lockseq + 1
def unlock(self):
res = self.client.unlock_file(1, self.fh, self.lockstateid)
if res.status == NFS4_OK:
self.lockstateid = res.lockid
+ self.lockseq = self.lockseq + 1
def testOpenUpgradeLock(t, env):
"""Try open, lock, open, downgrade, close
--
2.34.1
Hi Neil,
> On Wed, 02 Jun 2021, J. Bruce Fields wrote:
> > On Tue, Jun 01, 2021 at 04:01:08PM +0200, Petr Vorel wrote:
> > > LOCK24 st_lock.testOpenUpgradeLock : FAILURE
> > > OP_LOCK should return NFS4_OK, instead got
> > > NFS4ERR_BAD_SEQID
> > I suspect the server's actually OK here, but I need to look more
> > closely.
> I agree.
> I think this patch fixes the test.
> NeilBrown
> From: NeilBrown <[email protected]>
> Date: Tue, 18 Jan 2022 15:50:37 +1100
> Subject: [PATCH] Fix NFSv4.0 LOCK24 test
> Only the first lock request for a given open-owner can use lock_file.
> Subsequent lock request must use relock_file.
Reviewed-by: Petr Vorel <[email protected]>
Tested-by: Petr Vorel <[email protected]>
Thanks!
> Signed-off-by: NeilBrown <[email protected]>
> ---
> nfs4.0/servertests/st_lock.py | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
> diff --git a/nfs4.0/servertests/st_lock.py b/nfs4.0/servertests/st_lock.py
> index 468672403ffe..db08fbeedac4 100644
> --- a/nfs4.0/servertests/st_lock.py
> +++ b/nfs4.0/servertests/st_lock.py
> @@ -886,6 +886,7 @@ class open_sequence:
> self.client = client
> self.owner = owner
> self.lockowner = lockowner
> + self.lockseq = 0
> def open(self, access):
> self.fh, self.stateid = self.client.create_confirm(self.owner,
> access=access,
> @@ -899,15 +900,21 @@ class open_sequence:
> def close(self):
> self.client.close_file(self.owner, self.fh, self.stateid)
> def lock(self, type):
> - res = self.client.lock_file(self.owner, self.fh, self.stateid,
> - type=type, lockowner=self.lockowner)
> + if self.lockseq == 0:
> + res = self.client.lock_file(self.owner, self.fh, self.stateid,
> + type=type, lockowner=self.lockowner)
> + else:
> + res = self.client.relock_file(self.lockseq, self.fh, self.lockstateid,
> + type=type)
> check(res)
> if res.status == NFS4_OK:
> self.lockstateid = res.lockid
> + self.lockseq = self.lockseq + 1
I'd just: self.lockseq += 1
(supported even on python 2.x)
> def unlock(self):
> res = self.client.unlock_file(1, self.fh, self.lockstateid)
> if res.status == NFS4_OK:
> self.lockstateid = res.lockid
> + self.lockseq = self.lockseq + 1
And here.
Kind regards,
Petr
> def testOpenUpgradeLock(t, env):
> """Try open, lock, open, downgrade, close
Frank added this test in 4299316fb357, and I don't really understand
his description, but it *sounds* like he really wanted it to do the
new-lockowner case. Frank?
--b.
On Tue, Jan 18, 2022 at 12:01 AM NeilBrown <[email protected]> wrote:
>
> On Wed, 02 Jun 2021, J. Bruce Fields wrote:
> > On Tue, Jun 01, 2021 at 04:01:08PM +0200, Petr Vorel wrote:
> >
> > > LOCK24 st_lock.testOpenUpgradeLock : FAILURE
> > > OP_LOCK should return NFS4_OK, instead got
> > > NFS4ERR_BAD_SEQID
> >
> > I suspect the server's actually OK here, but I need to look more
> > closely.
> >
> I agree.
> I think this patch fixes the test.
>
> NeilBrown
>
> From: NeilBrown <[email protected]>
> Date: Tue, 18 Jan 2022 15:50:37 +1100
> Subject: [PATCH] Fix NFSv4.0 LOCK24 test
>
> Only the first lock request for a given open-owner can use lock_file.
> Subsequent lock request must use relock_file.
>
> Signed-off-by: NeilBrown <[email protected]>
> ---
> nfs4.0/servertests/st_lock.py | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/nfs4.0/servertests/st_lock.py b/nfs4.0/servertests/st_lock.py
> index 468672403ffe..db08fbeedac4 100644
> --- a/nfs4.0/servertests/st_lock.py
> +++ b/nfs4.0/servertests/st_lock.py
> @@ -886,6 +886,7 @@ class open_sequence:
> self.client = client
> self.owner = owner
> self.lockowner = lockowner
> + self.lockseq = 0
> def open(self, access):
> self.fh, self.stateid = self.client.create_confirm(self.owner,
> access=access,
> @@ -899,15 +900,21 @@ class open_sequence:
> def close(self):
> self.client.close_file(self.owner, self.fh, self.stateid)
> def lock(self, type):
> - res = self.client.lock_file(self.owner, self.fh, self.stateid,
> - type=type, lockowner=self.lockowner)
> + if self.lockseq == 0:
> + res = self.client.lock_file(self.owner, self.fh, self.stateid,
> + type=type, lockowner=self.lockowner)
> + else:
> + res = self.client.relock_file(self.lockseq, self.fh, self.lockstateid,
> + type=type)
> check(res)
> if res.status == NFS4_OK:
> self.lockstateid = res.lockid
> + self.lockseq = self.lockseq + 1
> def unlock(self):
> res = self.client.unlock_file(1, self.fh, self.lockstateid)
> if res.status == NFS4_OK:
> self.lockstateid = res.lockid
> + self.lockseq = self.lockseq + 1
>
> def testOpenUpgradeLock(t, env):
> """Try open, lock, open, downgrade, close
> --
> 2.34.1
>
On Wed, 26 Jan 2022, Bruce Fields wrote:
> Frank added this test in 4299316fb357, and I don't really understand
> his description, but it *sounds* like he really wanted it to do the
> new-lockowner case. Frank?
The way I read that commit message, there needs to be a second lock
owner (as you suggest), but there clearly isn't one.
Maybe there needs to be a second open_sequence() created ... I'm not
sure.
NeilBrown
Yes, I believe I wrote this test to recreate a condition we saw in the wild. There is no guarantee the client doesn't send LOCK with an OPEN stateid and requesting new lock owner when you already have a LOCK stateid for that lock owner. This test case forces that condition.
It looks like we were having troubles with FREE_STATEID racing with LOCK. A LOCK following a FREE_STATEID MUST use the OPEN stateid and ask for a new lock owner (since the LOCK stateid was freed), but if the LOCK wins the race, the old LOCK stateid still exists, so we get an LOCK with OPEN stateid requesting new lock owner where the STATEID will already exist.
Now maybe there's a different way to resolve the race, but if the LOCK truly arrives before Ganesha even sees the FREE_STATEID then it has no knowledge that would allow it to delay the LOCK request. Before we made changes to allow this I believe we replied with an error that broke things client side.
Here's a Ganesha patch trying to resolve the race and creating the condition that LOCK24 was then written to test:
https://github.com/nfs-ganesha/nfs-ganesha/commit/7d0fb8e9328c40fcfae03ac950a854f56689bb44
Of course the client may have changed to eliminate the race...
If need be, just change this from an "all" test to a "ganesha" test.
Frank
> -----Original Message-----
> From: Bruce Fields [mailto:[email protected]]
> Sent: Tuesday, January 25, 2022 2:47 PM
> To: NeilBrown <[email protected]>
> Cc: Petr Vorel <[email protected]>; Linux NFS Mailing List <linux-
> [email protected]>; Yong Sun <[email protected]>; Frank S. Filz
> <[email protected]>
> Subject: Re: pynfs: [NFS 4.0] SEC7, LOCK24 test failures
>
> Frank added this test in 4299316fb357, and I don't really understand his
> description, but it *sounds* like he really wanted it to do the new-lockowner
> case. Frank?
>
> --b.
>
> On Tue, Jan 18, 2022 at 12:01 AM NeilBrown <[email protected]> wrote:
> >
> > On Wed, 02 Jun 2021, J. Bruce Fields wrote:
> > > On Tue, Jun 01, 2021 at 04:01:08PM +0200, Petr Vorel wrote:
> > >
> > > > LOCK24 st_lock.testOpenUpgradeLock : FAILURE
> > > > OP_LOCK should return NFS4_OK, instead got
> > > > NFS4ERR_BAD_SEQID
> > >
> > > I suspect the server's actually OK here, but I need to look more
> > > closely.
> > >
> > I agree.
> > I think this patch fixes the test.
> >
> > NeilBrown
> >
> > From: NeilBrown <[email protected]>
> > Date: Tue, 18 Jan 2022 15:50:37 +1100
> > Subject: [PATCH] Fix NFSv4.0 LOCK24 test
> >
> > Only the first lock request for a given open-owner can use lock_file.
> > Subsequent lock request must use relock_file.
> >
> > Signed-off-by: NeilBrown <[email protected]>
> > ---
> > nfs4.0/servertests/st_lock.py | 11 +++++++++--
> > 1 file changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/nfs4.0/servertests/st_lock.py
> > b/nfs4.0/servertests/st_lock.py index 468672403ffe..db08fbeedac4
> > 100644
> > --- a/nfs4.0/servertests/st_lock.py
> > +++ b/nfs4.0/servertests/st_lock.py
> > @@ -886,6 +886,7 @@ class open_sequence:
> > self.client = client
> > self.owner = owner
> > self.lockowner = lockowner
> > + self.lockseq = 0
> > def open(self, access):
> > self.fh, self.stateid = self.client.create_confirm(self.owner,
> > access=access, @@
> > -899,15 +900,21 @@ class open_sequence:
> > def close(self):
> > self.client.close_file(self.owner, self.fh, self.stateid)
> > def lock(self, type):
> > - res = self.client.lock_file(self.owner, self.fh, self.stateid,
> > - type=type, lockowner=self.lockowner)
> > + if self.lockseq == 0:
> > + res = self.client.lock_file(self.owner, self.fh, self.stateid,
> > + type=type, lockowner=self.lockowner)
> > + else:
> > + res = self.client.relock_file(self.lockseq, self.fh, self.lockstateid,
> > + type=type)
> > check(res)
> > if res.status == NFS4_OK:
> > self.lockstateid = res.lockid
> > + self.lockseq = self.lockseq + 1
> > def unlock(self):
> > res = self.client.unlock_file(1, self.fh, self.lockstateid)
> > if res.status == NFS4_OK:
> > self.lockstateid = res.lockid
> > + self.lockseq = self.lockseq + 1
> >
> > def testOpenUpgradeLock(t, env):
> > """Try open, lock, open, downgrade, close
> > --
> > 2.34.1
> >
Hi all,
> Yes, I believe I wrote this test to recreate a condition we saw in the wild. There is no guarantee the client doesn't send LOCK with an OPEN stateid and requesting new lock owner when you already have a LOCK stateid for that lock owner. This test case forces that condition.
> It looks like we were having troubles with FREE_STATEID racing with LOCK. A LOCK following a FREE_STATEID MUST use the OPEN stateid and ask for a new lock owner (since the LOCK stateid was freed), but if the LOCK wins the race, the old LOCK stateid still exists, so we get an LOCK with OPEN stateid requesting new lock owner where the STATEID will already exist.
> Now maybe there's a different way to resolve the race, but if the LOCK truly arrives before Ganesha even sees the FREE_STATEID then it has no knowledge that would allow it to delay the LOCK request. Before we made changes to allow this I believe we replied with an error that broke things client side.
> Here's a Ganesha patch trying to resolve the race and creating the condition that LOCK24 was then written to test:
> https://github.com/nfs-ganesha/nfs-ganesha/commit/7d0fb8e9328c40fcfae03ac950a854f56689bb44
> Of course the client may have changed to eliminate the race...
> If need be, just change this from an "all" test to a "ganesha" test.
Bruce, could this be done to solve problems for other clients?
> Frank
> > -----Original Message-----
> > From: Bruce Fields [mailto:[email protected]]
> > Sent: Tuesday, January 25, 2022 2:47 PM
> > To: NeilBrown <[email protected]>
> > Cc: Petr Vorel <[email protected]>; Linux NFS Mailing List <linux-
> > [email protected]>; Yong Sun <[email protected]>; Frank S. Filz
> > <[email protected]>
> > Subject: Re: pynfs: [NFS 4.0] SEC7, LOCK24 test failures
> > Frank added this test in 4299316fb357, and I don't really understand his
> > description, but it *sounds* like he really wanted it to do the new-lockowner
> > case. Frank?
> > --b.
> > On Tue, Jan 18, 2022 at 12:01 AM NeilBrown <[email protected]> wrote:
> > > On Wed, 02 Jun 2021, J. Bruce Fields wrote:
> > > > On Tue, Jun 01, 2021 at 04:01:08PM +0200, Petr Vorel wrote:
> > > > > LOCK24 st_lock.testOpenUpgradeLock : FAILURE
> > > > > OP_LOCK should return NFS4_OK, instead got
> > > > > NFS4ERR_BAD_SEQID
> > > > I suspect the server's actually OK here, but I need to look more
> > > > closely.
> > > I agree.
> > > I think this patch fixes the test.
> > > NeilBrown
> > > From: NeilBrown <[email protected]>
> > > Date: Tue, 18 Jan 2022 15:50:37 +1100
> > > Subject: [PATCH] Fix NFSv4.0 LOCK24 test
> > > Only the first lock request for a given open-owner can use lock_file.
> > > Subsequent lock request must use relock_file.
> > > Signed-off-by: NeilBrown <[email protected]>
> > > ---
> > > nfs4.0/servertests/st_lock.py | 11 +++++++++--
> > > 1 file changed, 9 insertions(+), 2 deletions(-)
> > > diff --git a/nfs4.0/servertests/st_lock.py
> > > b/nfs4.0/servertests/st_lock.py index 468672403ffe..db08fbeedac4
> > > 100644
> > > --- a/nfs4.0/servertests/st_lock.py
> > > +++ b/nfs4.0/servertests/st_lock.py
> > > @@ -886,6 +886,7 @@ class open_sequence:
> > > self.client = client
> > > self.owner = owner
> > > self.lockowner = lockowner
> > > + self.lockseq = 0
> > > def open(self, access):
> > > self.fh, self.stateid = self.client.create_confirm(self.owner,
> > > access=access, @@
> > > -899,15 +900,21 @@ class open_sequence:
> > > def close(self):
> > > self.client.close_file(self.owner, self.fh, self.stateid)
> > > def lock(self, type):
> > > - res = self.client.lock_file(self.owner, self.fh, self.stateid,
> > > - type=type, lockowner=self.lockowner)
> > > + if self.lockseq == 0:
> > > + res = self.client.lock_file(self.owner, self.fh, self.stateid,
> > > + type=type, lockowner=self.lockowner)
> > > + else:
> > > + res = self.client.relock_file(self.lockseq, self.fh, self.lockstateid,
> > > + type=type)
> > > check(res)
> > > if res.status == NFS4_OK:
> > > self.lockstateid = res.lockid
> > > + self.lockseq = self.lockseq + 1
> > > def unlock(self):
> > > res = self.client.unlock_file(1, self.fh, self.lockstateid)
> > > if res.status == NFS4_OK:
> > > self.lockstateid = res.lockid
> > > + self.lockseq = self.lockseq + 1
> > > def testOpenUpgradeLock(t, env):
> > > """Try open, lock, open, downgrade, close
> > > --
> > > 2.34.1