2019-04-18 13:37:29

by Scott Mayhew

[permalink] [raw]
Subject: Question about open(CLAIM_FH)

When the client does an open(CLAIM_FH) and the server already has open
state for that open owner and file, what's supposed to happen?
Currently the server returns the existing stateid with the seqid bumped,
but it looks like the client is expecting a new stateid (I'm seeing the
state manager spending a lot of time waiting in
nfs_set_open_stateid_locked() due to NFS_STATE_CHANGE_WAIT being set in
the state flags by nfs_need_update_open_stateid()).

Looking at rfc5661 section 18.16.3, I see:

| CLAIM_NULL, CLAIM_FH | For the client, this is a new OPEN request |
| | and there is no previous state associated |
| | with the file for the client. With |
| | CLAIM_NULL, the file is identified by the |
| | current filehandle and the specified |
| | component name. With CLAIM_FH (new to |
| | NFSv4.1), the file is identified by just |
| | the current filehandle.

So it seems like maybe the server should be tossing the old state and
returning a new stateid?

-Scott


2019-04-18 14:38:36

by Trond Myklebust

[permalink] [raw]
Subject: Re: Question about open(CLAIM_FH)

Hi Scott,

On Thu, 2019-04-18 at 09:37 -0400, Scott Mayhew wrote:
> When the client does an open(CLAIM_FH) and the server already has
> open
> state for that open owner and file, what's supposed to happen?
> Currently the server returns the existing stateid with the seqid
> bumped,
> but it looks like the client is expecting a new stateid (I'm seeing
> the
> state manager spending a lot of time waiting in
> nfs_set_open_stateid_locked() due to NFS_STATE_CHANGE_WAIT being set
> in
> the state flags by nfs_need_update_open_stateid()).
>
> Looking at rfc5661 section 18.16.3, I see:
>
> | CLAIM_NULL, CLAIM_FH | For the client, this is a new OPEN
> request |
> | | and there is no previous state
> associated |
> | | with the file for the
> client. With |
> | | CLAIM_NULL, the file is identified by
> the |
> | | current filehandle and the
> specified |
> | | component name. With CLAIM_FH (new
> to |
> | | NFSv4.1), the file is identified by
> just |
> | | the current filehandle.
>
> So it seems like maybe the server should be tossing the old state and
> returning a new stateid?
>

No. As far as the protocol is concerned, the only difference between
CLAIM_NULL and CLAIM_FH is through how the client identifies the file
(in the first case, through an implicit lookup, and in the second case
through a file handle). The client should be free to intermix the two
types of OPEN, and it should expect the resulting stateids to depend
only on whether or not the open_owner matches. If the open_owner
matches an existing stateid, then that stateid is bumped and returned.

I'm not aware of any expectation in the client that this should not be
the case, so if you are seeing different behaviour, then something else
must be at work here. Is the client perhaps mounting the same
filesystem in two different places in such a way that the super block
is not being shared?

Cheers
Trond

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-04-18 15:26:48

by Trond Myklebust

[permalink] [raw]
Subject: Re: Question about open(CLAIM_FH)

On Thu, 2019-04-18 at 14:38 +0000, Trond Myklebust wrote:
> Hi Scott,
>
> On Thu, 2019-04-18 at 09:37 -0400, Scott Mayhew wrote:
> > When the client does an open(CLAIM_FH) and the server already has
> > open
> > state for that open owner and file, what's supposed to happen?
> > Currently the server returns the existing stateid with the seqid
> > bumped,
> > but it looks like the client is expecting a new stateid (I'm seeing
> > the
> > state manager spending a lot of time waiting in
> > nfs_set_open_stateid_locked() due to NFS_STATE_CHANGE_WAIT being
> > set
> > in
> > the state flags by nfs_need_update_open_stateid()).
> >
> > Looking at rfc5661 section 18.16.3, I see:
> >
> > | CLAIM_NULL, CLAIM_FH | For the client, this is a new OPEN
> > request |
> > | | and there is no previous state
> > associated |
> > | | with the file for the
> > client. With |
> > | | CLAIM_NULL, the file is identified by
> > the |
> > | | current filehandle and the
> > specified |
> > | | component name. With CLAIM_FH (new
> > to |
> > | | NFSv4.1), the file is identified by
> > just |
> > | | the current filehandle.
> >
> > So it seems like maybe the server should be tossing the old state
> > and
> > returning a new stateid?
> >
>
> No. As far as the protocol is concerned, the only difference between
> CLAIM_NULL and CLAIM_FH is through how the client identifies the file
> (in the first case, through an implicit lookup, and in the second
> case
> through a file handle). The client should be free to intermix the two
> types of OPEN, and it should expect the resulting stateids to depend
> only on whether or not the open_owner matches. If the open_owner
> matches an existing stateid, then that stateid is bumped and
> returned.
>
> I'm not aware of any expectation in the client that this should not
> be
> the case, so if you are seeing different behaviour, then something
> else
> must be at work here. Is the client perhaps mounting the same
> filesystem in two different places in such a way that the super block
> is not being shared?

Oh, one thing to note that is both a protocol and a client expectation:
if the server returns multiple different filehandles for the same file
(i.e. same fsid:fileid combination), then the stateid is attached to
the _filehandle_ and not the fsid;fileid combination. So if the server
is coded to return a different filehandle in the CLAIM_NULL case than
the one used for CLAIM_FH, then that would explain the expectation
above.

This should never be the case for knfsd.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-04-18 20:44:00

by Scott Mayhew

[permalink] [raw]
Subject: Re: Question about open(CLAIM_FH)

On Thu, 18 Apr 2019, Trond Myklebust wrote:

> Hi Scott,
>
> On Thu, 2019-04-18 at 09:37 -0400, Scott Mayhew wrote:
> > When the client does an open(CLAIM_FH) and the server already has
> > open
> > state for that open owner and file, what's supposed to happen?
> > Currently the server returns the existing stateid with the seqid
> > bumped,
> > but it looks like the client is expecting a new stateid (I'm seeing
> > the
> > state manager spending a lot of time waiting in
> > nfs_set_open_stateid_locked() due to NFS_STATE_CHANGE_WAIT being set
> > in
> > the state flags by nfs_need_update_open_stateid()).
> >
> > Looking at rfc5661 section 18.16.3, I see:
> >
> > | CLAIM_NULL, CLAIM_FH | For the client, this is a new OPEN
> > request |
> > | | and there is no previous state
> > associated |
> > | | with the file for the
> > client. With |
> > | | CLAIM_NULL, the file is identified by
> > the |
> > | | current filehandle and the
> > specified |
> > | | component name. With CLAIM_FH (new
> > to |
> > | | NFSv4.1), the file is identified by
> > just |
> > | | the current filehandle.
> >
> > So it seems like maybe the server should be tossing the old state and
> > returning a new stateid?
> >
>
> No. As far as the protocol is concerned, the only difference between
> CLAIM_NULL and CLAIM_FH is through how the client identifies the file
> (in the first case, through an implicit lookup, and in the second case
> through a file handle). The client should be free to intermix the two
> types of OPEN, and it should expect the resulting stateids to depend
> only on whether or not the open_owner matches. If the open_owner
> matches an existing stateid, then that stateid is bumped and returned.
>
> I'm not aware of any expectation in the client that this should not be
> the case, so if you are seeing different behaviour, then something else
> must be at work here. Is the client perhaps mounting the same
> filesystem in two different places in such a way that the super block
> is not being shared?

No, it's just a single 4.1 mount w/ the default mount options.

For a bit of background, I've been trying to track down a problem in
RHEL where the SEQ4_STATUS_RECALLABLE_STATE_REVOKED flags is getting
permanently set because the nfs4_client->cl_revoked list on the server
is non-empty... yet there's no longer open state on the client.

I can reproduce it pretty easily in RHEL using 2 VMs, each with 2-4 CPUs
and 4-8G of memory. The server has 64 nfsd threads and a 15 second
lease time.

On the client I'm running the following to add a 10ms delay to CB_RECALL
replies:
# stap -gve 'global count = 0; probe module("nfsv4").function("nfs4_callback_recall") { printf("%s: %d\n", ppfunc(), ++count); mdelay(10); }'

then in another window I open a bunch of files:
# for i in `seq -w 1 5000`; do sleep 2m </mnt/t/dir1/file.$i & done

(Note: I already created the files ahead of time)

As soon as the bash prompt returns on the client, I run the following on
the server:
# for i in `seq -w 1 5000`; do date >/export/dir1/file.$i & done

At that point, any further SEQUENCE ops will have the recallable state
revoked flag set on the client until the fs is unmounted.

If I run the same steps on Fedora clients with recent kernels, I don't
have the problem with the recallable state revoked flag, but I'm getting
some other strangeness. Everything starts out fine with
nfs_reap_expired_delegations() doing TEST_STATEID and FREE_STATEID, but
once the state manager starts callings nfs41_open_expired(), things sort
of grind to a halt and I see 1 OPEN and 1 or 2 TEST_STATEID ops every 5
seconds in wireshark. It stays that way until the files are closed on
the client, when I see a slew of DELEGRETURNs and FREE_STATEIDs... but
I'm only seeing 3 or 4 CLOSE ops. If I poke around in crash on the
server, I see a ton of open stateids:

crash> epython fs/nfsd/print-client-state-info.py
nfsd_net = 0xffff93e473511000
nfs4_client = 0xffff93e3f7954980
nfs4_stateowner = 0xffff93e4058cc360 num_stateids = 4997 <---- only 3 CLOSE ops were received
num_openowners = 1
num_layouts = 0
num_delegations = 0
num_sessions = 1
num_copies = 0
num_revoked = 0
cl_cb_waitq_qlen = 0

Those stateids stick around until the fs is unmounted (and the
DESTROY_STATEID ops return NFS4ERR_CLIENTID_BUSY while doing so).

Both VMs are running 5.0.6-200.fc29.x86_64, but the server also has the
"nfsd: Don't release the callback slot unless it was actually held"
patch you sent a few weeks ago as well as the "nfsd: CB_RECALL can race
with FREE_STATEID" patch I sent today.

-Scott

>
> Cheers
> Trond
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>

2019-04-18 21:31:29

by Trond Myklebust

[permalink] [raw]
Subject: Re: Question about open(CLAIM_FH)

On Thu, 2019-04-18 at 16:43 -0400, Scott Mayhew wrote:
> On Thu, 18 Apr 2019, Trond Myklebust wrote:
>
> > Hi Scott,
> >
> > On Thu, 2019-04-18 at 09:37 -0400, Scott Mayhew wrote:
> > > When the client does an open(CLAIM_FH) and the server already has
> > > open
> > > state for that open owner and file, what's supposed to happen?
> > > Currently the server returns the existing stateid with the seqid
> > > bumped,
> > > but it looks like the client is expecting a new stateid (I'm
> > > seeing
> > > the
> > > state manager spending a lot of time waiting in
> > > nfs_set_open_stateid_locked() due to NFS_STATE_CHANGE_WAIT being
> > > set
> > > in
> > > the state flags by nfs_need_update_open_stateid()).
> > >
> > > Looking at rfc5661 section 18.16.3, I see:
> > >
> > > | CLAIM_NULL, CLAIM_FH | For the client, this is a new OPEN
> > > request |
> > > | | and there is no previous state
> > > associated |
> > > | | with the file for the
> > > client. With |
> > > | | CLAIM_NULL, the file is identified by
> > > the |
> > > | | current filehandle and the
> > > specified |
> > > | | component name. With CLAIM_FH (new
> > > to |
> > > | | NFSv4.1), the file is identified by
> > > just |
> > > | | the current filehandle.
> > >
> > > So it seems like maybe the server should be tossing the old state
> > > and
> > > returning a new stateid?
> > >
> >
> > No. As far as the protocol is concerned, the only difference
> > between
> > CLAIM_NULL and CLAIM_FH is through how the client identifies the
> > file
> > (in the first case, through an implicit lookup, and in the second
> > case
> > through a file handle). The client should be free to intermix the
> > two
> > types of OPEN, and it should expect the resulting stateids to
> > depend
> > only on whether or not the open_owner matches. If the open_owner
> > matches an existing stateid, then that stateid is bumped and
> > returned.
> >
> > I'm not aware of any expectation in the client that this should not
> > be
> > the case, so if you are seeing different behaviour, then something
> > else
> > must be at work here. Is the client perhaps mounting the same
> > filesystem in two different places in such a way that the super
> > block
> > is not being shared?
>
> No, it's just a single 4.1 mount w/ the default mount options.
>
> For a bit of background, I've been trying to track down a problem in
> RHEL where the SEQ4_STATUS_RECALLABLE_STATE_REVOKED flags is getting
> permanently set because the nfs4_client->cl_revoked list on the
> server
> is non-empty... yet there's no longer open state on the client.
>
> I can reproduce it pretty easily in RHEL using 2 VMs, each with 2-4
> CPUs
> and 4-8G of memory. The server has 64 nfsd threads and a 15 second
> lease time.
>
> On the client I'm running the following to add a 10ms delay to
> CB_RECALL
> replies:
> # stap -gve 'global count = 0; probe
> module("nfsv4").function("nfs4_callback_recall") { printf("%s: %d\n",
> ppfunc(), ++count); mdelay(10); }'
>
> then in another window I open a bunch of files:
> # for i in `seq -w 1 5000`; do sleep 2m </mnt/t/dir1/file.$i & done
>
> (Note: I already created the files ahead of time)
>
> As soon as the bash prompt returns on the client, I run the following
> on
> the server:
> # for i in `seq -w 1 5000`; do date >/export/dir1/file.$i & done
>
> At that point, any further SEQUENCE ops will have the recallable
> state
> revoked flag set on the client until the fs is unmounted.
>
> If I run the same steps on Fedora clients with recent kernels, I
> don't
> have the problem with the recallable state revoked flag, but I'm
> getting
> some other strangeness. Everything starts out fine with
> nfs_reap_expired_delegations() doing TEST_STATEID and FREE_STATEID,
> but
> once the state manager starts callings nfs41_open_expired(), things
> sort
> of grind to a halt and I see 1 OPEN and 1 or 2 TEST_STATEID ops every
> 5
> seconds in wireshark. It stays that way until the files are closed
> on
> the client, when I see a slew of DELEGRETURNs and FREE_STATEIDs...
> but
> I'm only seeing 3 or 4 CLOSE ops. If I poke around in crash on the
> server, I see a ton of open stateids:
>
> crash> epython fs/nfsd/print-client-state-info.py
> nfsd_net = 0xffff93e473511000
> nfs4_client = 0xffff93e3f7954980
> nfs4_stateowner = 0xffff93e4058cc360 num_stateids =
> 4997 <---- only 3 CLOSE ops were received
> num_openowners = 1
> num_layouts = 0
> num_delegations = 0
> num_sessions = 1
> num_copies = 0
> num_revoked = 0
> cl_cb_waitq_qlen = 0
>
> Those stateids stick around until the fs is unmounted (and the
> DESTROY_STATEID ops return NFS4ERR_CLIENTID_BUSY while doing so).
>
> Both VMs are running 5.0.6-200.fc29.x86_64, but the server also has
> the
> "nfsd: Don't release the callback slot unless it was actually held"
> patch you sent a few weeks ago as well as the "nfsd: CB_RECALL can
> race
> with FREE_STATEID" patch I sent today.

Are the calls to nfs41_open_expired() succeeding? It sounds like they
might not be.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-04-30 18:45:16

by Scott Mayhew

[permalink] [raw]
Subject: Re: Question about open(CLAIM_FH)

On Thu, 18 Apr 2019, Trond Myklebust wrote:

> On Thu, 2019-04-18 at 16:43 -0400, Scott Mayhew wrote:
> > On Thu, 18 Apr 2019, Trond Myklebust wrote:
> >
> > > Hi Scott,
> > >
> > > On Thu, 2019-04-18 at 09:37 -0400, Scott Mayhew wrote:
> > > > When the client does an open(CLAIM_FH) and the server already has
> > > > open
> > > > state for that open owner and file, what's supposed to happen?
> > > > Currently the server returns the existing stateid with the seqid
> > > > bumped,
> > > > but it looks like the client is expecting a new stateid (I'm
> > > > seeing
> > > > the
> > > > state manager spending a lot of time waiting in
> > > > nfs_set_open_stateid_locked() due to NFS_STATE_CHANGE_WAIT being
> > > > set
> > > > in
> > > > the state flags by nfs_need_update_open_stateid()).
> > > >
> > > > Looking at rfc5661 section 18.16.3, I see:
> > > >
> > > > | CLAIM_NULL, CLAIM_FH | For the client, this is a new OPEN
> > > > request |
> > > > | | and there is no previous state
> > > > associated |
> > > > | | with the file for the
> > > > client. With |
> > > > | | CLAIM_NULL, the file is identified by
> > > > the |
> > > > | | current filehandle and the
> > > > specified |
> > > > | | component name. With CLAIM_FH (new
> > > > to |
> > > > | | NFSv4.1), the file is identified by
> > > > just |
> > > > | | the current filehandle.
> > > >
> > > > So it seems like maybe the server should be tossing the old state
> > > > and
> > > > returning a new stateid?
> > > >
> > >
> > > No. As far as the protocol is concerned, the only difference
> > > between
> > > CLAIM_NULL and CLAIM_FH is through how the client identifies the
> > > file
> > > (in the first case, through an implicit lookup, and in the second
> > > case
> > > through a file handle). The client should be free to intermix the
> > > two
> > > types of OPEN, and it should expect the resulting stateids to
> > > depend
> > > only on whether or not the open_owner matches. If the open_owner
> > > matches an existing stateid, then that stateid is bumped and
> > > returned.
> > >
> > > I'm not aware of any expectation in the client that this should not
> > > be
> > > the case, so if you are seeing different behaviour, then something
> > > else
> > > must be at work here. Is the client perhaps mounting the same
> > > filesystem in two different places in such a way that the super
> > > block
> > > is not being shared?
> >
> > No, it's just a single 4.1 mount w/ the default mount options.
> >
> > For a bit of background, I've been trying to track down a problem in
> > RHEL where the SEQ4_STATUS_RECALLABLE_STATE_REVOKED flags is getting
> > permanently set because the nfs4_client->cl_revoked list on the
> > server
> > is non-empty... yet there's no longer open state on the client.
> >
> > I can reproduce it pretty easily in RHEL using 2 VMs, each with 2-4
> > CPUs
> > and 4-8G of memory. The server has 64 nfsd threads and a 15 second
> > lease time.
> >
> > On the client I'm running the following to add a 10ms delay to
> > CB_RECALL
> > replies:
> > # stap -gve 'global count = 0; probe
> > module("nfsv4").function("nfs4_callback_recall") { printf("%s: %d\n",
> > ppfunc(), ++count); mdelay(10); }'
> >
> > then in another window I open a bunch of files:
> > # for i in `seq -w 1 5000`; do sleep 2m </mnt/t/dir1/file.$i & done
> >
> > (Note: I already created the files ahead of time)
> >
> > As soon as the bash prompt returns on the client, I run the following
> > on
> > the server:
> > # for i in `seq -w 1 5000`; do date >/export/dir1/file.$i & done
> >
> > At that point, any further SEQUENCE ops will have the recallable
> > state
> > revoked flag set on the client until the fs is unmounted.
> >
> > If I run the same steps on Fedora clients with recent kernels, I
> > don't
> > have the problem with the recallable state revoked flag, but I'm
> > getting
> > some other strangeness. Everything starts out fine with
> > nfs_reap_expired_delegations() doing TEST_STATEID and FREE_STATEID,
> > but
> > once the state manager starts callings nfs41_open_expired(), things
> > sort
> > of grind to a halt and I see 1 OPEN and 1 or 2 TEST_STATEID ops every
> > 5
> > seconds in wireshark. It stays that way until the files are closed
> > on
> > the client, when I see a slew of DELEGRETURNs and FREE_STATEIDs...
> > but
> > I'm only seeing 3 or 4 CLOSE ops. If I poke around in crash on the
> > server, I see a ton of open stateids:
> >
> > crash> epython fs/nfsd/print-client-state-info.py
> > nfsd_net = 0xffff93e473511000
> > nfs4_client = 0xffff93e3f7954980
> > nfs4_stateowner = 0xffff93e4058cc360 num_stateids =
> > 4997 <---- only 3 CLOSE ops were received
> > num_openowners = 1
> > num_layouts = 0
> > num_delegations = 0
> > num_sessions = 1
> > num_copies = 0
> > num_revoked = 0
> > cl_cb_waitq_qlen = 0
> >
> > Those stateids stick around until the fs is unmounted (and the
> > DESTROY_STATEID ops return NFS4ERR_CLIENTID_BUSY while doing so).
> >
> > Both VMs are running 5.0.6-200.fc29.x86_64, but the server also has
> > the
> > "nfsd: Don't release the callback slot unless it was actually held"
> > patch you sent a few weeks ago as well as the "nfsd: CB_RECALL can
> > race
> > with FREE_STATEID" patch I sent today.
>
> Are the calls to nfs41_open_expired() succeeding? It sounds like they
> might not be.

They're succeeding, they're just taking 5 seconds each.

To make matters worse, due to the aggressive lease time on the server,
the client was doing a SEQUENCE op periodically while the OPENs were
occurring. The SEQUENCE replies also had the RECALLABLE_STATE_REVOKED
flag set, and since they weren't coming from the state manager the
nfs4_slot->privileged field was unset, so we'd wind up calling
nfs41_handle_recallable_state_revoked() again, undoing what little
progress was made. Bumping the lease time to 20 seconds made that go
away, but the overall problem is still there.

Is it really necessary for nfs41_handle_recallable_state_revoked() to
mark all open state as needing recovery? After all, the only state that
can be revoked are layouts and delegations. If I change
nfs41_handle_recallable_state_revoked(), so that instead of calling
nfs41_handle_some_state_revoked() we instead call
nfs_mark_test_expired_all_delegations() and
nfs4_schedule_state_manager(), then the problem seems to go away.

-Scott

>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>

2019-04-30 18:57:40

by Trond Myklebust

[permalink] [raw]
Subject: Re: Question about open(CLAIM_FH)

On Tue, 2019-04-30 at 14:44 -0400, Scott Mayhew wrote:
> On Thu, 18 Apr 2019, Trond Myklebust wrote:
>
> > On Thu, 2019-04-18 at 16:43 -0400, Scott Mayhew wrote:
> > > On Thu, 18 Apr 2019, Trond Myklebust wrote:
> > >
> > > > Hi Scott,
> > > >
> > > > On Thu, 2019-04-18 at 09:37 -0400, Scott Mayhew wrote:
> > > > > When the client does an open(CLAIM_FH) and the server already
> > > > > has
> > > > > open
> > > > > state for that open owner and file, what's supposed to
> > > > > happen?
> > > > > Currently the server returns the existing stateid with the
> > > > > seqid
> > > > > bumped,
> > > > > but it looks like the client is expecting a new stateid (I'm
> > > > > seeing
> > > > > the
> > > > > state manager spending a lot of time waiting in
> > > > > nfs_set_open_stateid_locked() due to NFS_STATE_CHANGE_WAIT
> > > > > being
> > > > > set
> > > > > in
> > > > > the state flags by nfs_need_update_open_stateid()).
> > > > >
> > > > > Looking at rfc5661 section 18.16.3, I see:
> > > > >
> > > > > | CLAIM_NULL, CLAIM_FH | For the client, this is a new
> > > > > OPEN
> > > > > request |
> > > > > | | and there is no previous state
> > > > > associated |
> > > > > | | with the file for the
> > > > > client. With |
> > > > > | | CLAIM_NULL, the file is
> > > > > identified by
> > > > > the |
> > > > > | | current filehandle and the
> > > > > specified |
> > > > > | | component name. With CLAIM_FH
> > > > > (new
> > > > > to |
> > > > > | | NFSv4.1), the file is identified
> > > > > by
> > > > > just |
> > > > > | | the current filehandle.
> > > > >
> > > > > So it seems like maybe the server should be tossing the old
> > > > > state
> > > > > and
> > > > > returning a new stateid?
> > > > >
> > > >
> > > > No. As far as the protocol is concerned, the only difference
> > > > between
> > > > CLAIM_NULL and CLAIM_FH is through how the client identifies
> > > > the
> > > > file
> > > > (in the first case, through an implicit lookup, and in the
> > > > second
> > > > case
> > > > through a file handle). The client should be free to intermix
> > > > the
> > > > two
> > > > types of OPEN, and it should expect the resulting stateids to
> > > > depend
> > > > only on whether or not the open_owner matches. If the
> > > > open_owner
> > > > matches an existing stateid, then that stateid is bumped and
> > > > returned.
> > > >
> > > > I'm not aware of any expectation in the client that this should
> > > > not
> > > > be
> > > > the case, so if you are seeing different behaviour, then
> > > > something
> > > > else
> > > > must be at work here. Is the client perhaps mounting the same
> > > > filesystem in two different places in such a way that the super
> > > > block
> > > > is not being shared?
> > >
> > > No, it's just a single 4.1 mount w/ the default mount options.
> > >
> > > For a bit of background, I've been trying to track down a problem
> > > in
> > > RHEL where the SEQ4_STATUS_RECALLABLE_STATE_REVOKED flags is
> > > getting
> > > permanently set because the nfs4_client->cl_revoked list on the
> > > server
> > > is non-empty... yet there's no longer open state on the client.
> > >
> > > I can reproduce it pretty easily in RHEL using 2 VMs, each with
> > > 2-4
> > > CPUs
> > > and 4-8G of memory. The server has 64 nfsd threads and a 15
> > > second
> > > lease time.
> > >
> > > On the client I'm running the following to add a 10ms delay to
> > > CB_RECALL
> > > replies:
> > > # stap -gve 'global count = 0; probe
> > > module("nfsv4").function("nfs4_callback_recall") { printf("%s:
> > > %d\n",
> > > ppfunc(), ++count); mdelay(10); }'
> > >
> > > then in another window I open a bunch of files:
> > > # for i in `seq -w 1 5000`; do sleep 2m </mnt/t/dir1/file.$i &
> > > done
> > >
> > > (Note: I already created the files ahead of time)
> > >
> > > As soon as the bash prompt returns on the client, I run the
> > > following
> > > on
> > > the server:
> > > # for i in `seq -w 1 5000`; do date >/export/dir1/file.$i & done
> > >
> > > At that point, any further SEQUENCE ops will have the recallable
> > > state
> > > revoked flag set on the client until the fs is unmounted.
> > >
> > > If I run the same steps on Fedora clients with recent kernels, I
> > > don't
> > > have the problem with the recallable state revoked flag, but I'm
> > > getting
> > > some other strangeness. Everything starts out fine with
> > > nfs_reap_expired_delegations() doing TEST_STATEID and
> > > FREE_STATEID,
> > > but
> > > once the state manager starts callings nfs41_open_expired(),
> > > things
> > > sort
> > > of grind to a halt and I see 1 OPEN and 1 or 2 TEST_STATEID ops
> > > every
> > > 5
> > > seconds in wireshark. It stays that way until the files are
> > > closed
> > > on
> > > the client, when I see a slew of DELEGRETURNs and
> > > FREE_STATEIDs...
> > > but
> > > I'm only seeing 3 or 4 CLOSE ops. If I poke around in crash on
> > > the
> > > server, I see a ton of open stateids:
> > >
> > > crash> epython fs/nfsd/print-client-state-info.py
> > > nfsd_net = 0xffff93e473511000
> > > nfs4_client = 0xffff93e3f7954980
> > > nfs4_stateowner = 0xffff93e4058cc360 num_stateids
> > > =
> > > 4997 <---- only 3 CLOSE ops were received
> > > num_openowners = 1
> > > num_layouts = 0
> > > num_delegations = 0
> > > num_sessions = 1
> > > num_copies = 0
> > > num_revoked = 0
> > > cl_cb_waitq_qlen = 0
> > >
> > > Those stateids stick around until the fs is unmounted (and the
> > > DESTROY_STATEID ops return NFS4ERR_CLIENTID_BUSY while doing so).
> > >
> > > Both VMs are running 5.0.6-200.fc29.x86_64, but the server also
> > > has
> > > the
> > > "nfsd: Don't release the callback slot unless it was actually
> > > held"
> > > patch you sent a few weeks ago as well as the "nfsd: CB_RECALL
> > > can
> > > race
> > > with FREE_STATEID" patch I sent today.
> >
> > Are the calls to nfs41_open_expired() succeeding? It sounds like
> > they
> > might not be.
>
> They're succeeding, they're just taking 5 seconds each.
>
> To make matters worse, due to the aggressive lease time on the
> server,
> the client was doing a SEQUENCE op periodically while the OPENs were
> occurring. The SEQUENCE replies also had the
> RECALLABLE_STATE_REVOKED
> flag set, and since they weren't coming from the state manager the
> nfs4_slot->privileged field was unset, so we'd wind up calling
> nfs41_handle_recallable_state_revoked() again, undoing what little
> progress was made. Bumping the lease time to 20 seconds made that go
> away, but the overall problem is still there.
>
> Is it really necessary for nfs41_handle_recallable_state_revoked() to
> mark all open state as needing recovery? After all, the only state
> that
> can be revoked are layouts and delegations. If I change
> nfs41_handle_recallable_state_revoked(), so that instead of calling
> nfs41_handle_some_state_revoked() we instead call
> nfs_mark_test_expired_all_delegations() and
> nfs4_schedule_state_manager(), then the problem seems to go away.

That would be OK with me. That code probably pre-dates the existence of
nfs_mark_test_expired_all_delegations().

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]