The protocol for creating a new state in nfsd is to allocated the state
leaving it largely uninitialised, add that state to the ->cl_stateids
idr so as to reserve a state id, then complete initialisation of the
state and only set ->sc_type to non-zero once the state is fully
initialised.
If a state is found in the idr with ->sc_type == 0, it is ignored.
The ->cl_lock list is used to avoid races - it is held while checking
sc_type during lookup, and held when a non-zero value is stored in
->sc_type.
... except... hash_delegation_locked() finalises the initialisation of a
delegation state, but does NOT hold ->cl_lock.
So this patch takes ->cl_lock at the appropriate time w.r.t other locks,
and so ensures there are no races (which are extremely unlikely in any
case).
Signed-off-by: NeilBrown <[email protected]>
---
fs/nfsd/nfs4state.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 65fd5510323a..6368788a7d4e 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1317,6 +1317,7 @@ hash_delegation_locked(struct nfs4_delegation *dp, struct nfs4_file *fp)
lockdep_assert_held(&state_lock);
lockdep_assert_held(&fp->fi_lock);
+ lockdep_assert_held(&clp->cl_lock);
if (nfs4_delegation_exists(clp, fp))
return -EAGAIN;
@@ -5609,12 +5610,14 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
goto out_unlock;
spin_lock(&state_lock);
+ spin_lock(&clp->cl_lock);
spin_lock(&fp->fi_lock);
if (fp->fi_had_conflict)
status = -EAGAIN;
else
status = hash_delegation_locked(dp, fp);
spin_unlock(&fp->fi_lock);
+ spin_unlock(&clp->cl_lock);
spin_unlock(&state_lock);
if (status)
--
2.42.0
On Fri, 2023-11-17 at 13:18 +1100, NeilBrown wrote:
> The protocol for creating a new state in nfsd is to allocated the state
> leaving it largely uninitialised, add that state to the ->cl_stateids
> idr so as to reserve a state id, then complete initialisation of the
> state and only set ->sc_type to non-zero once the state is fully
> initialised.
>
> If a state is found in the idr with ->sc_type == 0, it is ignored.
> The ->cl_lock list is used to avoid races - it is held while checking
> sc_type during lookup, and held when a non-zero value is stored in
> ->sc_type.
>
> ... except... hash_delegation_locked() finalises the initialisation of a
> delegation state, but does NOT hold ->cl_lock.
>
> So this patch takes ->cl_lock at the appropriate time w.r.t other locks,
> and so ensures there are no races (which are extremely unlikely in any
> case).
>
> Signed-off-by: NeilBrown <[email protected]>
> ---
> fs/nfsd/nfs4state.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 65fd5510323a..6368788a7d4e 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1317,6 +1317,7 @@ hash_delegation_locked(struct nfs4_delegation *dp, struct nfs4_file *fp)
>
> lockdep_assert_held(&state_lock);
> lockdep_assert_held(&fp->fi_lock);
> + lockdep_assert_held(&clp->cl_lock);
>
> if (nfs4_delegation_exists(clp, fp))
> return -EAGAIN;
> @@ -5609,12 +5610,14 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
> goto out_unlock;
>
> spin_lock(&state_lock);
> + spin_lock(&clp->cl_lock);
> spin_lock(&fp->fi_lock);
> if (fp->fi_had_conflict)
> status = -EAGAIN;
> else
> status = hash_delegation_locked(dp, fp);
> spin_unlock(&fp->fi_lock);
> + spin_unlock(&clp->cl_lock);
> spin_unlock(&state_lock);
>
> if (status)
I know it's (supposedly) an unlikely race, but should we send this to
stable?
Reviewed-by: Jeff Layton <[email protected]>
On Fri, Nov 17, 2023 at 01:18:47PM +1100, NeilBrown wrote:
> The protocol for creating a new state in nfsd is to allocated the state
> leaving it largely uninitialised, add that state to the ->cl_stateids
> idr so as to reserve a state id, then complete initialisation of the
> state and only set ->sc_type to non-zero once the state is fully
> initialised.
>
> If a state is found in the idr with ->sc_type == 0, it is ignored.
> The ->cl_lock list is used to avoid races - it is held while checking
s/->cl_lock list/->cl_lock lock
> sc_type during lookup,
In particular, find_stateid_locked(), but yet, not in nfs4_find_file()
Can you help me understand why it's not needed in the latter case?
> and held when a non-zero value is stored in ->sc_type.
I see a few additional spots where an sc_type value is set and
cl_lock is not held:
init_open_stateid
nfsd4_process_open2
> ... except... hash_delegation_locked() finalises the initialisation of a
> delegation state, but does NOT hold ->cl_lock.
>
> So this patch takes ->cl_lock at the appropriate time w.r.t other locks,
> and so ensures there are no races (which are extremely unlikely in any
> case).
I would have expected that cl_lock should be taken first. Can the
patch description provide some rationale for the lock ordering
you chose?
Jeff asks in another email whether this fix should get copied to
stable. Since the race is unlikely, I'm inclined to wait for an
explicit problem report.
> Signed-off-by: NeilBrown <[email protected]>
> ---
> fs/nfsd/nfs4state.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 65fd5510323a..6368788a7d4e 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1317,6 +1317,7 @@ hash_delegation_locked(struct nfs4_delegation *dp, struct nfs4_file *fp)
>
> lockdep_assert_held(&state_lock);
> lockdep_assert_held(&fp->fi_lock);
> + lockdep_assert_held(&clp->cl_lock);
>
> if (nfs4_delegation_exists(clp, fp))
> return -EAGAIN;
> @@ -5609,12 +5610,14 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
> goto out_unlock;
>
> spin_lock(&state_lock);
> + spin_lock(&clp->cl_lock);
> spin_lock(&fp->fi_lock);
> if (fp->fi_had_conflict)
> status = -EAGAIN;
> else
> status = hash_delegation_locked(dp, fp);
> spin_unlock(&fp->fi_lock);
> + spin_unlock(&clp->cl_lock);
> spin_unlock(&state_lock);
>
> if (status)
> --
> 2.42.0
>
--
Chuck Lever
On Fri, 17 Nov 2023, Jeff Layton wrote:
> On Fri, 2023-11-17 at 13:18 +1100, NeilBrown wrote:
> > The protocol for creating a new state in nfsd is to allocated the state
> > leaving it largely uninitialised, add that state to the ->cl_stateids
> > idr so as to reserve a state id, then complete initialisation of the
> > state and only set ->sc_type to non-zero once the state is fully
> > initialised.
> >
> > If a state is found in the idr with ->sc_type == 0, it is ignored.
> > The ->cl_lock list is used to avoid races - it is held while checking
> > sc_type during lookup, and held when a non-zero value is stored in
> > ->sc_type.
> >
> > ... except... hash_delegation_locked() finalises the initialisation of a
> > delegation state, but does NOT hold ->cl_lock.
> >
> > So this patch takes ->cl_lock at the appropriate time w.r.t other locks,
> > and so ensures there are no races (which are extremely unlikely in any
> > case).
> >
> > Signed-off-by: NeilBrown <[email protected]>
> > ---
> > fs/nfsd/nfs4state.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 65fd5510323a..6368788a7d4e 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -1317,6 +1317,7 @@ hash_delegation_locked(struct nfs4_delegation *dp, struct nfs4_file *fp)
> >
> > lockdep_assert_held(&state_lock);
> > lockdep_assert_held(&fp->fi_lock);
> > + lockdep_assert_held(&clp->cl_lock);
> >
> > if (nfs4_delegation_exists(clp, fp))
> > return -EAGAIN;
> > @@ -5609,12 +5610,14 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
> > goto out_unlock;
> >
> > spin_lock(&state_lock);
> > + spin_lock(&clp->cl_lock);
> > spin_lock(&fp->fi_lock);
> > if (fp->fi_had_conflict)
> > status = -EAGAIN;
> > else
> > status = hash_delegation_locked(dp, fp);
> > spin_unlock(&fp->fi_lock);
> > + spin_unlock(&clp->cl_lock);
> > spin_unlock(&state_lock);
> >
> > if (status)
>
> I know it's (supposedly) an unlikely race, but should we send this to
> stable?
I don't know. Once upon a time, "stable" meant something. There was a
clear list of rules. Those seem to have been torn up. Now it seems to
be whatever some machine-learning tool chooses.
If that tool chooses this patch (which I suspect it will), I won't
object. But I don't think it is worth encouraging it.
>
> Reviewed-by: Jeff Layton <[email protected]>
>
Thanks,
NelBrown
On Sat, 18 Nov 2023, Chuck Lever wrote:
> On Fri, Nov 17, 2023 at 01:18:47PM +1100, NeilBrown wrote:
> > The protocol for creating a new state in nfsd is to allocated the state
> > leaving it largely uninitialised, add that state to the ->cl_stateids
> > idr so as to reserve a state id, then complete initialisation of the
> > state and only set ->sc_type to non-zero once the state is fully
> > initialised.
> >
> > If a state is found in the idr with ->sc_type == 0, it is ignored.
> > The ->cl_lock list is used to avoid races - it is held while checking
>
> s/->cl_lock list/->cl_lock lock
>
>
> > sc_type during lookup,
>
> In particular, find_stateid_locked(), but yet, not in nfs4_find_file()
>
> Can you help me understand why it's not needed in the latter case?
nfs4_find_file() is called from nfs4_check_file() which is called from
nfs4_preprocess_stateid_op(), which gets the nfs4_stid from
nfsd4_lookup_stateid().
That in turn gets the stateid from find_stateid_by_type() which gets it
from find_stateid_locked().
If find_stateid_locked() returns a stateid, then ->sc_type is not 0, and
it can never be set to zero (At least after subsequent patches land).
Or, more succinctly, nfs4_find_file() does not do lookup, so it doesn't
check sc_type against zero, so it doesn't need a lock.
>
>
> > and held when a non-zero value is stored in ->sc_type.
>
> I see a few additional spots where an sc_type value is set and
> cl_lock is not held:
>
> init_open_stateid
->cl_lock is taken 9 lines before NFS4_OPEN_STID is assigned to
>sc_type, and it is dropped 13 lines later.
> nfsd4_process_open2
This assignment does not change from zero to non-zero. So it cannot
race with lookup, which tests for zero.
A later patch changes this assignment to be a change to the new
sc_status.
>
>
> > ... except... hash_delegation_locked() finalises the initialisation of a
> > delegation state, but does NOT hold ->cl_lock.
> >
> > So this patch takes ->cl_lock at the appropriate time w.r.t other locks,
> > and so ensures there are no races (which are extremely unlikely in any
> > case).
>
> I would have expected that cl_lock should be taken first. Can the
> patch description provide some rationale for the lock ordering
> you chose?
I've added
As ->fi_lock is often taken when ->cl_lock is held, we need to take
->cl_lock first of those two.
Currently ->cl_lock and state_lock are never both taken at the same time.
We need both for this patch so an arbitrary choice is needed concerning
which to take first. As state_lock is more global, it might be more
contended, so take it first.
I'm happy to choose a different ordering for ->cl_lock and state_lock if
you have a different justification - I accept that mine isn't
particularly strong.
>
> Jeff asks in another email whether this fix should get copied to
> stable. Since the race is unlikely, I'm inclined to wait for an
> explicit problem report.
I agree.
Thanks,
NeilBrown
>
>
> > Signed-off-by: NeilBrown <[email protected]>
> > ---
> > fs/nfsd/nfs4state.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 65fd5510323a..6368788a7d4e 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -1317,6 +1317,7 @@ hash_delegation_locked(struct nfs4_delegation *dp, struct nfs4_file *fp)
> >
> > lockdep_assert_held(&state_lock);
> > lockdep_assert_held(&fp->fi_lock);
> > + lockdep_assert_held(&clp->cl_lock);
> >
> > if (nfs4_delegation_exists(clp, fp))
> > return -EAGAIN;
> > @@ -5609,12 +5610,14 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
> > goto out_unlock;
> >
> > spin_lock(&state_lock);
> > + spin_lock(&clp->cl_lock);
> > spin_lock(&fp->fi_lock);
> > if (fp->fi_had_conflict)
> > status = -EAGAIN;
> > else
> > status = hash_delegation_locked(dp, fp);
> > spin_unlock(&fp->fi_lock);
> > + spin_unlock(&clp->cl_lock);
> > spin_unlock(&state_lock);
> >
> > if (status)
> > --
> > 2.42.0
> >
>
> --
> Chuck Lever
>
On Mon, 20 Nov 2023, Chuck Lever wrote:
> On Mon, Nov 20, 2023 at 08:37:55AM +1100, NeilBrown wrote:
> > On Fri, 17 Nov 2023, Jeff Layton wrote:
> > > On Fri, 2023-11-17 at 13:18 +1100, NeilBrown wrote:
> > > > The protocol for creating a new state in nfsd is to allocated the state
> > > > leaving it largely uninitialised, add that state to the ->cl_stateids
> > > > idr so as to reserve a state id, then complete initialisation of the
> > > > state and only set ->sc_type to non-zero once the state is fully
> > > > initialised.
> > > >
> > > > If a state is found in the idr with ->sc_type == 0, it is ignored.
> > > > The ->cl_lock list is used to avoid races - it is held while checking
> > > > sc_type during lookup, and held when a non-zero value is stored in
> > > > ->sc_type.
> > > >
> > > > ... except... hash_delegation_locked() finalises the initialisation of a
> > > > delegation state, but does NOT hold ->cl_lock.
> > > >
> > > > So this patch takes ->cl_lock at the appropriate time w.r.t other locks,
> > > > and so ensures there are no races (which are extremely unlikely in any
> > > > case).
> > > >
> > > > Signed-off-by: NeilBrown <[email protected]>
> > > > ---
> > > > fs/nfsd/nfs4state.c | 3 +++
> > > > 1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > index 65fd5510323a..6368788a7d4e 100644
> > > > --- a/fs/nfsd/nfs4state.c
> > > > +++ b/fs/nfsd/nfs4state.c
> > > > @@ -1317,6 +1317,7 @@ hash_delegation_locked(struct nfs4_delegation *dp, struct nfs4_file *fp)
> > > >
> > > > lockdep_assert_held(&state_lock);
> > > > lockdep_assert_held(&fp->fi_lock);
> > > > + lockdep_assert_held(&clp->cl_lock);
> > > >
> > > > if (nfs4_delegation_exists(clp, fp))
> > > > return -EAGAIN;
> > > > @@ -5609,12 +5610,14 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
> > > > goto out_unlock;
> > > >
> > > > spin_lock(&state_lock);
> > > > + spin_lock(&clp->cl_lock);
> > > > spin_lock(&fp->fi_lock);
> > > > if (fp->fi_had_conflict)
> > > > status = -EAGAIN;
> > > > else
> > > > status = hash_delegation_locked(dp, fp);
> > > > spin_unlock(&fp->fi_lock);
> > > > + spin_unlock(&clp->cl_lock);
> > > > spin_unlock(&state_lock);
> > > >
> > > > if (status)
> > >
> > > I know it's (supposedly) an unlikely race, but should we send this to
> > > stable?
> >
> > I don't know. Once upon a time, "stable" meant something. There was a
> > clear list of rules. Those seem to have been torn up. Now it seems to
> > be whatever some machine-learning tool chooses.
> > If that tool chooses this patch (which I suspect it will), I won't
> > object. But I don't think it is worth encouraging it.
>
> We've asked Sasha and GregKH not to use AUTOSEL on NFSD patches,
> promising that we will explicitly mark anything that should be
> back-ported.
Oh good - I didn't know that.
Thanks.
NeilBrown
On Mon, Nov 20, 2023 at 08:37:55AM +1100, NeilBrown wrote:
> On Fri, 17 Nov 2023, Jeff Layton wrote:
> > On Fri, 2023-11-17 at 13:18 +1100, NeilBrown wrote:
> > > The protocol for creating a new state in nfsd is to allocated the state
> > > leaving it largely uninitialised, add that state to the ->cl_stateids
> > > idr so as to reserve a state id, then complete initialisation of the
> > > state and only set ->sc_type to non-zero once the state is fully
> > > initialised.
> > >
> > > If a state is found in the idr with ->sc_type == 0, it is ignored.
> > > The ->cl_lock list is used to avoid races - it is held while checking
> > > sc_type during lookup, and held when a non-zero value is stored in
> > > ->sc_type.
> > >
> > > ... except... hash_delegation_locked() finalises the initialisation of a
> > > delegation state, but does NOT hold ->cl_lock.
> > >
> > > So this patch takes ->cl_lock at the appropriate time w.r.t other locks,
> > > and so ensures there are no races (which are extremely unlikely in any
> > > case).
> > >
> > > Signed-off-by: NeilBrown <[email protected]>
> > > ---
> > > fs/nfsd/nfs4state.c | 3 +++
> > > 1 file changed, 3 insertions(+)
> > >
> > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > index 65fd5510323a..6368788a7d4e 100644
> > > --- a/fs/nfsd/nfs4state.c
> > > +++ b/fs/nfsd/nfs4state.c
> > > @@ -1317,6 +1317,7 @@ hash_delegation_locked(struct nfs4_delegation *dp, struct nfs4_file *fp)
> > >
> > > lockdep_assert_held(&state_lock);
> > > lockdep_assert_held(&fp->fi_lock);
> > > + lockdep_assert_held(&clp->cl_lock);
> > >
> > > if (nfs4_delegation_exists(clp, fp))
> > > return -EAGAIN;
> > > @@ -5609,12 +5610,14 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
> > > goto out_unlock;
> > >
> > > spin_lock(&state_lock);
> > > + spin_lock(&clp->cl_lock);
> > > spin_lock(&fp->fi_lock);
> > > if (fp->fi_had_conflict)
> > > status = -EAGAIN;
> > > else
> > > status = hash_delegation_locked(dp, fp);
> > > spin_unlock(&fp->fi_lock);
> > > + spin_unlock(&clp->cl_lock);
> > > spin_unlock(&state_lock);
> > >
> > > if (status)
> >
> > I know it's (supposedly) an unlikely race, but should we send this to
> > stable?
>
> I don't know. Once upon a time, "stable" meant something. There was a
> clear list of rules. Those seem to have been torn up. Now it seems to
> be whatever some machine-learning tool chooses.
> If that tool chooses this patch (which I suspect it will), I won't
> object. But I don't think it is worth encouraging it.
We've asked Sasha and GregKH not to use AUTOSEL on NFSD patches,
promising that we will explicitly mark anything that should be
back-ported.
--
Chuck Lever