2016-08-06 23:34:24

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1] nfsd: Fix race between FREE_STATEID and LOCK

Using LTP's nfslock01 test, one of our internal testers found that
the Linux client can send a LOCK and a FREE_STATEID request at the
same time. The LOCK uses the same lockowner as the stateid sent in
the FREE_STATEID request.

The outcome is:

Frame 115025 C FREE_STATEID stateid 2/A
Frame 115026 C LOCK offset 672128 len 64
Frame 115029 R FREE_STATEID NFS4_OK
Frame 115030 R LOCK stateid 3/A
Frame 115034 C WRITE stateid 0/A offset 672128 len 64
Frame 115038 R WRITE NFS4ERR_BAD_STATEID

In other words, the server returns stateid A in the LOCK reply, but
it has already released it. Subsequent uses of the stateid fail.

To address this, protect the logic in nfsd4_free_stateid with the
st_mutex. This should guarantee that only one of two outcomes
occurs: either LOCK returns a fresh valid stateid, or FREE_STATEID
returns NFS4ERR_LOCKS_HELD.

Reported-by: Alexey Kodanev <[email protected]>
Fix-suggested-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
---

Before I pass this along to Alexey for testing, I'd appreciate some
review of the proposed fix.


fs/nfsd/nfs4state.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index b921123..a9e0606 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4911,16 +4911,24 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
ret = nfserr_locks_held;
break;
case NFS4_LOCK_STID:
- ret = check_stateid_generation(stateid, &s->sc_stateid, 1);
- if (ret)
- break;
+ spin_unlock(&cl->cl_lock);
stp = openlockstateid(s);
+ mutex_lock(&stp->st_mutex);
+ ret = check_stateid_generation(stateid, &s->sc_stateid, 1);
+ if (ret) {
+ mutex_unlock(&stp->st_mutex);
+ goto out;
+ }
ret = nfserr_locks_held;
if (check_for_locks(stp->st_stid.sc_file,
- lockowner(stp->st_stateowner)))
- break;
+ lockowner(stp->st_stateowner))) {
+ mutex_unlock(&stp->st_mutex);
+ goto out;
+ }
+ spin_lock(&cl->cl_lock);
WARN_ON(!unhash_lock_stateid(stp));
spin_unlock(&cl->cl_lock);
+ mutex_unlock(&stp->st_mutex);
nfs4_put_stid(s);
ret = nfs_ok;
goto out;



2016-08-07 02:01:19

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v1] nfsd: Fix race between FREE_STATEID and LOCK

On Fri, 2016-08-05 at 22:26 -0400, Chuck Lever wrote:
> Using LTP's nfslock01 test, one of our internal testers found that
> the Linux client can send a LOCK and a FREE_STATEID request at the
> same time. The LOCK uses the same lockowner as the stateid sent in
> the FREE_STATEID request.
>
> The outcome is:
>
> Frame 115025 C FREE_STATEID stateid 2/A
> Frame 115026 C LOCK offset 672128 len 64
> Frame 115029 R FREE_STATEID NFS4_OK
> Frame 115030 R LOCK stateid 3/A
> Frame 115034 C WRITE stateid 0/A offset 672128 len 64
> Frame 115038 R WRITE NFS4ERR_BAD_STATEID
>
> In other words, the server returns stateid A in the LOCK reply, but
> it has already released it. Subsequent uses of the stateid fail.
>
> To address this, protect the logic in nfsd4_free_stateid with the
> st_mutex. This should guarantee that only one of two outcomes
> occurs: either LOCK returns a fresh valid stateid, or FREE_STATEID
> returns NFS4ERR_LOCKS_HELD.
>
> > Reported-by: Alexey Kodanev <[email protected]>
> > Fix-suggested-by: Jeff Layton <[email protected]>
> > Signed-off-by: Chuck Lever <[email protected]>
> ---
>
> Before I pass this along to Alexey for testing, I'd appreciate some
> review of the proposed fix.
>
>
>  fs/nfsd/nfs4state.c |   18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index b921123..a9e0606 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -4911,16 +4911,24 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> >   ret = nfserr_locks_held;
> >   break;
> >   case NFS4_LOCK_STID:
> > - ret = check_stateid_generation(stateid, &s->sc_stateid, 1);
> > - if (ret)
> > - break;
> > > + spin_unlock(&cl->cl_lock);

Once you drop the spinlock, you don't hold a reference to the stateid
anymore. You'll want to bump the refcount and then put the extra
reference when you're done.

> > >   stp = openlockstateid(s);
> > + mutex_lock(&stp->st_mutex);
> > + ret = check_stateid_generation(stateid, &s->sc_stateid, 1);
> > + if (ret) {
> > + mutex_unlock(&stp->st_mutex);
> > + goto out;
> > + }
> >   ret = nfserr_locks_held;
> >   if (check_for_locks(stp->st_stid.sc_file,
> > -     lockowner(stp->st_stateowner)))
> > - break;
> > +     lockowner(stp->st_stateowner))) {
> > + mutex_unlock(&stp->st_mutex);
> > + goto out;
> > + }
> > + spin_lock(&cl->cl_lock);
> >   WARN_ON(!unhash_lock_stateid(stp));
> > >   spin_unlock(&cl->cl_lock);

Now that you're dropping the spinlock, it could be unhashed before you
take it again. Probably should convert this and the following put to a
release_lock_stateid call.

> > > + mutex_unlock(&stp->st_mutex);
> >   nfs4_put_stid(s);
> >   ret = nfs_ok;
> >   goto out;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
Jeff Layton <[email protected]>

2016-08-07 16:18:11

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v1] nfsd: Fix race between FREE_STATEID and LOCK

Hi Jeff-

Thanks for your comments. Responses below:


> On Aug 6, 2016, at 10:01 PM, Jeff Layton <[email protected]> wrote:
>
> On Fri, 2016-08-05 at 22:26 -0400, Chuck Lever wrote:
>> Using LTP's nfslock01 test, one of our internal testers found that
>> the Linux client can send a LOCK and a FREE_STATEID request at the
>> same time. The LOCK uses the same lockowner as the stateid sent in
>> the FREE_STATEID request.
>>
>> The outcome is:
>>
>> Frame 115025 C FREE_STATEID stateid 2/A
>> Frame 115026 C LOCK offset 672128 len 64
>> Frame 115029 R FREE_STATEID NFS4_OK
>> Frame 115030 R LOCK stateid 3/A
>> Frame 115034 C WRITE stateid 0/A offset 672128 len 64
>> Frame 115038 R WRITE NFS4ERR_BAD_STATEID
>>
>> In other words, the server returns stateid A in the LOCK reply, but
>> it has already released it. Subsequent uses of the stateid fail.
>>
>> To address this, protect the logic in nfsd4_free_stateid with the
>> st_mutex. This should guarantee that only one of two outcomes
>> occurs: either LOCK returns a fresh valid stateid, or FREE_STATEID
>> returns NFS4ERR_LOCKS_HELD.
>>
>>> Reported-by: Alexey Kodanev <[email protected]>
>>> Fix-suggested-by: Jeff Layton <[email protected]>
>>> Signed-off-by: Chuck Lever <[email protected]>
>> ---
>>
>> Before I pass this along to Alexey for testing, I'd appreciate some
>> review of the proposed fix.
>>
>>
>> fs/nfsd/nfs4state.c | 18 +++++++++++++-----
>> 1 file changed, 13 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> index b921123..a9e0606 100644
>> --- a/fs/nfsd/nfs4state.c
>> +++ b/fs/nfsd/nfs4state.c
>> @@ -4911,16 +4911,24 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>> ret = nfserr_locks_held;
>>> break;
>>> case NFS4_LOCK_STID:
>>> - ret = check_stateid_generation(stateid, &s->sc_stateid, 1);
>>> - if (ret)
>>> - break;
>>>> + spin_unlock(&cl->cl_lock);
>
> Once you drop the spinlock, you don't hold a reference to the stateid
> anymore. You'll want to bump the refcount and then put the extra
> reference when you're done.

Ooops. find_stateid_by_type does bump the reference count,
but indeed, find_stateid_locked does not. I can add

atomic_inc(&s->sc_count);

here, and do something about putting sc_count in the exit
paths below.


>
>>>> stp = openlockstateid(s);
>>> + mutex_lock(&stp->st_mutex);
>>> + ret = check_stateid_generation(stateid, &s->sc_stateid, 1);
>>> + if (ret) {
>>> + mutex_unlock(&stp->st_mutex);
>>> + goto out;
>>> + }
>>> ret = nfserr_locks_held;
>>> if (check_for_locks(stp->st_stid.sc_file,
>>> - lockowner(stp->st_stateowner)))
>>> - break;
>>> + lockowner(stp->st_stateowner))) {
>>> + mutex_unlock(&stp->st_mutex);
>>> + goto out;
>>> + }
>>> + spin_lock(&cl->cl_lock);
>>> WARN_ON(!unhash_lock_stateid(stp));
>>>> spin_unlock(&cl->cl_lock);
>
> Now that you're dropping the spinlock, it could be unhashed before you
> take it again. Probably should convert this and the following put to a
> release_lock_stateid call.

Something like:

goto out;
}

release_lock_stateid
spin_lock(cl_lock)
unhash
spin_unlock(cl_lock)
maybe put_stid (now called while st_mutex is still held)

mutex_unlock
put_stid (since now an extra reference count is taken above)

I guess we decided the ordering of mutex_unlock and
put_stid ultimately doesn't matter.


>>>> + mutex_unlock(&stp->st_mutex);
>>> nfs4_put_stid(s);
>>> ret = nfs_ok;
>>> goto out;


--
Chuck Lever




2016-08-07 18:17:18

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v1] nfsd: Fix race between FREE_STATEID and LOCK

On Sun, 2016-08-07 at 12:18 -0400, Chuck Lever wrote:
> Hi Jeff-
>
> Thanks for your comments. Responses below:
>
>
> >
> > On Aug 6, 2016, at 10:01 PM, Jeff Layton <[email protected]>
> > wrote:
> >
> > On Fri, 2016-08-05 at 22:26 -0400, Chuck Lever wrote:
> > >
> > > Using LTP's nfslock01 test, one of our internal testers found
> > > that
> > > the Linux client can send a LOCK and a FREE_STATEID request at
> > > the
> > > same time. The LOCK uses the same lockowner as the stateid sent
> > > in
> > > the FREE_STATEID request.
> > >
> > > The outcome is:
> > >
> > > Frame 115025 C FREE_STATEID stateid 2/A
> > > Frame 115026 C LOCK offset 672128 len 64
> > > Frame 115029 R FREE_STATEID NFS4_OK
> > > Frame 115030 R LOCK stateid 3/A
> > > Frame 115034 C WRITE stateid 0/A offset 672128 len 64
> > > Frame 115038 R WRITE NFS4ERR_BAD_STATEID
> > >
> > > In other words, the server returns stateid A in the LOCK reply,
> > > but
> > > it has already released it. Subsequent uses of the stateid fail.
> > >
> > > To address this, protect the logic in nfsd4_free_stateid with the
> > > st_mutex. This should guarantee that only one of two outcomes
> > > occurs: either LOCK returns a fresh valid stateid, or
> > > FREE_STATEID
> > > returns NFS4ERR_LOCKS_HELD.
> > >
> > > >
> > > > Reported-by: Alexey Kodanev <[email protected]>
> > > > Fix-suggested-by: Jeff Layton <[email protected]>
> > > > Signed-off-by: Chuck Lever <[email protected]>
> > > ---
> > >
> > > Before I pass this along to Alexey for testing, I'd appreciate
> > > some
> > > review of the proposed fix.
> > >
> > >
> > >  fs/nfsd/nfs4state.c |   18 +++++++++++++-----
> > >  1 file changed, 13 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > index b921123..a9e0606 100644
> > > --- a/fs/nfsd/nfs4state.c
> > > +++ b/fs/nfsd/nfs4state.c
> > > @@ -4911,16 +4911,24 @@ nfsd4_free_stateid(struct svc_rqst
> > > *rqstp, struct nfsd4_compound_state *cstate,
> > > >
> > > >   ret = nfserr_locks_held;
> > > >   break;
> > > >   case NFS4_LOCK_STID:
> > > > - ret = check_stateid_generation(stateid, &s-
> > > > >sc_stateid, 1);
> > > > - if (ret)
> > > > - break;
> > > > >
> > > > > + spin_unlock(&cl->cl_lock);
> >
> > Once you drop the spinlock, you don't hold a reference to the
> > stateid
> > anymore. You'll want to bump the refcount and then put the extra
> > reference when you're done.
>
> Ooops. find_stateid_by_type does bump the reference count,
> but indeed, find_stateid_locked does not. I can add
>
>    atomic_inc(&s->sc_count);
>
> here, and do something about putting sc_count in the exit
> paths below.
>
>

Yes. Just call nfs4_put_stid once you're done and that will put the
reference (and start freeing the stateid if it was the last one).

> >
> >
> > >
> > > >
> > > > >
> > > > >   stp = openlockstateid(s);
> > > > + mutex_lock(&stp->st_mutex);
> > > > + ret = check_stateid_generation(stateid, &s-
> > > > >sc_stateid, 1);
> > > > + if (ret) {
> > > > + mutex_unlock(&stp->st_mutex);
> > > > + goto out;
> > > > + }
> > > >   ret = nfserr_locks_held;
> > > >   if (check_for_locks(stp->st_stid.sc_file,
> > > > -     lockowner(stp-
> > > > >st_stateowner)))
> > > > - break;
> > > > +     lockowner(stp-
> > > > >st_stateowner))) {
> > > > + mutex_unlock(&stp->st_mutex);
> > > > + goto out;
> > > > + }
> > > > + spin_lock(&cl->cl_lock);
> > > >   WARN_ON(!unhash_lock_stateid(stp));
> > > > >
> > > > >   spin_unlock(&cl->cl_lock);
> >
> > Now that you're dropping the spinlock, it could be unhashed before
> > you
> > take it again. Probably should convert this and the following put
> > to a
> > release_lock_stateid call.
>
> Something like:
>
>           goto out;
>      }
>
>      release_lock_stateid
>           spin_lock(cl_lock)
>           unhash
>           spin_unlock(cl_lock)
>           maybe put_stid (now called while st_mutex is still held)
>
>      mutex_unlock
>      put_stid (since now an extra reference count is taken above)
>

release_lock_stateid will unhash it and put the hashtable reference if
it did the unhashing. So assuming you take the reference above while
still holding the spinlock:

release_lock_stateid(); /* unhash and release hashtable reference */
mutex_unlock(); /* unlock the stateid */
nfs4_put_stid(); /* put the reference you acquired before dropping the spinlock */


> I guess we decided the ordering of mutex_unlock and
> put_stid ultimately doesn't matter.
>

You definitely want to mutex_unlock before you put the reference you're
taking in this function. Otherwise you have no guarantee that the
pointer will still be good...

>
> >
> > >
> > > >
> > > > >
> > > > > + mutex_unlock(&stp->st_mutex);
> > > >   nfs4_put_stid(s);
> > > >   ret = nfs_ok;
> > > >   goto out;
>
>
> --
> Chuck Lever
>
>
>

--
Jeff Layton <[email protected]>