LinuxLists.cc - [PATCH 0/8] nfsd4: keep the client from expiring while in use by nfs41 compounds

2010-05-04 22:37:36

Subject: [PATCH 0/8] nfsd4: keep the client from expiring while in use by nfs41 compounds

Bruce,

The following patchset changes the scope of the sessionid spin lock
to cover both sessions and the client lru list and it introduces a
new reference count on the client that's manipulated under that new
client lock (not requiring the state mutex).

It's tested to pass connectathon tests as well as explicit session destroy
and implicit client expiry when the client is blown away.
However, I haven't tested the gist of this patchset which is to get
the client to perform a long enough compound during which it might time out...

[PATCH 1/8] nfsd4: rename sessionid_lock to client_lock
[PATCH 2/8] nfsd4: fold release_session into expire_client
[PATCH 3/8] nfsd4: use list_move in move_to_confirmed
[PATCH 4/8] nfsd4: extend the client_lock to cover cl_lru
[PATCH 5/8] nfsd4: refactor expire_client
[PATCH 6/8] nfsd4: introduce nfs4_client.cl_refcount
[PATCH 7/8] nfsd4: keep a reference count on client while in use

[PATCH 8/8] nfsd41: cstate->session can NULL in nfsd4_destroy_session
I think this was introduced in: 26c0c75 nfsd4: fix unlikely race in session replay case
though I'm not sure how it ever worked correctly...

Benny

2010-05-10 19:01:13

by J.Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH 0/8] nfsd4: keep the client from expiring while in use by nfs41 compounds

On Mon, May 10, 2010 at 05:15:43PM +0300, Benny Halevy wrote:
> On May. 09, 2010, 19:55 +0300, "J. Bruce Fields" <[email protected]> wrote:
> > On Sun, May 09, 2010 at 09:30:31AM +0300, Benny Halevy wrote:
> >> Correct. The intentions are:
> >> 1. Make the laundromat process ignore clients that are in
> >> use by a 4.1 session.
> >> 2. Renew the client when the compound ends, rather than when it begins.
> >> 3. Unhash the client when it's expired explicitly but don't destroy it
> >> until there's no reference to it.
> >
> > OK. My one slight worry there is to make sure that code getting a
> > pointer to a client through a sessionid won't inadvertently assume it's
> > still hashed.
>
> That's a good point.
> In fact, the client should not be renewed when the compound is done
> if it was already explicitly expired.

Hm, and we've still got a lot of renew_client()'s sprinkled around that
will try to add the expired client back to client_lru.

> Another way to deal with this which may be safer but is less optimal
> is to keep only the sessionid and look it up on each use. Then, using
> it while holding the state lock will make sure it's valid when used.

That seems overkill. Instead of making ops look up the sessionid from
state each time I guess we could have a revalidate_sessionid() that
checked the associated client to see if it was still good.

If we continue to reduce the scope of the state lock, isn't that going
to be a pain, though? Will we end up having to do that sort of
revalidation every time we drop the state lock and reacquire it?

Perhaps the simplest would be to make the clientid-destroyer wait; it
could set some sort of CLIENTID_DEAD flag on the client, wait for a
reference count to go to zero, then destroy the client.

Then other code would be guaranteed that nothing will change underneath
them as long as they hold either the state lock or a reference to a
session.

So hopefully we'd only need worry about client shutdown in well-defined
places:
- when put()'ing a session, to check whether the client
is ready to be destroyed now.
- when looking up a session, in which case we should check
whether the client is dead and fail the lookup?

--b.

2010-05-11 14:39:26

by J.Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH 0/8] nfsd4: keep the client from expiring while in use by nfs41 compounds

On Tue, May 11, 2010 at 10:27:27AM +0300, Benny Halevy wrote:
> On May. 10, 2010, 22:01 +0300, "J. Bruce Fields" <[email protected]> wrote:
> > On Mon, May 10, 2010 at 05:15:43PM +0300, Benny Halevy wrote:
> >> On May. 09, 2010, 19:55 +0300, "J. Bruce Fields" <[email protected]> wrote:
> >>> On Sun, May 09, 2010 at 09:30:31AM +0300, Benny Halevy wrote:
> >>>> Correct. The intentions are:
> >>>> 1. Make the laundromat process ignore clients that are in
> >>>> use by a 4.1 session.
> >>>> 2. Renew the client when the compound ends, rather than when it begins.
> >>>> 3. Unhash the client when it's expired explicitly but don't destroy it
> >>>> until there's no reference to it.
> >>> OK. My one slight worry there is to make sure that code getting a
> >>> pointer to a client through a sessionid won't inadvertently assume it's
> >>> still hashed.
> >> That's a good point.
> >> In fact, the client should not be renewed when the compound is done
> >> if it was already explicitly expired.
> >
> > Hm, and we've still got a lot of renew_client()'s sprinkled around that
> > will try to add the expired client back to client_lru.
> >
>
> For that I've already have a fix in my branch to not renew the client
> if it's marked as expired (cl_time == 0).

OK.

> >> Another way to deal with this which may be safer but is less optimal
> >> is to keep only the sessionid and look it up on each use. Then, using
> >> it while holding the state lock will make sure it's valid when used.
> >
> > That seems overkill. Instead of making ops look up the sessionid from
> > state each time I guess we could have a revalidate_sessionid() that
> > checked the associated client to see if it was still good.
>
> Yeah. Let me see if this is a quick fix and if so I'll send it as part
> of version 2 of this patchset, otherwise I'll send it as is.
>
> > If we continue to reduce the scope of the state lock, isn't that going
> > to be a pain, though? Will we end up having to do that sort of
> > revalidation every time we drop the state lock and reacquire it?
>
> It's very little pain, just verifying that cl_time != 0.

It's not the trouble of the revalidation I worry about, so much as a)
remembering to do it every time, and b) always being in a position to
return an error if it fails.

For now it doesn't seem like a problem, so OK. We can always reconsider
if it turns out to be.

> > Perhaps the simplest would be to make the clientid-destroyer wait; it
> > could set some sort of CLIENTID_DEAD flag on the client, wait for a
> > reference count to go to zero, then destroy the client.
> >
>
> I'm not sure about this. Setting this flag is equivalent
> to what I propose to do today (with marking the client as expired AND
> unhashing it) as you want to prevent any more references to it.
>
> The question is more about the semantics of the operation that explicitly
> destroys the old client, e.g. EXCHANGE_ID or DESTROY_SESSION.
> Can the client tolerate responses for the old clientid/sessionid
> after the client-destroying operation has succeeded?

I can't see what would give the client to expect any defined behavior
for a compound executed concurrently with an explicit destruction of the
associated clientid.

> > Then other code would be guaranteed that nothing will change underneath
> > them as long as they hold either the state lock or a reference to a
> > session.
> >
> > So hopefully we'd only need worry about client shutdown in well-defined
> > places:
> > - when put()'ing a session, to check whether the client
> > is ready to be destroyed now.
>
> That's in v2.
>
> > - when looking up a session, in which case we should check
> > whether the client is dead and fail the lookup?
>
> If the client is dead we won't find the sessionid as we unhash the session
> structure atomically with the client so that isn't an issue so that's also
> dealt with in v2.

OK, thanks.

--b.

2010-05-11 07:27:31

by Benny Halevy

[permalink] [raw]

Subject: Re: [PATCH 0/8] nfsd4: keep the client from expiring while in use by nfs41 compounds

On May. 10, 2010, 22:01 +0300, "J. Bruce Fields" <[email protected]> wrote:
> On Mon, May 10, 2010 at 05:15:43PM +0300, Benny Halevy wrote:
>> On May. 09, 2010, 19:55 +0300, "J. Bruce Fields" <[email protected]> wrote:
>>> On Sun, May 09, 2010 at 09:30:31AM +0300, Benny Halevy wrote:
>>>> Correct. The intentions are:
>>>> 1. Make the laundromat process ignore clients that are in
>>>> use by a 4.1 session.
>>>> 2. Renew the client when the compound ends, rather than when it begins.
>>>> 3. Unhash the client when it's expired explicitly but don't destroy it
>>>> until there's no reference to it.
>>> OK. My one slight worry there is to make sure that code getting a
>>> pointer to a client through a sessionid won't inadvertently assume it's
>>> still hashed.
>> That's a good point.
>> In fact, the client should not be renewed when the compound is done
>> if it was already explicitly expired.
>
> Hm, and we've still got a lot of renew_client()'s sprinkled around that
> will try to add the expired client back to client_lru.
>

For that I've already have a fix in my branch to not renew the client
if it's marked as expired (cl_time == 0).

>> Another way to deal with this which may be safer but is less optimal
>> is to keep only the sessionid and look it up on each use. Then, using
>> it while holding the state lock will make sure it's valid when used.
>
> That seems overkill. Instead of making ops look up the sessionid from
> state each time I guess we could have a revalidate_sessionid() that
> checked the associated client to see if it was still good.
>

Yeah. Let me see if this is a quick fix and if so I'll send it as part
of version 2 of this patchset, otherwise I'll send it as is.

> If we continue to reduce the scope of the state lock, isn't that going
> to be a pain, though? Will we end up having to do that sort of
> revalidation every time we drop the state lock and reacquire it?

It's very little pain, just verifying that cl_time != 0.

>
> Perhaps the simplest would be to make the clientid-destroyer wait; it
> could set some sort of CLIENTID_DEAD flag on the client, wait for a
> reference count to go to zero, then destroy the client.
>

I'm not sure about this. Setting this flag is equivalent
to what I propose to do today (with marking the client as expired AND
unhashing it) as you want to prevent any more references to it.

The question is more about the semantics of the operation that explicitly
destroys the old client, e.g. EXCHANGE_ID or DESTROY_SESSION.
Can the client tolerate responses for the old clientid/sessionid
after the client-destroying operation has succeeded?

> Then other code would be guaranteed that nothing will change underneath
> them as long as they hold either the state lock or a reference to a
> session.
>
> So hopefully we'd only need worry about client shutdown in well-defined
> places:
> - when put()'ing a session, to check whether the client
> is ready to be destroyed now.

That's in v2.

> - when looking up a session, in which case we should check
> whether the client is dead and fail the lookup?

If the client is dead we won't find the sessionid as we unhash the session
structure atomically with the client so that isn't an issue so that's also
dealt with in v2.

Benny

>
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2010-05-07 22:29:27

by J.Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH 4/8] nfsd4: extend the client_lock to cover cl_lru

On Wed, May 05, 2010 at 01:44:17AM +0300, Benny Halevy wrote:
> @@ -2557,6 +2562,7 @@ nfs4_laundromat(void)
> dprintk("NFSD: laundromat service - starting\n");
> if (locks_in_grace())
> nfsd4_end_grace();
> + spin_lock(&client_lock);
> list_for_each_safe(pos, next, &client_lru) {
> clp = list_entry(pos, struct nfs4_client, cl_lru);
> if (time_after((unsigned long)clp->cl_time, (unsigned long)cutoff)) {
> @@ -2565,11 +2571,15 @@ nfs4_laundromat(void)
> clientid_val = t;
> break;
> }
> + list_del_init(&clp->cl_lru);
> + spin_unlock(&client_lock);
> dprintk("NFSD: purging unused client (clientid %08x)\n",
> clp->cl_clientid.cl_id);
> nfsd4_remove_clid_dir(clp);
> expire_client(clp);
> + spin_lock(&client_lock);
> }
> + spin_unlock(&client_lock);
> INIT_LIST_HEAD(&reaplist);
> spin_lock(&recall_lock);
> list_for_each_safe(pos, next, &del_recall_lru) {

Careful--the list_for_each_safe() isn't enough to handle the results of
concurrent modifications that might occur while the client_lock is
dropped.

Maybe just use the trick of moving everything to a temporary list, then
doing the real work on that list after dropping the lock?

--b.

2010-05-07 22:38:36

by J.Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH 0/8] nfsd4: keep the client from expiring while in use by nfs41 compounds

On Wed, May 05, 2010 at 01:37:34AM +0300, Benny Halevy wrote:
> Bruce,
>
> The following patchset changes the scope of the sessionid spin lock
> to cover both sessions and the client lru list and it introduces a
> new reference count on the client that's manipulated under that new
> client lock (not requiring the state mutex).
>
> It's tested to pass connectathon tests as well as explicit session destroy
> and implicit client expiry when the client is blown away.
> However, I haven't tested the gist of this patchset which is to get
> the client to perform a long enough compound during which it might time out...

Yeah, I'm not sure how to test that. Create a temporary patch
introducign a "delay X seconds" compound op, then teach pynfs to send
those timed to coincide with client-reboot exchangeid's or the end of a
client lease?

So if I understand the intention of these patches right: behavior in the
case of something explicitly destroys a client (e.g. client-rebooting
exchangeid) is to partially destroy the client, but allow any concurrent
compound to attempt to continue processing with the near-dead client?

> [PATCH 1/8] nfsd4: rename sessionid_lock to client_lock
> [PATCH 2/8] nfsd4: fold release_session into expire_client
> [PATCH 3/8] nfsd4: use list_move in move_to_confirmed
> [PATCH 4/8] nfsd4: extend the client_lock to cover cl_lru
> [PATCH 5/8] nfsd4: refactor expire_client
> [PATCH 6/8] nfsd4: introduce nfs4_client.cl_refcount
> [PATCH 7/8] nfsd4: keep a reference count on client while in use
>
> [PATCH 8/8] nfsd41: cstate->session can NULL in nfsd4_destroy_session
> I think this was introduced in: 26c0c75 nfsd4: fix unlikely race in session replay case
> though I'm not sure how it ever worked correctly...

Me neither. I've got a similar patch in my tree.

--b.

2010-05-09 06:18:34

by Benny Halevy

[permalink] [raw]

Subject: Re: [PATCH 4/8] nfsd4: extend the client_lock to cover cl_lru

On May. 08, 2010, 1:29 +0300, " J. Bruce Fields" <[email protected]> wrote:
> On Wed, May 05, 2010 at 01:44:17AM +0300, Benny Halevy wrote:
>> @@ -2557,6 +2562,7 @@ nfs4_laundromat(void)
>> dprintk("NFSD: laundromat service - starting\n");
>> if (locks_in_grace())
>> nfsd4_end_grace();
>> + spin_lock(&client_lock);
>> list_for_each_safe(pos, next, &client_lru) {
>> clp = list_entry(pos, struct nfs4_client, cl_lru);
>> if (time_after((unsigned long)clp->cl_time, (unsigned long)cutoff)) {
>> @@ -2565,11 +2571,15 @@ nfs4_laundromat(void)
>> clientid_val = t;
>> break;
>> }
>> + list_del_init(&clp->cl_lru);
>> + spin_unlock(&client_lock);
>> dprintk("NFSD: purging unused client (clientid %08x)\n",
>> clp->cl_clientid.cl_id);
>> nfsd4_remove_clid_dir(clp);
>> expire_client(clp);
>> + spin_lock(&client_lock);
>> }
>> + spin_unlock(&client_lock);
>> INIT_LIST_HEAD(&reaplist);
>> spin_lock(&recall_lock);
>> list_for_each_safe(pos, next, &del_recall_lru) {
>
> Careful--the list_for_each_safe() isn't enough to handle the results of
> concurrent modifications that might occur while the client_lock is
> dropped.
>
> Maybe just use the trick of moving everything to a temporary list, then
> doing the real work on that list after dropping the lock?

Yeah, that should be better.
Will do.

Benny

>
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2010-05-09 06:30:35

by Benny Halevy

[permalink] [raw]

Subject: Re: [PATCH 0/8] nfsd4: keep the client from expiring while in use by nfs41 compounds

On May. 08, 2010, 1:38 +0300, "J. Bruce Fields" <[email protected]> wrote:
> On Wed, May 05, 2010 at 01:37:34AM +0300, Benny Halevy wrote:
>> Bruce,
>>
>> The following patchset changes the scope of the sessionid spin lock
>> to cover both sessions and the client lru list and it introduces a
>> new reference count on the client that's manipulated under that new
>> client lock (not requiring the state mutex).
>>
>> It's tested to pass connectathon tests as well as explicit session destroy
>> and implicit client expiry when the client is blown away.
>> However, I haven't tested the gist of this patchset which is to get
>> the client to perform a long enough compound during which it might time out...
>
> Yeah, I'm not sure how to test that. Create a temporary patch
> introducign a "delay X seconds" compound op, then teach pynfs to send
> those timed to coincide with client-reboot exchangeid's or the end of a
> client lease?

That should do for a one time test.
For regression testing I think we need a better way of injecting
a timeout longer than the lease period.

>
> So if I understand the intention of these patches right: behavior in the
> case of something explicitly destroys a client (e.g. client-rebooting
> exchangeid) is to partially destroy the client, but allow any concurrent
> compound to attempt to continue processing with the near-dead client?
>

Correct. The intentions are:
1. Make the laundromat process ignore clients that are in
use by a 4.1 session.
2. Renew the client when the compound ends, rather than when it begins.
3. Unhash the client when it's expired explicitly but don't destroy it
until there's no reference to it.

>> [PATCH 1/8] nfsd4: rename sessionid_lock to client_lock
>> [PATCH 2/8] nfsd4: fold release_session into expire_client
>> [PATCH 3/8] nfsd4: use list_move in move_to_confirmed
>> [PATCH 4/8] nfsd4: extend the client_lock to cover cl_lru
>> [PATCH 5/8] nfsd4: refactor expire_client
>> [PATCH 6/8] nfsd4: introduce nfs4_client.cl_refcount
>> [PATCH 7/8] nfsd4: keep a reference count on client while in use
>>
>> [PATCH 8/8] nfsd41: cstate->session can NULL in nfsd4_destroy_session
>> I think this was introduced in: 26c0c75 nfsd4: fix unlikely race in session replay case
>> though I'm not sure how it ever worked correctly...
>
> Me neither. I've got a similar patch in my tree.

Heh, I see.
5d4cec2 nfsd4: fix bare destroy_session null dereference

Benny

>
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2010-05-09 16:55:53

by J.Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH 0/8] nfsd4: keep the client from expiring while in use by nfs41 compounds

On Sun, May 09, 2010 at 09:30:31AM +0300, Benny Halevy wrote:
> Correct. The intentions are:
> 1. Make the laundromat process ignore clients that are in
> use by a 4.1 session.
> 2. Renew the client when the compound ends, rather than when it begins.
> 3. Unhash the client when it's expired explicitly but don't destroy it
> until there's no reference to it.

OK. My one slight worry there is to make sure that code getting a
pointer to a client through a sessionid won't inadvertently assume it's
still hashed.

By the way, the current stateid may present a similar problem to the
session, since it also lasts across compound ops and references a
client (and other state).

Looking through the code.... We don't implement the current stateid at
all yet. Ouch.

I'll add that to the wiki.

> >> [PATCH 8/8] nfsd41: cstate->session can NULL in nfsd4_destroy_session
> >> I think this was introduced in: 26c0c75 nfsd4: fix unlikely race in session replay case
> >> though I'm not sure how it ever worked correctly...
> >
> > Me neither. I've got a similar patch in my tree.
>
> Heh, I see.
> 5d4cec2 nfsd4: fix bare destroy_session null dereference

Apologies, I'd applied it privately but not pushed it out yet.... I
should have gotten that out faster.

--b.

2010-05-10 14:16:08

by Benny Halevy

[permalink] [raw]

Subject: Re: [PATCH 0/8] nfsd4: keep the client from expiring while in use by nfs41 compounds

On May. 09, 2010, 19:55 +0300, "J. Bruce Fields" <[email protected]> wrote:
> On Sun, May 09, 2010 at 09:30:31AM +0300, Benny Halevy wrote:
>> Correct. The intentions are:
>> 1. Make the laundromat process ignore clients that are in
>> use by a 4.1 session.
>> 2. Renew the client when the compound ends, rather than when it begins.
>> 3. Unhash the client when it's expired explicitly but don't destroy it
>> until there's no reference to it.
>
> OK. My one slight worry there is to make sure that code getting a
> pointer to a client through a sessionid won't inadvertently assume it's
> still hashed.

That's a good point.
In fact, the client should not be renewed when the compound is done
if it was already explicitly expired.

Another way to deal with this which may be safer but is less optimal
is to keep only the sessionid and look it up on each use. Then, using
it while holding the state lock will make sure it's valid when used.

Benny

>
> By the way, the current stateid may present a similar problem to the
> session, since it also lasts across compound ops and references a
> client (and other state).
>
> Looking through the code.... We don't implement the current stateid at
> all yet. Ouch.
>
> I'll add that to the wiki.
>
>>>> [PATCH 8/8] nfsd41: cstate->session can NULL in nfsd4_destroy_session
>>>> I think this was introduced in: 26c0c75 nfsd4: fix unlikely race in session replay case
>>>> though I'm not sure how it ever worked correctly...
>>> Me neither. I've got a similar patch in my tree.
>> Heh, I see.
>> 5d4cec2 nfsd4: fix bare destroy_session null dereference
>
> Apologies, I'd applied it privately but not pushed it out yet.... I
> should have gotten that out faster.
>
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2010-05-11 16:05:46

by J.Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH 0/8] nfsd4: keep the client from expiring while in use by nfs41 compounds

On Tue, May 11, 2010 at 10:39:25AM -0400, J. Bruce Fields wrote:
> On Tue, May 11, 2010 at 10:27:27AM +0300, Benny Halevy wrote:
> > On May. 10, 2010, 22:01 +0300, "J. Bruce Fields" <[email protected]> wrote:
> > > On Mon, May 10, 2010 at 05:15:43PM +0300, Benny Halevy wrote:
> > >> On May. 09, 2010, 19:55 +0300, "J. Bruce Fields" <[email protected]> wrote:
> > >>> On Sun, May 09, 2010 at 09:30:31AM +0300, Benny Halevy wrote:
> > >>>> Correct. The intentions are:
> > >>>> 1. Make the laundromat process ignore clients that are in
> > >>>> use by a 4.1 session.
> > >>>> 2. Renew the client when the compound ends, rather than when it begins.
> > >>>> 3. Unhash the client when it's expired explicitly but don't destroy it
> > >>>> until there's no reference to it.
> > >>> OK. My one slight worry there is to make sure that code getting a
> > >>> pointer to a client through a sessionid won't inadvertently assume it's
> > >>> still hashed.
> > >> That's a good point.
> > >> In fact, the client should not be renewed when the compound is done
> > >> if it was already explicitly expired.
> > >
> > > Hm, and we've still got a lot of renew_client()'s sprinkled around that
> > > will try to add the expired client back to client_lru.
> > >
> >
> > For that I've already have a fix in my branch to not renew the client
> > if it's marked as expired (cl_time == 0).
>
> OK.
>
> > >> Another way to deal with this which may be safer but is less optimal
> > >> is to keep only the sessionid and look it up on each use. Then, using
> > >> it while holding the state lock will make sure it's valid when used.
> > >
> > > That seems overkill. Instead of making ops look up the sessionid from
> > > state each time I guess we could have a revalidate_sessionid() that
> > > checked the associated client to see if it was still good.
> >
> > Yeah. Let me see if this is a quick fix and if so I'll send it as part
> > of version 2 of this patchset, otherwise I'll send it as is.
> >
> > > If we continue to reduce the scope of the state lock, isn't that going
> > > to be a pain, though? Will we end up having to do that sort of
> > > revalidation every time we drop the state lock and reacquire it?
> >
> > It's very little pain, just verifying that cl_time != 0.
>
> It's not the trouble of the revalidation I worry about, so much as a)
> remembering to do it every time, and b) always being in a position to
> return an error if it fails.
>
> For now it doesn't seem like a problem, so OK. We can always reconsider
> if it turns out to be.
>
> > > Perhaps the simplest would be to make the clientid-destroyer wait; it
> > > could set some sort of CLIENTID_DEAD flag on the client, wait for a
> > > reference count to go to zero, then destroy the client.
> > >
> >
> > I'm not sure about this. Setting this flag is equivalent
> > to what I propose to do today (with marking the client as expired AND
> > unhashing it) as you want to prevent any more references to it.
> >
> > The question is more about the semantics of the operation that explicitly
> > destroys the old client, e.g. EXCHANGE_ID or DESTROY_SESSION.
> > Can the client tolerate responses for the old clientid/sessionid
> > after the client-destroying operation has succeeded?
>
> I can't see what would give the client to expect any defined behavior

(Sorry, I meant to say "would give the client the right to expect...",
or words to that effect.)

--b.

> for a compound executed concurrently with an explicit destruction of the
> associated clientid.
>
> > > Then other code would be guaranteed that nothing will change underneath
> > > them as long as they hold either the state lock or a reference to a
> > > session.
> > >
> > > So hopefully we'd only need worry about client shutdown in well-defined
> > > places:
> > > - when put()'ing a session, to check whether the client
> > > is ready to be destroyed now.
> >
> > That's in v2.
> >
> > > - when looking up a session, in which case we should check
> > > whether the client is dead and fail the lookup?
> >
> > If the client is dead we won't find the sessionid as we unhash the session
> > structure atomically with the client so that isn't an issue so that's also
> > dealt with in v2.
>
> OK, thanks.
>
> --b.

2010-05-04 22:43:43

by Benny Halevy

[permalink] [raw]

Subject: [PATCH 1/8] nfsd4: rename sessionid_lock to client_lock

In preparation to share the lock's scope to both client
and session hash tables.

Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4state.c | 26 ++++++++++++++------------
1 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 737315c..2d77ae8 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -249,6 +249,9 @@ unhash_delegation(struct nfs4_delegation *dp)
* SETCLIENTID state
*/

+/* client_lock protects the session hash table */
+static DEFINE_SPINLOCK(client_lock);
+
/* Hash tables for nfs4_clientid state */
#define CLIENT_HASH_BITS 4
#define CLIENT_HASH_SIZE (1 << CLIENT_HASH_BITS)
@@ -367,7 +370,6 @@ static void release_openowner(struct nfs4_stateowner *sop)
nfs4_put_stateowner(sop);
}

-static DEFINE_SPINLOCK(sessionid_lock);
#define SESSION_HASH_SIZE 512
static struct list_head sessionid_hashtbl[SESSION_HASH_SIZE];

@@ -565,10 +567,10 @@ alloc_init_session(struct svc_rqst *rqstp, struct nfs4_client *clp,

new->se_flags = cses->flags;
kref_init(&new->se_ref);
- spin_lock(&sessionid_lock);
+ spin_lock(&client_lock);
list_add(&new->se_hash, &sessionid_hashtbl[idx]);
list_add(&new->se_perclnt, &clp->cl_sessions);
- spin_unlock(&sessionid_lock);
+ spin_unlock(&client_lock);

status = nfs_ok;
out:
@@ -579,7 +581,7 @@ out_free:
goto out;
}

-/* caller must hold sessionid_lock */
+/* caller must hold client_lock */
static struct nfsd4_session *
find_in_sessionid_hashtbl(struct nfs4_sessionid *sessionid)
{
@@ -602,7 +604,7 @@ find_in_sessionid_hashtbl(struct nfs4_sessionid *sessionid)
return NULL;
}

-/* caller must hold sessionid_lock */
+/* caller must hold client_lock */
static void
unhash_session(struct nfsd4_session *ses)
{
@@ -613,9 +615,9 @@ unhash_session(struct nfsd4_session *ses)
static void
release_session(struct nfsd4_session *ses)
{
- spin_lock(&sessionid_lock);
+ spin_lock(&client_lock);
unhash_session(ses);
- spin_unlock(&sessionid_lock);
+ spin_unlock(&client_lock);
nfsd4_put_session(ses);
}

@@ -1372,15 +1374,15 @@ nfsd4_destroy_session(struct svc_rqst *r,
return nfserr_not_only_op;
}
dump_sessionid(__func__, &sessionid->sessionid);
- spin_lock(&sessionid_lock);
+ spin_lock(&client_lock);
ses = find_in_sessionid_hashtbl(&sessionid->sessionid);
if (!ses) {
- spin_unlock(&sessionid_lock);
+ spin_unlock(&client_lock);
goto out;
}

unhash_session(ses);
- spin_unlock(&sessionid_lock);
+ spin_unlock(&client_lock);

/* wait for callbacks */
nfsd4_set_callback_client(ses->se_client, NULL);
@@ -1404,7 +1406,7 @@ nfsd4_sequence(struct svc_rqst *rqstp,
if (resp->opcnt != 1)
return nfserr_sequence_pos;

- spin_lock(&sessionid_lock);
+ spin_lock(&client_lock);
status = nfserr_badsession;
session = find_in_sessionid_hashtbl(&seq->sessionid);
if (!session)
@@ -1447,7 +1449,7 @@ out:
/* Hold a session reference until done processing the compound. */
if (cstate->session)
nfsd4_get_session(cstate->session);
- spin_unlock(&sessionid_lock);
+ spin_unlock(&client_lock);
/* Renew the clientid on success and on replay */
if (cstate->session) {
nfs4_lock_state();
--
1.6.5.1

2010-05-04 22:43:55

by Benny Halevy

[permalink] [raw]

Subject: [PATCH 2/8] nfsd4: fold release_session into expire_client

and grab the client lock once for all the client's sessions.

Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4state.c | 14 ++++----------
1 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 2d77ae8..388d244 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -612,15 +612,6 @@ unhash_session(struct nfsd4_session *ses)
list_del(&ses->se_perclnt);
}

-static void
-release_session(struct nfsd4_session *ses)
-{
- spin_lock(&client_lock);
- unhash_session(ses);
- spin_unlock(&client_lock);
- nfsd4_put_session(ses);
-}
-
void
free_session(struct kref *kref)
{
@@ -721,12 +712,15 @@ expire_client(struct nfs4_client *clp)
sop = list_entry(clp->cl_openowners.next, struct nfs4_stateowner, so_perclient);
release_openowner(sop);
}
+ spin_lock(&client_lock);
while (!list_empty(&clp->cl_sessions)) {
struct nfsd4_session *ses;
ses = list_entry(clp->cl_sessions.next, struct nfsd4_session,
se_perclnt);
- release_session(ses);
+ unhash_session(ses);
+ nfsd4_put_session(ses);
}
+ spin_unlock(&client_lock);
nfsd4_set_callback_client(clp, NULL);
if (clp->cl_cb_conn.cb_xprt)
svc_xprt_put(clp->cl_cb_conn.cb_xprt);
--
1.6.5.1

2010-05-04 22:44:07

by Benny Halevy

[permalink] [raw]

Subject: [PATCH 3/8] nfsd4: use list_move in move_to_confirmed

rather than list_del_init, list_add

Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4state.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 388d244..53cb5e4 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -858,10 +858,9 @@ move_to_confirmed(struct nfs4_client *clp)
unsigned int strhashval;

dprintk("NFSD: move_to_confirm nfs4_client %p\n", clp);
- list_del_init(&clp->cl_strhash);
list_move(&clp->cl_idhash, &conf_id_hashtbl[idhashval]);
strhashval = clientstr_hashval(clp->cl_recdir);
- list_add(&clp->cl_strhash, &conf_str_hashtbl[strhashval]);
+ list_move(&clp->cl_strhash, &conf_str_hashtbl[strhashval]);
renew_client(clp);
}

--
1.6.5.1

2010-05-04 22:44:20

by Benny Halevy

[permalink] [raw]

Subject: [PATCH 4/8] nfsd4: extend the client_lock to cover cl_lru

To be used later on to hold a reference count on the client while in use by a
nfsv4.1 compound.

Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4state.c | 38 ++++++++++++++++++++++++--------------
1 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 53cb5e4..6720081 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -249,7 +249,7 @@ unhash_delegation(struct nfs4_delegation *dp)
* SETCLIENTID state
*/

-/* client_lock protects the session hash table */
+/* client_lock protects the client lru list and session hash table */
static DEFINE_SPINLOCK(client_lock);

/* Hash tables for nfs4_clientid state */
@@ -627,8 +627,9 @@ free_session(struct kref *kref)
kfree(ses);
}

+/* must be called under the client_lock */
static inline void
-renew_client(struct nfs4_client *clp)
+renew_client_locked(struct nfs4_client *clp)
{
/*
* Move client to the end to the LRU list.
@@ -640,6 +641,14 @@ renew_client(struct nfs4_client *clp)
clp->cl_time = get_seconds();
}

+static inline void
+renew_client(struct nfs4_client *clp)
+{
+ spin_lock(&client_lock);
+ renew_client_locked(clp);
+ spin_unlock(&client_lock);
+}
+
/* SETCLIENTID and SETCLIENTID_CONFIRM Helper functions */
static int
STALE_CLIENTID(clientid_t *clid)
@@ -705,14 +714,14 @@ expire_client(struct nfs4_client *clp)
list_del_init(&dp->dl_recall_lru);
unhash_delegation(dp);
}
- list_del(&clp->cl_idhash);
- list_del(&clp->cl_strhash);
- list_del(&clp->cl_lru);
while (!list_empty(&clp->cl_openowners)) {
sop = list_entry(clp->cl_openowners.next, struct nfs4_stateowner, so_perclient);
release_openowner(sop);
}
+ list_del(&clp->cl_idhash);
+ list_del(&clp->cl_strhash);
spin_lock(&client_lock);
+ list_del(&clp->cl_lru);
while (!list_empty(&clp->cl_sessions)) {
struct nfsd4_session *ses;
ses = list_entry(clp->cl_sessions.next, struct nfsd4_session,
@@ -847,8 +856,7 @@ add_to_unconfirmed(struct nfs4_client *clp, unsigned int strhashval)
list_add(&clp->cl_strhash, &unconf_str_hashtbl[strhashval]);
idhashval = clientid_hashval(clp->cl_clientid.cl_id);
list_add(&clp->cl_idhash, &unconf_id_hashtbl[idhashval]);
- list_add_tail(&clp->cl_lru, &client_lru);
- clp->cl_time = get_seconds();
+ renew_client(clp);
}

static void
@@ -1440,15 +1448,12 @@ nfsd4_sequence(struct svc_rqst *rqstp,

out:
/* Hold a session reference until done processing the compound. */
- if (cstate->session)
- nfsd4_get_session(cstate->session);
- spin_unlock(&client_lock);
- /* Renew the clientid on success and on replay */
if (cstate->session) {
- nfs4_lock_state();
- renew_client(session->se_client);
- nfs4_unlock_state();
+ nfsd4_get_session(cstate->session);
+ /* Renew the clientid on success and on replay */
+ renew_client_locked(session->se_client);
}
+ spin_unlock(&client_lock);
dprintk("%s: return %d\n", __func__, ntohl(status));
return status;
}
@@ -2557,6 +2562,7 @@ nfs4_laundromat(void)
dprintk("NFSD: laundromat service - starting\n");
if (locks_in_grace())
nfsd4_end_grace();
+ spin_lock(&client_lock);
list_for_each_safe(pos, next, &client_lru) {
clp = list_entry(pos, struct nfs4_client, cl_lru);
if (time_after((unsigned long)clp->cl_time, (unsigned long)cutoff)) {
@@ -2565,11 +2571,15 @@ nfs4_laundromat(void)
clientid_val = t;
break;
}
+ list_del_init(&clp->cl_lru);
+ spin_unlock(&client_lock);
dprintk("NFSD: purging unused client (clientid %08x)\n",
clp->cl_clientid.cl_id);
nfsd4_remove_clid_dir(clp);
expire_client(clp);
+ spin_lock(&client_lock);
}
+ spin_unlock(&client_lock);
INIT_LIST_HEAD(&reaplist);
spin_lock(&recall_lock);
list_for_each_safe(pos, next, &del_recall_lru) {
--
1.6.5.1

2010-05-04 22:44:32

by Benny Halevy

[permalink] [raw]

Subject: [PATCH 5/8] nfsd4: refactor expire_client

Separate out unhashing of the client and session.
To be used later by the laundromat.

Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4state.c | 29 ++++++++++++++++++-----------
1 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 6720081..cd02567 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -692,6 +692,20 @@ free_client(struct nfs4_client *clp)
kfree(clp);
}

+/* must be called under the client_lock */
+static inline void
+unhash_client_locked(struct nfs4_client *clp)
+{
+ list_del(&clp->cl_lru);
+ while (!list_empty(&clp->cl_sessions)) {
+ struct nfsd4_session *ses;
+ ses = list_entry(clp->cl_sessions.next, struct nfsd4_session,
+ se_perclnt);
+ unhash_session(ses);
+ nfsd4_put_session(ses);
+ }
+}
+
static void
expire_client(struct nfs4_client *clp)
{
@@ -718,21 +732,14 @@ expire_client(struct nfs4_client *clp)
sop = list_entry(clp->cl_openowners.next, struct nfs4_stateowner, so_perclient);
release_openowner(sop);
}
+ nfsd4_set_callback_client(clp, NULL);
+ if (clp->cl_cb_conn.cb_xprt)
+ svc_xprt_put(clp->cl_cb_conn.cb_xprt);
list_del(&clp->cl_idhash);
list_del(&clp->cl_strhash);
spin_lock(&client_lock);
- list_del(&clp->cl_lru);
- while (!list_empty(&clp->cl_sessions)) {
- struct nfsd4_session *ses;
- ses = list_entry(clp->cl_sessions.next, struct nfsd4_session,
- se_perclnt);
- unhash_session(ses);
- nfsd4_put_session(ses);
- }
+ unhash_client_locked(clp);
spin_unlock(&client_lock);
- nfsd4_set_callback_client(clp, NULL);
- if (clp->cl_cb_conn.cb_xprt)
- svc_xprt_put(clp->cl_cb_conn.cb_xprt);
free_client(clp);
}

--
1.6.5.1

2010-05-04 22:44:44

by Benny Halevy

[permalink] [raw]

Subject: [PATCH 6/8] nfsd4: introduce nfs4_client.cl_refcount

Currently just initialize the cl_refcount to 1
and decrement in expire_client(), conditionally freeing the
client when the refcount reaches 0.

To be used later by nfsv4.1 compounds to keep the client from
timing out while in use.

Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4state.c | 5 ++++-
fs/nfsd/state.h | 1 +
2 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index cd02567..4b1d39f 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -739,8 +739,10 @@ expire_client(struct nfs4_client *clp)
list_del(&clp->cl_strhash);
spin_lock(&client_lock);
unhash_client_locked(clp);
+ BUG_ON(clp->cl_refcount <= 0);
+ if (--clp->cl_refcount <= 0)
+ free_client(clp);
spin_unlock(&client_lock);
- free_client(clp);
}

static void copy_verf(struct nfs4_client *target, nfs4_verifier *source)
@@ -826,6 +828,7 @@ static struct nfs4_client *create_client(struct xdr_netobj name, char *recdir,
}

memcpy(clp->cl_recdir, recdir, HEXDIR_LEN);
+ clp->cl_refcount = 1;
atomic_set(&clp->cl_cb_set, 0);
INIT_LIST_HEAD(&clp->cl_idhash);
INIT_LIST_HEAD(&clp->cl_strhash);
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 98836fd..09c827d 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -233,6 +233,7 @@ struct nfs4_client {
struct nfsd4_clid_slot cl_cs_slot; /* create_session slot */
u32 cl_exchange_flags;
struct nfs4_sessionid cl_sessionid;
+ int cl_refcount; /* use under client_lock */

/* for nfs41 callbacks */
/* We currently support a single back channel with a single slot */
--
1.6.5.1

2010-05-04 22:44:57

by Benny Halevy

[permalink] [raw]

Subject: [PATCH 7/8] nfsd4: keep a reference count on client while in use

Get a refcount on the client on SEQUENCE,
Release the refcount and renew the client when all respective compounds completed.
Do not expire the client by the laundromat while in use.
If the client was expired via another path, free it when the compounds
complete and the refcount reaches 0.

Note that unhash_client_locked must call list_del_init on cl_lru as
it may be called twice for the same client (once from nfs4_laundromat
and then from expire_client)

Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4state.c | 27 +++++++++++++++++++++++----
fs/nfsd/nfs4xdr.c | 3 ++-
fs/nfsd/state.h | 1 +
3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 4b1d39f..3b46994 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -692,11 +692,26 @@ free_client(struct nfs4_client *clp)
kfree(clp);
}

+void
+release_session_client(struct nfsd4_session *session)
+{
+ struct nfs4_client *clp = session->se_client;
+
+ spin_lock(&client_lock);
+ BUG_ON(clp->cl_refcount <= 0);
+ if (--clp->cl_refcount <= 0)
+ free_client(clp);
+ else if (clp->cl_refcount == 1)
+ renew_client_locked(clp);
+ spin_unlock(&client_lock);
+ nfsd4_put_session(session);
+}
+
/* must be called under the client_lock */
static inline void
unhash_client_locked(struct nfs4_client *clp)
{
- list_del(&clp->cl_lru);
+ list_del_init(&clp->cl_lru);
while (!list_empty(&clp->cl_sessions)) {
struct nfsd4_session *ses;
ses = list_entry(clp->cl_sessions.next, struct nfsd4_session,
@@ -1460,8 +1475,7 @@ out:
/* Hold a session reference until done processing the compound. */
if (cstate->session) {
nfsd4_get_session(cstate->session);
- /* Renew the clientid on success and on replay */
- renew_client_locked(session->se_client);
+ session->se_client->cl_refcount++;
}
spin_unlock(&client_lock);
dprintk("%s: return %d\n", __func__, ntohl(status));
@@ -2581,7 +2595,12 @@ nfs4_laundromat(void)
clientid_val = t;
break;
}
- list_del_init(&clp->cl_lru);
+ if (clp->cl_refcount > 1) {
+ dprintk("NFSD: client in use (clientid %08x)\n",
+ clp->cl_clientid.cl_id);
+ continue;
+ }
+ unhash_client_locked(clp);
spin_unlock(&client_lock);
dprintk("NFSD: purging unused client (clientid %08x)\n",
clp->cl_clientid.cl_id);
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index b27bcf3..05536e8 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3312,7 +3312,8 @@ nfs4svc_encode_compoundres(struct svc_rqst *rqstp, __be32 *p, struct nfsd4_compo
dprintk("%s: SET SLOT STATE TO AVAILABLE\n", __func__);
cs->slot->sl_inuse = false;
}
- nfsd4_put_session(cs->session);
+ /* Renew the clientid on success and on replay */
+ release_session_client(cs->session);
}
return 1;
}
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 09c827d..30121c3 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -407,6 +407,7 @@ extern int nfs4_has_reclaimed_state(const char *name, bool use_exchange_id);
extern void nfsd4_recdir_purge_old(void);
extern int nfsd4_create_clid_dir(struct nfs4_client *clp);
extern void nfsd4_remove_clid_dir(struct nfs4_client *clp);
+extern void release_session_client(struct nfsd4_session *);

static inline void
nfs4_put_stateowner(struct nfs4_stateowner *so)
--
1.6.5.1

2010-05-04 22:45:09

by Benny Halevy

[permalink] [raw]

Subject: [PATCH 8/8] nfsd41: cstate->session can NULL in nfsd4_destroy_session

If DESTROY_SESSION arrives on a singleton compound that has no SEQUENCE
operation cstate->session is NULL.

Signed-off-by: Benny Halevy <[email protected]>
---
fs/nfsd/nfs4state.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 3b46994..8da730f 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1394,8 +1394,9 @@ nfsd4_destroy_session(struct svc_rqst *r,
* - Do we need to clear any callback info from previous session?
*/

- if (!memcmp(&sessionid->sessionid, &cstate->session->se_sessionid,
- sizeof(struct nfs4_sessionid))) {
+ if (cstate->session &&
+ !memcmp(&sessionid->sessionid, &cstate->session->se_sessionid,
+ sizeof(struct nfs4_sessionid))) {
if (!nfsd4_last_compound_op(r))
return nfserr_not_only_op;
}
--
1.6.5.1