LinuxLists.cc - Re: [PATCH RFC v9 2/2] nfsd: Initial implementation of NFSv4 Courteous Server

2022-01-12 23:56:53

Subject: Re: [PATCH RFC v9 2/2] nfsd: Initial implementation of NFSv4 Courteous Server

On Mon, Jan 10, 2022 at 10:50:53AM -0800, Dai Ngo wrote:
> static time64_t
> nfs4_laundromat(struct nfsd_net *nn)
> {
> @@ -5587,7 +5834,9 @@ nfs4_laundromat(struct nfsd_net *nn)
> };
> struct nfs4_cpntf_state *cps;
> copy_stateid_t *cps_t;
> + struct nfs4_stid *stid;
> int i;
> + int id;
>
> if (clients_still_reclaiming(nn)) {
> lt.new_timeo = 0;
> @@ -5608,8 +5857,41 @@ nfs4_laundromat(struct nfsd_net *nn)
> spin_lock(&nn->client_lock);
> list_for_each_safe(pos, next, &nn->client_lru) {
> clp = list_entry(pos, struct nfs4_client, cl_lru);
> - if (!state_expired(&lt, clp->cl_time))
> + spin_lock(&clp->cl_cs_lock);
> + if (test_bit(NFSD4_DESTROY_COURTESY_CLIENT, &clp->cl_flags))
> + goto exp_client;
> + if (test_bit(NFSD4_COURTESY_CLIENT, &clp->cl_flags)) {
> + if (ktime_get_boottime_seconds() >= clp->courtesy_client_expiry)
> + goto exp_client;
> + /*
> + * after umount, v4.0 client is still around
> + * waiting to be expired. Check again and if
> + * it has no state then expire it.
> + */
> + if (clp->cl_minorversion) {
> + spin_unlock(&clp->cl_cs_lock);
> + continue;
> + }

I'm not following that comment or that logic.

> + }
> + if (!state_expired(&lt, clp->cl_time)) {
> + spin_unlock(&clp->cl_cs_lock);
> break;
> + }
> + id = 0;
> + spin_lock(&clp->cl_lock);
> + stid = idr_get_next(&clp->cl_stateids, &id);
> + if (stid && !nfs4_anylock_conflict(clp)) {
> + /* client still has states */

I'm a little confused by that comment. I think what you just checked is
that the client has some state, *and* nobody is waiting for one of its
locks. For me, that comment just conufses things.

> + spin_unlock(&clp->cl_lock);

Is nn->client_lock enough to guarantee that the condition you just
checked still holds? (Honest question, I'm not sure.)

> + clp->courtesy_client_expiry =
> + ktime_get_boottime_seconds() + courtesy_client_expiry;
> + set_bit(NFSD4_COURTESY_CLIENT, &clp->cl_flags);
> + spin_unlock(&clp->cl_cs_lock);
> + continue;
> + }
> + spin_unlock(&clp->cl_lock);
> +exp_client:
> + spin_unlock(&clp->cl_cs_lock);
> if (mark_client_expired_locked(clp))
> continue;
> list_add(&clp->cl_lru, &reaplist);

In general this loop is more complicated than the rest of the logic in
nfs4_laundromat(). I'd be looking for ways to simplify it and/or move some
of it into a helper function.

--b.

> @@ -5689,9 +5971,6 @@ nfs4_laundromat(struct nfsd_net *nn)
> return max_t(time64_t, lt.new_timeo, NFSD_LAUNDROMAT_MINTIMEOUT);
> }

2022-01-13 12:25:24

by Dai Ngo

[permalink] [raw]

Subject: Re: [PATCH RFC v9 2/2] nfsd: Initial implementation of NFSv4 Courteous Server

On 1/12/22 11:40 AM, J. Bruce Fields wrote:
> On Mon, Jan 10, 2022 at 10:50:53AM -0800, Dai Ngo wrote:
>> static time64_t
>> nfs4_laundromat(struct nfsd_net *nn)
>> {
>> @@ -5587,7 +5834,9 @@ nfs4_laundromat(struct nfsd_net *nn)
>> };
>> struct nfs4_cpntf_state *cps;
>> copy_stateid_t *cps_t;
>> + struct nfs4_stid *stid;
>> int i;
>> + int id;
>>
>> if (clients_still_reclaiming(nn)) {
>> lt.new_timeo = 0;
>> @@ -5608,8 +5857,41 @@ nfs4_laundromat(struct nfsd_net *nn)
>> spin_lock(&nn->client_lock);
>> list_for_each_safe(pos, next, &nn->client_lru) {
>> clp = list_entry(pos, struct nfs4_client, cl_lru);
>> - if (!state_expired(&lt, clp->cl_time))
>> + spin_lock(&clp->cl_cs_lock);
>> + if (test_bit(NFSD4_DESTROY_COURTESY_CLIENT, &clp->cl_flags))
>> + goto exp_client;
>> + if (test_bit(NFSD4_COURTESY_CLIENT, &clp->cl_flags)) {
>> + if (ktime_get_boottime_seconds() >= clp->courtesy_client_expiry)
>> + goto exp_client;
>> + /*
>> + * after umount, v4.0 client is still around
>> + * waiting to be expired. Check again and if
>> + * it has no state then expire it.
>> + */
>> + if (clp->cl_minorversion) {
>> + spin_unlock(&clp->cl_cs_lock);
>> + continue;
>> + }
> I'm not following that comment or that logic.

When unmounting an export v4.0 client closes all its state. These state
are kept around on nn->close_lru to handle CLOSE replay. They remain on
the queue even after the client state (clp->cl_time) expired and became
courtesy client.

Eventually these state are freed by the laundromat when the state expire.
This is why we check v4.0 courtesy client again and if there is no state
associated with it then we expire the client.

>> + }
>> + if (!state_expired(&lt, clp->cl_time)) {
>> + spin_unlock(&clp->cl_cs_lock);
>> break;
>> + }
>> + id = 0;
>> + spin_lock(&clp->cl_lock);
>> + stid = idr_get_next(&clp->cl_stateids, &id);
>> + if (stid && !nfs4_anylock_conflict(clp)) {
>> + /* client still has states */
> I'm a little confused by that comment. I think what you just checked is
> that the client has some state, *and* nobody is waiting for one of its
> locks. For me, that comment just conufses things.

will remove.

>
>> + spin_unlock(&clp->cl_lock);
> Is nn->client_lock enough to guarantee that the condition you just
> checked still holds? (Honest question, I'm not sure.)

nfs4_anylock_conflict_locked scans cl_ownerstr_hashtbl which is protected
by the cl_lock.

>
>> + clp->courtesy_client_expiry =
>> + ktime_get_boottime_seconds() + courtesy_client_expiry;
>> + set_bit(NFSD4_COURTESY_CLIENT, &clp->cl_flags);
>> + spin_unlock(&clp->cl_cs_lock);
>> + continue;
>> + }
>> + spin_unlock(&clp->cl_lock);
>> +exp_client:
>> + spin_unlock(&clp->cl_cs_lock);
>> if (mark_client_expired_locked(clp))
>> continue;
>> list_add(&clp->cl_lru, &reaplist);
> In general this loop is more complicated than the rest of the logic in
> nfs4_laundromat(). I'd be looking for ways to simplify it and/or move some
> of it into a helper function.

I will move it to a function.

-Dai

2022-01-13 15:42:09

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH RFC v9 2/2] nfsd: Initial implementation of NFSv4 Courteous Server

On Thu, Jan 13, 2022 at 12:51:57AM -0800, [email protected] wrote:
>
> On 1/12/22 11:40 AM, J. Bruce Fields wrote:
> >On Mon, Jan 10, 2022 at 10:50:53AM -0800, Dai Ngo wrote:
> >>+ }
> >>+ if (!state_expired(&lt, clp->cl_time)) {
> >>+ spin_unlock(&clp->cl_cs_lock);
> >> break;
> >>+ }
> >>+ id = 0;
> >>+ spin_lock(&clp->cl_lock);
> >>+ stid = idr_get_next(&clp->cl_stateids, &id);
> >>+ if (stid && !nfs4_anylock_conflict(clp)) {
> >>+ /* client still has states */
> >I'm a little confused by that comment. I think what you just checked is
> >that the client has some state, *and* nobody is waiting for one of its
> >locks. For me, that comment just conufses things.
>
> will remove.
>
> >
> >>+ spin_unlock(&clp->cl_lock);
> >Is nn->client_lock enough to guarantee that the condition you just
> >checked still holds? (Honest question, I'm not sure.)
>
> nfs4_anylock_conflict_locked scans cl_ownerstr_hashtbl which is protected
> by the cl_lock.

That doesn't answer the question. Which, I confess, was muddled (I
should have said "clp->cl_cs_lock", not "nn->client_lock".)

Let me try it a different way. You just checked that the client has
some state, and that nobody is waiting for one of its locks.

After you drop the cl_lock, how do you know that both of those things
are still true?

--b.

2022-01-14 08:23:49

by Dai Ngo

[permalink] [raw]

Subject: Re: [PATCH RFC v9 2/2] nfsd: Initial implementation of NFSv4 Courteous Server

On 1/13/22 7:42 AM, J. Bruce Fields wrote:
> On Thu, Jan 13, 2022 at 12:51:57AM -0800, [email protected] wrote:
>> On 1/12/22 11:40 AM, J. Bruce Fields wrote:
>>> On Mon, Jan 10, 2022 at 10:50:53AM -0800, Dai Ngo wrote:
>>>> + }
>>>> + if (!state_expired(&lt, clp->cl_time)) {
>>>> + spin_unlock(&clp->cl_cs_lock);
>>>> break;
>>>> + }
>>>> + id = 0;
>>>> + spin_lock(&clp->cl_lock);
>>>> + stid = idr_get_next(&clp->cl_stateids, &id);
>>>> + if (stid && !nfs4_anylock_conflict(clp)) {
>>>> + /* client still has states */
>>> I'm a little confused by that comment. I think what you just checked is
>>> that the client has some state, *and* nobody is waiting for one of its
>>> locks. For me, that comment just conufses things.
>> will remove.
>>
>>>> + spin_unlock(&clp->cl_lock);
>>> Is nn->client_lock enough to guarantee that the condition you just
>>> checked still holds? (Honest question, I'm not sure.)
>> nfs4_anylock_conflict_locked scans cl_ownerstr_hashtbl which is protected
>> by the cl_lock.
> That doesn't answer the question. Which, I confess, was muddled (I
> should have said "clp->cl_cs_lock", not "nn->client_lock".)
>
> Let me try it a different way. You just checked that the client has
> some state, and that nobody is waiting for one of its locks.
>
> After you drop the cl_lock, how do you know that both of those things
> are still true?

After we drop the lock, if the client now has no state then it just
remains in memory until the courtesy client timeout expires then we
get rid of it.

For the race condition of lock conflict, we use the client->cl_cs_lock
to synchronize the laundromat and and lm_lock_conflict/nfsd4_fl_lock_conflict.
If the locking thread acquires the cl_cs_lock before the laundromat does
then the thread will be blocked and laundromat detects there is blocker
and expires the client. If the laundromat acquires the cl_cs_lock first
then NFSD4_COURTESY_CLIENT is set and nfsd4_fl_lock_conflict detects
this flag and sets the client to NFSD4_DESTROY_COURTESY_CLIENT.

-Dai