Return-Path: Received: from daytona.panasas.com ([67.152.220.89]:36665 "EHLO daytona.int.panasas.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1755027Ab0EMOgS (ORCPT ); Thu, 13 May 2010 10:36:18 -0400 Message-ID: <4BEC0E5F.2030608@panasas.com> Date: Thu, 13 May 2010 17:36:15 +0300 From: Benny Halevy To: "J. Bruce Fields" CC: linux-nfs@vger.kernel.org Subject: Re: [PATCH v2 8/9] nfsd4: keep a reference count on client while in use References: <4BE9C7A2.8030801@panasas.com> <1273612434-13726-1-git-send-email-bhalevy@panasas.com> <20100512024030.GA27184@fieldses.org> <4BEA2E06.7070902@panasas.com> <4BEA4876.3030802@panasas.com> <20100512222946.GE11389@fieldses.org> In-Reply-To: <20100512222946.GE11389@fieldses.org> Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-nfs-owner@vger.kernel.org List-ID: MIME-Version: 1.0 On May. 13, 2010, 1:29 +0300, "J. Bruce Fields" wrote: > On Wed, May 12, 2010 at 09:19:34AM +0300, Benny Halevy wrote: >> On May. 12, 2010, 7:26 +0300, Benny Halevy wrote: >>> On May. 12, 2010, 5:40 +0300, " J. Bruce Fields" wrote: >>>> Something still doesn't look quite right: a sequence operation >>>> increments cl_count from 1 to 2; then, say, an exchangeid in another >>>> thread expires the client, dropping cl_count from 2 to 1; then the >>>> laundromat runs, sees cl_count 1, and decides it can expire the >>>> client. Meanwhile, the original compound is still running. >>>> >>>> Am I missing something? >>> >>> Yeah. exchange_id doesn't touch cl_refcount. It may call expire_client >>> explicitly which will unhash the client but will not destroy it if cl_refcount > 0 >>> the laundromat won't the client either after that since it's not on the lru list >>> any more (and even if it would, it's refcount is still great than zero so it would >>> have been ignored) >> >> Sorry, I misspoken. exchange_id may decrement cl_refcount via expire_client() >> but still the laundromat won't see it as expire_client will have already removed the >> client from client_lru. > > Yep, sorry for my confusion! > > The special treatment of refcount == 1 still seems odd to me; with > expiry==0 trick it no longer seems necessary. > > Another case: > > - Two 4.1 compounds arrive, both their sequence operations are > processed. > - Independently, an exchange_id expires the client. > - At this point, the reference count is 2. > - One of the original compounds completes. It renews the client > (because it hits reference count 1). and that will be a no-op as the client is marked as expired. > - The second of the two original compounds completes. It frees > the client. right > > I guess there's nothing fundamentally wrong with that, but it's a little > odd. > > Wouldn't it be more straightforward to let cl_refcount be a count of the > number of outstanding compounds, and make release_session_client do: > > if cl_refcount >= 0 > return; > if (client_is_expired(clp)) > free client; > else > renew client; > > ? > > The following works for me. If you don't see any objection, I'll squash > this in and push out the result. No objection, looks good. Thanks! Benny > > --b. > > commit 9d1b08f5cd9f2367f76f8350595c2bc00876d57f > Author: J. Bruce Fields > Date: Wed May 12 18:17:58 2010 -0400 > > tmp-squashme: count only session users in cl_refcount > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > index 89b77fe..05ad0ae 100644 > --- a/fs/nfsd/nfs4state.c > +++ b/fs/nfsd/nfs4state.c > @@ -706,12 +706,12 @@ release_session_client(struct nfsd4_session *session) > { > struct nfs4_client *clp = session->se_client; > > - spin_lock(&client_lock); > - BUG_ON(clp->cl_refcount <= 0); > - if (--clp->cl_refcount <= 0) { > + if (!atomic_dec_and_lock(&clp->cl_refcount, &client_lock)) > + return; > + if (is_client_expired(clp)) { > free_client(clp); > session->se_client = NULL; > - } else if (clp->cl_refcount == 1) > + } else > renew_client_locked(clp); > spin_unlock(&client_lock); > nfsd4_put_session(session); > @@ -765,8 +765,7 @@ expire_client(struct nfs4_client *clp) > list_del(&clp->cl_strhash); > spin_lock(&client_lock); > unhash_client_locked(clp); > - BUG_ON(clp->cl_refcount <= 0); > - if (--clp->cl_refcount <= 0) > + if (atomic_dec_and_test(&clp->cl_refcount)) > free_client(clp); > spin_unlock(&client_lock); > } > @@ -854,7 +853,7 @@ static struct nfs4_client *create_client(struct xdr_netobj name, char *recdir, > } > > memcpy(clp->cl_recdir, recdir, HEXDIR_LEN); > - clp->cl_refcount = 1; > + atomic_set(&clp->cl_refcount, 1); > atomic_set(&clp->cl_cb_set, 0); > INIT_LIST_HEAD(&clp->cl_idhash); > INIT_LIST_HEAD(&clp->cl_strhash); > @@ -1495,7 +1494,7 @@ out: > /* Hold a session reference until done processing the compound. */ > if (cstate->session) { > nfsd4_get_session(cstate->session); > - session->se_client->cl_refcount++; > + atomic_inc(&session->se_client->cl_refcount); > } > spin_unlock(&client_lock); > dprintk("%s: return %d\n", __func__, ntohl(status)); > @@ -2643,7 +2642,7 @@ nfs4_laundromat(void) > clientid_val = t; > break; > } > - if (clp->cl_refcount > 1) { > + if (atomic_read(&clp->cl_refcount)) { > dprintk("NFSD: client in use (clientid %08x)\n", > clp->cl_clientid.cl_id); > continue; > diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h > index f263174..006c842 100644 > --- a/fs/nfsd/state.h > +++ b/fs/nfsd/state.h > @@ -233,7 +233,8 @@ struct nfs4_client { > struct nfsd4_clid_slot cl_cs_slot; /* create_session slot */ > u32 cl_exchange_flags; > struct nfs4_sessionid cl_sessionid; > - int cl_refcount; /* use under client_lock */ > + /* number of rpc's in progress over an associated session: */ > + atomic_t cl_refcount; > > /* for nfs41 callbacks */ > /* We currently support a single back channel with a single slot */