From: Jeff Layton <jlayton@redhat.com>
Subject: Re: [PATCH 6/6] NLM: Add reference counting to lockd
Date: Fri, 21 Dec 2007 15:46:05 -0500
Message-ID: <20071221154605.5c852a8b@tleilax.poochiereds.net>
References: <1198250890-25571-1-git-send-email-jlayton@redhat.com>
	<1198250890-25571-2-git-send-email-jlayton@redhat.com>
	<1198250890-25571-3-git-send-email-jlayton@redhat.com>
	<1198250890-25571-4-git-send-email-jlayton@redhat.com>
	<1198250890-25571-5-git-send-email-jlayton@redhat.com>
	<1198250890-25571-6-git-send-email-jlayton@redhat.com>
	<1198250890-25571-7-git-send-email-jlayton@redhat.com>
	<213379F3-BB83-47CB-9488-436D246026B5@oracle.com>
	<20071221120215.03beada0@tleilax.poochiereds.net>
	<58F4CCD3-1E95-4B53-8FF3-F979B984D533@oracle.com>
	<20071221145456.122174d0@tleilax.poochiereds.net>
	<B71A85B3-25A5-4B90-A447-C0C063AD1E66@oracle.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Cc: trond.myklebust@fys.uio.no, linux-nfs@vger.kernel.org
To: Chuck Lever <chuck.lever@oracle.com>
In-Reply-To: <B71A85B3-25A5-4B90-A447-C0C063AD1E66@oracle.com>
Sender: linux-nfs-owner@vger.kernel.org

On Fri, 21 Dec 2007 15:25:29 -0500
Chuck Lever <chuck.lever@oracle.com> wrote:

> On Dec 21, 2007, at 2:54 PM, Jeff Layton wrote:
> > On Fri, 21 Dec 2007 12:51:25 -0500
> > Chuck Lever <chuck.lever@oracle.com> wrote:
> >
> >> You could easily post a message to the kernel log that says "lockd
> >> was signaled to stop, but is waiting for outstanding requests to
> >> complete."
> >
> > Here's a respun patch 6 that contains the warning message suggested
> > by Chuck.
> >
> > Thoughts?
> >
> > ------------[snip]--------------
> >
> > NLM: Add reference counting to lockd
> >
> > ...and only have lockd exit when the last reference is dropped.
> >
> > The problem is this:
> >
> > When a lock that a client is blocking on comes free, lockd does  
> > this in
> > nlmsvc_grant_blocked():
> >
> >     nlm_async_call(block->b_call, NLMPROC_GRANTED_MSG,  
> > &nlmsvc_grant_ops);
> >
> > the callback from this call is nlmsvc_grant_callback(). That
> > function does this at the end to wake up lockd:
> >
> >     svc_wake_up(block->b_daemon);
> >
> > However there is no guarantee that lockd will be up when this
> > happens. If someone shuts down or restarts lockd before the async
> > call completes,
> > then the b_daemon pointer will point to freed memory and the
> > kernel may
> > oops.
> >
> > I first noticed this on older kernels and had mistakenly thought
> > that newer kernels weren't susceptible, but that's not correct.
> > There's a bit
> > of a race to make sure that the nlm_host is bound when the async  
> > call is
> > done, but I can now reproduce this at will on current kernels.
> >
> > This patch is based on Trond's suggestion to add a new reference  
> > counter
> > to lockd, and only allows lockd to go down when it reaches 0. With  
> > this
> > change we can't use kthread_stop here. nlmsvc_unlink_block is  
> > called by
> > lockd and a kthread can't call kthread_stop on itself. So the patch
> > changes lockd to check the refcount itself and to return if it goes
> > to 0. We do the checking and exit while holding the nlmsvc_mutex to
> > make sure that a new lockd is not started until the old one is down.
> >
> > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > ---
> >  fs/lockd/svc.c              |   52
> > ++++++++++++++++++++++++++++++++ +---------
> >  fs/lockd/svclock.c          |    5 ++++
> >  include/linux/lockd/lockd.h |    1 +
> >  3 files changed, 47 insertions(+), 11 deletions(-)
> >
> > diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
> > index d7209ea..71a4f65 100644
> > --- a/fs/lockd/svc.c
> > +++ b/fs/lockd/svc.c
> > @@ -51,6 +51,7 @@ static DEFINE_MUTEX(nlmsvc_mutex);
> >  static unsigned int		nlmsvc_users;
> >  static struct task_struct	*nlmsvc_task;
> >  static struct svc_serv		*nlmsvc_serv;
> > +atomic_t			nlmsvc_ref = ATOMIC_INIT(0);
> >  int				nlmsvc_grace_period;
> >  unsigned long			nlmsvc_timeout;
> >
> > @@ -134,7 +135,10 @@ lockd(void *vrqstp)
> >
> >  	set_freezable();
> >
> > -	/* Process request with signals blocked, but allow
> > SIGKILL.  */
> > +	/*
> > +	 * Process request with signals blocked, but allow SIGKILL
> > which
> > +	 * signifies that lockd should drop all of its locks.
> > +	 */
> >  	allow_signal(SIGKILL);
> >
> >  	dprintk("NFS locking service started (ver " LOCKD_VERSION
> > ").\n"); @@ -147,15 +151,19 @@ lockd(void *vrqstp)
> >
> >  	/*
> >  	 * The main request loop. We don't terminate until the last
> > -	 * NFS mount or NFS daemon has gone away, and we've been
> > sent a
> > -	 * signal, or else another process has taken over our job.
> > +	 * NFS mount or NFS daemon has gone away, and the
> > nlm_blocked
> > +	 * list is empty. The nlmsvc_mutex ensures that we prevent
> > a
> > +	 * new lockd from being started before the old one is down.
> >  	 */
> > -	while (!kthread_should_stop()) {
> > +	mutex_lock(&nlmsvc_mutex);
> > +	while (atomic_read(&nlmsvc_ref) != 0) {
> >  		long timeout = MAX_SCHEDULE_TIMEOUT;
> >  		char buf[RPC_MAX_ADDRBUFLEN];
> >
> > +		mutex_unlock(&nlmsvc_mutex);
> > +
> >  		if (try_to_freeze())
> > -			continue;
> > +			goto again;
> >
> >  		if (signalled()) {
> >  			flush_signals(current);
> > @@ -182,11 +190,12 @@ lockd(void *vrqstp)
> >  		 */
> >  		err = svc_recv(rqstp, timeout);
> >  		if (err == -EAGAIN || err == -EINTR)
> > -			continue;
> > +			goto again;
> >  		if (err < 0) {
> >  			printk(KERN_WARNING
> >  			       "lockd: terminating on error %d\n",
> >  			       -err);
> > +			mutex_lock(&nlmsvc_mutex);
> >  			break;
> >  		}
> >
> > @@ -194,19 +203,22 @@ lockd(void *vrqstp)
> >  				svc_print_addr(rqstp, buf,
> > sizeof(buf)));
> >
> >  		svc_process(rqstp);
> > +again:
> > +		mutex_lock(&nlmsvc_mutex);
> >  	}
> >
> > -	flush_signals(current);
> > -
> >  	/*
> > -	 * Check whether there's a new lockd process before
> > -	 * shutting down the hosts and clearing the slot.
> > +	 * at this point lockd is committed to going down. We hold
> > the
> > +	 * nlmsvc_mutex until just before exit to prevent a new one
> > +	 * from starting before it's down.
> >  	 */
> > +	flush_signals(current);
> >  	if (nlmsvc_ops)
> >  		nlmsvc_invalidate_all();
> >  	nlm_shutdown_hosts();
> >  	nlmsvc_task = NULL;
> >  	nlmsvc_serv = NULL;
> > +	mutex_unlock(&nlmsvc_mutex);
> >
> >  	/* Exit the RPC thread */
> >  	svc_exit_thread(rqstp);
> > @@ -269,6 +281,10 @@ lockd_up(int proto) /* Maybe add a 'family'  
> > option when IPv6 is supported ?? */
> >  	int		error = 0;
> >
> >  	mutex_lock(&nlmsvc_mutex);
> > +
> > +	if (!nlmsvc_users)
> > +		atomic_inc(&nlmsvc_ref);
> > +
> >  	/*
> >  	 * Check whether we're already up and running.
> >  	 */
> > @@ -328,6 +344,8 @@ lockd_up(int proto) /* Maybe add a 'family'  
> > option when IPv6 is supported ?? */
> >  destroy_and_out:
> >  	svc_destroy(serv);
> >  out:
> > +	if (!nlmsvc_users && error)
> > +		atomic_dec(&nlmsvc_ref);
> >  	if (!error)
> >  		nlmsvc_users++;
> >  	mutex_unlock(&nlmsvc_mutex);
> > @@ -357,7 +375,19 @@ lockd_down(void)
> >  		goto out;
> >  	}
> >  	warned = 0;
> > -	kthread_stop(nlmsvc_task);
> > +	if (atomic_sub_return(1, &nlmsvc_ref) != 0)
> > +		printk(KERN_WARNING "lockd_down: lockd signalled
> > to go down, "
> > +			"but is waiting for outstanding requests
> > to "
> > +			"complete.\n");
> 
> We could quibble about the proper spelling of "signaled".
> 
> "lockd_down: lockd is waiting for outstanding requests to complete  
> before exiting."
> 
> might be less awkward.
> 
> Otherwise, I think this is helpful.
> 

m-w.com says either form is correct, though you're probably right that
your later suggestion is less awkward (and also more correct since
lockd technically hasn't been sent a signal at that point).

I'll plan to let this sit for a day or two and see if anyone else has
comments on the set. If not, then I'll respin with a clarified warning
message.

Thanks for the review,
-- 
Jeff Layton <jlayton@redhat.com>