2008-01-10 18:02:26

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 0/5] Intro: convert lockd to kthread and fix use-after-free (try #7)

This is the seventh patchset to fix the use-after-free problem in lockd
which we originally discussed back in October. Along the way, Christoph
Hellwig mentioned that it would be advantageous to convert lockd to use
the kthread API. This patch set first makes that change and then patches
it to actually fix the use after free problem. It also fixes a couple of
minor bugs in the current lockd implementation.

This patch takes a different approach in fixing the use-after-free than
earlier ones. Instead of trying to ensure that lockd stays up until all
callbacks complete, this patch tries to just make sure that all RPC's
are canceled when lockd is requested to come down. With this change
we no longer need to have lockd_down signal lockd, we don't need to do
any extra reference counting, and can use the more conventional kthread
functions to handle lockd shutdown.

I've done some very basic testing and everything seems to work as
expected. I've also tested this against the reproducer that I have for
the use-after-free problem and this does fix it. I've tried to make this
cleanly bisectable, but have only really tested the final result.

Many thanks to Trond Myklebust, Chuck Lever, Neil Brown and Christoph
Hellwig for their guidance on this.

Signed-off-by: Jeff Layton <[email protected]>



2008-01-10 18:02:21

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 5/5] NLM: have nlm_shutdown_hosts kill off all NLM RPC tasks

The main problem is this:

When a lock that a client is blocking on comes free, lockd does this in
nlmsvc_grant_blocked():

nlm_async_call(block->b_call, NLMPROC_GRANTED_MSG, &nlmsvc_grant_ops);

the callback from this call is nlmsvc_grant_callback(). That function
does this at the end to wake up lockd:

svc_wake_up(block->b_daemon);

However there is no guarantee that lockd will be up when this happens.
If someone shuts down or restarts lockd before the async call completes,
then the b_daemon pointer will point to freed memory and the kernel may
oops.

If we're shutting down all the nlm_hosts anyway, then it doesn't make
sense to allow RPC calls to linger. Allowing them to do so can mean
that the RPC calls can outlive the currently running lockd and can lead
to the above use after free situation and possibly others.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/lockd/host.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/lockd/host.c b/fs/lockd/host.c
index 572601e..8771484 100644
--- a/fs/lockd/host.c
+++ b/fs/lockd/host.c
@@ -377,8 +377,10 @@ nlm_shutdown_hosts(void)
/* First, make all hosts eligible for gc */
dprintk("lockd: nuking all hosts...\n");
for (chain = nlm_hosts; chain < nlm_hosts + NLM_HOST_NRHASH; ++chain) {
- hlist_for_each_entry(host, pos, chain, h_hash)
+ hlist_for_each_entry(host, pos, chain, h_hash) {
host->h_expires = jiffies - 1;
+ rpc_killall_tasks(host->h_rpcclnt);
+ }
}

/* Then, perform a garbage collection pass */
--
1.5.3.7


2008-01-10 18:02:27

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 3/5] NLM: Have lockd call try_to_freeze

lockd makes itself freezable, but never calls try_to_freeze(). Have it
call try_to_freeze() within the main loop.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/lockd/svc.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index 82e2192..6ee8bed 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -155,6 +155,9 @@ lockd(struct svc_rqst *rqstp)
long timeout = MAX_SCHEDULE_TIMEOUT;
char buf[RPC_MAX_ADDRBUFLEN];

+ if (try_to_freeze())
+ continue;
+
if (signalled()) {
flush_signals(current);
if (nlmsvc_ops) {
--
1.5.3.7


2008-01-10 18:02:31

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 4/5] NLM: Convert lockd to use kthreads

Have lockd_up start lockd using kthread_run. With this change,
lockd_down now blocks until lockd actually exits, so there's no longer
need for the waitqueue code at the end of lockd_down. This also means
that only one lockd can be running at a time which simplifies the code
within lockd's main loop.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/lockd/svc.c | 113 +++++++++++++++++++++----------------------------------
1 files changed, 43 insertions(+), 70 deletions(-)

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index 6ee8bed..1ecf551 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -25,6 +25,7 @@
#include <linux/smp.h>
#include <linux/smp_lock.h>
#include <linux/mutex.h>
+#include <linux/kthread.h>
#include <linux/freezer.h>

#include <linux/sunrpc/types.h>
@@ -48,14 +49,11 @@ EXPORT_SYMBOL(nlmsvc_ops);

static DEFINE_MUTEX(nlmsvc_mutex);
static unsigned int nlmsvc_users;
-static pid_t nlmsvc_pid;
+static struct task_struct *nlmsvc_task;
static struct svc_serv *nlmsvc_serv;
int nlmsvc_grace_period;
unsigned long nlmsvc_timeout;

-static DECLARE_COMPLETION(lockd_start_done);
-static DECLARE_WAIT_QUEUE_HEAD(lockd_exit);
-
/*
* These can be set at insmod time (useful for NFS as root filesystem),
* and also changed through the sysctl interface. -- Jamie Lokier, Aug 2003
@@ -111,31 +109,19 @@ static inline void clear_grace_period(void)
/*
* This is the lockd kernel thread
*/
-static void
-lockd(struct svc_rqst *rqstp)
+static int
+lockd(void *vrqstp)
{
int err = 0;
+ struct svc_rqst *rqstp = vrqstp;
unsigned long grace_period_expire;

- /* Lock module and set up kernel thread */
- /* lockd_up is waiting for us to startup, so will
- * be holding a reference to this module, so it
- * is safe to just claim another reference
- */
- __module_get(THIS_MODULE);
+ /* set up kernel thread */
lock_kernel();
-
- /*
- * Let our maker know we're running.
- */
- nlmsvc_pid = current->pid;
nlmsvc_serv = rqstp->rq_server;
- complete(&lockd_start_done);
-
- daemonize("lockd");
set_freezable();

- /* Process request with signals blocked, but allow SIGKILL. */
+ /* Allow SIGKILL to tell lockd to drop all of its locks */
allow_signal(SIGKILL);

dprintk("NFS locking service started (ver " LOCKD_VERSION ").\n");
@@ -148,10 +134,9 @@ lockd(struct svc_rqst *rqstp)

/*
* The main request loop. We don't terminate until the last
- * NFS mount or NFS daemon has gone away, and we've been sent a
- * signal, or else another process has taken over our job.
+ * NFS mount or NFS daemon has gone away.
*/
- while ((nlmsvc_users || !signalled()) && nlmsvc_pid == current->pid) {
+ while (!kthread_should_stop()) {
long timeout = MAX_SCHEDULE_TIMEOUT;
char buf[RPC_MAX_ADDRBUFLEN];

@@ -199,27 +184,18 @@ lockd(struct svc_rqst *rqstp)

flush_signals(current);

- /*
- * Check whether there's a new lockd process before
- * shutting down the hosts and clearing the slot.
- */
- if (!nlmsvc_pid || current->pid == nlmsvc_pid) {
- if (nlmsvc_ops)
- nlmsvc_invalidate_all();
- nlm_shutdown_hosts();
- nlmsvc_pid = 0;
- nlmsvc_serv = NULL;
- } else
- printk(KERN_DEBUG
- "lockd: new process, skipping host shutdown\n");
- wake_up(&lockd_exit);
+ if (nlmsvc_ops)
+ nlmsvc_invalidate_all();
+ nlm_shutdown_hosts();
+ nlmsvc_task = NULL;
+ nlmsvc_serv = NULL;

/* Exit the RPC thread */
svc_exit_thread(rqstp);

/* Release module */
unlock_kernel();
- module_put_and_exit(0);
+ return 0;
}


@@ -269,14 +245,15 @@ static int make_socks(struct svc_serv *serv, int proto)
int
lockd_up(int proto) /* Maybe add a 'family' option when IPv6 is supported ?? */
{
- struct svc_serv * serv;
- int error = 0;
+ struct svc_serv *serv;
+ struct svc_rqst *rqstp;
+ int error = 0;

mutex_lock(&nlmsvc_mutex);
/*
* Check whether we're already up and running.
*/
- if (nlmsvc_pid) {
+ if (nlmsvc_task) {
if (proto)
error = make_socks(nlmsvc_serv, proto);
goto out;
@@ -303,13 +280,25 @@ lockd_up(int proto) /* Maybe add a 'family' option when IPv6 is supported ?? */
/*
* Create the kernel thread and wait for it to start.
*/
- error = svc_create_thread(lockd, serv);
- if (error) {
+ rqstp = svc_prepare_thread(serv, &serv->sv_pools[0]);
+ if (IS_ERR(rqstp)) {
+ error = PTR_ERR(rqstp);
printk(KERN_WARNING
- "lockd_up: create thread failed, error=%d\n", error);
+ "lockd_up: svc_rqst allocation failed, error=%d\n",
+ error);
+ goto destroy_and_out;
+ }
+
+ svc_sock_update_bufs(serv);
+ nlmsvc_task = kthread_run(lockd, rqstp, serv->sv_name);
+ if (IS_ERR(nlmsvc_task)) {
+ error = PTR_ERR(nlmsvc_task);
+ nlmsvc_task = NULL;
+ printk(KERN_WARNING
+ "lockd_up: kthread_run failed, error=%d\n", error);
+ svc_exit_thread(rqstp);
goto destroy_and_out;
}
- wait_for_completion(&lockd_start_done);

/*
* Note: svc_serv structures have an initial use count of 1,
@@ -331,37 +320,21 @@ EXPORT_SYMBOL(lockd_up);
void
lockd_down(void)
{
- static int warned;
-
mutex_lock(&nlmsvc_mutex);
if (nlmsvc_users) {
if (--nlmsvc_users)
goto out;
- } else
- printk(KERN_WARNING "lockd_down: no users! pid=%d\n", nlmsvc_pid);
-
- if (!nlmsvc_pid) {
- if (warned++ == 0)
- printk(KERN_WARNING "lockd_down: no lockd running.\n");
- goto out;
+ } else {
+ printk(KERN_ERR "lockd_down: no users! task=%p\n",
+ nlmsvc_task);
+ BUG();
}
- warned = 0;

- kill_proc(nlmsvc_pid, SIGKILL, 1);
- /*
- * Wait for the lockd process to exit, but since we're holding
- * the lockd semaphore, we can't wait around forever ...
- */
- clear_thread_flag(TIF_SIGPENDING);
- interruptible_sleep_on_timeout(&lockd_exit, HZ);
- if (nlmsvc_pid) {
- printk(KERN_WARNING
- "lockd_down: lockd failed to exit, clearing pid\n");
- nlmsvc_pid = 0;
+ if (!nlmsvc_task) {
+ printk(KERN_ERR "lockd_down: no lockd running.\n");
+ BUG();
}
- spin_lock_irq(&current->sighand->siglock);
- recalc_sigpending();
- spin_unlock_irq(&current->sighand->siglock);
+ kthread_stop(nlmsvc_task);
out:
mutex_unlock(&nlmsvc_mutex);
}
--
1.5.3.7


2008-01-10 18:02:32

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 2/5] SUNRPC: export svc_sock_update_bufs

Needed since the plan is to not have a svc_create_thread helper and to
have current users of that function just call kthread_run directly.

Signed-off-by: Jeff Layton <[email protected]>
---
net/sunrpc/svcsock.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 057c870..f2bef16 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1407,6 +1407,7 @@ svc_sock_update_bufs(struct svc_serv *serv)
}
spin_unlock_bh(&serv->sv_lock);
}
+EXPORT_SYMBOL(svc_sock_update_bufs);

/*
* Receive the next request on any socket. This code is carefully
--
1.5.3.7


2008-01-10 18:02:32

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 1/5] SUNRPC: spin svc_rqst initialization to its own function

Move the initialzation in __svc_create_thread that happens prior to
thread creation to a new function. Export the function to allow
services to have better control over the svc_rqst structs.

Also rearrange the rqstp initialization to prevent NULL pointer
dereferences in svc_exit_thread in case allocations fail.

Signed-off-by: Jeff Layton <[email protected]>
---
include/linux/sunrpc/svc.h | 2 +
net/sunrpc/svc.c | 59 +++++++++++++++++++++++++++++++------------
2 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 8531a70..5f07300 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -382,6 +382,8 @@ struct svc_procedure {
*/
struct svc_serv * svc_create(struct svc_program *, unsigned int,
void (*shutdown)(struct svc_serv*));
+struct svc_rqst *svc_prepare_thread(struct svc_serv *serv,
+ struct svc_pool *pool);
int svc_create_thread(svc_thread_fn, struct svc_serv *);
void svc_exit_thread(struct svc_rqst *);
struct svc_serv * svc_create_pooled(struct svc_program *, unsigned int,
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index fca17d0..f9636bf 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -538,31 +538,17 @@ svc_release_buffer(struct svc_rqst *rqstp)
put_page(rqstp->rq_pages[i]);
}

-/*
- * Create a thread in the given pool. Caller must hold BKL.
- * On a NUMA or SMP machine, with a multi-pool serv, the thread
- * will be restricted to run on the cpus belonging to the pool.
- */
-static int
-__svc_create_thread(svc_thread_fn func, struct svc_serv *serv,
- struct svc_pool *pool)
+struct svc_rqst *
+svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool)
{
struct svc_rqst *rqstp;
- int error = -ENOMEM;
- int have_oldmask = 0;
- cpumask_t oldmask;

rqstp = kzalloc(sizeof(*rqstp), GFP_KERNEL);
if (!rqstp)
- goto out;
+ goto out_enomem;

init_waitqueue_head(&rqstp->rq_wait);

- if (!(rqstp->rq_argp = kmalloc(serv->sv_xdrsize, GFP_KERNEL))
- || !(rqstp->rq_resp = kmalloc(serv->sv_xdrsize, GFP_KERNEL))
- || !svc_init_buffer(rqstp, serv->sv_max_mesg))
- goto out_thread;
-
serv->sv_nrthreads++;
spin_lock_bh(&pool->sp_lock);
pool->sp_nrthreads++;
@@ -571,6 +557,45 @@ __svc_create_thread(svc_thread_fn func, struct svc_serv *serv,
rqstp->rq_server = serv;
rqstp->rq_pool = pool;

+ rqstp->rq_argp = kmalloc(serv->sv_xdrsize, GFP_KERNEL);
+ if (!rqstp->rq_argp)
+ goto out_thread;
+
+ rqstp->rq_resp = kmalloc(serv->sv_xdrsize, GFP_KERNEL);
+ if (!rqstp->rq_resp)
+ goto out_thread;
+
+ if (!svc_init_buffer(rqstp, serv->sv_max_mesg))
+ goto out_thread;
+
+ return rqstp;
+out_thread:
+ svc_exit_thread(rqstp);
+out_enomem:
+ return ERR_PTR(-ENOMEM);
+}
+EXPORT_SYMBOL(svc_prepare_thread);
+
+/*
+ * Create a thread in the given pool. Caller must hold BKL.
+ * On a NUMA or SMP machine, with a multi-pool serv, the thread
+ * will be restricted to run on the cpus belonging to the pool.
+ */
+static int
+__svc_create_thread(svc_thread_fn func, struct svc_serv *serv,
+ struct svc_pool *pool)
+{
+ struct svc_rqst *rqstp;
+ int error = -ENOMEM;
+ int have_oldmask = 0;
+ cpumask_t oldmask;
+
+ rqstp = svc_prepare_thread(serv, pool);
+ if (IS_ERR(rqstp)) {
+ error = PTR_ERR(rqstp);
+ goto out;
+ }
+
if (serv->sv_nrpools > 1)
have_oldmask = svc_pool_map_set_cpumask(pool->sp_id, &oldmask);

--
1.5.3.7


2008-01-11 01:37:32

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 0/5] Intro: convert lockd to kthread and fix use-after-free (try #7)

On Thursday January 10, [email protected] wrote:
> This is the seventh patchset to fix the use-after-free problem in lockd
....

This patch set looks good now. I'm happy to give it a

Reviewed-by: NeilBrown <[email protected]>

Two remaining issues that I would like to see address, but don't
necessarily need to be part of this set, are:

1/ When the last nfsd thread dies, lockd should drop all locks, even
if there are active nfs mounts.
One approach might be:
export nlmsvc_invalidate_all
call it from nfsd_last_thread
worry about how to change grace_period_expire.

2/ get rid of svc_wake_up and ->b_daemon
Maybe change b_daemon to b_rqstp and just call
wake_up(&block->b_rqstp->rq_wait)

Thanks,
NeilBrown



2008-01-13 11:55:00

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 3/5] NLM: Have lockd call try_to_freeze

On Thu, 10 Jan 2008 13:01:34 -0500
Jeff Layton <[email protected]> wrote:

> lockd makes itself freezable, but never calls try_to_freeze(). Have it
> call try_to_freeze() within the main loop.
>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/lockd/svc.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
> index 82e2192..6ee8bed 100644
> --- a/fs/lockd/svc.c
> +++ b/fs/lockd/svc.c
> @@ -155,6 +155,9 @@ lockd(struct svc_rqst *rqstp)
> long timeout = MAX_SCHEDULE_TIMEOUT;
> char buf[RPC_MAX_ADDRBUFLEN];
>
> + if (try_to_freeze())
> + continue;
> +
> if (signalled()) {
> flush_signals(current);
> if (nlmsvc_ops) {


I was looking over svc_recv today and noticed that it calls
try_to_freeze a couple of times. Given that, the above patch may be
unnecessary. I don't think it hurts anything though. Should we keep
this patch or drop it?

--
Jeff Layton <[email protected]>

2008-01-13 22:25:00

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 3/5] NLM: Have lockd call try_to_freeze

On Sunday January 13, [email protected] wrote:
> On Thu, 10 Jan 2008 13:01:34 -0500
> Jeff Layton <[email protected]> wrote:
>
> > lockd makes itself freezable, but never calls try_to_freeze(). Have it
> > call try_to_freeze() within the main loop.
> >
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > fs/lockd/svc.c | 3 +++
> > 1 files changed, 3 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
> > index 82e2192..6ee8bed 100644
> > --- a/fs/lockd/svc.c
> > +++ b/fs/lockd/svc.c
> > @@ -155,6 +155,9 @@ lockd(struct svc_rqst *rqstp)
> > long timeout = MAX_SCHEDULE_TIMEOUT;
> > char buf[RPC_MAX_ADDRBUFLEN];
> >
> > + if (try_to_freeze())
> > + continue;
> > +
> > if (signalled()) {
> > flush_signals(current);
> > if (nlmsvc_ops) {
>
>
> I was looking over svc_recv today and noticed that it calls
> try_to_freeze a couple of times. Given that, the above patch may be
> unnecessary. I don't think it hurts anything though. Should we keep
> this patch or drop it?

I would suggest dropping it.
Having unnecessary code is likely to be confusing.

NeilBrown

2008-01-13 23:56:58

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 3/5] NLM: Have lockd call try_to_freeze

On Sunday, 13 of January 2008, Neil Brown wrote:
> On Sunday January 13, [email protected] wrote:
> > On Thu, 10 Jan 2008 13:01:34 -0500
> > Jeff Layton <[email protected]> wrote:
> >
> > > lockd makes itself freezable, but never calls try_to_freeze(). Have it
> > > call try_to_freeze() within the main loop.
> > >
> > > Signed-off-by: Jeff Layton <[email protected]>
> > > ---
> > > fs/lockd/svc.c | 3 +++
> > > 1 files changed, 3 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
> > > index 82e2192..6ee8bed 100644
> > > --- a/fs/lockd/svc.c
> > > +++ b/fs/lockd/svc.c
> > > @@ -155,6 +155,9 @@ lockd(struct svc_rqst *rqstp)
> > > long timeout = MAX_SCHEDULE_TIMEOUT;
> > > char buf[RPC_MAX_ADDRBUFLEN];
> > >
> > > + if (try_to_freeze())
> > > + continue;
> > > +
> > > if (signalled()) {
> > > flush_signals(current);
> > > if (nlmsvc_ops) {
> >
> >
> > I was looking over svc_recv today and noticed that it calls
> > try_to_freeze a couple of times. Given that, the above patch may be
> > unnecessary. I don't think it hurts anything though. Should we keep
> > this patch or drop it?
>
> I would suggest dropping it.
> Having unnecessary code is likely to be confusing.

But adding a comment instead of it won't hurt, IMHO. :-)

Greetings,
Rafael