2008-06-11 14:03:09

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 0/3] fix potential races in lockd and nfs4-callback startup/shutdown

As Trond Myklebust pointed out, the current kthread-based lockd and
nfsv4 callback threads have a race condition if they are started and
brought down very rapidly. If this occurs, there is a chance that the
main thread function will never be run, and the cleanup done when the
function exits will not occur. This patchset fixes this by moving the
cleanup into the respective *_down functions.

While Bruce has just taken in a patchset from me to change nfsd to
kthreads, this race doesn't seem to exist there. We aren't using
kthread_stop() to take down nfsd, only signals, so I don't think the
kthread can be stopped before nfsd() actually runs.

The set also includes a patch to remove the BKL from nfs_callback_up
and nfs_callback_down.

Signed-off-by: Jeff Layton <[email protected]>


2008-06-13 17:53:16

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 3/3] lockd: close potential race with rapid lockd_up/lockd_down cycle

On Wed, Jun 11, 2008 at 10:03:12AM -0400, Jeff Layton wrote:
> If lockd_down is called very rapidly after lockd_up returns, then
> there is a slim chance that lockd() will never be called. kthread()
> will return before calling the function, so we'll end up never
> actually calling the cleanup functions for the thread.

Looks OK to me, thanks. Applied (for 2.6.27, unless someone wants to
argue it's more urgent.)

--b.

>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/lockd/svc.c | 33 +++++++++++++--------------------
> 1 files changed, 13 insertions(+), 20 deletions(-)
>
> diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
> index 2169af4..5bd9bf0 100644
> --- a/fs/lockd/svc.c
> +++ b/fs/lockd/svc.c
> @@ -50,7 +50,7 @@ EXPORT_SYMBOL(nlmsvc_ops);
> static DEFINE_MUTEX(nlmsvc_mutex);
> static unsigned int nlmsvc_users;
> static struct task_struct *nlmsvc_task;
> -static struct svc_serv *nlmsvc_serv;
> +static struct svc_rqst *nlmsvc_rqst;
> int nlmsvc_grace_period;
> unsigned long nlmsvc_timeout;
>
> @@ -194,20 +194,11 @@ lockd(void *vrqstp)
>
> svc_process(rqstp);
> }
> -
> flush_signals(current);
> if (nlmsvc_ops)
> nlmsvc_invalidate_all();
> nlm_shutdown_hosts();
> -
> unlock_kernel();
> -
> - nlmsvc_task = NULL;
> - nlmsvc_serv = NULL;
> -
> - /* Exit the RPC thread */
> - svc_exit_thread(rqstp);
> -
> return 0;
> }
>
> @@ -254,16 +245,15 @@ int
> lockd_up(int proto) /* Maybe add a 'family' option when IPv6 is supported ?? */
> {
> struct svc_serv *serv;
> - struct svc_rqst *rqstp;
> int error = 0;
>
> mutex_lock(&nlmsvc_mutex);
> /*
> * Check whether we're already up and running.
> */
> - if (nlmsvc_serv) {
> + if (nlmsvc_rqst) {
> if (proto)
> - error = make_socks(nlmsvc_serv, proto);
> + error = make_socks(nlmsvc_rqst->rq_server, proto);
> goto out;
> }
>
> @@ -288,9 +278,10 @@ lockd_up(int proto) /* Maybe add a 'family' option when IPv6 is supported ?? */
> /*
> * Create the kernel thread and wait for it to start.
> */
> - rqstp = svc_prepare_thread(serv, &serv->sv_pools[0]);
> - if (IS_ERR(rqstp)) {
> - error = PTR_ERR(rqstp);
> + nlmsvc_rqst = svc_prepare_thread(serv, &serv->sv_pools[0]);
> + if (IS_ERR(nlmsvc_rqst)) {
> + error = PTR_ERR(nlmsvc_rqst);
> + nlmsvc_rqst = NULL;
> printk(KERN_WARNING
> "lockd_up: svc_rqst allocation failed, error=%d\n",
> error);
> @@ -298,16 +289,15 @@ lockd_up(int proto) /* Maybe add a 'family' option when IPv6 is supported ?? */
> }
>
> svc_sock_update_bufs(serv);
> - nlmsvc_serv = rqstp->rq_server;
>
> - nlmsvc_task = kthread_run(lockd, rqstp, serv->sv_name);
> + nlmsvc_task = kthread_run(lockd, nlmsvc_rqst, serv->sv_name);
> if (IS_ERR(nlmsvc_task)) {
> error = PTR_ERR(nlmsvc_task);
> + svc_exit_thread(nlmsvc_rqst);
> nlmsvc_task = NULL;
> - nlmsvc_serv = NULL;
> + nlmsvc_rqst = NULL;
> printk(KERN_WARNING
> "lockd_up: kthread_run failed, error=%d\n", error);
> - svc_exit_thread(rqstp);
> goto destroy_and_out;
> }
>
> @@ -346,6 +336,9 @@ lockd_down(void)
> BUG();
> }
> kthread_stop(nlmsvc_task);
> + svc_exit_thread(nlmsvc_rqst);
> + nlmsvc_task = NULL;
> + nlmsvc_rqst = NULL;
> out:
> mutex_unlock(&nlmsvc_mutex);
> }
> --
> 1.5.3.6
>

2008-06-11 14:03:12

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 3/3] lockd: close potential race with rapid lockd_up/lockd_down cycle

If lockd_down is called very rapidly after lockd_up returns, then
there is a slim chance that lockd() will never be called. kthread()
will return before calling the function, so we'll end up never
actually calling the cleanup functions for the thread.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/lockd/svc.c | 33 +++++++++++++--------------------
1 files changed, 13 insertions(+), 20 deletions(-)

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index 2169af4..5bd9bf0 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -50,7 +50,7 @@ EXPORT_SYMBOL(nlmsvc_ops);
static DEFINE_MUTEX(nlmsvc_mutex);
static unsigned int nlmsvc_users;
static struct task_struct *nlmsvc_task;
-static struct svc_serv *nlmsvc_serv;
+static struct svc_rqst *nlmsvc_rqst;
int nlmsvc_grace_period;
unsigned long nlmsvc_timeout;

@@ -194,20 +194,11 @@ lockd(void *vrqstp)

svc_process(rqstp);
}
-
flush_signals(current);
if (nlmsvc_ops)
nlmsvc_invalidate_all();
nlm_shutdown_hosts();
-
unlock_kernel();
-
- nlmsvc_task = NULL;
- nlmsvc_serv = NULL;
-
- /* Exit the RPC thread */
- svc_exit_thread(rqstp);
-
return 0;
}

@@ -254,16 +245,15 @@ int
lockd_up(int proto) /* Maybe add a 'family' option when IPv6 is supported ?? */
{
struct svc_serv *serv;
- struct svc_rqst *rqstp;
int error = 0;

mutex_lock(&nlmsvc_mutex);
/*
* Check whether we're already up and running.
*/
- if (nlmsvc_serv) {
+ if (nlmsvc_rqst) {
if (proto)
- error = make_socks(nlmsvc_serv, proto);
+ error = make_socks(nlmsvc_rqst->rq_server, proto);
goto out;
}

@@ -288,9 +278,10 @@ lockd_up(int proto) /* Maybe add a 'family' option when IPv6 is supported ?? */
/*
* Create the kernel thread and wait for it to start.
*/
- rqstp = svc_prepare_thread(serv, &serv->sv_pools[0]);
- if (IS_ERR(rqstp)) {
- error = PTR_ERR(rqstp);
+ nlmsvc_rqst = svc_prepare_thread(serv, &serv->sv_pools[0]);
+ if (IS_ERR(nlmsvc_rqst)) {
+ error = PTR_ERR(nlmsvc_rqst);
+ nlmsvc_rqst = NULL;
printk(KERN_WARNING
"lockd_up: svc_rqst allocation failed, error=%d\n",
error);
@@ -298,16 +289,15 @@ lockd_up(int proto) /* Maybe add a 'family' option when IPv6 is supported ?? */
}

svc_sock_update_bufs(serv);
- nlmsvc_serv = rqstp->rq_server;

- nlmsvc_task = kthread_run(lockd, rqstp, serv->sv_name);
+ nlmsvc_task = kthread_run(lockd, nlmsvc_rqst, serv->sv_name);
if (IS_ERR(nlmsvc_task)) {
error = PTR_ERR(nlmsvc_task);
+ svc_exit_thread(nlmsvc_rqst);
nlmsvc_task = NULL;
- nlmsvc_serv = NULL;
+ nlmsvc_rqst = NULL;
printk(KERN_WARNING
"lockd_up: kthread_run failed, error=%d\n", error);
- svc_exit_thread(rqstp);
goto destroy_and_out;
}

@@ -346,6 +336,9 @@ lockd_down(void)
BUG();
}
kthread_stop(nlmsvc_task);
+ svc_exit_thread(nlmsvc_rqst);
+ nlmsvc_task = NULL;
+ nlmsvc_rqst = NULL;
out:
mutex_unlock(&nlmsvc_mutex);
}
--
1.5.3.6

2008-06-11 14:03:20

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 2/3] nfs4: fix potential race with rapid nfs_callback_up/down cycle

If the nfsv4 callback thread is rapidly brought up and down, it's
possible that nfs_callback_svc might never get a chance to run. If
this happens, the cleanup at thread exit might never occur, throwing
the refcounting off and nfs_callback_info in an incorrect state.

Move the clean functions into nfs_callback_down. Also change the
nfs_callback_info struct to track the svc_rqst rather than svc_serv
since we need to know that to call svc_exit_thread.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfs/callback.c | 30 ++++++++++++++++--------------
1 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
index 9e713d2..f447f4b 100644
--- a/fs/nfs/callback.c
+++ b/fs/nfs/callback.c
@@ -27,7 +27,7 @@

struct nfs_callback_data {
unsigned int users;
- struct svc_serv *serv;
+ struct svc_rqst *rqst;
struct task_struct *task;
};

@@ -91,18 +91,15 @@ nfs_callback_svc(void *vrqstp)
svc_process(rqstp);
}
unlock_kernel();
- nfs_callback_info.task = NULL;
- svc_exit_thread(rqstp);
return 0;
}

/*
- * Bring up the server process if it is not already up.
+ * Bring up the callback thread if it is not already up.
*/
int nfs_callback_up(void)
{
struct svc_serv *serv = NULL;
- struct svc_rqst *rqstp;
int ret = 0;

mutex_lock(&nfs_callback_mutex);
@@ -120,22 +117,23 @@ int nfs_callback_up(void)
nfs_callback_tcpport = ret;
dprintk("Callback port = 0x%x\n", nfs_callback_tcpport);

- rqstp = svc_prepare_thread(serv, &serv->sv_pools[0]);
- if (IS_ERR(rqstp)) {
- ret = PTR_ERR(rqstp);
+ nfs_callback_info.rqst = svc_prepare_thread(serv, &serv->sv_pools[0]);
+ if (IS_ERR(nfs_callback_info.rqst)) {
+ ret = PTR_ERR(nfs_callback_info.rqst);
+ nfs_callback_info.rqst = NULL;
goto out_err;
}

svc_sock_update_bufs(serv);
- nfs_callback_info.serv = serv;

- nfs_callback_info.task = kthread_run(nfs_callback_svc, rqstp,
+ nfs_callback_info.task = kthread_run(nfs_callback_svc,
+ nfs_callback_info.rqst,
"nfsv4-svc");
if (IS_ERR(nfs_callback_info.task)) {
ret = PTR_ERR(nfs_callback_info.task);
- nfs_callback_info.serv = NULL;
+ svc_exit_thread(nfs_callback_info.rqst);
+ nfs_callback_info.rqst = NULL;
nfs_callback_info.task = NULL;
- svc_exit_thread(rqstp);
goto out_err;
}
out:
@@ -157,14 +155,18 @@ out_err:
}

/*
- * Kill the server process if it is not already down.
+ * Kill the callback thread if it's no longer being used.
*/
void nfs_callback_down(void)
{
mutex_lock(&nfs_callback_mutex);
nfs_callback_info.users--;
- if (nfs_callback_info.users == 0 && nfs_callback_info.task != NULL)
+ if (nfs_callback_info.users == 0 && nfs_callback_info.task != NULL) {
kthread_stop(nfs_callback_info.task);
+ svc_exit_thread(nfs_callback_info.rqst);
+ nfs_callback_info.rqst = NULL;
+ nfs_callback_info.task = NULL;
+ }
mutex_unlock(&nfs_callback_mutex);
}

--
1.5.3.6


2008-06-11 14:03:20

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 1/3] nfs4: remove BKL from nfs_callback_up and nfs_callback_down

The nfs_callback_mutex is sufficient protection.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfs/callback.c | 4 ----
1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
index c1e7c83..9e713d2 100644
--- a/fs/nfs/callback.c
+++ b/fs/nfs/callback.c
@@ -105,7 +105,6 @@ int nfs_callback_up(void)
struct svc_rqst *rqstp;
int ret = 0;

- lock_kernel();
mutex_lock(&nfs_callback_mutex);
if (nfs_callback_info.users++ || nfs_callback_info.task != NULL)
goto out;
@@ -149,7 +148,6 @@ out:
if (serv)
svc_destroy(serv);
mutex_unlock(&nfs_callback_mutex);
- unlock_kernel();
return ret;
out_err:
dprintk("Couldn't create callback socket or server thread; err = %d\n",
@@ -163,13 +161,11 @@ out_err:
*/
void nfs_callback_down(void)
{
- lock_kernel();
mutex_lock(&nfs_callback_mutex);
nfs_callback_info.users--;
if (nfs_callback_info.users == 0 && nfs_callback_info.task != NULL)
kthread_stop(nfs_callback_info.task);
mutex_unlock(&nfs_callback_mutex);
- unlock_kernel();
}

static int nfs_callback_authenticate(struct svc_rqst *rqstp)
--
1.5.3.6