2008-06-04 15:03:12

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 0/4] Convert knfsd to kthread API and fix startup/shutdown races (try #2)

This patchset is the second attempt to change nfsd to the kthread
API. The main changes from the first patchset are:

- it includes the patch by Neil Brown to take the BKL out of nfsd
startup/shutdown codepath. This should hopefully plug the main
race condition that could lead to oopses when nfsd was signaled
and then rapidly restarted before all of the existing nfsd's had
gone down. I've also added some comments to this patch to try
and clarify the locking for future work.

- It preserves signaling as the main method for taking down nfsd.
While kthread_stop is easier to deal with and less racy than
signals, a lot of distros do the equivalent of "pkill nfsd" to
take nfsd down. Changing that would be painful. Also, using
kthread_stop is slower than using signals, since taking down
each thread would be serialized.

- it removes the special handling where if the last thread caught a
SIGHUP, the kernel would leave the export cache intact

I've tested this by running I/O against the server w/ iozone while
restarting nfs service in a loop. Everything seems to work as expected.

As always, comments and suggestions are appreciated...

Signed-off-by: Jeff Layton <[email protected]>


2008-06-05 23:39:43

by Greg Banks

[permalink] [raw]
Subject: Re: [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking.

J. Bruce Fields wrote:
> On Thu, Jun 05, 2008 at 10:47:37AM +1000, Greg Banks wrote:
>
>> I notice there's no change to write_leasetime(). That calls
>> nfs4_reset_lease(), which seems to want to only be called when nfsd is
>> not started (according to my reading of the comment preceding the
>> function), and which uses the BKL to protect the variable
>> user_lease_time. Perhaps that path should be changed to use the new
>> nfsd_mutex also?
>>
>
> write_leasetime() just results in setting the user_lease_time, which is
> copied (once, on server startup) to lease_time, which is what is
> actually used by the running server.
Yes, understood.
> So I don't think there's a race
> here (and the existing BKL use in write_leasetime() isn't really
> needed).
>
The race is pretty harmless. The only read of user_lease_time happens
during the critical section currently guarded by the BKL in nfsd_svc();
the only write of the variable is not guarded by that lock. If you call
write_leasetime() many times with different values during nfsd startup,
the actual value used is not predictable and you can't tell from
userspace whether your write() succeeded in changing the behaviour of
the server or not. Also, you can do write_leasetime() after nfsd
startup without useful effect; other parameters which can't be changed
from /proc after the service is running will fail the write() with EBUSY.

--
Greg Banks, P.Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.


2008-06-04 15:03:55

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 3/4] knfsd: convert knfsd to kthread API

This patch is rather large, but I couldn't figure out a way to break it
up that would remain bisectable. It does several things:

- change svc_thread_fn typedef to better match what kthread_create expects
- change svc_pool_map_set_cpumask to be more kthread friendly. Make it
take a task arg and and get rid of the "oldmask"
- have svc_set_num_threads call kthread_create directly
- eliminate __svc_create_thread

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfssvc.c | 52 ++++++++++++++--------
include/linux/sunrpc/svc.h | 2 +-
net/sunrpc/svc.c | 102 +++++++++++++++-----------------------------
3 files changed, 68 insertions(+), 88 deletions(-)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 582acb1..825936b 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -21,6 +21,7 @@
#include <linux/smp_lock.h>
#include <linux/freezer.h>
#include <linux/fs_struct.h>
+#include <linux/kthread.h>

#include <linux/sunrpc/types.h>
#include <linux/sunrpc/stats.h>
@@ -46,7 +47,7 @@
#define SHUTDOWN_SIGS (sigmask(SIGKILL) | sigmask(SIGHUP) | sigmask(SIGINT) | sigmask(SIGQUIT))

extern struct svc_program nfsd_program;
-static void nfsd(struct svc_rqst *rqstp);
+static int nfsd(void *vrqstp);
struct timeval nfssvc_boot;
static atomic_t nfsd_busy;
static unsigned long nfsd_last_call;
@@ -399,18 +400,19 @@ update_thread_usage(int busy_threads)
/*
* This is the NFS server kernel thread
*/
-static void
-nfsd(struct svc_rqst *rqstp)
+static int
+nfsd(void *vrqstp)
{
+ struct svc_rqst *rqstp = (struct svc_rqst *) vrqstp;
struct fs_struct *fsp;
- int err;
sigset_t shutdown_mask, allowed_mask;
+ int err, preverr = 0;
+ unsigned int signo;

/* Lock module and set up kernel thread */
mutex_lock(&nfsd_mutex);
- daemonize("nfsd");

- /* After daemonize() this kernel thread shares current->fs
+ /* At this point, the thread shares current->fs
* with the init process. We need to create files with a
* umask of 0 instead of init's umask. */
fsp = copy_fs_struct(current->fs);
@@ -425,14 +427,18 @@ nfsd(struct svc_rqst *rqstp)
siginitsetinv(&shutdown_mask, SHUTDOWN_SIGS);
siginitsetinv(&allowed_mask, ALLOWED_SIGS);

+ /*
+ * thread is spawned with all signals set to SIG_IGN, re-enable
+ * the ones that matter
+ */
+ for (signo = 1; signo <= _NSIG; signo++) {
+ if (!sigismember(&shutdown_mask, signo))
+ allow_signal(signo);
+ }

nfsdstats.th_cnt++;
-
- rqstp->rq_task = current;
-
mutex_unlock(&nfsd_mutex);

-
/*
* We want less throttling in balance_dirty_pages() so that nfs to
* localhost doesn't cause nfsd to lock up due to all the client's
@@ -454,15 +460,25 @@ nfsd(struct svc_rqst *rqstp)
*/
while ((err = svc_recv(rqstp, 60*60*HZ)) == -EAGAIN)
;
- if (err < 0)
+ if (err == -EINTR)
break;
+ else if (err < 0) {
+ if (err != preverr) {
+ printk(KERN_WARNING "%s: unexpected error "
+ "from svc_recv (%d)\n", __func__, -err);
+ preverr = err;
+ }
+ schedule_timeout_uninterruptible(HZ);
+ continue;
+ }
+
update_thread_usage(atomic_read(&nfsd_busy));
atomic_inc(&nfsd_busy);

/* Lock the export hash tables for reading. */
exp_readlock();

- /* Process request with signals blocked. */
+ /* Process request with signals blocked. */
sigprocmask(SIG_SETMASK, &allowed_mask, NULL);

svc_process(rqstp);
@@ -471,25 +487,23 @@ nfsd(struct svc_rqst *rqstp)
exp_readunlock();
update_thread_usage(atomic_read(&nfsd_busy));
atomic_dec(&nfsd_busy);
- }

- if (err != -EINTR)
- printk(KERN_WARNING "nfsd: terminating on error %d\n", -err);
+ /* reset err in case kthread_stop is called */
+ err = 0;
+ }

/* Clear signals before calling svc_exit_thread() */
flush_signals(current);

mutex_lock(&nfsd_mutex);
-
nfsdstats.th_cnt --;

out:
/* Release the thread */
svc_exit_thread(rqstp);
-
- /* Release module */
mutex_unlock(&nfsd_mutex);
- module_put_and_exit(0);
+
+ return 0;
}

static __be32 map_new_errors(u32 vers, __be32 nfserr)
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 4b54c5f..011d6d8 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -22,7 +22,7 @@
/*
* This is the RPC server thread function prototype
*/
-typedef void (*svc_thread_fn)(struct svc_rqst *);
+typedef int (*svc_thread_fn)(void *);

/*
*
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 7bffaff..f461da2 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -18,6 +18,7 @@
#include <linux/mm.h>
#include <linux/interrupt.h>
#include <linux/module.h>
+#include <linux/kthread.h>

#include <linux/sunrpc/types.h>
#include <linux/sunrpc/xdr.h>
@@ -291,15 +292,14 @@ svc_pool_map_put(void)


/*
- * Set the current thread's cpus_allowed mask so that it
+ * Set the given thread's cpus_allowed mask so that it
* will only run on cpus in the given pool.
- *
- * Returns 1 and fills in oldmask iff a cpumask was applied.
*/
-static inline int
-svc_pool_map_set_cpumask(unsigned int pidx, cpumask_t *oldmask)
+static inline void
+svc_pool_map_set_cpumask(struct task_struct *task, unsigned int pidx)
{
struct svc_pool_map *m = &svc_pool_map;
+ unsigned int node = m->pool_to[pidx];

/*
* The caller checks for sv_nrpools > 1, which
@@ -307,26 +307,17 @@ svc_pool_map_set_cpumask(unsigned int pidx, cpumask_t *oldmask)
*/
BUG_ON(m->count == 0);

- switch (m->mode)
- {
- default:
- return 0;
+ switch (m->mode) {
case SVC_POOL_PERCPU:
{
- unsigned int cpu = m->pool_to[pidx];
-
- *oldmask = current->cpus_allowed;
- set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
- return 1;
+ set_cpus_allowed_ptr(task, &cpumask_of_cpu(node));
+ break;
}
case SVC_POOL_PERNODE:
{
- unsigned int node = m->pool_to[pidx];
node_to_cpumask_ptr(nodecpumask, node);
-
- *oldmask = current->cpus_allowed;
- set_cpus_allowed_ptr(current, nodecpumask);
- return 1;
+ set_cpus_allowed_ptr(task, nodecpumask);
+ break;
}
}
}
@@ -579,47 +570,6 @@ out_enomem:
EXPORT_SYMBOL(svc_prepare_thread);

/*
- * Create a thread in the given pool. Caller must hold BKL or another lock to
- * serialize access to the svc_serv struct. On a NUMA or SMP machine, with a
- * multi-pool serv, the thread will be restricted to run on the cpus belonging
- * to the pool.
- */
-static int
-__svc_create_thread(svc_thread_fn func, struct svc_serv *serv,
- struct svc_pool *pool)
-{
- struct svc_rqst *rqstp;
- int error = -ENOMEM;
- int have_oldmask = 0;
- cpumask_t uninitialized_var(oldmask);
-
- rqstp = svc_prepare_thread(serv, pool);
- if (IS_ERR(rqstp)) {
- error = PTR_ERR(rqstp);
- goto out;
- }
-
- if (serv->sv_nrpools > 1)
- have_oldmask = svc_pool_map_set_cpumask(pool->sp_id, &oldmask);
-
- error = kernel_thread((int (*)(void *)) func, rqstp, 0);
-
- if (have_oldmask)
- set_cpus_allowed(current, oldmask);
-
- if (error < 0)
- goto out_thread;
- svc_sock_update_bufs(serv);
- error = 0;
-out:
- return error;
-
-out_thread:
- svc_exit_thread(rqstp);
- goto out;
-}
-
-/*
* Choose a pool in which to create a new thread, for svc_set_num_threads
*/
static inline struct svc_pool *
@@ -688,7 +638,9 @@ found_pool:
int
svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
{
- struct task_struct *victim;
+ struct svc_rqst *rqstp;
+ struct task_struct *task;
+ struct svc_pool *chosen_pool;
int error = 0;
unsigned int state = serv->sv_nrthreads-1;

@@ -704,18 +656,32 @@ svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
/* create new threads */
while (nrservs > 0) {
nrservs--;
- __module_get(serv->sv_module);
- error = __svc_create_thread(serv->sv_function, serv,
- choose_pool(serv, pool, &state));
- if (error < 0) {
- module_put(serv->sv_module);
+ chosen_pool = choose_pool(serv, pool, &state);
+
+ rqstp = svc_prepare_thread(serv, chosen_pool);
+ if (IS_ERR(rqstp)) {
+ error = PTR_ERR(rqstp);
break;
}
+
+ task = kthread_create(serv->sv_function, rqstp, serv->sv_name);
+ if (IS_ERR(task)) {
+ error = PTR_ERR(task);
+ svc_exit_thread(rqstp);
+ break;
+ }
+
+ rqstp->rq_task = task;
+ if (serv->sv_nrpools > 1)
+ svc_pool_map_set_cpumask(task, chosen_pool->sp_id);
+
+ svc_sock_update_bufs(serv);
+ wake_up_process(task);
}
/* destroy old threads */
while (nrservs < 0 &&
- (victim = choose_victim(serv, pool, &state)) != NULL) {
- send_sig(serv->sv_kill_signal, victim, 1);
+ (task = choose_victim(serv, pool, &state)) != NULL) {
+ send_sig(serv->sv_kill_signal, task, 1);
nrservs++;
}

--
1.5.3.6


2008-06-04 15:03:52

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 4/4] sunrpc: remove unneeded field from svc_serv struct

Remove the sv_module field from the svc_serv struct since
svc_set_num_threads doesn't bother with module reference
counts anymore.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfssvc.c | 3 +--
include/linux/sunrpc/svc.h | 8 ++------
net/sunrpc/svc.c | 3 +--
3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 825936b..a87f7f9 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -228,8 +228,7 @@ int nfsd_create_serv(void)

atomic_set(&nfsd_busy, 0);
nfsd_serv = svc_create_pooled(&nfsd_program, nfsd_max_blksize,
- nfsd_last_thread, nfsd, SIGINT,
- THIS_MODULE);
+ nfsd_last_thread, nfsd, SIGINT);
if (nfsd_serv == NULL)
err = -ENOMEM;

diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 011d6d8..6d69165 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -72,13 +72,9 @@ struct svc_serv {
unsigned int sv_nrpools; /* number of thread pools */
struct svc_pool * sv_pools; /* array of thread pools */

+ /* Callback to use when last thread exits */
void (*sv_shutdown)(struct svc_serv *serv);
- /* Callback to use when last thread
- * exits.
- */

- struct module * sv_module; /* optional module to count when
- * adding threads */
svc_thread_fn sv_function; /* main function for threads */
int sv_kill_signal; /* signal to kill threads */
};
@@ -389,7 +385,7 @@ struct svc_rqst *svc_prepare_thread(struct svc_serv *serv,
void svc_exit_thread(struct svc_rqst *);
struct svc_serv * svc_create_pooled(struct svc_program *, unsigned int,
void (*shutdown)(struct svc_serv*),
- svc_thread_fn, int sig, struct module *);
+ svc_thread_fn, int);
int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int);
void svc_destroy(struct svc_serv *);
int svc_process(struct svc_rqst *);
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index f461da2..9eabb3f 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -434,7 +434,7 @@ EXPORT_SYMBOL(svc_create);
struct svc_serv *
svc_create_pooled(struct svc_program *prog, unsigned int bufsize,
void (*shutdown)(struct svc_serv *serv),
- svc_thread_fn func, int sig, struct module *mod)
+ svc_thread_fn func, int sig)
{
struct svc_serv *serv;
unsigned int npools = svc_pool_map_get();
@@ -444,7 +444,6 @@ svc_create_pooled(struct svc_program *prog, unsigned int bufsize,
if (serv != NULL) {
serv->sv_function = func;
serv->sv_kill_signal = sig;
- serv->sv_module = mod;
}

return serv;
--
1.5.3.6


2008-06-04 15:03:14

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 2/4] knfsd: remove special handling for SIGHUP

The special handling for SIGHUP in knfsd is a holdover from much
earlier versions of Linux where reloading the export table was
more expensive. That facility is not really needed anymore and
to my knowledge, is seldom-used.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfssvc.c | 33 ++++++++-------------------------
1 files changed, 8 insertions(+), 25 deletions(-)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 040b3c2..582acb1 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -44,11 +44,6 @@
* when not handling a request. i.e. when waiting
*/
#define SHUTDOWN_SIGS (sigmask(SIGKILL) | sigmask(SIGHUP) | sigmask(SIGINT) | sigmask(SIGQUIT))
-/* if the last thread dies with SIGHUP, then the exports table is
- * left unchanged ( like 2.4-{0-9} ). Any other signal will clear
- * the exports table (like 2.2).
- */
-#define SIG_NOCLEAN SIGHUP

extern struct svc_program nfsd_program;
static void nfsd(struct svc_rqst *rqstp);
@@ -167,7 +162,6 @@ int nfsd_nrthreads(void)
return nfsd_serv->sv_nrthreads;
}

-static int killsig; /* signal that was used to kill last nfsd */
static void nfsd_last_thread(struct svc_serv *serv)
{
/* When last nfsd thread exits we need to do some clean-up */
@@ -178,11 +172,9 @@ static void nfsd_last_thread(struct svc_serv *serv)
nfsd_racache_shutdown();
nfs4_state_shutdown();

- printk(KERN_WARNING "nfsd: last server has exited\n");
- if (killsig != SIG_NOCLEAN) {
- printk(KERN_WARNING "nfsd: unexporting all filesystems\n");
- nfsd_export_flush();
- }
+ printk(KERN_WARNING "nfsd: last server has exited, flushing export "
+ "cache\n");
+ nfsd_export_flush();
}

void nfsd_reset_versions(void)
@@ -234,10 +226,9 @@ int nfsd_create_serv(void)
}

atomic_set(&nfsd_busy, 0);
- nfsd_serv = svc_create_pooled(&nfsd_program,
- nfsd_max_blksize,
- nfsd_last_thread,
- nfsd, SIG_NOCLEAN, THIS_MODULE);
+ nfsd_serv = svc_create_pooled(&nfsd_program, nfsd_max_blksize,
+ nfsd_last_thread, nfsd, SIGINT,
+ THIS_MODULE);
if (nfsd_serv == NULL)
err = -ENOMEM;

@@ -482,17 +473,9 @@ nfsd(struct svc_rqst *rqstp)
atomic_dec(&nfsd_busy);
}

- if (err != -EINTR) {
+ if (err != -EINTR)
printk(KERN_WARNING "nfsd: terminating on error %d\n", -err);
- } else {
- unsigned int signo;
-
- for (signo = 1; signo <= _NSIG; signo++)
- if (sigismember(&current->pending.signal, signo) &&
- !sigismember(&current->blocked, signo))
- break;
- killsig = signo;
- }
+
/* Clear signals before calling svc_exit_thread() */
flush_signals(current);

--
1.5.3.6

2008-06-04 15:03:13

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking.

From: Neil Brown <[email protected]>

This removes the BKL from the RPC service creation codepath. The BKL
really isn't adequate for this job since some of this info needs
protection across sleeps.

Also, add some comments to try and clarify how the locking should work
and to make it clear that the BKL isn't necessary as long as there is
adequate locking between tasks when touching the svc_serv fields.

Signed-off-by: Neil Brown <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfsctl.c | 39 +++++++++++++++++++++++++--------------
fs/nfsd/nfssvc.c | 45 ++++++++++++++++++++++++++++++++-------------
net/sunrpc/svc.c | 15 +++++++++------
3 files changed, 66 insertions(+), 33 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 5ac00c4..d601a77 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -441,6 +441,8 @@ static ssize_t write_threads(struct file *file, char *buf, size_t size)
return strlen(buf);
}

+extern struct mutex nfsd_mutex;
+
static ssize_t write_pool_threads(struct file *file, char *buf, size_t size)
{
/* if size > 0, look for an array of number of threads per node
@@ -450,22 +452,26 @@ static ssize_t write_pool_threads(struct file *file, char *buf, size_t size)
int i;
int rv;
int len;
- int npools = nfsd_nrpools();
+ int npools;
int *nthreads;

+ mutex_lock(&nfsd_mutex);
+ npools = nfsd_nrpools();
if (npools == 0) {
/*
* NFS is shut down. The admin can start it by
* writing to the threads file but NOT the pool_threads
* file, sorry. Report zero threads.
*/
+ mutex_unlock(&nfsd_mutex);
strcpy(buf, "0\n");
return strlen(buf);
}

nthreads = kcalloc(npools, sizeof(int), GFP_KERNEL);
+ rv = -ENOMEM;
if (nthreads == NULL)
- return -ENOMEM;
+ goto out_free;

if (size > 0) {
for (i = 0; i < npools; i++) {
@@ -496,10 +502,12 @@ static ssize_t write_pool_threads(struct file *file, char *buf, size_t size)
mesg += len;
}

+ mutex_unlock(&nfsd_mutex);
return (mesg-buf);

out_free:
kfree(nthreads);
+ mutex_unlock(&nfsd_mutex);
return rv;
}

@@ -566,14 +574,13 @@ static ssize_t write_versions(struct file *file, char *buf, size_t size)
return len;
}

-static ssize_t write_ports(struct file *file, char *buf, size_t size)
+static ssize_t __write_ports(struct file *file, char *buf, size_t size)
{
if (size == 0) {
int len = 0;
- lock_kernel();
+
if (nfsd_serv)
len = svc_xprt_names(nfsd_serv, buf, 0);
- unlock_kernel();
return len;
}
/* Either a single 'fd' number is written, in which
@@ -603,9 +610,7 @@ static ssize_t write_ports(struct file *file, char *buf, size_t size)
/* Decrease the count, but don't shutdown the
* the service
*/
- lock_kernel();
nfsd_serv->sv_nrthreads--;
- unlock_kernel();
}
return err < 0 ? err : 0;
}
@@ -614,10 +619,8 @@ static ssize_t write_ports(struct file *file, char *buf, size_t size)
int len = 0;
if (!toclose)
return -ENOMEM;
- lock_kernel();
if (nfsd_serv)
len = svc_sock_names(buf, nfsd_serv, toclose);
- unlock_kernel();
if (len >= 0)
lockd_down();
kfree(toclose);
@@ -655,7 +658,6 @@ static ssize_t write_ports(struct file *file, char *buf, size_t size)
if (sscanf(&buf[1], "%15s %4d", transport, &port) == 2) {
if (port == 0)
return -EINVAL;
- lock_kernel();
if (nfsd_serv) {
xprt = svc_find_xprt(nfsd_serv, transport,
AF_UNSPEC, port);
@@ -666,13 +668,22 @@ static ssize_t write_ports(struct file *file, char *buf, size_t size)
} else
err = -ENOTCONN;
}
- unlock_kernel();
return err < 0 ? err : 0;
}
}
return -EINVAL;
}

+static ssize_t write_ports(struct file *file, char *buf, size_t size)
+{
+ ssize_t rv;
+ mutex_lock(&nfsd_mutex);
+ rv = __write_ports(file, buf, size);
+ mutex_unlock(&nfsd_mutex);
+ return rv;
+}
+
+
int nfsd_max_blksize;

static ssize_t write_maxblksize(struct file *file, char *buf, size_t size)
@@ -691,13 +702,13 @@ static ssize_t write_maxblksize(struct file *file, char *buf, size_t size)
if (bsize > NFSSVC_MAXBLKSIZE)
bsize = NFSSVC_MAXBLKSIZE;
bsize &= ~(1024-1);
- lock_kernel();
+ mutex_lock(&nfsd_mutex);
if (nfsd_serv && nfsd_serv->sv_nrthreads) {
- unlock_kernel();
+ mutex_unlock(&nfsd_mutex);
return -EBUSY;
}
nfsd_max_blksize = bsize;
- unlock_kernel();
+ mutex_unlock(&nfsd_mutex);
}
return sprintf(buf, "%d\n", nfsd_max_blksize);
}
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 941041f..040b3c2 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -53,11 +53,27 @@
extern struct svc_program nfsd_program;
static void nfsd(struct svc_rqst *rqstp);
struct timeval nfssvc_boot;
- struct svc_serv *nfsd_serv;
static atomic_t nfsd_busy;
static unsigned long nfsd_last_call;
static DEFINE_SPINLOCK(nfsd_call_lock);

+/*
+ * nfsd_mutex protects nfsd_serv -- both the pointer itself and the members
+ * of the svc_serv struct. In particular, ->sv_nrthreads but also to some
+ * extent ->sv_temp_socks and ->sv_permsocks. It also protects nfsdstats.th_cnt
+ *
+ * If (out side the lock) nfsd_serv is non-NULL, then it must point to a
+ * properly initialised 'struct svc_serv' with ->sv_nrthreads > 0. That number
+ * of nfsd threads must exist and each must listed in ->sp_all_threads in each
+ * entry of ->sv_pools[].
+ *
+ * Transitions of the thread count between zero and non-zero are of particular
+ * interest since the svc_serv needs to be created and initialized at that
+ * point, or freed.
+ */
+struct svc_serv *nfsd_serv;
+DEFINE_MUTEX(nfsd_mutex);
+
#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
static struct svc_stat nfsd_acl_svcstats;
static struct svc_version * nfsd_acl_version[] = {
@@ -190,13 +206,14 @@ void nfsd_reset_versions(void)
}
}

+
int nfsd_create_serv(void)
{
int err = 0;
- lock_kernel();
+
+ WARN_ON(!mutex_is_locked(&nfsd_mutex));
if (nfsd_serv) {
svc_get(nfsd_serv);
- unlock_kernel();
return 0;
}
if (nfsd_max_blksize == 0) {
@@ -223,7 +240,7 @@ int nfsd_create_serv(void)
nfsd, SIG_NOCLEAN, THIS_MODULE);
if (nfsd_serv == NULL)
err = -ENOMEM;
- unlock_kernel();
+
do_gettimeofday(&nfssvc_boot); /* record boot time */
return err;
}
@@ -282,6 +299,8 @@ int nfsd_set_nrthreads(int n, int *nthreads)
int tot = 0;
int err = 0;

+ WARN_ON(!mutex_is_locked(&nfsd_mutex));
+
if (nfsd_serv == NULL || n <= 0)
return 0;

@@ -316,7 +335,6 @@ int nfsd_set_nrthreads(int n, int *nthreads)
nthreads[0] = 1;

/* apply the new numbers */
- lock_kernel();
svc_get(nfsd_serv);
for (i = 0; i < n; i++) {
err = svc_set_num_threads(nfsd_serv, &nfsd_serv->sv_pools[i],
@@ -325,7 +343,6 @@ int nfsd_set_nrthreads(int n, int *nthreads)
break;
}
svc_destroy(nfsd_serv);
- unlock_kernel();

return err;
}
@@ -334,8 +351,8 @@ int
nfsd_svc(unsigned short port, int nrservs)
{
int error;
-
- lock_kernel();
+
+ mutex_lock(&nfsd_mutex);
dprintk("nfsd: creating service\n");
error = -EINVAL;
if (nrservs <= 0)
@@ -363,7 +380,7 @@ nfsd_svc(unsigned short port, int nrservs)
failure:
svc_destroy(nfsd_serv); /* Release server */
out:
- unlock_kernel();
+ mutex_unlock(&nfsd_mutex);
return error;
}

@@ -399,7 +416,7 @@ nfsd(struct svc_rqst *rqstp)
sigset_t shutdown_mask, allowed_mask;

/* Lock module and set up kernel thread */
- lock_kernel();
+ mutex_lock(&nfsd_mutex);
daemonize("nfsd");

/* After daemonize() this kernel thread shares current->fs
@@ -417,11 +434,13 @@ nfsd(struct svc_rqst *rqstp)
siginitsetinv(&shutdown_mask, SHUTDOWN_SIGS);
siginitsetinv(&allowed_mask, ALLOWED_SIGS);

+
nfsdstats.th_cnt++;

rqstp->rq_task = current;

- unlock_kernel();
+ mutex_unlock(&nfsd_mutex);
+

/*
* We want less throttling in balance_dirty_pages() so that nfs to
@@ -477,7 +496,7 @@ nfsd(struct svc_rqst *rqstp)
/* Clear signals before calling svc_exit_thread() */
flush_signals(current);

- lock_kernel();
+ mutex_lock(&nfsd_mutex);

nfsdstats.th_cnt --;

@@ -486,7 +505,7 @@ out:
svc_exit_thread(rqstp);

/* Release module */
- unlock_kernel();
+ mutex_unlock(&nfsd_mutex);
module_put_and_exit(0);
}

diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 01c7e31..7bffaff 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -461,7 +461,8 @@ svc_create_pooled(struct svc_program *prog, unsigned int bufsize,
EXPORT_SYMBOL(svc_create_pooled);

/*
- * Destroy an RPC service. Should be called with the BKL held
+ * Destroy an RPC service. Should be called with appropriate locking to
+ * protect the sv_nrthreads, sv_permsocks and sv_tempsocks.
*/
void
svc_destroy(struct svc_serv *serv)
@@ -578,9 +579,10 @@ out_enomem:
EXPORT_SYMBOL(svc_prepare_thread);

/*
- * Create a thread in the given pool. Caller must hold BKL.
- * On a NUMA or SMP machine, with a multi-pool serv, the thread
- * will be restricted to run on the cpus belonging to the pool.
+ * Create a thread in the given pool. Caller must hold BKL or another lock to
+ * serialize access to the svc_serv struct. On a NUMA or SMP machine, with a
+ * multi-pool serv, the thread will be restricted to run on the cpus belonging
+ * to the pool.
*/
static int
__svc_create_thread(svc_thread_fn func, struct svc_serv *serv,
@@ -674,7 +676,7 @@ found_pool:
* of threads the given number. If `pool' is non-NULL, applies
* only to threads in that pool, otherwise round-robins between
* all pools. Must be called with a svc_get() reference and
- * the BKL held.
+ * the BKL or another lock to protect access to svc_serv fields.
*
* Destroying threads relies on the service threads filling in
* rqstp->rq_task, which only the nfs ones do. Assumes the serv
@@ -722,7 +724,8 @@ svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
EXPORT_SYMBOL(svc_set_num_threads);

/*
- * Called from a server thread as it's exiting. Caller must hold BKL.
+ * Called from a server thread as it's exiting. Caller must hold the BKL or
+ * the "service mutex", whichever is appropriate for the service.
*/
void
svc_exit_thread(struct svc_rqst *rqstp)
--
1.5.3.6

2008-06-04 21:02:38

by Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking.

On Wed, Jun 04, 2008 at 11:03:13AM -0400, Jeff Layton wrote:
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index 5ac00c4..d601a77 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
...
> @@ -566,14 +574,13 @@ static ssize_t write_versions(struct file *file, char *buf, size_t size)
> return len;
> }
>
> -static ssize_t write_ports(struct file *file, char *buf, size_t size)
> +static ssize_t __write_ports(struct file *file, char *buf, size_t size)
> {
> if (size == 0) {
> int len = 0;
> - lock_kernel();
> +
> if (nfsd_serv)
> len = svc_xprt_names(nfsd_serv, buf, 0);
> - unlock_kernel();

svc_xprt_names() has to be prepared to accept NULL as a first parameter
(since we've got nothing here any longer to guarantee that nfsd_serv
won't change after we've checked it). And, indeed, it does check for
that (with its local copy, which won't change. So that's OK. But then
could we just ditch this redundant check here? It's confusing.

Oops, but: what happens if something like this races with svc_destroy,
so svc_xprt_names() is passed a pointer to freed memory?

--b.

2008-06-04 21:27:52

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking.

On Wed, 4 Jun 2008 17:02:35 -0400
"J. Bruce Fields" <[email protected]> wrote:

> On Wed, Jun 04, 2008 at 11:03:13AM -0400, Jeff Layton wrote:
> > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > index 5ac00c4..d601a77 100644
> > --- a/fs/nfsd/nfsctl.c
> > +++ b/fs/nfsd/nfsctl.c
> ...
> > @@ -566,14 +574,13 @@ static ssize_t write_versions(struct file *file, char *buf, size_t size)
> > return len;
> > }
> >
> > -static ssize_t write_ports(struct file *file, char *buf, size_t size)
> > +static ssize_t __write_ports(struct file *file, char *buf, size_t size)
> > {
> > if (size == 0) {
> > int len = 0;
> > - lock_kernel();
> > +
> > if (nfsd_serv)
> > len = svc_xprt_names(nfsd_serv, buf, 0);
> > - unlock_kernel();
>
> svc_xprt_names() has to be prepared to accept NULL as a first parameter
> (since we've got nothing here any longer to guarantee that nfsd_serv
> won't change after we've checked it). And, indeed, it does check for
> that (with its local copy, which won't change. So that's OK. But then
> could we just ditch this redundant check here? It's confusing.
>
> Oops, but: what happens if something like this races with svc_destroy,
> so svc_xprt_names() is passed a pointer to freed memory?
>

We do have a guarantee that nfsd_serv won't change after it's checked
here. The new nfsd_mutex protects it. write_ports has been renamed to
__write_ports, and write_ports has been turned into a wrapper that runs
the entire original function under the nfsd_mutex. We also have nfsd
hold the nfsd_mutex when svc_exit_thread is called, so svc_destroy
should also be called while holding it. That should serialize access
to the nfsd_serv.

I think you're correct that we can get rid of the redundant null
pointer check in __write_ports here though.

Cheers,
--
Jeff Layton <[email protected]>

2008-06-04 21:58:17

by Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking.

On Wed, Jun 04, 2008 at 05:27:52PM -0400, Jeff Layton wrote:
> On Wed, 4 Jun 2008 17:02:35 -0400
> "J. Bruce Fields" <[email protected]> wrote:
>
> > On Wed, Jun 04, 2008 at 11:03:13AM -0400, Jeff Layton wrote:
> > > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > > index 5ac00c4..d601a77 100644
> > > --- a/fs/nfsd/nfsctl.c
> > > +++ b/fs/nfsd/nfsctl.c
> > ...
> > > @@ -566,14 +574,13 @@ static ssize_t write_versions(struct file *file, char *buf, size_t size)
> > > return len;
> > > }
> > >
> > > -static ssize_t write_ports(struct file *file, char *buf, size_t size)
> > > +static ssize_t __write_ports(struct file *file, char *buf, size_t size)
> > > {
> > > if (size == 0) {
> > > int len = 0;
> > > - lock_kernel();
> > > +
> > > if (nfsd_serv)
> > > len = svc_xprt_names(nfsd_serv, buf, 0);
> > > - unlock_kernel();
> >
> > svc_xprt_names() has to be prepared to accept NULL as a first parameter
> > (since we've got nothing here any longer to guarantee that nfsd_serv
> > won't change after we've checked it). And, indeed, it does check for
> > that (with its local copy, which won't change. So that's OK. But then
> > could we just ditch this redundant check here? It's confusing.
> >
> > Oops, but: what happens if something like this races with svc_destroy,
> > so svc_xprt_names() is passed a pointer to freed memory?
> >
>
> We do have a guarantee that nfsd_serv won't change after it's checked
> here. The new nfsd_mutex protects it. write_ports has been renamed to
> __write_ports, and write_ports has been turned into a wrapper that runs
> the entire original function under the nfsd_mutex. We also have nfsd
> hold the nfsd_mutex when svc_exit_thread is called, so svc_destroy
> should also be called while holding it. That should serialize access
> to the nfsd_serv.

Of course, you're right; thanks for setting me straight!

>
> I think you're correct that we can get rid of the redundant null
> pointer check in __write_ports here though.

OK.--b.

2008-06-04 22:41:22

by Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking.

On Wed, Jun 04, 2008 at 05:58:15PM -0400, bfields wrote:
> On Wed, Jun 04, 2008 at 05:27:52PM -0400, Jeff Layton wrote:
> > On Wed, 4 Jun 2008 17:02:35 -0400
> > "J. Bruce Fields" <[email protected]> wrote:
> >
> > > On Wed, Jun 04, 2008 at 11:03:13AM -0400, Jeff Layton wrote:
> > > > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > > > index 5ac00c4..d601a77 100644
> > > > --- a/fs/nfsd/nfsctl.c
> > > > +++ b/fs/nfsd/nfsctl.c
> > > ...
> > > > @@ -566,14 +574,13 @@ static ssize_t write_versions(struct file *file, char *buf, size_t size)
> > > > return len;
> > > > }
> > > >
> > > > -static ssize_t write_ports(struct file *file, char *buf, size_t size)
> > > > +static ssize_t __write_ports(struct file *file, char *buf, size_t size)
> > > > {
> > > > if (size == 0) {
> > > > int len = 0;
> > > > - lock_kernel();
> > > > +
> > > > if (nfsd_serv)
> > > > len = svc_xprt_names(nfsd_serv, buf, 0);
> > > > - unlock_kernel();
> > >
> > > svc_xprt_names() has to be prepared to accept NULL as a first parameter
> > > (since we've got nothing here any longer to guarantee that nfsd_serv
> > > won't change after we've checked it). And, indeed, it does check for
> > > that (with its local copy, which won't change. So that's OK. But then
> > > could we just ditch this redundant check here? It's confusing.
> > >
> > > Oops, but: what happens if something like this races with svc_destroy,
> > > so svc_xprt_names() is passed a pointer to freed memory?
> > >
> >
> > We do have a guarantee that nfsd_serv won't change after it's checked
> > here. The new nfsd_mutex protects it. write_ports has been renamed to
> > __write_ports, and write_ports has been turned into a wrapper that runs
> > the entire original function under the nfsd_mutex. We also have nfsd
> > hold the nfsd_mutex when svc_exit_thread is called, so svc_destroy
> > should also be called while holding it. That should serialize access
> > to the nfsd_serv.
>
> Of course, you're right; thanks for setting me straight!

One more random point of confusion: is write_versions racy? It assigns
to nfsd_versions, which is used in svc_process() to decide whether a
version is supported or not, without doing the adjustment of rq_argp and
rq_resp which a comment in write_versions() says is necessary. And
there's no locking around the nfsd_serv check there. So in theory could
a write_versions() at the wrong time result in an nfsd that accepted nfs
versions that it shouldn't (and hence could overflow some buffer)?

That'd be a preexisting problem, nothing to do with your work--I was
just grepping for uses of nfsd_serv....

--b.

2008-06-05 00:07:23

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking.

On Wed, 4 Jun 2008 18:41:20 -0400
"J. Bruce Fields" <[email protected]> wrote:

> On Wed, Jun 04, 2008 at 05:58:15PM -0400, bfields wrote:
> > On Wed, Jun 04, 2008 at 05:27:52PM -0400, Jeff Layton wrote:
> > > On Wed, 4 Jun 2008 17:02:35 -0400
> > > "J. Bruce Fields" <[email protected]> wrote:
> > >
> > > > On Wed, Jun 04, 2008 at 11:03:13AM -0400, Jeff Layton wrote:
> > > > > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > > > > index 5ac00c4..d601a77 100644
> > > > > --- a/fs/nfsd/nfsctl.c
> > > > > +++ b/fs/nfsd/nfsctl.c
> > > > ...
> > > > > @@ -566,14 +574,13 @@ static ssize_t write_versions(struct file *file, char *buf, size_t size)
> > > > > return len;
> > > > > }
> > > > >
> > > > > -static ssize_t write_ports(struct file *file, char *buf, size_t size)
> > > > > +static ssize_t __write_ports(struct file *file, char *buf, size_t size)
> > > > > {
> > > > > if (size == 0) {
> > > > > int len = 0;
> > > > > - lock_kernel();
> > > > > +
> > > > > if (nfsd_serv)
> > > > > len = svc_xprt_names(nfsd_serv, buf, 0);
> > > > > - unlock_kernel();
> > > >
> > > > svc_xprt_names() has to be prepared to accept NULL as a first parameter
> > > > (since we've got nothing here any longer to guarantee that nfsd_serv
> > > > won't change after we've checked it). And, indeed, it does check for
> > > > that (with its local copy, which won't change. So that's OK. But then
> > > > could we just ditch this redundant check here? It's confusing.
> > > >
> > > > Oops, but: what happens if something like this races with svc_destroy,
> > > > so svc_xprt_names() is passed a pointer to freed memory?
> > > >
> > >
> > > We do have a guarantee that nfsd_serv won't change after it's checked
> > > here. The new nfsd_mutex protects it. write_ports has been renamed to
> > > __write_ports, and write_ports has been turned into a wrapper that runs
> > > the entire original function under the nfsd_mutex. We also have nfsd
> > > hold the nfsd_mutex when svc_exit_thread is called, so svc_destroy
> > > should also be called while holding it. That should serialize access
> > > to the nfsd_serv.
> >
> > Of course, you're right; thanks for setting me straight!
>
> One more random point of confusion: is write_versions racy? It assigns
> to nfsd_versions, which is used in svc_process() to decide whether a
> version is supported or not, without doing the adjustment of rq_argp and
> rq_resp which a comment in write_versions() says is necessary. And
> there's no locking around the nfsd_serv check there. So in theory could
> a write_versions() at the wrong time result in an nfsd that accepted nfs
> versions that it shouldn't (and hence could overflow some buffer)?
>

Hmm. You may be right, though I'd think the race is pretty unlikely in
normal usage. I guess the comment you're referring to is this one:

if (nfsd_serv)
/* Cannot change versions without updating
* nfsd_serv->sv_xdrsize, and reallocing
* rq_argp and rq_resp
*/
return -EBUSY;

...so the race would have to be:

nfsd is down
write versions is called and gets past
nfsd_serv NULL ptr check

nfsd accepts a call
write versions disables the NFS version that
was in the call

A pretty unlikely race, I think, but might be possible. Holding the
nfsd_mutex over the life of write_versions is probably the right thing
to do here. I'll plan a respin to add that (and I'll check that it
doesn't cause any problems).

> That'd be a preexisting problem, nothing to do with your work--I was
> just grepping for uses of nfsd_serv....
>

This is actually Neil's work...I only added the signed-off-by since I
added and cleaned up some comments. ;-)

--
Jeff Layton <[email protected]>

2008-06-05 00:47:37

by Greg Banks

[permalink] [raw]
Subject: Re: [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking.

Jeff Layton wrote:
> From: Neil Brown <[email protected]>
>
>
>
Sorry; I know this was posted before but now that I'm back home I've had
a chance to review it in slightly more depth so I have some more comments.
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index 5ac00c4..d601a77 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -441,6 +441,8 @@ static ssize_t write_threads(struct file *file, char *buf, size_t size)
> @@ -450,22 +452,26 @@ static ssize_t write_pool_threads(struct file *file, char *buf, size_t size)
> @@ -496,10 +502,12 @@ static ssize_t write_pool_threads(struct file *file, char *buf, size_t size)
>
These are fine.

> @@ -566,14 +574,13 @@ static ssize_t write_versions(struct file *file, char *buf, size_t size)
>
I notice there's no change to write_versions(), even though that
function checks nfsd_serv for NULL and implicitly expects the svc_serv
to be created while it's running.
> @@ -603,9 +610,7 @@ static ssize_t write_ports(struct file *file, char *buf, size_t size)
> @@ -614,10 +619,8 @@ static ssize_t write_ports(struct file *file, char *buf, size_t size
> @@ -655,7 +658,6 @@ static ssize_t write_ports(struct file *file, char *buf, size_t size)
> @@ -666,13 +668,22 @@ static ssize_t write_ports(struct file *file, char *buf, size_t size)
> @@ -691,13 +702,13 @@ static ssize_t write_maxblksize(struct file *file, char *buf, size_t size)
>
All good.

I notice there's no change to write_leasetime(). That calls
nfs4_reset_lease(), which seems to want to only be called when nfsd is
not started (according to my reading of the comment preceding the
function), and which uses the BKL to protect the variable
user_lease_time. Perhaps that path should be changed to use the new
nfsd_mutex also?

Similarly with write_recoverydir(). That calls nfs4_reset_recoverydir()
which uses client_mutex to protect user_recovery_dirname[], but that
variable is only used during nfsd startup. Perhaps that path should use
the new nfsd_mutex also?
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index 941041f..040b3c2 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -53,11 +53,27 @@
> @@ -190,13 +206,14 @@ void nfsd_reset_versions(void)
> @@ -223,7 +240,7 @@ int nfsd_create_serv(void)
> @@ -282,6 +299,8 @@ int nfsd_set_nrthreads(int n, int *nthreads)
> @@ -316,7 +335,6 @@ int nfsd_set_nrthreads(int n, int *nthreads)
> @@ -325,7 +343,6 @@ int nfsd_set_nrthreads(int n, int *nthreads)
> @@ -334,8 +351,8 @@ int
> @@ -363,7 +380,7 @@ nfsd_svc(unsigned short port, int nrservs)
> @@ -399,7 +416,7 @@ nfsd(struct svc_rqst *rqstp)
> @@ -417,11 +434,13 @@ nfsd(struct svc_rqst *rqstp)
> @@ -477,7 +496,7 @@ nfsd(struct svc_rqst *rqstp)
> @@ -486,7 +505,7 @@ out:
>
All good.
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index 01c7e31..7bffaff 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -461,7 +461,8 @@ svc_create_pooled(struct svc_program *prog, unsigned int bufsize,
> @@ -578,9 +579,10 @@ out_enomem:
> @@ -674,7 +676,7 @@ found_pool:
> @@ -722,7 +724,8 @@ svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
>

All good.

--
Greg Banks, P.Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.

2008-06-05 01:03:10

by Greg Banks

[permalink] [raw]
Subject: Re: [PATCH 2/4] knfsd: remove special handling for SIGHUP

Jeff Layton wrote:
> The special handling for SIGHUP in knfsd is a holdover from much
> earlier versions of Linux where reloading the export table was
> more expensive. That facility is not really needed anymore and
> to my knowledge, is seldom-used.
>
> Signed-off-by: Jeff Layton <[email protected]>
>
Acked-by: Greg Banks <gnb-cP1dWloDopni96+mSzHFpQC/[email protected]>
> @@ -234,10 +226,9 @@ int nfsd_create_serv(void)
> }
>
> atomic_set(&nfsd_busy, 0);
> - nfsd_serv = svc_create_pooled(&nfsd_program,
> - nfsd_max_blksize,
> - nfsd_last_thread,
> - nfsd, SIG_NOCLEAN, THIS_MODULE);
> + nfsd_serv = svc_create_pooled(&nfsd_program, nfsd_max_blksize,
> + nfsd_last_thread, nfsd, SIGINT,
> + THIS_MODULE);
>
Now that there's no special interpretation of signals, you could
probably also remove the `sig' argument to svc_create_pooled() and the
svc_serv->sv_kill_signal field.

--
Greg Banks, P.Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.


2008-06-05 01:28:40

by Greg Banks

[permalink] [raw]
Subject: Re: [PATCH 3/4] knfsd: convert knfsd to kthread API

Jeff Layton wrote:
> This patch is rather large, but I couldn't figure out a way to break it
> up that would remain bisectable. It does several things:
>
> - change svc_thread_fn typedef to better match what kthread_create expects
> - change svc_pool_map_set_cpumask to be more kthread friendly. Make it
> take a task arg and and get rid of the "oldmask"
> - have svc_set_num_threads call kthread_create directly
> - eliminate __svc_create_thread
>
> Signed-off-by: Jeff Layton <[email protected]>
>
>

Acked-by: Greg Banks <[email protected]>
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index 582acb1..825936b 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -21,6 +21,7 @@
> @@ -46,7 +47,7 @@
> @@ -399,18 +400,19 @@ update_thread_usage(int busy_threads)
> /*
> * This is the NFS server kernel thread
> */
> -static void
> -nfsd(struct svc_rqst *rqstp)
> +static int
> +nfsd(void *vrqstp)
> {
> + struct svc_rqst *rqstp = (struct svc_rqst *) vrqstp;
>

Not that it matters, but you don't actually need a cast here.
> @@ -425,14 +427,18 @@ nfsd(struct svc_rqst *rqstp)
> @@ -454,15 +460,25 @@ nfsd(struct svc_rqst *rqstp)
> @@ -471,25 +487,23 @@ nfsd(struct svc_rqst *rqstp)
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -22,7 +22,7 @@
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -18,6 +18,7 @@
> @@ -291,15 +292,14 @@ svc_pool_map_put(void)
> @@ -307,26 +307,17 @@ svc_pool_map_set_cpumask(unsigned int pidx, cpumask_t *oldmask)
> @@ -579,47 +570,6 @@ out_enomem:
> @@ -688,7 +638,9 @@ found_pool:
> @@ -704,18 +656,32 @@ svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
>
All good.

--
Greg Banks, P.Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.

2008-06-05 01:30:21

by Greg Banks

[permalink] [raw]
Subject: Re: [PATCH 4/4] sunrpc: remove unneeded field from svc_serv struct

Jeff Layton wrote:
> Remove the sv_module field from the svc_serv struct since
> svc_set_num_threads doesn't bother with module reference
> counts anymore.
>
> Signed-off-by: Jeff Layton <[email protected]>
>
Acked-by: Greg Banks <[email protected]>

--
Greg Banks, P.Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.

2008-06-05 20:03:58

by Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking.

On Thu, Jun 05, 2008 at 10:47:37AM +1000, Greg Banks wrote:
> I notice there's no change to write_leasetime(). That calls
> nfs4_reset_lease(), which seems to want to only be called when nfsd is
> not started (according to my reading of the comment preceding the
> function), and which uses the BKL to protect the variable
> user_lease_time. Perhaps that path should be changed to use the new
> nfsd_mutex also?

write_leasetime() just results in setting the user_lease_time, which is
copied (once, on server startup) to lease_time, which is what is
actually used by the running server. So I don't think there's a race
here (and the existing BKL use in write_leasetime() isn't really
needed).

--b.

>
> Similarly with write_recoverydir(). That calls nfs4_reset_recoverydir()
> which uses client_mutex to protect user_recovery_dirname[], but that
> variable is only used during nfsd startup. Perhaps that path should use
> the new nfsd_mutex also?
> > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > index 941041f..040b3c2 100644
> > --- a/fs/nfsd/nfssvc.c
> > +++ b/fs/nfsd/nfssvc.c
> > @@ -53,11 +53,27 @@
> > @@ -190,13 +206,14 @@ void nfsd_reset_versions(void)
> > @@ -223,7 +240,7 @@ int nfsd_create_serv(void)
> > @@ -282,6 +299,8 @@ int nfsd_set_nrthreads(int n, int *nthreads)
> > @@ -316,7 +335,6 @@ int nfsd_set_nrthreads(int n, int *nthreads)
> > @@ -325,7 +343,6 @@ int nfsd_set_nrthreads(int n, int *nthreads)
> > @@ -334,8 +351,8 @@ int
> > @@ -363,7 +380,7 @@ nfsd_svc(unsigned short port, int nrservs)
> > @@ -399,7 +416,7 @@ nfsd(struct svc_rqst *rqstp)
> > @@ -417,11 +434,13 @@ nfsd(struct svc_rqst *rqstp)
> > @@ -477,7 +496,7 @@ nfsd(struct svc_rqst *rqstp)
> > @@ -486,7 +505,7 @@ out:
> >
> All good.
> > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> > index 01c7e31..7bffaff 100644
> > --- a/net/sunrpc/svc.c
> > +++ b/net/sunrpc/svc.c
> > @@ -461,7 +461,8 @@ svc_create_pooled(struct svc_program *prog, unsigned int bufsize,
> > @@ -578,9 +579,10 @@ out_enomem:
> > @@ -674,7 +676,7 @@ found_pool:
> > @@ -722,7 +724,8 @@ svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
> >
>
> All good.
>
> --
> Greg Banks, P.Engineer, SGI Australian Software Group.
> The cake is *not* a lie.
> I don't speak for SGI.
>
> _______________________________________________
> NFSv4 mailing list
> [email protected]
> http://linux-nfs.org/cgi-bin/mailman/listinfo/nfsv4

2008-06-05 20:15:43

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking.

On Thu, 5 Jun 2008 16:03:58 -0400
"J. Bruce Fields" <[email protected]> wrote:

> On Thu, Jun 05, 2008 at 10:47:37AM +1000, Greg Banks wrote:
> > I notice there's no change to write_leasetime(). That calls
> > nfs4_reset_lease(), which seems to want to only be called when nfsd is
> > not started (according to my reading of the comment preceding the
> > function), and which uses the BKL to protect the variable
> > user_lease_time. Perhaps that path should be changed to use the new
> > nfsd_mutex also?
>
> write_leasetime() just results in setting the user_lease_time, which is
> copied (once, on server startup) to lease_time, which is what is
> actually used by the running server. So I don't think there's a race
> here (and the existing BKL use in write_leasetime() isn't really
> needed).
>
> --b.
>

I think write_recoverydir might be a similar situation. It just sets
user_recovery_dirname, and that is also only read once when nfsd is
starting. We can probably eliminate the client_mutex locking around
that too.

Presuming that Bruce and I are correct, then I think I only need to
make sure that we add some nfsd_mutex locking to write_versions.

-- Jeff

> >
> > Similarly with write_recoverydir(). That calls nfs4_reset_recoverydir()
> > which uses client_mutex to protect user_recovery_dirname[], but that
> > variable is only used during nfsd startup. Perhaps that path should use
> > the new nfsd_mutex also?


> > > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > > index 941041f..040b3c2 100644
> > > --- a/fs/nfsd/nfssvc.c
> > > +++ b/fs/nfsd/nfssvc.c
> > > @@ -53,11 +53,27 @@
> > > @@ -190,13 +206,14 @@ void nfsd_reset_versions(void)
> > > @@ -223,7 +240,7 @@ int nfsd_create_serv(void)
> > > @@ -282,6 +299,8 @@ int nfsd_set_nrthreads(int n, int *nthreads)
> > > @@ -316,7 +335,6 @@ int nfsd_set_nrthreads(int n, int *nthreads)
> > > @@ -325,7 +343,6 @@ int nfsd_set_nrthreads(int n, int *nthreads)
> > > @@ -334,8 +351,8 @@ int
> > > @@ -363,7 +380,7 @@ nfsd_svc(unsigned short port, int nrservs)
> > > @@ -399,7 +416,7 @@ nfsd(struct svc_rqst *rqstp)
> > > @@ -417,11 +434,13 @@ nfsd(struct svc_rqst *rqstp)
> > > @@ -477,7 +496,7 @@ nfsd(struct svc_rqst *rqstp)
> > > @@ -486,7 +505,7 @@ out:
> > >
> > All good.
> > > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> > > index 01c7e31..7bffaff 100644
> > > --- a/net/sunrpc/svc.c
> > > +++ b/net/sunrpc/svc.c
> > > @@ -461,7 +461,8 @@ svc_create_pooled(struct svc_program *prog, unsigned int bufsize,
> > > @@ -578,9 +579,10 @@ out_enomem:
> > > @@ -674,7 +676,7 @@ found_pool:
> > > @@ -722,7 +724,8 @@ svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
> > >
> >
> > All good.
> >
> > --
> > Greg Banks, P.Engineer, SGI Australian Software Group.
> > The cake is *not* a lie.
> > I don't speak for SGI.
> >
> > _______________________________________________
> > NFSv4 mailing list
> > [email protected]
> > http://linux-nfs.org/cgi-bin/mailman/listinfo/nfsv4
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


--
Jeff Layton <[email protected]>