2008-05-18 02:35:17

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 0/3] [RFC] knfsd: convert to kthread API and remove signaling for shutdown

knfsd is the last NFS-related kernel thread that does not use the
kthread API. This patchset represents a first pass at converting it. It
seems to work, but changes the shutdown interface. knfsd currently
allows signals to tell it when to come down.

My main question is...how tied to this shutdown method are we? We can
also take down nfsd by having people run:

# rpc.nfsd 0

...which basically does:

# echo 0 > /proc/fs/nfsd/threads

...so we don't think we *have* to use signals here. Is signaling
something we can reasonably eliminate? In addition to making the code a
bit simpler and cleaner, I think it will also eliminate this race:

http://lkml.org/lkml/2007/8/2/462

If this isn't feasible, then I can add the signaling back in, but am not
sure whether we can eliminate the race without adding more locking.

If we can do this, we may need to provide an alternate way to specify
that we want to take down all nfsd's but not flush the export table.
Currently that's done with a SIGHUP, but the value of this facility is
not clear to me since the kernel can just do another upcall.

Comments and suggestions appreciated...

Signed-off-by: Jeff Layton <[email protected]>



2008-05-18 02:35:17

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 1/3] [RFC] knfsd: convert knfsd to kthread API

This patch is rather large, but I couldn't figure out a way to break it
up that would remain bisectable. It does several things:

- change svc_thread_fn typedef to better match what kthread_create expects
- change svc_pool_map_set_cpumask to be more kthread friendly. Make it
take a task arg and and get rid of the "oldmask"
- have svc_set_num_threads call set up the thread and call kthread_run
directly, and to call kthread_stop to take down threads.
- change nfsd() to ignore signals, and to loop forever until kthread_stop
is called on it

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfssvc.c | 57 ++++++++----------------
include/linux/sunrpc/svc.h | 2 +-
net/sunrpc/svc.c | 101 +++++++++++++++-----------------------------
3 files changed, 54 insertions(+), 106 deletions(-)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 941041f..b5fdb9b 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -21,6 +21,7 @@
#include <linux/smp_lock.h>
#include <linux/freezer.h>
#include <linux/fs_struct.h>
+#include <linux/kthread.h>

#include <linux/sunrpc/types.h>
#include <linux/sunrpc/stats.h>
@@ -51,7 +52,7 @@
#define SIG_NOCLEAN SIGHUP

extern struct svc_program nfsd_program;
-static void nfsd(struct svc_rqst *rqstp);
+static int nfsd(void *vrqstp);
struct timeval nfssvc_boot;
struct svc_serv *nfsd_serv;
static atomic_t nfsd_busy;
@@ -391,18 +392,17 @@ update_thread_usage(int busy_threads)
/*
* This is the NFS server kernel thread
*/
-static void
-nfsd(struct svc_rqst *rqstp)
+static int
+nfsd(void *vrqstp)
{
+ struct svc_rqst *rqstp = (struct svc_rqst *) vrqstp;
struct fs_struct *fsp;
- int err;
- sigset_t shutdown_mask, allowed_mask;
+ int err, preverr = 0;

/* Lock module and set up kernel thread */
lock_kernel();
- daemonize("nfsd");

- /* After daemonize() this kernel thread shares current->fs
+ /* At this point, the thread shares current->fs
* with the init process. We need to create files with a
* umask of 0 instead of init's umask. */
fsp = copy_fs_struct(current->fs);
@@ -414,13 +414,8 @@ nfsd(struct svc_rqst *rqstp)
current->fs = fsp;
current->fs->umask = 0;

- siginitsetinv(&shutdown_mask, SHUTDOWN_SIGS);
- siginitsetinv(&allowed_mask, ALLOWED_SIGS);
-
nfsdstats.th_cnt++;

- rqstp->rq_task = current;
-
unlock_kernel();

/*
@@ -434,27 +429,29 @@ nfsd(struct svc_rqst *rqstp)
/*
* The main request loop
*/
- for (;;) {
- /* Block all but the shutdown signals */
- sigprocmask(SIG_SETMASK, &shutdown_mask, NULL);
-
+ while (!kthread_should_stop()) {
/*
* Find a socket with data available and call its
* recvfrom routine.
*/
while ((err = svc_recv(rqstp, 60*60*HZ)) == -EAGAIN)
;
- if (err < 0)
- break;
+ if (err < 0) {
+ if (err != preverr) {
+ printk(KERN_WARNING "%s: unexpected error "
+ "from svc_recv (%d)\n", __func__, -err);
+ preverr = err;
+ }
+ schedule_timeout_uninterruptible(HZ);
+ continue;
+ }
+
update_thread_usage(atomic_read(&nfsd_busy));
atomic_inc(&nfsd_busy);

/* Lock the export hash tables for reading. */
exp_readlock();

- /* Process request with signals blocked. */
- sigprocmask(SIG_SETMASK, &allowed_mask, NULL);
-
svc_process(rqstp);

/* Unlock export hash tables */
@@ -463,31 +460,15 @@ nfsd(struct svc_rqst *rqstp)
atomic_dec(&nfsd_busy);
}

- if (err != -EINTR) {
- printk(KERN_WARNING "nfsd: terminating on error %d\n", -err);
- } else {
- unsigned int signo;
-
- for (signo = 1; signo <= _NSIG; signo++)
- if (sigismember(&current->pending.signal, signo) &&
- !sigismember(&current->blocked, signo))
- break;
- killsig = signo;
- }
- /* Clear signals before calling svc_exit_thread() */
- flush_signals(current);
-
lock_kernel();
-
nfsdstats.th_cnt --;

out:
/* Release the thread */
svc_exit_thread(rqstp);

- /* Release module */
unlock_kernel();
- module_put_and_exit(0);
+ return 0;
}

static __be32 map_new_errors(u32 vers, __be32 nfserr)
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 4b54c5f..011d6d8 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -22,7 +22,7 @@
/*
* This is the RPC server thread function prototype
*/
-typedef void (*svc_thread_fn)(struct svc_rqst *);
+typedef int (*svc_thread_fn)(void *);

/*
*
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 01c7e31..18a9e33 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -18,6 +18,7 @@
#include <linux/mm.h>
#include <linux/interrupt.h>
#include <linux/module.h>
+#include <linux/kthread.h>

#include <linux/sunrpc/types.h>
#include <linux/sunrpc/xdr.h>
@@ -291,15 +292,14 @@ svc_pool_map_put(void)


/*
- * Set the current thread's cpus_allowed mask so that it
+ * Set the given thread's cpus_allowed mask so that it
* will only run on cpus in the given pool.
- *
- * Returns 1 and fills in oldmask iff a cpumask was applied.
*/
-static inline int
-svc_pool_map_set_cpumask(unsigned int pidx, cpumask_t *oldmask)
+static inline void
+svc_pool_map_set_cpumask(struct task_struct *task, unsigned int pidx)
{
struct svc_pool_map *m = &svc_pool_map;
+ unsigned int node = m->pool_to[pidx];

/*
* The caller checks for sv_nrpools > 1, which
@@ -307,26 +307,17 @@ svc_pool_map_set_cpumask(unsigned int pidx, cpumask_t *oldmask)
*/
BUG_ON(m->count == 0);

- switch (m->mode)
- {
- default:
- return 0;
+ switch (m->mode) {
case SVC_POOL_PERCPU:
{
- unsigned int cpu = m->pool_to[pidx];
-
- *oldmask = current->cpus_allowed;
- set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
- return 1;
+ set_cpus_allowed_ptr(task, &cpumask_of_cpu(node));
+ break;
}
case SVC_POOL_PERNODE:
{
- unsigned int node = m->pool_to[pidx];
node_to_cpumask_ptr(nodecpumask, node);
-
- *oldmask = current->cpus_allowed;
- set_cpus_allowed_ptr(current, nodecpumask);
- return 1;
+ set_cpus_allowed_ptr(task, nodecpumask);
+ break;
}
}
}
@@ -578,46 +569,6 @@ out_enomem:
EXPORT_SYMBOL(svc_prepare_thread);

/*
- * Create a thread in the given pool. Caller must hold BKL.
- * On a NUMA or SMP machine, with a multi-pool serv, the thread
- * will be restricted to run on the cpus belonging to the pool.
- */
-static int
-__svc_create_thread(svc_thread_fn func, struct svc_serv *serv,
- struct svc_pool *pool)
-{
- struct svc_rqst *rqstp;
- int error = -ENOMEM;
- int have_oldmask = 0;
- cpumask_t uninitialized_var(oldmask);
-
- rqstp = svc_prepare_thread(serv, pool);
- if (IS_ERR(rqstp)) {
- error = PTR_ERR(rqstp);
- goto out;
- }
-
- if (serv->sv_nrpools > 1)
- have_oldmask = svc_pool_map_set_cpumask(pool->sp_id, &oldmask);
-
- error = kernel_thread((int (*)(void *)) func, rqstp, 0);
-
- if (have_oldmask)
- set_cpus_allowed(current, oldmask);
-
- if (error < 0)
- goto out_thread;
- svc_sock_update_bufs(serv);
- error = 0;
-out:
- return error;
-
-out_thread:
- svc_exit_thread(rqstp);
- goto out;
-}
-
-/*
* Choose a pool in which to create a new thread, for svc_set_num_threads
*/
static inline struct svc_pool *
@@ -686,7 +637,9 @@ found_pool:
int
svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
{
- struct task_struct *victim;
+ struct svc_rqst *rqstp;
+ struct task_struct *task;
+ struct svc_pool *chosen_pool;
int error = 0;
unsigned int state = serv->sv_nrthreads-1;

@@ -702,18 +655,32 @@ svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
/* create new threads */
while (nrservs > 0) {
nrservs--;
- __module_get(serv->sv_module);
- error = __svc_create_thread(serv->sv_function, serv,
- choose_pool(serv, pool, &state));
- if (error < 0) {
- module_put(serv->sv_module);
+ chosen_pool = choose_pool(serv, pool, &state);
+
+ rqstp = svc_prepare_thread(serv, chosen_pool);
+ if (IS_ERR(rqstp)) {
+ error = PTR_ERR(rqstp);
break;
}
+
+ task = kthread_create(serv->sv_function, rqstp, serv->sv_name);
+ if (IS_ERR(task)) {
+ error = PTR_ERR(task);
+ svc_exit_thread(rqstp);
+ break;
+ }
+
+ rqstp->rq_task = task;
+ if (serv->sv_nrpools > 1)
+ svc_pool_map_set_cpumask(task, chosen_pool->sp_id);
+
+ svc_sock_update_bufs(serv);
+ wake_up_process(task);
}
/* destroy old threads */
while (nrservs < 0 &&
- (victim = choose_victim(serv, pool, &state)) != NULL) {
- send_sig(serv->sv_kill_signal, victim, 1);
+ (task = choose_victim(serv, pool, &state)) != NULL) {
+ kthread_stop(task);
nrservs++;
}

--
1.5.3.6


2008-05-18 02:35:17

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 2/3] [RFC] sunrpc: remove unneeded fields from svc_serv struct

Remove 2 fields that are no longer needed in the svc_serv struct,
sv_module and sv_kill_signal. sv_kill_signal doesn't appear to
currently be used for anything, and sv_module is no longer
needed since svc_set_num_threads doesn't bother with module reference
counts anymore. Also, remove the args from svc_create_pooled from
which these fields were populated.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfssvc.c | 6 ++----
include/linux/sunrpc/svc.h | 10 ++--------
net/sunrpc/svc.c | 7 ++-----
3 files changed, 6 insertions(+), 17 deletions(-)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index b5fdb9b..d4d4be9 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -218,10 +218,8 @@ int nfsd_create_serv(void)
}

atomic_set(&nfsd_busy, 0);
- nfsd_serv = svc_create_pooled(&nfsd_program,
- nfsd_max_blksize,
- nfsd_last_thread,
- nfsd, SIG_NOCLEAN, THIS_MODULE);
+ nfsd_serv = svc_create_pooled(&nfsd_program, nfsd_max_blksize,
+ nfsd_last_thread, nfsd);
if (nfsd_serv == NULL)
err = -ENOMEM;
unlock_kernel();
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 011d6d8..cd304c4 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -72,15 +72,10 @@ struct svc_serv {
unsigned int sv_nrpools; /* number of thread pools */
struct svc_pool * sv_pools; /* array of thread pools */

+ /* Callback to use when last thread exits */
void (*sv_shutdown)(struct svc_serv *serv);
- /* Callback to use when last thread
- * exits.
- */

- struct module * sv_module; /* optional module to count when
- * adding threads */
svc_thread_fn sv_function; /* main function for threads */
- int sv_kill_signal; /* signal to kill threads */
};

/*
@@ -388,8 +383,7 @@ struct svc_rqst *svc_prepare_thread(struct svc_serv *serv,
struct svc_pool *pool);
void svc_exit_thread(struct svc_rqst *);
struct svc_serv * svc_create_pooled(struct svc_program *, unsigned int,
- void (*shutdown)(struct svc_serv*),
- svc_thread_fn, int sig, struct module *);
+ void (*shutdown)(struct svc_serv*), svc_thread_fn);
int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int);
void svc_destroy(struct svc_serv *);
int svc_process(struct svc_rqst *);
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 18a9e33..da90ce0 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -434,18 +434,15 @@ EXPORT_SYMBOL(svc_create);
struct svc_serv *
svc_create_pooled(struct svc_program *prog, unsigned int bufsize,
void (*shutdown)(struct svc_serv *serv),
- svc_thread_fn func, int sig, struct module *mod)
+ svc_thread_fn func)
{
struct svc_serv *serv;
unsigned int npools = svc_pool_map_get();

serv = __svc_create(prog, bufsize, npools, shutdown);

- if (serv != NULL) {
+ if (serv != NULL)
serv->sv_function = func;
- serv->sv_kill_signal = sig;
- serv->sv_module = mod;
- }

return serv;
}
--
1.5.3.6


2008-05-18 02:35:17

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 3/3] [RFC] knfsd: remove signal defines and extraneous variables

There are some defines for signals delivered to knfsd which are no
longer used with the new kthread-based server.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfssvc.c | 21 ++-------------------
1 files changed, 2 insertions(+), 19 deletions(-)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index d4d4be9..fb70660 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -37,20 +37,6 @@

#define NFSDDBG_FACILITY NFSDDBG_SVC

-/* these signals will be delivered to an nfsd thread
- * when handling a request
- */
-#define ALLOWED_SIGS (sigmask(SIGKILL))
-/* these signals will be delivered to an nfsd thread
- * when not handling a request. i.e. when waiting
- */
-#define SHUTDOWN_SIGS (sigmask(SIGKILL) | sigmask(SIGHUP) | sigmask(SIGINT) | sigmask(SIGQUIT))
-/* if the last thread dies with SIGHUP, then the exports table is
- * left unchanged ( like 2.4-{0-9} ). Any other signal will clear
- * the exports table (like 2.2).
- */
-#define SIG_NOCLEAN SIGHUP
-
extern struct svc_program nfsd_program;
static int nfsd(void *vrqstp);
struct timeval nfssvc_boot;
@@ -152,7 +138,6 @@ int nfsd_nrthreads(void)
return nfsd_serv->sv_nrthreads;
}

-static int killsig; /* signal that was used to kill last nfsd */
static void nfsd_last_thread(struct svc_serv *serv)
{
/* When last nfsd thread exits we need to do some clean-up */
@@ -164,10 +149,8 @@ static void nfsd_last_thread(struct svc_serv *serv)
nfs4_state_shutdown();

printk(KERN_WARNING "nfsd: last server has exited\n");
- if (killsig != SIG_NOCLEAN) {
- printk(KERN_WARNING "nfsd: unexporting all filesystems\n");
- nfsd_export_flush();
- }
+ printk(KERN_WARNING "nfsd: unexporting all filesystems\n");
+ nfsd_export_flush();
}

void nfsd_reset_versions(void)
--
1.5.3.6


2008-05-19 06:07:21

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 0/3] [RFC] knfsd: convert to kthread API and remove signaling for shutdown

On Saturday May 17, [email protected] wrote:
> knfsd is the last NFS-related kernel thread that does not use the
> kthread API. This patchset represents a first pass at converting it. It
> seems to work, but changes the shutdown interface. knfsd currently
> allows signals to tell it when to come down.
>
> My main question is...how tied to this shutdown method are we? We can
> also take down nfsd by having people run:
>
> # rpc.nfsd 0
>
> ...which basically does:
>
> # echo 0 > /proc/fs/nfsd/threads
>
> ...so we don't think we *have* to use signals here. Is signaling
> something we can reasonably eliminate?

Good question. I suspect lots of distros still use "killall" or some
equivalent to stop nfsd - so at a minimum, some education would be
required.

Most kernel threads are there to provide a service for some other
entity, such as a filesystem or a device etc. So it doesn't make
sense to kill them except when the device/filesystem/whatever goes
away.

With nfsd, the thread *is* the central entity. So killing it does
make sense.
This doesn't argue strongly in favour of a killable nfsd, but does
explain why it might be different from all other kernel threads.

> In addition to making the code a
> bit simpler and cleaner, I think it will also eliminate this race:
>
> http://lkml.org/lkml/2007/8/2/462

I never followed up on this did I...
The core problem seems to be the principle of
"The last one out turns off the lights"
but once you've turned off the lights, you can't see if someone else
snuck back in so you aren't the last one. You really have to have
only one door and stand in the doorway while switching off the
lights....

If we replace the BKL usage with a simple global semaphore, that
problem might just go away. We should only need to protect
svc_destroy, svc_exit_thread, and svc_set_num_threads from each other.

It's long past time to discard the BLK here anyway.

>
> If this isn't feasible, then I can add the signaling back in, but am not
> sure whether we can eliminate the race without adding more locking.

I think I prefer keeping the signalling and fixing the locking.

>
> If we can do this, we may need to provide an alternate way to specify
> that we want to take down all nfsd's but not flush the export table.
> Currently that's done with a SIGHUP, but the value of this facility is
> not clear to me since the kernel can just do another upcall.

I added this functionality well before the kernel could do upcalls to
refill the export table.

I wanted to be able to change the number of active threads without a
complete shutdown and restart, so I make the /proc/fs/nfs/threads
file accessed by "nfsd NN".

Then I wanted
nfsd 0
nfsd 8
to just change the number of threads, not flush the exports table as
well (because at the time, flushing the exports table was more
serious).
So I arranged that "nfsd 0" just reduced the number of threads to 0
and changed no other (externally visible) state.

As you say, it isn't really needed now. I wouldn't have a problem
with removing the SIGHUP feature, but I think keeping SIGKILL and
SIGINT with their old meaning is important.

>
> Comments and suggestions appreciated...
>

Thanks for the efforts!

NeilBrown