2003-05-15 23:01:45

by Olof Johansson

[permalink] [raw]
Subject: [PATCH] Race between rpciod() and rpciod_down() when shutting down

Below patch removes a timing window during which rpciod_down() will pass
the while(rpciod_pid) test, but go to sleep after rpciod() has woken up
sleepers on the rpciod_killer queue.

Unfortunately, there's no wait_for_completion_interruptible(). Instead of
rolling my own in the NFS code, I'll work on getting one added to the
kernel. The uninterruptible sleep is a temporary solution until then, and
I'll make sure to follow up here if/when the function is added to
kernel/sched.c.

Patch is against 2.4.20. Same problem exists in 2.5, I can supply
a separate patch for that if needed.


Thanks,

Olof


--- linux-2.4.20/net/sunrpc/sched.c.orig 2002-11-28 17:53:16.000000000 -0600
+++ linux-2.4.20/net/sunrpc/sched.c 2003-05-15 17:43:45.000000000 -0500
@@ -19,6 +19,7 @@
#include <linux/smp.h>
#include <linux/smp_lock.h>
#include <linux/spinlock.h>
+#include <linux/completion.h>

#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/xprt.h>
@@ -63,7 +64,7 @@ static LIST_HEAD(all_tasks);
* rpciod-related stuff
*/
static DECLARE_WAIT_QUEUE_HEAD(rpciod_idle);
-static DECLARE_WAIT_QUEUE_HEAD(rpciod_killer);
+static DECLARE_COMPLETION(rpciod_done);
static DECLARE_MUTEX(rpciod_sema);
static unsigned int rpciod_users;
static pid_t rpciod_pid;
@@ -979,7 +980,7 @@ rpciod_task_pending(void)
static int
rpciod(void *ptr)
{
- wait_queue_head_t *assassin = (wait_queue_head_t*) ptr;
+ struct completion *done = ptr;
int rounds = 0;

MOD_INC_USE_COUNT;
@@ -1027,7 +1028,7 @@ rpciod(void *ptr)
}

rpciod_pid = 0;
- wake_up(assassin);
+ complete(done);

dprintk("RPC: rpciod exiting\n");
MOD_DEC_USE_COUNT;
@@ -1076,7 +1077,7 @@ rpciod_up(void)
/*
* Create the rpciod thread and wait for it to start.
*/
- error = kernel_thread(rpciod, &rpciod_killer, 0);
+ error = kernel_thread(rpciod, &rpciod_done, 0);
if (error < 0) {
printk(KERN_WARNING "rpciod_up: create thread failed, error=%d\n", error);
rpciod_users--;
@@ -1119,14 +1120,23 @@ rpciod_down(void)
/*
* Display a message if we're going to wait longer.
*/
- while (rpciod_pid) {
+ if (rpciod_pid)
dprintk("rpciod_down: waiting for pid %d to exit\n", rpciod_pid);
- if (signalled()) {
- dprintk("rpciod_down: caught signal\n");
- break;
- }
- interruptible_sleep_on(&rpciod_killer);
- }
+/*
+ XXX Unfortunately, there's no wait_for_completion_interruptible()
+ (yet), so we need to bite the bullet and sleep uninterruptible.
+ Once we have the infrastructure for it, we can switch over.
+ -OlofJ
+
+ if (wait_for_completion_interruptible(&rpciod_done))
+ dprintk("rpciod_down: caught signal\n");
+*/
+ /* Even if rpciod_pid is 0, we still need to call wait_for_completion().
+ * Otherwise the "done" count of the completion structure will be off
+ * by one since complete() increases it.
+ */
+ wait_for_completion(&rpciod_done);
+
spin_lock_irqsave(&current->sigmask_lock, flags);
recalc_sigpending(current);
spin_unlock_irqrestore(&current->sigmask_lock, flags);




---
Olof Johansson Office: 4E002/905
pSeries Linux Development IBM Systems Group
Email: [email protected] Phone: 512-838-9858
All opinions are my own and not those of IBM




-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
http://www.enterpriselinuxforum.com

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2003-05-20 12:07:43

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH] Race between rpciod() and rpciod_down() when shutting down

>>>>> " " == olof <[email protected]> writes:

> Below patch removes a timing window during which rpciod_down()
> will pass the while(rpciod_pid) test, but go to sleep after
> rpciod() has woken up sleepers on the rpciod_killer queue.

> Unfortunately, there's no
> wait_for_completion_interruptible(). Instead of rolling my own
> in the NFS code, I'll work on getting one added to the kernel.
> The uninterruptible sleep is a temporary solution until then,
> and I'll make sure to follow up here if/when the function is
> added to kernel/sched.c.

Why not just ensure that the callers are holding the BKL?

Cheers,
Trond


-------------------------------------------------------
This SF.net email is sponsored by: ObjectStore.
If flattening out C++ or Java code to make your application fit in a
relational database is painful, don't do it! Check out ObjectStore.
Now part of Progress Software. http://www.objectstore.net/sourceforge
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs