2001-02-18 22:24:47

by Kenn Humborg

[permalink] [raw]
Subject: kernel_thread() & thread starting


In init/main.c, do_basic_setup() we have:

start_context_thread();
do_initcalls();

start_context_thread() calls kernel_thread() to start the keventd
thread. Then do_initcalls() calls all the init functions and
finishes by calling flush_scheduled_tasks(). This function ends
up calling schedule_task() which checks if keventd is running.

With a very stripped down kernel, it seems possible that do_initcalls()
can complete without context_thread() having had a chance to run (and
set the flag that keventd is running).

Right now, in the Linux/VAX project, I'm working with a very stripped
down kernel and I'm seeing this behaviour. Depending on what I enable
in the .config, I can get schedule_task() to fail with:

schedule_task(): keventd has not started

When starting bdflush and kupdated, bdflush_init() uses a semaphore to
make sure that the threads have run before continuing. Shouldn't
start_context_thread() do something similar?

Or am I missing something?

Thanks,
Kenn


2001-02-18 22:54:08

by Russell King

[permalink] [raw]
Subject: Re: kernel_thread() & thread starting

Kenn Humborg writes:
> When starting bdflush and kupdated, bdflush_init() uses a semaphore to
> make sure that the threads have run before continuing. Shouldn't
> start_context_thread() do something similar?

I think this would be a good idea. Here is a patch to try. Please report
back if it works so that it can be forwarded to Linus. Thanks.

--- orig/kernel/context.c Tue Jan 30 13:31:11 2001
+++ linux/kernel/context.c Sun Feb 18 22:51:56 2001
@@ -63,7 +63,7 @@
return ret;
}

-static int context_thread(void *dummy)
+static int context_thread(void *sem)
{
struct task_struct *curtask = current;
DECLARE_WAITQUEUE(wait, curtask);
@@ -79,6 +79,8 @@
recalc_sigpending(curtask);
spin_unlock_irq(&curtask->sigmask_lock);

+ up((struct semaphore *)sem);
+
/* Install a handler so SIGCLD is delivered */
sa.sa.sa_handler = SIG_IGN;
sa.sa.sa_flags = 0;
@@ -148,7 +150,9 @@

int start_context_thread(void)
{
- kernel_thread(context_thread, NULL, CLONE_FS | CLONE_FILES);
+ DECLARE_MUTEX_LOCKED(sem);
+ kernel_thread(context_thread, &sem, CLONE_FS | CLONE_FILES);
+ down(&sem);
return 0;
}



--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2001-02-19 10:17:29

by Philipp Rumpf

[permalink] [raw]
Subject: Re: kernel_thread() & thread starting

On Sun, 18 Feb 2001, Kenn Humborg wrote:
> in the .config, I can get schedule_task() to fail with:
>
> schedule_task(): keventd has not started

This shouldn't be a failure case, just a (bogus) printk.

> When starting bdflush and kupdated, bdflush_init() uses a semaphore to
> make sure that the threads have run before continuing. Shouldn't
> start_context_thread() do something similar?

Why bother ? It looks like a leftover debugging message which doesn't
make a lot of sense once the code is stable (what might make sense is
checking keventd is still around, but that's not what the code is doing).

Proposed patch:

--- linux-2.4/kernel/context.c Fri Jan 12 18:52:41 2001
+++ linux-prumpf/kernel/context.c Mon Feb 19 11:11:37 2001
@@ -26,19 +26,9 @@
static int keventd_running;
static struct task_struct *keventd_task;

-static int need_keventd(const char *who)
-{
- if (keventd_running == 0)
- printk(KERN_ERR "%s(): keventd has not started\n", who);
- return keventd_running;
-}
-
int current_is_keventd(void)
{
- int ret = 0;
- if (need_keventd(__FUNCTION__))
- ret = (current == keventd_task);
- return ret;
+ return current == keventd_task;
}

/**
@@ -57,7 +47,6 @@
int schedule_task(struct tq_struct *task)
{
int ret;
- need_keventd(__FUNCTION__);
ret = queue_task(task, &tq_context);
wake_up(&context_task_wq);
return ret;

dwmw2 ?

2001-02-19 10:33:17

by David Woodhouse

[permalink] [raw]
Subject: Re: kernel_thread() & thread starting


[email protected] said:
> Why bother ? It looks like a leftover debugging message which
> doesn't make a lot of sense once the code is stable (what might make
> sense is checking keventd is still around, but that's not what the
> code is doing).

> Proposed patch:

> dwmw2 ?

Don't look at me. I didn't like the current_is_keventd stuff very much
in the first place. akpm?

--
dwmw2


2001-02-19 12:35:56

by Andrew Morton

[permalink] [raw]
Subject: Re: kernel_thread() & thread starting

David Woodhouse wrote:
>
> [email protected] said:
> > Why bother ? It looks like a leftover debugging message which
> > doesn't make a lot of sense once the code is stable (what might make
> > sense is checking keventd is still around, but that's not what the
> > code is doing).

keventd *must* still be around.

And the code obviously isn't completely stable, and this debug
message has found something rather unpleasant.

I don't think we should run the init tasks when keventd may, or
may not be running. Sure, the current code does, by happenstance,
all work correctly when keventd hasn't yet started running, and
when it's starting up. But it's safer, saner and surer just
to crank the damn thing up before proceeding.

> > Proposed patch:
>
> > dwmw2 ?
>
> Don't look at me. I didn't like the current_is_keventd stuff very much
> in the first place. akpm?

Heh. Tell that to wakeup_bdflush().

The all-singing fully-async + fully-sync + async with callback
patch was dropped, and until we can demonstrate that it
fixes a bug, it can stay dropped. I think it _will_ fix
a bug, but the development of userspace hotplug infrastructure
hasn't reached the stage where the need for a kernel fix
has been proven.

I believe the right thing to do here is the RMK approach.

Here's a faintly tested patch.

--- linux-2.4.2-pre4/kernel/context.c Sat Jan 13 04:52:41 2001
+++ lk/kernel/context.c Mon Feb 19 23:33:38 2001
@@ -19,6 +19,7 @@
#include <linux/init.h>
#include <linux/unistd.h>
#include <linux/signal.h>
+#include <asm/semaphore.h>

static DECLARE_TASK_QUEUE(tq_context);
static DECLARE_WAIT_QUEUE_HEAD(context_task_wq);
@@ -26,19 +27,9 @@
static int keventd_running;
static struct task_struct *keventd_task;

-static int need_keventd(const char *who)
-{
- if (keventd_running == 0)
- printk(KERN_ERR "%s(): keventd has not started\n", who);
- return keventd_running;
-}
-
int current_is_keventd(void)
{
- int ret = 0;
- if (need_keventd(__FUNCTION__))
- ret = (current == keventd_task);
- return ret;
+ return (current == keventd_task);
}

/**
@@ -57,13 +48,12 @@
int schedule_task(struct tq_struct *task)
{
int ret;
- need_keventd(__FUNCTION__);
ret = queue_task(task, &tq_context);
wake_up(&context_task_wq);
return ret;
}

-static int context_thread(void *dummy)
+static int context_thread(void *sem)
{
struct task_struct *curtask = current;
DECLARE_WAITQUEUE(wait, curtask);
@@ -85,6 +75,7 @@
siginitset(&sa.sa.sa_mask, sigmask(SIGCHLD));
do_sigaction(SIGCHLD, &sa, (struct k_sigaction *)0);

+ up((struct semaphore *)sem);
/*
* If one of the functions on a task queue re-adds itself
* to the task queue we call schedule() in state TASK_RUNNING
@@ -146,9 +137,11 @@
remove_wait_queue(&context_task_done, &wait);
}

-int start_context_thread(void)
+int __init start_context_thread(void)
{
- kernel_thread(context_thread, NULL, CLONE_FS | CLONE_FILES);
+ DECLARE_MUTEX_LOCKED(sem);
+ kernel_thread(context_thread, &sem, CLONE_FS | CLONE_FILES);
+ down(&sem);
return 0;
}

-

2001-02-19 21:24:10

by Russell King

[permalink] [raw]
Subject: Re: kernel_thread() & thread starting

Philipp Rumpf writes:
> That still won't catch keventd oopsing though - which I think might happen
> quite easily in real life.

Maybe we should panic in that case? For example, what happens if kswapd
oopses? kreclaimd? bdflush? kupdate? All these have the same problem,
and should probably have the same "fix" applied, whatever that fix may
be.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2001-02-19 21:41:33

by Philipp Rumpf

[permalink] [raw]
Subject: Re: kernel_thread() & thread starting

On Mon, 19 Feb 2001, Russell King wrote:
> Philipp Rumpf writes:
> > That still won't catch keventd oopsing though - which I think might happen
> > quite easily in real life.
>
> Maybe we should panic in that case? For example, what happens if kswapd
> oopses? kreclaimd? bdflush? kupdate? All these have the same problem,

No. If kswapd oopses it's a bug in kswapd (or related code). If keventd
oopses most likely the broken code is actually the task queue you
scheduled, which belongs to your driver.

2001-02-19 22:26:32

by David Woodhouse

[permalink] [raw]
Subject: Re: kernel_thread() & thread starting

On Mon, 19 Feb 2001, Philipp Rumpf wrote:

> No. If kswapd oopses it's a bug in kswapd (or related code). If keventd
> oopses most likely the broken code is actually the task queue you
> scheduled, which belongs to your driver.

If we're going to detect this case, we might as well just restart keventd.

--
dwmw2


2001-02-21 00:52:26

by Kenn Humborg

[permalink] [raw]
Subject: Re: kernel_thread() & thread starting

On Sun, Feb 18, 2001 at 10:53:16PM +0000, Russell King wrote:
> Kenn Humborg writes:
> > When starting bdflush and kupdated, bdflush_init() uses a semaphore to
> > make sure that the threads have run before continuing. Shouldn't
> > start_context_thread() do something similar?
>
> I think this would be a good idea. Here is a patch to try. Please report
> back if it works so that it can be forwarded to Linus. Thanks.

Works perfectly for me.

I'll leave it up to you guys to decide what's the right way to deal with
this and pass a patch to Linus/Alan. Meanwhile, I'll keep Russell's
patch below in our CVS tree.

Thanks,
Kenn

> --- orig/kernel/context.c Tue Jan 30 13:31:11 2001
> +++ linux/kernel/context.c Sun Feb 18 22:51:56 2001
> @@ -63,7 +63,7 @@
> return ret;
> }
>
> -static int context_thread(void *dummy)
> +static int context_thread(void *sem)
> {
> struct task_struct *curtask = current;
> DECLARE_WAITQUEUE(wait, curtask);
> @@ -79,6 +79,8 @@
> recalc_sigpending(curtask);
> spin_unlock_irq(&curtask->sigmask_lock);
>
> + up((struct semaphore *)sem);
> +
> /* Install a handler so SIGCLD is delivered */
> sa.sa.sa_handler = SIG_IGN;
> sa.sa.sa_flags = 0;
> @@ -148,7 +150,9 @@
>
> int start_context_thread(void)
> {
> - kernel_thread(context_thread, NULL, CLONE_FS | CLONE_FILES);
> + DECLARE_MUTEX_LOCKED(sem);
> + kernel_thread(context_thread, &sem, CLONE_FS | CLONE_FILES);
> + down(&sem);
> return 0;
> }
>
>
>
> --
> Russell King ([email protected]) The developer of ARM Linux
> http://www.arm.linux.org.uk/personal/aboutme.html