LinuxLists.cc - [PATCH] Make kthread_stop() not oops when passed a bad pointer

2008-08-05 13:56:38

Subject: [PATCH] Make kthread_stop() not oops when passed a bad pointer

Due to some botched error handling code in a driver I was writing, I
recently called kthread_stop(NULL). It's terribly exciting to discover
that you've just oopsed while holding a mutex that's required in order
to shut down. Make kthread_stop a little more robust against numbskulls
like me.

Signed-off-by: Matthew Wilcox <[email protected]>

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 96cff2f..06c3477 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -201,6 +201,9 @@ int kthread_stop(struct task_struct *k)
{
int ret;

+ if (!k || IS_ERR(k))
+ return -EINVAL;
+
mutex_lock(&kthread_stop_lock);

/* It could exit after stop_info.k set, but before wake_up_process. */
--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2008-08-05 15:58:28

by Stefan Richter

[permalink] [raw]

Subject: Re: [PATCH] Make kthread_stop() not oops when passed a bad pointer

Matthew Wilcox wrote:
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -201,6 +201,9 @@ int kthread_stop(struct task_struct *k)
> {
> int ret;
>
> + if (!k || IS_ERR(k))
> + return -EINVAL;
> +
> mutex_lock(&kthread_stop_lock);
>
> /* It could exit after stop_info.k set, but before wake_up_process. */

Would
if (WARN_ON(!k || IS_ERR(k)))

be in order, or are there valid cases of passing an invalid pointer?
--
Stefan Richter
-=====-==--- =--- --=-=
http://arcgraph.de/sr/

2008-08-05 16:22:57

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH] Make kthread_stop() not oops when passed a bad pointer

On Tue, 5 Aug 2008 07:55:59 -0600 Matthew Wilcox <[email protected]> wrote:

>
> Due to some botched error handling code in a driver I was writing, I
> recently called kthread_stop(NULL). It's terribly exciting to discover
> that you've just oopsed while holding a mutex that's required in order
> to shut down. Make kthread_stop a little more robust against numbskulls
> like me.
>
> Signed-off-by: Matthew Wilcox <[email protected]>
>
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 96cff2f..06c3477 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -201,6 +201,9 @@ int kthread_stop(struct task_struct *k)
> {
> int ret;
>
> + if (!k || IS_ERR(k))
> + return -EINVAL;
> +
> mutex_lock(&kthread_stop_lock);
>

hrm.

a) as this is a programming error, the programmer would like it not
to be hidden. A WARN_ON() would address that.

b) there are many instances of

if (p)
kthread_stop(p);

which could now be simplified to

kthread_stop(p);

c) a) and b) are incompatible.

2008-08-06 05:43:41

by Minchan Kim

[permalink] [raw]

Subject: [PATCH][migration] Trivial cleanup

This is a trivial cleanup.
Anyone doesn't use it any more.

Signed-off-by: MinChan Kim <[email protected]>
---
mm/mempolicy.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 97020c0..36f4257 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -808,7 +808,6 @@ static int migrate_to_node(struct mm_struct *mm, int
source, int dest,
int do_migrate_pages(struct mm_struct *mm,
const nodemask_t *from_nodes, const nodemask_t *to_nodes, int flags)
{
- LIST_HEAD(pagelist);
int busy = 0;
int err = 0;
nodemask_t tmp;
--
1.5.4.3

2008-08-06 06:30:58

by Rusty Russell

[permalink] [raw]

Subject: Re: [PATCH] Make kthread_stop() not oops when passed a bad pointer

On Tuesday 05 August 2008 23:55:59 Matthew Wilcox wrote:
> Make kthread_stop a little more robust against numbskulls
> like me.
>
> Signed-off-by: Matthew Wilcox <[email protected]>

Hi Willy,

I really do sympathize with your problem; but the quest is to figure out how
to identify it before the code is run, not to put a non-orthogonal bandaid
here which can hurt other cases.

How about a more ambitious "we've oopsed so break a mutex every 30 seconds of
waiting" patch?

> +?????if (!k || IS_ERR(k))
> +?????????????return -EINVAL;

1) There's no reason that kthread_stop is uniquely difficult to use. Why pick
on that one?

2) I know that kfree() handles NULL, but kthread_create/kthread_run never
return NULL, unlike kmalloc().

3) If we really want to pass a failed kthread_create() through kthread_stop(),
we should return PTR_ERR(k) here. But that should only be done if it made it
harder for the callers to screw up, which I don't think it does.

4) After a successful kthread_run(), kthread_stop() will always return the
value from the threadfn callback. ie. kthread_stop() doesn't ever fail. A
simple semantic, which this patch breaks.

5) Covering up programmer errors is not good policy. I dislike WARN_ON()
because an oops is much harder to miss. Painful for you, but The System
Works.

Sorry,
Rusty.

2008-08-06 12:07:49

by Matthew Wilcox

[permalink] [raw]

Subject: Re: [PATCH] Make kthread_stop() not oops when passed a bad pointer

On Wed, Aug 06, 2008 at 11:22:58AM +1000, Rusty Russell wrote:
> I really do sympathize with your problem; but the quest is to figure out how
> to identify it before the code is run, not to put a non-orthogonal bandaid
> here which can hurt other cases.
>
> How about a more ambitious "we've oopsed so break a mutex every 30 seconds of
> waiting" patch?

I was considering something more along the lines of "we've oopsed so
find every mutex we own and release it".

> > +?????if (!k || IS_ERR(k))
> > +?????????????return -EINVAL;
>
> 1) There's no reason that kthread_stop is uniquely difficult to use. Why pick
> on that one?

It was the one I hit.

> 2) I know that kfree() handles NULL, but kthread_create/kthread_run never
> return NULL, unlike kmalloc().

I'd kzalloc'd the memory structure, then rearranged the order of calls
initialising it without rearranging the destructor.

> 3) If we really want to pass a failed kthread_create() through kthread_stop(),
> we should return PTR_ERR(k) here. But that should only be done if it made it
> harder for the callers to screw up, which I don't think it does.

I'm actually really dubious about kthread_stop() returning a value at
all. To me, returning an error implies that the function failed to do
its job, ie the thread is still running. But that's not true; if it
returns -EINVAL, it means the thread never ran. And why should the
caller care? Only three callers of kthread_stop do anything with the
return value. Two of them just put the value in a debug message, and
the third one goes to the effort of passing the return value through
three layers of function pointer calls only to have all the callers
discard it.

> 4) After a successful kthread_run(), kthread_stop() will always return the
> value from the threadfn callback. ie. kthread_stop() doesn't ever fail. A
> simple semantic, which this patch breaks.

Now I'm confused. kthread_stop isn't failing. It preserves the
invariant that when it returns, the thread is no longer running.

> 5) Covering up programmer errors is not good policy. I dislike WARN_ON()
> because an oops is much harder to miss. Painful for you, but The System
> Works.

I don't understand why we wouldn't want to be more robust here.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2008-08-06 14:26:30

by Christoph Lameter

[permalink] [raw]

Subject: Re: [PATCH][migration] Trivial cleanup

Acked-by: Christoph Lameter <[email protected]>

2008-08-08 06:20:22

by Rusty Russell

[permalink] [raw]

Subject: Re: [PATCH] Make kthread_stop() not oops when passed a bad pointer

On Wednesday 06 August 2008 22:07:04 Matthew Wilcox wrote:
> On Wed, Aug 06, 2008 at 11:22:58AM +1000, Rusty Russell wrote:
> > How about a more ambitious "we've oopsed so break a mutex every 30
> > seconds of waiting" patch?
>
> I was considering something more along the lines of "we've oopsed so
> find every mutex we own and release it".

Hmm, I don't think that's possible in general is it?

> > 1) There's no reason that kthread_stop is uniquely difficult to use. Why
> > pick on that one?
>
> It was the one I hit.

Yes, I got that :) But if we're not about to sprinkle "if check_ptr(arg)" all
through the kernel wherever someone can misuse a function.

> > 2) I know that kfree() handles NULL, but kthread_create/kthread_run never
> > return NULL, unlike kmalloc().
>
> I'd kzalloc'd the memory structure, then rearranged the order of calls
> initialising it without rearranging the destructor.

And if you hadn't used kzalloc you'll still blow up. I dislike zeroing allocs
myself because I have dreams of valgrinding the kernel. gcc would warn about
this for a stack var, it'd be nice if it did the same here.

> > 3) If we really want to pass a failed kthread_create() through
> > kthread_stop(), we should return PTR_ERR(k) here. But that should only
> > be done if it made it harder for the callers to screw up, which I don't
> > think it does.
>
> I'm actually really dubious about kthread_stop() returning a value at
> all. To me, returning an error implies that the function failed to do
> its job, ie the thread is still running. But that's not true; if it
> returns -EINVAL, it means the thread never ran.

You mean -EINTR? Yes, it should probably be left undefined: the caller
presumably knows it didn't start the thread.

> And why should the
> caller care? Only three callers of kthread_stop do anything with the
> return value. Two of them just put the value in a debug message, and
> the third one goes to the effort of passing the return value through
> three layers of function pointer calls only to have all the callers
> discard it.

Good point. I assumed passing through the value would be useful, but as it's
not been after a couple of years, we should make the callback return void.
It'd be a painful transition, but I like the simplicity.

> > 4) After a successful kthread_run(), kthread_stop() will always return
> > the value from the threadfn callback. ie. kthread_stop() doesn't ever
> > fail. A simple semantic, which this patch breaks.
>
> Now I'm confused. kthread_stop isn't failing. It preserves the
> invariant that when it returns, the thread is no longer running.

No, all we know is that they passed the wrong thing into kthread_stop(). So
we really don't know if their thread is stopped; maybe it never existed (as
in your case), maybe it's still running.

> > 5) Covering up programmer errors is not good policy. I dislike WARN_ON()
> > because an oops is much harder to miss. Painful for you, but The System
> > Works.
>
> I don't understand why we wouldn't want to be more robust here.

Because the OOPS made you fix the bug the way silently sucking it up wouldn't
have.

Rusty.

2008-08-08 20:51:05

by Matthew Wilcox

[permalink] [raw]

Subject: Re: [PATCH] Make kthread_stop() not oops when passed a bad pointer

On Thu, Aug 07, 2008 at 06:48:05AM +1000, Rusty Russell wrote:
> On Wednesday 06 August 2008 22:07:04 Matthew Wilcox wrote:
> > On Wed, Aug 06, 2008 at 11:22:58AM +1000, Rusty Russell wrote:
> > > How about a more ambitious "we've oopsed so break a mutex every 30
> > > seconds of waiting" patch?
> >
> > I was considering something more along the lines of "we've oopsed so
> > find every mutex we own and release it".
>
> Hmm, I don't think that's possible in general is it?

Maybe not. I've not looked carefully. Though I just had a really good
idea that might have the same effect. If you have CONFIG_DEBUG_MUTEXES
on, then there's an owner field. So if we don't free the thread_info
for an oopsed task, we can tell that it's currently owned by a dead
task, and usurp the ownership.

> > > 1) There's no reason that kthread_stop is uniquely difficult to use. Why
> > > pick on that one?
> >
> > It was the one I hit.
>
> Yes, I got that :) But if we're not about to sprinkle "if check_ptr(arg)" all
> through the kernel wherever someone can misuse a function.

We are starting to move checks from the callers into the callees -- at
least as much for code size improvements as anything else. I'm not sure
why the line should be between kthread_stop and kfree rather than on the
other side of kthread_stop.

> > > 2) I know that kfree() handles NULL, but kthread_create/kthread_run never
> > > return NULL, unlike kmalloc().
> >
> > I'd kzalloc'd the memory structure, then rearranged the order of calls
> > initialising it without rearranging the destructor.
>
> And if you hadn't used kzalloc you'll still blow up. I dislike zeroing allocs
> myself because I have dreams of valgrinding the kernel. gcc would warn about
> this for a stack var, it'd be nice if it did the same here.

It's too complex for gcc to be able to see this one, IMO. Valgrind
would catch it. I'm not a huge fan of kzalloc either.

> Good point. I assumed passing through the value would be useful, but as it's
> not been after a couple of years, we should make the callback return void.
> It'd be a painful transition, but I like the simplicity.

Cool.

> > > 4) After a successful kthread_run(), kthread_stop() will always return
> > > the value from the threadfn callback. ie. kthread_stop() doesn't ever
> > > fail. A simple semantic, which this patch breaks.
> >
> > Now I'm confused. kthread_stop isn't failing. It preserves the
> > invariant that when it returns, the thread is no longer running.
>
> No, all we know is that they passed the wrong thing into kthread_stop(). So
> we really don't know if their thread is stopped; maybe it never existed (as
> in your case), maybe it's still running.

Ah, good point, I hadn't considered the possibility of passing the wrong
pointer in.

> > > 5) Covering up programmer errors is not good policy. I dislike WARN_ON()
> > > because an oops is much harder to miss. Painful for you, but The System
> > > Works.
> >
> > I don't understand why we wouldn't want to be more robust here.
>
> Because the OOPS made you fix the bug the way silently sucking it up wouldn't
> have.

I made doing what I did no longer a bug ;-)

But, look, I'm perfectly happy with:

+ BUG_ON(!k || ERR_PTR(k));

as long as it happens before we take the mutex.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."