2002-08-07 09:19:15

by Nikita Danilov

[permalink] [raw]
Subject: kernel thread exit race

Hello,

what is the politically correct way to exit from a kernel thread daemon
without module unload races?

Currently most kernel threads exit with something like

wake_up(&daemon_done_wait_queue);
return 0;

(or complete() in stead of wake_up()). Problem is that thread waiting
for daemon shutdown can start running on another CPU while daemon is
still executing and unload module, in particular unmapping page with
daemon code.

Reiserfs used to do something like

/*
* BKL will be released in do_exit()
*/
lock_kernel();
wake_up(&daemon_done_wait_queue);
return 0;

and wait for daemon completion under BKL so that when waiter resumes,
daemon is definitely not executing module code. This looks like a hack,
though. Is there a better solution?

Nikita.


2002-08-07 09:34:44

by Russell King

[permalink] [raw]
Subject: Re: kernel thread exit race

On Wed, Aug 07, 2002 at 01:22:51PM +0400, Nikita Danilov wrote:
> what is the politically correct way to exit from a kernel thread daemon
> without module unload races?

You need to use the completion stuff (see include/linux/completion.h)
Specifically complete_and_exit(), which is in include/linux/kernel.h
(for some weird reason).

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2002-08-07 09:55:37

by Alan

[permalink] [raw]
Subject: Re: kernel thread exit race


On Wed, 2002-08-07 at 10:22, Nikita Danilov wrote:
> Hello,
>
> what is the politically correct way to exit from a kernel thread daemon
> without module unload races?

You probably want to use completions. There is a function in the kernel
core "complete_and_exit" which does both the complete() and then the
exit() so that after complete finishes the task will not re-enter
modulespace and risk an unload race


2002-08-07 10:01:48

by Nikita Danilov

[permalink] [raw]
Subject: Re: kernel thread exit race

Alan Cox writes:
>
> On Wed, 2002-08-07 at 10:22, Nikita Danilov wrote:
> > Hello,
> >
> > what is the politically correct way to exit from a kernel thread daemon
> > without module unload races?
>
> You probably want to use completions. There is a function in the kernel
> core "complete_and_exit" which does both the complete() and then the
> exit() so that after complete finishes the task will not re-enter
> modulespace and risk an unload race
>

Ah I see, thank you and Russell. But this depends on no architecture
ever accessing spinlock data after letting waiters to run, otherwise
there still is (tiny) window for race at the end of complete() call,
right?

>

Nikita.

2002-08-07 11:26:48

by Nikita Danilov

[permalink] [raw]
Subject: Re: kernel thread exit race

Alan Cox writes:
> On Wed, 2002-08-07 at 11:05, Nikita Danilov wrote:
> > Ah I see, thank you and Russell. But this depends on no architecture
> > ever accessing spinlock data after letting waiters to run, otherwise
> > there still is (tiny) window for race at the end of complete() call,
> > right?
>
> complete() as opposed to spinlocks/semaphores is defined to be safe to
> free the object once the complete finishes

So, complete() is not-arch dependent because spinlocks are "good" in all
architectures? complete() ends with spin_unlock_irqrestore() so it
cannot be any better than spinlocks, until there is some hidden magic.

Let me clarify this. Suppose there is some obscure architecture that
maintains in spinlocks some additional data for debugging. Then,

complete_and_exit()->complete()->spin_unlock_irqrestore()

"wakes up" thread on another CPU and proceeds to access spin-lock data
(to check/update debugging information, for example), but by this time
woken up thread manages to unload module and to un-map page containing
spin-lock data.

>

Nikita.

2002-08-07 10:48:22

by Alan

[permalink] [raw]
Subject: Re: kernel thread exit race

On Wed, 2002-08-07 at 11:05, Nikita Danilov wrote:
> Ah I see, thank you and Russell. But this depends on no architecture
> ever accessing spinlock data after letting waiters to run, otherwise
> there still is (tiny) window for race at the end of complete() call,
> right?

complete() as opposed to spinlocks/semaphores is defined to be safe to
free the object once the complete finishes

2002-08-07 12:08:54

by Russell King

[permalink] [raw]
Subject: Re: kernel thread exit race

On Wed, Aug 07, 2002 at 03:30:17PM +0400, Nikita Danilov wrote:
> So, complete() is not-arch dependent because spinlocks are "good" in all
> architectures? complete() ends with spin_unlock_irqrestore() so it
> cannot be any better than spinlocks, until there is some hidden magic.

It works like this:

cpu1 cpu2
kill thread (on cpu2)
complete_and_exit()
- takes spinlock
wait_for_completion()
- spins on completion spinlock
- increments x->done
- wakes up anyone waiting on
the completion
- releases spinlock
- checks x->done
- decrements x->done
- releases spinlock

OR:

cpu1 cpu2
kill thread (on cpu2)
wait_for_completion()
- takes spinlock
complete_and_exit()
- spins on spinlock
- checks x->done
- adds to waitqueue
- releases spinlock
- increments x->done
- sleeps
- wakes up anyone waiting on
the completion
- wakes up
- spins on spinlock
- releases spinlock
- decrements x->done
- releases spinlock

OR:

cpu1 cpu2
kill thread (on cpu2)
wait_for_completion()
- takes spinlock
- checks x->done
- adds to waitqueue
- releases spinlock
- sleeps
complete_and_exit()
- takes spinlock
- increments x->done
- wakes up anyone waiting on
the completion
- wakes up
- spins on spinlock
- releases spinlock
- decrements x->done
- releases spinlock

As you can see, wait_for_completion() will never return until complete()
has released the spinlock.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2002-08-07 11:59:44

by Alan

[permalink] [raw]
Subject: Re: kernel thread exit race

On Wed, 2002-08-07 at 12:30, Nikita Danilov wrote:
> Let me clarify this. Suppose there is some obscure architecture that
> maintains in spinlocks some additional data for debugging. Then,
>
> complete_and_exit()->complete()->spin_unlock_irqrestore()
>
> "wakes up" thread on another CPU and proceeds to access spin-lock data
> (to check/update debugging information, for example), but by this time
> woken up thread manages to unload module and to un-map page containing
> spin-lock data.

At the point complete() sets x->done to indicate completion occurred it
holds the lock, when it drops the lock then the wait_for_completion can
return, and while it may touch the data again, the complete() call will
not.

So in your unload you have got

complete_and_exit()

On return from this we know the thread won't touch the lock data again
and we know the thread if executing is not executing in module space


And
wait_for_completion

in the actual module unload path means the complete has finished (but
not neccessarily the exit) so the thread isnt running module code any
more and won't touch the lock again


This does assume that spin_unlock in the arch code doesn't scribble on
things after it unlocks. So if its doing debugging or pre-emption magic
it must do that before it finally drops the lock proper. Thats what I'd
expect anyway since it has to protect its own internal data too