2006-02-20 14:46:52

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 3/4] cleanup __exit_signal()

This patch factors out duplicated code under 'if' branches.
Also, BUG_ON() conversions and whitespace cleanups.

Signed-off-by: Oleg Nesterov <[email protected]>

--- 2.6.16-rc3/kernel/signal.c~3_ESIG 2006-02-20 02:02:03.000000000 +0300
+++ 2.6.16-rc3/kernel/signal.c 2006-02-20 20:55:50.000000000 +0300
@@ -341,24 +341,20 @@ void __exit_sighand(struct task_struct *
*/
void __exit_signal(struct task_struct *tsk)
{
- struct signal_struct * sig = tsk->signal;
- struct sighand_struct * sighand;
+ struct signal_struct *sig = tsk->signal;
+ struct sighand_struct *sighand;
+
+ BUG_ON(!sig);
+ BUG_ON(!atomic_read(&sig->count));

- if (!sig)
- BUG();
- if (!atomic_read(&sig->count))
- BUG();
rcu_read_lock();
sighand = rcu_dereference(tsk->sighand);
spin_lock(&sighand->siglock);
+
posix_cpu_timers_exit(tsk);
- if (atomic_dec_and_test(&sig->count)) {
+ if (atomic_dec_and_test(&sig->count))
posix_cpu_timers_exit_group(tsk);
- tsk->signal = NULL;
- __exit_sighand(tsk);
- spin_unlock(&sighand->siglock);
- flush_sigqueue(&sig->shared_pending);
- } else {
+ else {
/*
* If there is any task waiting for the group exit
* then notify it:
@@ -369,7 +365,6 @@ void __exit_signal(struct task_struct *t
}
if (tsk == sig->curr_target)
sig->curr_target = next_thread(tsk);
- tsk->signal = NULL;
/*
* Accumulate here the counters for all threads but the
* group leader as they die, so they can be added into
@@ -387,14 +382,18 @@ void __exit_signal(struct task_struct *t
sig->nvcsw += tsk->nvcsw;
sig->nivcsw += tsk->nivcsw;
sig->sched_time += tsk->sched_time;
- __exit_sighand(tsk);
- spin_unlock(&sighand->siglock);
- sig = NULL; /* Marker for below. */
+ sig = NULL; /* Marker for below. */
}
+
+ tsk->signal = NULL;
+ __exit_sighand(tsk);
+ spin_unlock(&sighand->siglock);
rcu_read_unlock();
+
clear_tsk_thread_flag(tsk,TIF_SIGPENDING);
flush_sigqueue(&tsk->pending);
if (sig) {
+ flush_sigqueue(&sig->shared_pending);
__cleanup_signal(sig);
}
}


2006-02-24 18:01:28

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 3/4] cleanup __exit_signal()

On Mon, Feb 20, 2006 at 07:04:03PM +0300, Oleg Nesterov wrote:
> This patch factors out duplicated code under 'if' branches.
> Also, BUG_ON() conversions and whitespace cleanups.

Passed steamroller. Looks sane to me.

Thanx, Paul

Acked-by: <[email protected]>
> Signed-off-by: Oleg Nesterov <[email protected]>
>
> --- 2.6.16-rc3/kernel/signal.c~3_ESIG 2006-02-20 02:02:03.000000000 +0300
> +++ 2.6.16-rc3/kernel/signal.c 2006-02-20 20:55:50.000000000 +0300
> @@ -341,24 +341,20 @@ void __exit_sighand(struct task_struct *
> */
> void __exit_signal(struct task_struct *tsk)
> {
> - struct signal_struct * sig = tsk->signal;
> - struct sighand_struct * sighand;
> + struct signal_struct *sig = tsk->signal;
> + struct sighand_struct *sighand;
> +
> + BUG_ON(!sig);
> + BUG_ON(!atomic_read(&sig->count));
>
> - if (!sig)
> - BUG();
> - if (!atomic_read(&sig->count))
> - BUG();
> rcu_read_lock();
> sighand = rcu_dereference(tsk->sighand);
> spin_lock(&sighand->siglock);
> +
> posix_cpu_timers_exit(tsk);
> - if (atomic_dec_and_test(&sig->count)) {
> + if (atomic_dec_and_test(&sig->count))
> posix_cpu_timers_exit_group(tsk);
> - tsk->signal = NULL;
> - __exit_sighand(tsk);
> - spin_unlock(&sighand->siglock);
> - flush_sigqueue(&sig->shared_pending);
> - } else {
> + else {
> /*
> * If there is any task waiting for the group exit
> * then notify it:
> @@ -369,7 +365,6 @@ void __exit_signal(struct task_struct *t
> }
> if (tsk == sig->curr_target)
> sig->curr_target = next_thread(tsk);
> - tsk->signal = NULL;
> /*
> * Accumulate here the counters for all threads but the
> * group leader as they die, so they can be added into
> @@ -387,14 +382,18 @@ void __exit_signal(struct task_struct *t
> sig->nvcsw += tsk->nvcsw;
> sig->nivcsw += tsk->nivcsw;
> sig->sched_time += tsk->sched_time;
> - __exit_sighand(tsk);
> - spin_unlock(&sighand->siglock);
> - sig = NULL; /* Marker for below. */
> + sig = NULL; /* Marker for below. */
> }
> +
> + tsk->signal = NULL;
> + __exit_sighand(tsk);
> + spin_unlock(&sighand->siglock);
> rcu_read_unlock();
> +
> clear_tsk_thread_flag(tsk,TIF_SIGPENDING);
> flush_sigqueue(&tsk->pending);
> if (sig) {
> + flush_sigqueue(&sig->shared_pending);
> __cleanup_signal(sig);
> }
> }

2006-02-24 18:16:21

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/4] cleanup __exit_signal()

"Paul E. McKenney" wrote:
>
> On Mon, Feb 20, 2006 at 07:04:03PM +0300, Oleg Nesterov wrote:
> > This patch factors out duplicated code under 'if' branches.
> > Also, BUG_ON() conversions and whitespace cleanups.
>
> Passed steamroller. Looks sane to me.

Oh, thanks!

I forgot to say it, but I had run steamroller tests too before I
sent "some tasklist_lock removals" series.

Do you know any other test which may be useful too?

Oleg.

2006-02-25 00:20:13

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 3/4] cleanup __exit_signal()

On Fri, Feb 24, 2006 at 09:13:22PM +0300, Oleg Nesterov wrote:
> "Paul E. McKenney" wrote:
> >
> > On Mon, Feb 20, 2006 at 07:04:03PM +0300, Oleg Nesterov wrote:
> > > This patch factors out duplicated code under 'if' branches.
> > > Also, BUG_ON() conversions and whitespace cleanups.
> >
> > Passed steamroller. Looks sane to me.
>
> Oh, thanks!
>
> I forgot to say it, but I had run steamroller tests too before I
> sent "some tasklist_lock removals" series.

Glad to hear it!

> Do you know any other test which may be useful too?

Matt Wilcox mentioned that a full build of gdb ran some tests that do
a good job of exercising signals. I have not yet tried this myself
(but am giving it a shot).

Also, my guess is that you ran steamroller on x86 (how many CPUs?).
I ran on ppc64.

Thanx, Paul

2006-02-25 01:48:17

by Suzanne Wood

[permalink] [raw]
Subject: Re: [PATCH 3/4] cleanup __exit_signal()

The extent of the rcu readside critical section is determined
by the corresponding placements of rcu_read_lock() and
rcu_read_unlock(). Your recent [PATCH] convert sighand_cache
to use SLAB_DESTROY_BY_RCU uncovered a comment that elicits
this request for clarification. (The initial motivation was in
seeing the introduction of an rcu_assign_pointer() and
looking for the corresponding rcu_dereference().)

Jul 13 2004 [PATCH] rmaplock 2/6 SLAB_DESTROY_BY_RCU (and
consistent with slab.c in linux-2.6.16-rc3), struct slab_rcu
is described:
* struct slab_rcu
*
* slab_destroy on a SLAB_DESTROY_BY_RCU cache uses this structure to
* arrange for kmem_freepages to be called via RCU. This is useful if
* we need to approach a kernel structure obliquely, from its address
* obtained without the usual locking. We can lock the structure to
* stabilize it and check it's still at the given address, only if we
* can be sure that the memory has not been meanwhile reused for some
* other kind of object (which our subsystem's lock might corrupt).
*
* rcu_read_lock before reading the address, then rcu_read_unlock after
* taking the spinlock within the structure expected at that address.

Does this mean that the rcu_read_lock() can safely occur just
after the spin_lock(&sighand->siglock)? Since I don't find an
example that follows this interpretation of the comment, what
is the intention? Or, if so, what is the particular context?
Looks like all kernel occurrences of rcu_dereference()
with sighand arguments have, within the function definition,
rcu_read_lock/unlock() pairs enclosing spin lock and unlock
pairs except that in group_send_sig_info() with a comment on
requiring rcu_read_lock or tasklist_lock.

An example is attached in your patch to move __exit_signal().
It appears that the rcu readside critical section is in place to
provide persistence of the task_struct. __exit_sighand() calls
sighand_free(sighand) -- proposed to be renamed cleanup_sighand(tsk)
to call kmem_cache_free(sighand_cachep, sighand) -- before
spin_unlock(&sighand->siglock) is called in __exit_signal().

Thank you for any suggestions.

> Subject: [PATCH 1/3] move __exit_signal() to kernel/exit.c
> From: Oleg Nesterov
> Date: 2006-02-22 22:32:54
>
> __exit_signal() is private to release_task() now.
> I think it is better to make it static in kernel/exit.c
> and export flush_sigqueue() instead - this function is
> much more simple and straightforward.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
>
> --- 2.6.16-rc3/include/linux/sched.h~1_MOVE 2006-02-20 21:00:09.000000000 +0300
> +++ 2.6.16-rc3/include/linux/sched.h 2006-02-23 00:23:40.000000000 +0300
> @@ -1143,7 +1143,6 @@ extern void exit_thread(void);
> extern void exit_files(struct task_struct *);
> extern void __cleanup_signal(struct signal_struct *);
> extern void cleanup_sighand(struct task_struct *);
> -extern void __exit_signal(struct task_struct *);
> extern void exit_itimers(struct signal_struct *);
>
> extern NORET_TYPE void do_group_exit(int);
> --- 2.6.16-rc3/include/linux/signal.h~1_MOVE 2006-01-19 18:13:07.000000000 +0300
> +++ 2.6.16-rc3/include/linux/signal.h 2006-02-23 00:36:27.000000000 +0300
> @@ -249,6 +249,8 @@ static inline void init_sigpending(struc
> INIT_LIST_HEAD(&sig->list);
> }
>
> +extern void flush_sigqueue(struct sigpending *queue);
> +
> /* Test if 'sig' is valid signal. Use this instead of testing _NSIG directly */
> static inline int valid_signal(unsigned long sig)
> {
> --- 2.6.16-rc3/kernel/exit.c~1_MOVE 2006-02-17 00:05:25.000000000 +0300
> +++ 2.6.16-rc3/kernel/exit.c 2006-02-23 00:32:46.000000000 +0300
> @@ -29,6 +29,7 @@
> #include <linux/cpuset.h>
> #include <linux/syscalls.h>
> #include <linux/signal.h>
> +#include <linux/posix-timers.h>
> #include <linux/cn_proc.h>
> #include <linux/mutex.h>
>
> @@ -60,6 +61,68 @@ static void __unhash_process(struct task
> remove_parent(p);
> }
>
> +/*
> + * This function expects the tasklist_lock write-locked.
> + */
> +static void __exit_signal(struct task_struct *tsk)
> +{
> + struct signal_struct *sig = tsk->signal;
> + struct sighand_struct *sighand;
> +
> + BUG_ON(!sig);
> + BUG_ON(!atomic_read(&sig->count));
> +
> + rcu_read_lock();
> + sighand = rcu_dereference(tsk->sighand);
> + spin_lock(&sighand->siglock);
> +
> + posix_cpu_timers_exit(tsk);
> + if (atomic_dec_and_test(&sig->count))
> + posix_cpu_timers_exit_group(tsk);
> + else {
> + /*
> + * If there is any task waiting for the group exit
> + * then notify it:
> + */
> + if (sig->group_exit_task && atomic_read(&sig->count) == sig->notify_count) {
> + wake_up_process(sig->group_exit_task);
> + sig->group_exit_task = NULL;
> + }
> + if (tsk == sig->curr_target)
> + sig->curr_target = next_thread(tsk);
> + /*
> + * Accumulate here the counters for all threads but the
> + * group leader as they die, so they can be added into
> + * the process-wide totals when those are taken.
> + * The group leader stays around as a zombie as long
> + * as there are other threads. When it gets reaped,
> + * the exit.c code will add its counts into these totals.
> + * We won't ever get here for the group leader, since it
> + * will have been the last reference on the signal_struct.
> + */
> + sig->utime = cputime_add(sig->utime, tsk->utime);
> + sig->stime = cputime_add(sig->stime, tsk->stime);
> + sig->min_flt += tsk->min_flt;
> + sig->maj_flt += tsk->maj_flt;
> + sig->nvcsw += tsk->nvcsw;
> + sig->nivcsw += tsk->nivcsw;
> + sig->sched_time += tsk->sched_time;
> + sig = NULL; /* Marker for below. */
> + }
> +
> + tsk->signal = NULL;
> + cleanup_sighand(tsk);
> + spin_unlock(&sighand->siglock);
> + rcu_read_unlock();
> +
> + clear_tsk_thread_flag(tsk,TIF_SIGPENDING);
> + flush_sigqueue(&tsk->pending);
> + if (sig) {
> + flush_sigqueue(&sig->shared_pending);
> + __cleanup_signal(sig);
> + }
> +}

2006-02-25 19:29:27

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/4] cleanup __exit_signal()

Suzanne Wood wrote:
>
> The extent of the rcu readside critical section is determined
> by the corresponding placements of rcu_read_lock() and
> rcu_read_unlock(). Your recent [PATCH] convert sighand_cache
> to use SLAB_DESTROY_BY_RCU uncovered a comment that elicits
> this request for clarification. (The initial motivation was in
> seeing the introduction of an rcu_assign_pointer() and
> looking for the corresponding rcu_dereference().)
>
> Jul 13 2004 [PATCH] rmaplock 2/6 SLAB_DESTROY_BY_RCU (and
> consistent with slab.c in linux-2.6.16-rc3), struct slab_rcu
> is described:
> * struct slab_rcu
> *
> * slab_destroy on a SLAB_DESTROY_BY_RCU cache uses this structure to
> * arrange for kmem_freepages to be called via RCU. This is useful if
> * we need to approach a kernel structure obliquely, from its address
> * obtained without the usual locking. We can lock the structure to
> * stabilize it and check it's still at the given address, only if we
> * can be sure that the memory has not been meanwhile reused for some
> * other kind of object (which our subsystem's lock might corrupt).
> *
> * rcu_read_lock before reading the address, then rcu_read_unlock after
> * taking the spinlock within the structure expected at that address.
>
> Does this mean that the rcu_read_lock() can safely occur just
> after the spin_lock(&sighand->siglock)? Since I don't find an
> example that follows this interpretation of the comment, what
> is the intention? Or, if so, what is the particular context?
> Looks like all kernel occurrences of rcu_dereference()
> with sighand arguments have, within the function definition,
> rcu_read_lock/unlock() pairs enclosing spin lock and unlock
> pairs except that in group_send_sig_info() with a comment on
> requiring rcu_read_lock or tasklist_lock.

Sorry, I can't understand this question. __exit_signal() does
rcu_read_lock() (btw, this is not strictly necessary here due
to tasklist_lock held, it is more a documentation) _before_ it
takes sighand->siglock.

> An example is attached in your patch to move __exit_signal().
> It appears that the rcu readside critical section is in place to
> provide persistence of the task_struct. __exit_sighand() calls
> sighand_free(sighand) -- proposed to be renamed cleanup_sighand(tsk)
> to call kmem_cache_free(sighand_cachep, sighand) -- before
> spin_unlock(&sighand->siglock) is called in __exit_signal().

This is a very valid question.

Yes, spin_unlock(&sighand->siglock) after kmem_cache_free() means
we are probably writing to the memory which was already reused on
another cpu.

However, SLAB_DESTROY_BY_RCU garantees that this memory was not
released to the system (while we are under rcu_read_lock()), so
this memory is a valid sighand_struct, and it is ok to release
sighand->siglock. That is why we initialize ->siglock (unlike
->count) in sighand_ctor, but not in copy_sighand().

In other words, after kmem_cache_free() we own nothing in sighand,
except this ->siglock.

So we are safe even if another cpu tries to lock this sighand
right now (currently this is not posiible, copy_process or
de_thread should first take tasklist_lock), it will be blocked
until we release it.

This patch was done when __exit_signal had 2 sighand_free() calls.
Now we can change this:

void cleanup_sighand(struct sighand_struct *sighand)
{
if (atomic_dec_and_test(&sighand->count))
kmem_cache_free(sighand_cachep, sighand);
}

void __exit_signal(tsk)
{
...

tsk->signal = NULL;
tsk->sighand = NULL; // we must do it before unlocking ->siglock
spin_unlock(&sighand->siglock);
rcu_read_unlock();

cleanup_sighand(sighand)
...
}

Oleg.