The patch needs an ack, probably I just misunderstood the comments.
Suppose that the main thread blocks the signal. In that case
__group_complete_signal() tries to find another thread starting from
signal->curr_target.
The comment says about load-balancing, but this is not what happens?
Suppose that wants_signal(signal->curr_target) == T. In that case we
always choose the same ->curr_target thread. Isn't it better to try
to "spread" the signals over the thread group?
With this patch we are trying to find another suitable thread starting
from next_thread(signal->curr_target), thus distributing the load over
the whole thread group.
Bad idea? If not, probably we can also remove the "if (wants_signal())"
at the top of __group_complete_signal() ?
Signed-off-by: Oleg Nesterov <[email protected]>
--- 25/kernel/signal.c~4_GCS_BALANCE 2008-03-06 01:07:55.000000000 +0300
+++ 25/kernel/signal.c 2008-03-06 01:34:57.000000000 +0300
@@ -863,13 +863,14 @@ __group_complete_signal(int sig, struct
/*
* Otherwise try to find a suitable thread.
*/
- t = signal->curr_target;
- if (t == NULL)
- /* restart balancing at this thread */
- t = signal->curr_target = p;
+ if (!signal->curr_target)
+ signal->curr_target = p;
- while (!wants_signal(sig, t)) {
+ for (t = signal->curr_target ;; ) {
t = next_thread(t);
+ if (wants_signal(sig, t))
+ break;
+
if (t == signal->curr_target)
/*
* No thread needs to be woken.
> The comment says about load-balancing, but this is not what happens?
> Suppose that wants_signal(signal->curr_target) == T. In that case we
> always choose the same ->curr_target thread. Isn't it better to try
> to "spread" the signals over the thread group?
The "load balancing" stuff was in the old multithreaded signals code (2.5)
from before I rearranged a lot of code to fix the main parts of the MT
semantics. Maybe it was Ingo who originally put that code in? I moved
everything around it to change the deterministic semantics, but I never
really gave any thought to the "performance feature". Perhaps it did
something different to begin with and bit-rot made it into the algorithm we
have that seems not so optimal .
The current behavior hammers all the unblocked signals onto one thread
until it's scheduled out. For getting the signal delivered as quickly as
can be, it makes some sense to choose running threads (task_curr) over
threads blocked without signals already pending. So perhaps the same
thread that just ran a signal handler (maybe is still setting it up) really
is the preferable choice when it's on the CPU--at least in comparison to
another candidate thread that is not on a CPU. But it's not exactly doing
"load balancing". If several threads are running on CPUs, presumably it's
intended to spread several near-simultaneous signals across those CPUs.
Perhaps Ingo has some thoughts on what the original plan is, or on what
desireable performance choices are now.
> With this patch we are trying to find another suitable thread starting
> from next_thread(signal->curr_target), thus distributing the load over
> the whole thread group.
If we're cleaning up, we can start by getting rid of the NULL check.
There's no reason to have it in this hot path. It should never come
up if we make copy_signal initialize sig->curr_target = tsk.
Then just:
t = p->signal->curr_target;
while (!wants_signal(sig, (t = next_thread(t)))) {
if (t == p->signal->curr_target)
/*
* No thread needs to be woken.
* Any eligible threads will see
* the signal in the queue soon.
*/
return;
}
p->signal->curr_target = t;
> Bad idea? If not, probably we can also remove the "if (wants_signal())"
> at the top of __group_complete_signal() ?
I don't think we should change that special case, or at least not soon.
The initial thread always takes the signal if it's not busy. It's now,
and always has been, deterministic that in a slow steady situation with
no races, no signals blocked anywhere and no other signals sent but one
SIGALRM every hour, the handler always runs in the main thread. Now if
you change this, it will round-robin among the threads in the process.
There aren't supposed to be any guarantees about this, any eligible
thread running the handler is kosher. But like the comment there now
says, "least surprising".
Thanks,
Roland
On 03/06, Roland McGrath wrote:
>
> > The comment says about load-balancing, but this is not what happens?
> > Suppose that wants_signal(signal->curr_target) == T. In that case we
> > always choose the same ->curr_target thread. Isn't it better to try
> > to "spread" the signals over the thread group?
>
> The "load balancing" stuff was in the old multithreaded signals code (2.5)
> from before I rearranged a lot of code to fix the main parts of the MT
> semantics. Maybe it was Ingo who originally put that code in? I moved
> everything around it to change the deterministic semantics, but I never
> really gave any thought to the "performance feature". Perhaps it did
> something different to begin with and bit-rot made it into the algorithm we
> have that seems not so optimal .
>
> The current behavior hammers all the unblocked signals onto one thread
> until it's scheduled out. For getting the signal delivered as quickly as
> can be, it makes some sense to choose running threads (task_curr) over
> threads blocked without signals already pending. So perhaps the same
> thread that just ran a signal handler (maybe is still setting it up) really
> is the preferable choice when it's on the CPU--at least in comparison to
> another candidate thread that is not on a CPU. But it's not exactly doing
> "load balancing". If several threads are running on CPUs, presumably it's
> intended to spread several near-simultaneous signals across those CPUs.
>
> Perhaps Ingo has some thoughts on what the original plan is, or on what
> desireable performance choices are now.
OK, thanks, please ignore this patch (it was more the question anyway).
So. currently the meaning of->curr_target is: remember the last thread
we sent a signal, may help to avoid iterating over the thread group when
the next signal is sent.
> If we're cleaning up, we can start by getting rid of the NULL check.
> There's no reason to have it in this hot path. It should never come
> up if we make copy_signal initialize sig->curr_target = tsk.
Can't understand why I didn't realize this while reading the code.
Looks like a reasonable cleanup regardless.
Oleg.