2004-01-20 06:00:52

by Rusty Russell (IBM)

[permalink] [raw]
Subject: CPU Hotplug: Hotplug Script And SIGPWR

In message <[email protected]> you write:
> Would it make sense if we defer invoking hotplug script _after_
> the CPU is completely dead (i.e after issuing the CPU_DEAD
> notification)?

The original code wanted to block until the hotplug script
acknowledged the removal before completing it. Greg KH says hotplug
doesn't work this way, so now it could well be delivered after
everything is over. If it's simpler, we can just do it after.

The other issue I wanted to revisit: we currently send SIGPWR to all
processes which we have to undo the CPU affinity for (with a new
si_info field containing the cpu going down).

The main problem is that a process can call sched_setaffinity on
another (unrelated) task, which might not know about it. One option
would be to only deliver the signal if it's not SIG_DFL for that
process. Another would be not to signal, and expect hotplug scripts
to clean up.

Thoughts?
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.


2004-01-20 06:33:40

by Tim Hockin

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR

On Tue, Jan 20, 2004 at 04:44:45PM +1100, Rusty Russell wrote:
> The other issue I wanted to revisit: we currently send SIGPWR to all
> processes which we have to undo the CPU affinity for (with a new
> si_info field containing the cpu going down).
>
> The main problem is that a process can call sched_setaffinity on
> another (unrelated) task, which might not know about it. One option
> would be to only deliver the signal if it's not SIG_DFL for that
> process. Another would be not to signal, and expect hotplug scripts
> to clean up.

I had to deal with this in my procstate patch (was against RH 2.4 with O(1)
sched but not 2.6). What I chose to do (and what the people who were
wanting the code wanted) was to move tasks which had no CPU to run upon onto
an unrunnable list. Whenever a CPU's state is changed, scan the list.
Whenevr a task's affinity mask is changed, check if it needs to go onto or
come off of the unrunnable_list.

I added a new TASK_UNRUNNABLE state for these tasks, too. By adding the
task's current (or most recent) CPU and the task's cpus_allowed and
cpus_allowed_mask to /proc/pid/status, we gave simple tools for finding
these unrunnable tasks.

I think the sanest thing for a CPU removal is to migrate everything off the
processor in question, move unrunnable tasks into TASK_UNRUNNABLE state,
then notify /sbin/hotplug. The hotplug script can then find and handle the
unrunnable tasks. No SIGPWR grossness needed.

Code against 2.4 at http://www.hockin.org/~thockin/procstate - it was
heavily tested and I *think* it is all correct (for that kernel snapshot).

Tim

2004-01-20 06:44:38

by Nick Piggin

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR



Tim Hockin wrote:

>On Tue, Jan 20, 2004 at 04:44:45PM +1100, Rusty Russell wrote:
>
>>The other issue I wanted to revisit: we currently send SIGPWR to all
>>processes which we have to undo the CPU affinity for (with a new
>>si_info field containing the cpu going down).
>>
>>The main problem is that a process can call sched_setaffinity on
>>another (unrelated) task, which might not know about it. One option
>>would be to only deliver the signal if it's not SIG_DFL for that
>>process. Another would be not to signal, and expect hotplug scripts
>>to clean up.
>>
>
>I had to deal with this in my procstate patch (was against RH 2.4 with O(1)
>sched but not 2.6). What I chose to do (and what the people who were
>wanting the code wanted) was to move tasks which had no CPU to run upon onto
>an unrunnable list. Whenever a CPU's state is changed, scan the list.
>Whenevr a task's affinity mask is changed, check if it needs to go onto or
>come off of the unrunnable_list.
>
>I added a new TASK_UNRUNNABLE state for these tasks, too. By adding the
>task's current (or most recent) CPU and the task's cpus_allowed and
>cpus_allowed_mask to /proc/pid/status, we gave simple tools for finding
>these unrunnable tasks.
>
>I think the sanest thing for a CPU removal is to migrate everything off the
>processor in question, move unrunnable tasks into TASK_UNRUNNABLE state,
>then notify /sbin/hotplug. The hotplug script can then find and handle the
>unrunnable tasks. No SIGPWR grossness needed.
>
>Code against 2.4 at http://www.hockin.org/~thockin/procstate - it was
>heavily tested and I *think* it is all correct (for that kernel snapshot).
>


Seems less robust and more ad hoc than SIGPWR, however.


2004-01-20 06:52:20

by Tim Hockin

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR

On Tue, Jan 20, 2004 at 05:43:59PM +1100, Nick Piggin wrote:
> >I think the sanest thing for a CPU removal is to migrate everything off the
> >processor in question, move unrunnable tasks into TASK_UNRUNNABLE state,
> >then notify /sbin/hotplug. The hotplug script can then find and handle the
> >unrunnable tasks. No SIGPWR grossness needed.
> >
> >Code against 2.4 at http://www.hockin.org/~thockin/procstate - it was
> >heavily tested and I *think* it is all correct (for that kernel snapshot).
>
> Seems less robust and more ad hoc than SIGPWR, however.

Disagree. SIGPWR will kill any process that doesn't catch it. That's
policy. It seems more robust to let the hotplug script decide what to do.
If it wants to kill each unrunnable task with SIGPWR, it can. But if it
wants to let them live, it can.

Tim

2004-01-20 07:30:42

by Tim Hockin

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR

On Tue, Jan 20, 2004 at 06:11:49PM +1100, Nick Piggin wrote:
> I thought hotplug is allowed to fail? Thus you can have a hung system.
> Or what if the hotplug script itself becomes TASK_UNRUNNABLE? What if the
> process needs a guaranteed scheduling latency?

I guess a hotplug script MAY fail. I don't think it's a good idea to make
your CPU hotplug script fail. May and Misght are different. It's up to the
implementor whether the script can get into a failure condition.

The hotplug script can only become unrunnable if you yank out all the CPUs
on the system. I'd assume it would have an affinity of 0xffffffff.

What if <which> process needs guaranteed scheduling latency? Do we really
_guarantee_ scheduling latency *anywhere*?

2004-01-20 07:23:22

by Nick Piggin

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR



Tim Hockin wrote:

>On Tue, Jan 20, 2004 at 05:43:59PM +1100, Nick Piggin wrote:
>
>>>I think the sanest thing for a CPU removal is to migrate everything off the
>>>processor in question, move unrunnable tasks into TASK_UNRUNNABLE state,
>>>then notify /sbin/hotplug. The hotplug script can then find and handle the
>>>unrunnable tasks. No SIGPWR grossness needed.
>>>
>>>Code against 2.4 at http://www.hockin.org/~thockin/procstate - it was
>>>heavily tested and I *think* it is all correct (for that kernel snapshot).
>>>
>>Seems less robust and more ad hoc than SIGPWR, however.
>>
>
>Disagree. SIGPWR will kill any process that doesn't catch it. That's
>policy. It seems more robust to let the hotplug script decide what to do.
>If it wants to kill each unrunnable task with SIGPWR, it can. But if it
>wants to let them live, it can.
>

I thought hotplug is allowed to fail? Thus you can have a hung system.
Or what if the hotplug script itself becomes TASK_UNRUNNABLE? What if the
process needs a guaranteed scheduling latency?

(I dropped [email protected] because its moderated)


2004-01-20 07:45:43

by Nick Piggin

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR



Tim Hockin wrote:

>On Tue, Jan 20, 2004 at 06:11:49PM +1100, Nick Piggin wrote:
>
>>I thought hotplug is allowed to fail? Thus you can have a hung system.
>>Or what if the hotplug script itself becomes TASK_UNRUNNABLE? What if the
>>process needs a guaranteed scheduling latency?
>>
>
>I guess a hotplug script MAY fail. I don't think it's a good idea to make
>your CPU hotplug script fail. May and Misght are different. It's up to the
>implementor whether the script can get into a failure condition.
>

Sorry bad wording. The script may fail to be executed.

>
>The hotplug script can only become unrunnable if you yank out all the CPUs
>on the system. I'd assume it would have an affinity of 0xffffffff.
>

OK I guess thats not such a valid concern

>
>What if <which> process needs guaranteed scheduling latency? Do we really
>_guarantee_ scheduling latency *anywhere*?
>
>

We do guarantee that a realtime task won't be blocked waiting for
a hotplug script to fault in and start it up again (which may not
happen). Not sure how important this issue is.


2004-01-20 07:54:24

by Tim Hockin

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR

On Tue, Jan 20, 2004 at 06:45:37PM +1100, Nick Piggin wrote:
> >I guess a hotplug script MAY fail. I don't think it's a good idea to make
> >your CPU hotplug script fail. May and Misght are different. It's up to
> >the
> >implementor whether the script can get into a failure condition.
> >
>
> Sorry bad wording. The script may fail to be executed.

Under what conditions? Not arbitrary entropy, surely. If a hotplug script
is present and does not blow up, it should be safe to assume it will be run
upon an event being delivered. If not, we have a WAY bigger problem :)

> >What if <which> process needs guaranteed scheduling latency? Do we really
> >_guarantee_ scheduling latency *anywhere*?
>
> We do guarantee that a realtime task won't be blocked waiting for
> a hotplug script to fault in and start it up again (which may not
> happen). Not sure how important this issue is.

We have a conflict of priority here. If an RT task is affined to CPU A and
CPU A gets yanked out, what do we do?

Obviously the RT task can't keep running as it was. It was affined to A.
Maybe for a good reason. I see we have a few choices here:

* re-affine it automatically, thereby silently undoing the explicit
affinity.
* violate it's RT scheduling by not running it until it has been re-affined
or CPU A returns to the pool/

Sending it a SIGPWR means you have to run it on a different CPU that it was
affined to, which is already a violation.

Basically, RT tasks + CPU affinity + hotplug CPUs do not play nicely
together. I don't see much that can be done to solve that. With the
procstate stuff I did, and with planned CPU unplugs we *do* have time before
the CPU really goes offline in which to act. With unplanned CPU offlining,
we don't.

2004-01-20 08:16:05

by Nick Piggin

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR



Tim Hockin wrote:

>On Tue, Jan 20, 2004 at 06:45:37PM +1100, Nick Piggin wrote:
>
>>>I guess a hotplug script MAY fail. I don't think it's a good idea to make
>>>your CPU hotplug script fail. May and Misght are different. It's up to
>>>the
>>>implementor whether the script can get into a failure condition.
>>>
>>>
>>Sorry bad wording. The script may fail to be executed.
>>
>
>Under what conditions? Not arbitrary entropy, surely. If a hotplug script
>is present and does not blow up, it should be safe to assume it will be run
>upon an event being delivered. If not, we have a WAY bigger problem :)
>

That assumption is not safe. The main problems are of course process limits
and memory allocation failure.

>
>>>What if <which> process needs guaranteed scheduling latency? Do we really
>>>_guarantee_ scheduling latency *anywhere*?
>>>
>>We do guarantee that a realtime task won't be blocked waiting for
>>a hotplug script to fault in and start it up again (which may not
>>happen). Not sure how important this issue is.
>>
>
>We have a conflict of priority here. If an RT task is affined to CPU A and
>CPU A gets yanked out, what do we do?
>
>Obviously the RT task can't keep running as it was. It was affined to A.
>Maybe for a good reason. I see we have a few choices here:
>
>* re-affine it automatically, thereby silently undoing the explicit
> affinity.
>* violate it's RT scheduling by not running it until it has been re-affined
> or CPU A returns to the pool/
>
>Sending it a SIGPWR means you have to run it on a different CPU that it was
>affined to, which is already a violation.
>

At least the task has the option to handle the problem.


2004-01-20 08:22:41

by Rusty Russell

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR

In message <[email protected]> you write:
> I added a new TASK_UNRUNNABLE state for these tasks, too. By adding the
> task's current (or most recent) CPU and the task's cpus_allowed and
> cpus_allowed_mask to /proc/pid/status, we gave simple tools for finding
> these unrunnable tasks.
>
> I think the sanest thing for a CPU removal is to migrate everything off the
> processor in question, move unrunnable tasks into TASK_UNRUNNABLE state,
> then notify /sbin/hotplug. The hotplug script can then find and handle the
> unrunnable tasks. No SIGPWR grossness needed.

Interesting.

The downside is that you now need some script needs to know what to do
with the tasks (unless you have something like DBUS, but that's a ways
off). There are no correctness concerns AFAICT with userspace not
being on a particular CPU, just performance.

The SIGPWR solution lets a random process deal appropriately without
having to interface with /sbin/hotplug, if it wants to. And it's a
lot less invasive.

I'll take a look though.
Thanks,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2004-01-20 08:29:53

by Tim Hockin

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR

On Tue, Jan 20, 2004 at 07:14:12PM +1100, Nick Piggin wrote:
> >Under what conditions? Not arbitrary entropy, surely. If a hotplug script
> >is present and does not blow up, it should be safe to assume it will be run
> >upon an event being delivered. If not, we have a WAY bigger problem :)
> >
>
> That assumption is not safe. The main problems are of course process limits
> and memory allocation failure.

If root has a process limit that make hotplug scripts fail to run, then
we're hosed in a lot of ways. And if we fail to allocate memory, there
really ought to be some retry or something. It seems to me that a failure
to run a hotplug script is a BAD THING.

> >Sending it a SIGPWR means you have to run it on a different CPU that it was
> >affined to, which is already a violation.
>
> At least the task has the option to handle the problem.

But it is a violation of the affinity. As the kernel we CAN NOT know what
the affinity really means. Maybe there is some way for a task to indicate
it would like to receive SIGPWR in that case. Or some other signal. Can we
invent new signals?

That way a task that KNOWS about the CPU disappearing underneath it can be
wise, while everything else will not just get killed.

2004-01-20 08:44:14

by Tim Hockin

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR

On Tue, Jan 20, 2004 at 07:37:48PM +1100, Nick Piggin wrote:
> (or OOM killed being another that comes to mind)
>
> It is sometimes inevitable. With that knowledge we should be designing
> for graceful failure.

Don't get me started on OOM killer. If the OOM killer is killing hotplug
scripts, there's another problem. What's the chance of hotplug scripts
being the memory hog? :)

That said, I understand what you're saying. It's rough.

> >But it is a violation of the affinity. As the kernel we CAN NOT know what
> >the affinity really means.
>
> Not if the application is designed to handle it. How would hotplug
> scripts make this any different, anyway?

IFF the app is designed to handle it. The existence of a SIGPWR handler
does not necessarily imply that, though. a SIGCPU or something might
correlate 1:1 with this, but SIGPWR doesn't.

Solving it from hotplug scripts means the task's affinity is not
automatically violated. It means the decision to violate the affinity was
made in user-space, probably by the admin, who CAN know what the affinity
means.

> Rusty thought you just wouldn't send it unless the process was handling
> it.

I remembered that after I sent it, sorry. :)

2004-01-20 08:39:59

by Nick Piggin

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR



Tim Hockin wrote:

>On Tue, Jan 20, 2004 at 07:14:12PM +1100, Nick Piggin wrote:
>
>>>Under what conditions? Not arbitrary entropy, surely. If a hotplug script
>>>is present and does not blow up, it should be safe to assume it will be run
>>>upon an event being delivered. If not, we have a WAY bigger problem :)
>>>
>>>
>>That assumption is not safe. The main problems are of course process limits
>>and memory allocation failure.
>>
>
>If root has a process limit that make hotplug scripts fail to run, then
>we're hosed in a lot of ways. And if we fail to allocate memory, there
>really ought to be some retry or something. It seems to me that a failure
>to run a hotplug script is a BAD THING.
>

(or OOM killed being another that comes to mind)

It is sometimes inevitable. With that knowledge we should be designing
for graceful failure.

>
>>>Sending it a SIGPWR means you have to run it on a different CPU that it was
>>>affined to, which is already a violation.
>>>
>>At least the task has the option to handle the problem.
>>
>
>But it is a violation of the affinity. As the kernel we CAN NOT know what
>the affinity really means.
>

Not if the application is designed to handle it. How would hotplug
scripts make this any different, anyway?

> Maybe there is some way for a task to indicate
>it would like to receive SIGPWR in that case. Or some other signal. Can we
>invent new signals?
>
>That way a task that KNOWS about the CPU disappearing underneath it can be
>wise, while everything else will not just get killed.
>

Rusty thought you just wouldn't send it unless the process was handling
it.


2004-01-20 08:37:34

by Stefan Smietanowski

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR

Hi.

>> We have a conflict of priority here. If an RT task is affined to CPU
>> A and
>> CPU A gets yanked out, what do we do?
>>
>> Obviously the RT task can't keep running as it was. It was affined to A.
>> Maybe for a good reason. I see we have a few choices here:
>>
>> * re-affine it automatically, thereby silently undoing the explicit
>> affinity.
>> * violate it's RT scheduling by not running it until it has been
>> re-affined
>> or CPU A returns to the pool/
>>
>> Sending it a SIGPWR means you have to run it on a different CPU that
>> it was
>> affined to, which is already a violation.
>>
>
> At least the task has the option to handle the problem.

Why not make a flag that handles that choice explicitly.

If the task sets the affinity itself the default is to
re-affine it if the cpu gets yanked but if the task wants to
be suspended until the CPU reappears it can set a flag for
that to happen if the CPU is yanked.

If we have a program that can start another program on a
specific CPU then that program can dictate how the task
should respond by setting the flag the same way
as the task would if the task would be the one selecting
a specific CPU. Doesn't that fix the problem?

If the default was to re-affine to another CPU then
we can optionally send it a SIGPWR as well to let it
know it was re-affined.

But the SIGPWR is in my eyes optional and the above scenario
should handle the cases imo.

// Stefan

2004-01-20 08:37:21

by Tim Hockin

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR

On Tue, Jan 20, 2004 at 06:45:41PM +1100, Rusty Russell wrote:
> In message <[email protected]> you write:
> > I added a new TASK_UNRUNNABLE state for these tasks, too. By adding the
> > task's current (or most recent) CPU and the task's cpus_allowed and
> > cpus_allowed_mask to /proc/pid/status, we gave simple tools for finding
> > these unrunnable tasks.
> >
> > I think the sanest thing for a CPU removal is to migrate everything off the
> > processor in question, move unrunnable tasks into TASK_UNRUNNABLE state,
> > then notify /sbin/hotplug. The hotplug script can then find and handle the
> > unrunnable tasks. No SIGPWR grossness needed.
>
> Interesting.
>
> The downside is that you now need some script needs to know what to do
> with the tasks (unless you have something like DBUS, but that's a ways

Well, if we provide a sane example script, the rest is up to the distros or
the people with this hardware to decide.

> off). There are no correctness concerns AFAICT with userspace not
> being on a particular CPU, just performance.

Correctness does matter if an affined task violates that affinity. If we
are going to provide explicit affinity, we need to honor it under all
conditions, or at least provide an option to honor it.

> The SIGPWR solution lets a random process deal appropriately without
> having to interface with /sbin/hotplug, if it wants to. And it's a
> lot less invasive.

I agree about invasiveness. Maybe a combo? Send SIGPWR iff a task is
actually handling it, otherwise mark it TASK_UNRUNNABLE and let hotplug
handle it? A new signal would be much more polite, but SIGPWR can be made
to work. What if a process catches SIGPWR, but does not handle CPU removal?
Do we wait for it's signal handler to finish before re-evaluating it for
TASK_UNRUNNABLE? Yuck. If a CPU gets yanked with no warning, where do we
run the signal handler? Violating affinity again.

Tim

2004-01-20 08:50:41

by Nick Piggin

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR



Stefan Smietanowski wrote:

> Hi.
>
>>> We have a conflict of priority here. If an RT task is affined to
>>> CPU A and
>>> CPU A gets yanked out, what do we do?
>>>
>>> Obviously the RT task can't keep running as it was. It was affined
>>> to A.
>>> Maybe for a good reason. I see we have a few choices here:
>>>
>>> * re-affine it automatically, thereby silently undoing the explicit
>>> affinity.
>>> * violate it's RT scheduling by not running it until it has been
>>> re-affined
>>> or CPU A returns to the pool/
>>>
>>> Sending it a SIGPWR means you have to run it on a different CPU that
>>> it was
>>> affined to, which is already a violation.
>>>
>>
>> At least the task has the option to handle the problem.
>
>
> Why not make a flag that handles that choice explicitly.
>
> If the task sets the affinity itself the default is to
> re-affine it if the cpu gets yanked but if the task wants to
> be suspended until the CPU reappears it can set a flag for
> that to happen if the CPU is yanked.
>
> If we have a program that can start another program on a
> specific CPU then that program can dictate how the task
> should respond by setting the flag the same way
> as the task would if the task would be the one selecting
> a specific CPU. Doesn't that fix the problem?


Well I'll admit it would usually be more flexible if you freeze
the process and run hotplug scripts to handle cpu affinity.

Unfortunately it introduces unfixable robustness and realtime
problems by design.


2004-01-20 09:12:41

by Tim Hockin

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR

On Tue, Jan 20, 2004 at 07:49:45PM +1100, Nick Piggin wrote:
> Well I'll admit it would usually be more flexible if you freeze
> the process and run hotplug scripts to handle cpu affinity.
>
> Unfortunately it introduces unfixable robustness and realtime
> problems by design.

And I submit that there is no clean way to handle the RT problem. The
proposed flag gives the task a choice, which is good, but I am not sure that
the choice is worth the effort.

The robustness issues are real, but the same issue applies to all hotplug
activity. The issues are severe corner cases which indicate OTHER faults in
the system.

My main concern is that affinity is not treated as a suggestion or
preference. Affinity is an explicit request. Once granted, we can not
arbitrarily decide to revoke affinity unless we have a sane way to alert
*someone*.

Freezing tasks and sending a hotplug event is a sane way.

Sending SIGPWR is a sane way IFF you can guarantee that a task which
receives SIGPWR will handle a CPU being yanked without violating affinity.
This does not handle the case of tasks which do not handle SIGPWR.

A flag to indicate 'my affinity is a preference' vs. 'my affinity is a
requirement' is a possibly sane way. It still requires all the code to
freeze a task, and forces affinity-aware apps to adapt to this new edge
case.

2004-01-20 09:26:49

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR

On Tue, Jan 20, 2004 at 12:37:01AM -0800, Tim Hockin wrote:
> If a CPU gets yanked with no warning, where do we
> run the signal handler? Violating affinity again.

With the current CPU Hotplug design, I don't think this is allowed.
A CPU has to be offlined first in software before it is yanked out
from hardware.

--


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2004-01-20 17:50:15

by Andy Lutomirski

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR

Tim Hockin wrote:

> On Tue, Jan 20, 2004 at 05:43:59PM +1100, Nick Piggin wrote:
>
>>>I think the sanest thing for a CPU removal is to migrate everything off the
>>>processor in question, move unrunnable tasks into TASK_UNRUNNABLE state,
>>>then notify /sbin/hotplug. The hotplug script can then find and handle the
>>>unrunnable tasks. No SIGPWR grossness needed.
>>>
>>
>>Seems less robust and more ad hoc than SIGPWR, however.
>
>
> Disagree. SIGPWR will kill any process that doesn't catch it. That's
> policy. It seems more robust to let the hotplug script decide what to do.
> If it wants to kill each unrunnable task with SIGPWR, it can. But if it
> wants to let them live, it can.

This seems like a problem that a lot of power-management issues have.
(At some point, linux may want to suspend itself after inactivity. Both
RT tasks and some interactive tasks may want to supress that.) Why not
add a SIGPM signal, which is only sent if handles, and which indicates
that PM event is happening. Give usermode some method of responding to
it (e.g. handler returns a value, or a new syscall), and let
/sbin/hotplug handle events for tasks that either ignore the signal or
responded that they were uninterested. This seems be close to optimal
for every case I can think of.

--Andy

2004-01-21 04:01:55

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR

On Tue, Jan 20, 2004 at 12:43:52AM -0800, Tim Hockin wrote:
> IFF the app is designed to handle it. The existence of a SIGPWR handler
> does not necessarily imply that, though. a SIGCPU or something might
> correlate 1:1 with this, but SIGPWR doesn't.

I agree we should have a separe signal for CPU Hotplug. By default the signal
will be ignored, unless a task registers a signal handler for that special
signal.

That way, tasks which "knowingly" change their CPU affinity will be able to
tackle a CPU going down by handling the signal (probably change their CPU
affinity again), while tasks which have their CPU affinity changed "unknowingly"
(by other tasks) will just ignore the signal. The hotplug script interface
allows the admin to go and change the CPU affinity again for the second class
of tasks, if needed.

The only problem with a new signal is conformance to standards (if any).

--


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2004-01-21 04:19:28

by Nick Piggin

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR



Srivatsa Vaddagiri wrote:

>On Tue, Jan 20, 2004 at 12:43:52AM -0800, Tim Hockin wrote:
>
>>IFF the app is designed to handle it. The existence of a SIGPWR handler
>>does not necessarily imply that, though. a SIGCPU or something might
>>correlate 1:1 with this, but SIGPWR doesn't.
>>
>
>I agree we should have a separe signal for CPU Hotplug. By default the signal
>will be ignored, unless a task registers a signal handler for that special
>signal.
>

I'd be happy with that.

>
>That way, tasks which "knowingly" change their CPU affinity will be able to
>tackle a CPU going down by handling the signal (probably change their CPU
>affinity again), while tasks which have their CPU affinity changed "unknowingly"
>(by other tasks) will just ignore the signal. The hotplug script interface
>allows the admin to go and change the CPU affinity again for the second class
>of tasks, if needed.
>

Yes, that is with the cpu-is-down hotplug event, right?

*Before* that happens, tasks that don't handle the signal should just
have their affinity changed to all cpus.

Or doesn't anybody care to think about hoplug scripts failing?
(serious question)


2004-01-21 04:36:00

by Rusty Russell

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR

In message <[email protected]> you write:
> On Tue, Jan 20, 2004 at 05:43:59PM +1100, Nick Piggin wrote:
> > Seems less robust and more ad hoc than SIGPWR, however.
>
> Disagree. SIGPWR will kill any process that doesn't catch it. That's
> policy. It seems more robust to let the hotplug script decide what to do.
> If it wants to kill each unrunnable task with SIGPWR, it can. But if it
> wants to let them live, it can.

The proposal was to send SIGPWR only if they don't have it set to the
default, for this reason.

I think that if your patch goes in, it will complement this solution
nicely.

Thanks!
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2004-01-21 04:38:47

by Rusty Russell

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR

In message <[email protected]> you write:
> Basically, RT tasks + CPU affinity + hotplug CPUs do not play nicely
> together. I don't see much that can be done to solve that. With the
> procstate stuff I did, and with planned CPU unplugs we *do* have time before
> the CPU really goes offline in which to act. With unplanned CPU offlining,
> we don't.

This can't be done with the hotplug scripts. I originally ran hotplug
synchronous before taking the CPU offline, and Greg KH said that
constitutes abuse 8(

Userspace can agree on a protocol *before* initiating the offline, of
course, in which case it's not a kernel problem.

You make an excellent point though: if you need 2 cpus on your system
to meet requirements, and you go down to one cpu, you have a problem.
But I think that's a "don't do that".

Thanks,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2004-01-21 04:38:47

by Rusty Russell

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR

In message <[email protected]> you write:
> > off). There are no correctness concerns AFAICT with userspace not
> > being on a particular CPU, just performance.
>
> Correctness does matter if an affined task violates that affinity. If we
> are going to provide explicit affinity, we need to honor it under all
> conditions, or at least provide an option to honor it.

WHY? Think of an example where this is actually a problem.

"Under all conditions" is not something we can ever implement for
anything.

> I agree about invasiveness. Maybe a combo? Send SIGPWR iff a task is
> actually handling it, otherwise mark it TASK_UNRUNNABLE and let hotplug
> handle it?

Well, I think that violating affinity given that (1) affinity in
userspace is only a performance issue, and (2) we've been explicitly
told to take the CPU down, is a valid solution.

OTOH making tasks unrunnable until hotplug gets around to servicing
them could equally be a disaster. Given that this requires
infrastructure not in Linus' tree and the "simply unbind" solution
doesn't, I'm leaning towards unbinding everything which would become
unrunnable, SIGPWR if they handle it, and hotplug at the end.

Thanks,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2004-01-21 04:38:42

by Rusty Russell

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR

In message <[email protected]> you write:
> (At some point, linux may want to suspend itself after inactivity. Both
> RT tasks and some interactive tasks may want to supress that.) Why not
> add a SIGPM signal, which is only sent if handles, and which indicates
> that PM event is happening. Give usermode some method of responding to
> it (e.g. handler returns a value, or a new syscall), and let
> /sbin/hotplug handle events for tasks that either ignore the signal or
> responded that they were uninterested. This seems be close to optimal
> for every case I can think of.

This was my original idea too. AIX has this, but in reality the
control ends up all in userspace for non-trivial uses. ie. some
"workload manager" program consults with all the interested parties
*before* telling the kernel what to do.

The async and non-consultive nature of hotplug is policy for good
reason. Giving someone 30 seconds to respond to a signal can always
fail, and making it configurable is just a bandaid.

I have nothing against SIGRECONFIG (think memory hotplug), but the AIX
guys indicated from their experience it seems that non-toy users don't
use it anyway (they have a hotplug-style script system, too).

So: trying to cover every corner case isn't worthwhile in practice, it
seems. I like the signal for RC5 challenge etc, but that's about it.

Cheers,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2004-01-21 05:04:54

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR

On Wed, Jan 21, 2004 at 03:14:03PM +1100, Nick Piggin wrote:
> Yes, that is with the cpu-is-down hotplug event, right?

right.


> *Before* that happens, tasks that don't handle the signal should just
> have their affinity changed to all cpus.

Currently, handle or not handle the signal, affinity is changed
to all cpus for tasks that are bound only to the dying CPU.

--


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

2004-01-21 05:09:18

by Rusty Russell (IBM)

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR

In message <[email protected]> you write:
> On Tue, Jan 20, 2004 at 12:43:52AM -0800, Tim Hockin wrote:
> > IFF the app is designed to handle it. The existence of a SIGPWR handler
> > does not necessarily imply that, though. a SIGCPU or something might
> > correlate 1:1 with this, but SIGPWR doesn't.
>
> I agree we should have a separe signal for CPU Hotplug.

Can we add signals without breaking userspace?

If we can, SIGRECONFIG makes sense. If not, I'd rather not have a
signal, rely on hotplug, and look at addding a signal in 2.7.

Cheers,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2004-01-21 07:08:58

by Tim Hockin

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR

On Wed, Jan 21, 2004 at 10:39:33AM +0530, Srivatsa Vaddagiri wrote:
> > *Before* that happens, tasks that don't handle the signal should just
> > have their affinity changed to all cpus.
>
> Currently, handle or not handle the signal, affinity is changed
> to all cpus for tasks that are bound only to the dying CPU.

OK, so lets assume this scenarion:

process A affined to cpu1
all other processes affined to 0xffffffff
cpu1 goes down
- process A affined to 0xffffffff
hotplug "cpu1 removed" event
cpu1 comes back
hotplug "cpu1 inserted" event

Process A has now discarded useful potentially VALUABLE information, with no
way to retrieve it. The hot plug scripts do not have enough information to
put things the way they were before. I can't believe that anyone considers
this to be OK.

Userspace gave us EXPLICIT instructions, which we then violate. By granting
affinity, we have made a contract with userspace. Changing affinity without
userspace's direct instruction is wrong.

What about this:

We already can not handle unexpected CPU removals gracefully, correct? So
we expect some user-provided notification, right?

So force userland to handle it before we give the OK to remove a CPU.

pid_t sys_proc_offline(int cpu)
{
pid_t p;

/* flag cpu as not schedulale anymore */
dont_add_tasks_to(cpu);

p = find_first_unrunnable(cpu);
if (p)
return p;

take_proc_offline(cpu);
return 0;
}

The userspace control can then loop on this until it returns 0. Each time
it return a pid, userspace must try to handle that pid - kill it,
re-affine it, or provide some way to suspend it.

Simpler yet:

int sys_proc_offline(int cpu, int reaffine)
{
pid_t p;

/* flag cpu as not schedulale anymore */
dont_add_tasks_to(cpu);

while ((p = find_first_unrunnable(cpu))) {
if (reaffine)
reaffine(p);
else
make_unrunnable(p);
}

take_proc_offline(cpu);
return 0;
}

Less flexible, but workable. I prefer the first. Yes it's racy, but the
worst case is that you receive a pid that you don't need to handle (died or
re-affined already).

Anything that violates affinity without permission just is so WRONG.

2004-01-21 07:09:49

by Tim Hockin

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR

On Wed, Jan 21, 2004 at 03:14:03PM +1100, Nick Piggin wrote:
> Or doesn't anybody care to think about hoplug scripts failing?
> (serious question)

If hotplug scripts are failing, you're in really deep trouble. I can't find
a single case where a hotplug script failing would not indicate some other
larger failure.

2004-01-21 07:36:20

by Nick Piggin

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR



Tim Hockin wrote:

>On Wed, Jan 21, 2004 at 03:14:03PM +1100, Nick Piggin wrote:
>
>>Or doesn't anybody care to think about hoplug scripts failing?
>>(serious question)
>>
>
>If hotplug scripts are failing, you're in really deep trouble. I can't find
>a single case where a hotplug script failing would not indicate some other
>larger failure.
>

sigh. threads-max, pid_max, ulimit, -ENOMEM, oom.

In my opinion, you can be in fine shape after one of the above happening,
and if limits _are_ in place, its reasonable to expect they're there because
they might get reached in rare cases.

I'd rather not add something that, by design can hang any number of
processes
including the entire system if a hotplug script fails. Thats just my honest
opinion, I know its rare enough it probably would never happen to anyone.

Sorry I keep repeating this, its not my call and its never going to affect
me so I'll shut up now ;)


2004-01-21 07:42:47

by Tim Hockin

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR

On Wed, Jan 21, 2004 at 06:31:06PM +1100, Nick Piggin wrote:
> >If hotplug scripts are failing, you're in really deep trouble. I can't
> >find
> >a single case where a hotplug script failing would not indicate some other
> >larger failure.
> >
>
> sigh. threads-max, pid_max, ulimit, -ENOMEM, oom.

These affect ALL hotplug scripts. If you can't run a hotplug script because
you've exceeded root's ulimit, or the max # of tasks/threads in the system,
you're in trouble - regardless of what the hotplug event was - SOMETHING is
going to go wrong.

If you get ENOMEM you have a bigger problem.

If you get OOM killed, then the OOM killer has gone haywire (not uncommon,
historically).

> I'd rather not add something that, by design can hang any number of
> processes
> including the entire system if a hotplug script fails. Thats just my honest
> opinion, I know its rare enough it probably would never happen to anyone.
>
> Sorry I keep repeating this, its not my call and its never going to affect
> me so I'll shut up now ;)

I'd rather not add anything like that either. I'm not saying I advocate
fast-and-loose at all. On the contrary, I think any action taken in
response to a CPU removal needs to be accountable, and wantonly changing
affinity is NOT.

It'll probably not affect me either, nor is it my decision :)

2004-01-21 08:14:16

by Rusty Russell

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR

In message <[email protected]> you write:
> Or doesn't anybody care to think about hoplug scripts failing?
> (serious question)

It seems not. I don't neccessarily agree with it, but we'll see how
it goes.

Guarantees are hard: if the script is supposed to fork something and
you're out of memory, what do you do?

Cheers,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

2004-01-21 15:07:53

by Matthias Urlichs

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR

Hi, Tim Hockin wrote:

> We already can not handle unexpected CPU removals gracefully, correct? So
> we expect some user-provided notification, right?
>
Well, if the CPU is executing userland (or idling), we conceivably could.
That would kill off one userspace process (which might be able to recover
given a signal and longjmp(), but such is life. ;-)

> So force userland to handle it before we give the OK to remove a CPU.

I like the idea of an "unrunnable" queue, that way you have the option to
fix the problem afterwards -- or just ignore it, if you decide it's OK for
processes to wait a few minutes while you replace the failing CPU fan.

It's like mount(). Usually you unmount cleanly, but sometimes you use -f
and something becomes inaccesible. At least WRT CPUs, the inaccessibility
is (usually) fixable. (I wish it were so, WRT NFS mounts.)

--
Matthias Urlichs | {M:U} IT Design @ m-u-it.de | [email protected]
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
- -
"Whenever the civil government forbids the practice of things
that God has commanded us to do, or tells us to do things He
has commanded us not to do then we are on solid ground in
disobeying the government and rebelling against it."
[Pat Robertson]

2004-01-22 07:16:26

by Rusty Russell

[permalink] [raw]
Subject: Re: CPU Hotplug: Hotplug Script And SIGPWR

In message <[email protected]> you write:
> Process A has now discarded useful potentially VALUABLE information, with no
> way to retrieve it. The hot plug scripts do not have enough information to
> put things the way they were before. I can't believe that anyone considers
> this to be OK.

We already established that the process which cares has to listed to
hotplug events.

Userland should handle it *before* telling the kernel to remove the
CPU. What we're dealing with here is merely a corner case, IMHO worth
neither hysteria nor a great deal of code.

Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.