2005-04-23 17:35:27

by Dr. David Alan Gilbert

[permalink] [raw]
Subject: Hotplug CPU and setaffinity?

Hi,
I got to wondering how Hotplug CPU and sched_setaffinity interact;
if I have a process that has its affinity set to one CPU and some
nasty person comes along and unplugs it what happens to that process-
does it get scheduled onto another cpu, just not get any time or
die ?

In particular I was thinking of the cases where a thread has a
functional reason for remaining on one particular CPU (e.g. if you
had calibrated for some feature of that CPU say its time stamp
counter skew/speed). Another case would be a set of threads which
had set their affinity to the same CPU and then made memory
consistency or locking assumptions that wouldn't be valid
if they got rescheduled onto different CPUs.

Dave

-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/


2005-04-23 18:22:36

by Nathan Lynch

[permalink] [raw]
Subject: Re: Hotplug CPU and setaffinity?

Dr. David Alan Gilbert wrote:
>
> I got to wondering how Hotplug CPU and sched_setaffinity interact;
> if I have a process that has its affinity set to one CPU and some
> nasty person comes along and unplugs it what happens to that process-
> does it get scheduled onto another cpu, just not get any time or
> die ?

The affinity of the process is reset to the default and it is migrated
to another cpu, for better or worse. The kernel assumes the admin
know what he/she is doing.

>
> In particular I was thinking of the cases where a thread has a
> functional reason for remaining on one particular CPU (e.g. if you
> had calibrated for some feature of that CPU say its time stamp
> counter skew/speed). Another case would be a set of threads which
> had set their affinity to the same CPU and then made memory
> consistency or locking assumptions that wouldn't be valid
> if they got rescheduled onto different CPUs.

Yeah. But I don't think this is an issue to be solved in the kernel.
Applications that are this sensitive to cpu hotplugging need to
arrange to be notified before the hotplug occurs, which I think would
be best done with dbus or some other IPC.


Nathan


2005-04-24 12:35:35

by Dave Gilbert (Home)

[permalink] [raw]
Subject: Re: Hotplug CPU and setaffinity?

* Nathan Lynch ([email protected]) wrote:

Hi Nathan,
Thanks for the reply.

> Dr. David Alan Gilbert wrote:
> >
> > I got to wondering how Hotplug CPU and sched_setaffinity interact;
> > if I have a process that has its affinity set to one CPU and some
> > nasty person comes along and unplugs it what happens to that process-
> > does it get scheduled onto another cpu, just not get any time or
> > die ?
>
> The affinity of the process is reset to the default and it is migrated
> to another cpu, for better or worse. The kernel assumes the admin
> know what he/she is doing.

Yeh that's ok - is there anything that would hotplug a cpu
automatically; say on receiving some MCEs ; and thus not
give the admin a look in.

> > In particular I was thinking of the cases where a thread has a
> > functional reason for remaining on one particular CPU (e.g. if you
> > had calibrated for some feature of that CPU say its time stamp
> > counter skew/speed). Another case would be a set of threads which
> > had set their affinity to the same CPU and then made memory
> > consistency or locking assumptions that wouldn't be valid
> > if they got rescheduled onto different CPUs.
>
> Yeah. But I don't think this is an issue to be solved in the kernel.
> Applications that are this sensitive to cpu hotplugging need to
> arrange to be notified before the hotplug occurs, which I think would
> be best done with dbus or some other IPC.

Agreed; since the kernel will reset it to the default affinity then
this involvement must happen before the hotplug, if the kernel were
to stop scheduling them anywhere then this is something that could
be fixed up externally to the app by a hotplug type of thing after
the hot un-plug happened.

Dave
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/

2005-04-25 16:04:41

by Joel Schopp

[permalink] [raw]
Subject: Re: Hotplug CPU and setaffinity?


>>The affinity of the process is reset to the default and it is migrated
>>to another cpu, for better or worse. The kernel assumes the admin
>>know what he/she is doing.
>
>
> Yeh that's ok - is there anything that would hotplug a cpu
> automatically; say on receiving some MCEs ; and thus not
> give the admin a look in.

On ppc64 we have CPU guard, which would remove a processor if it is
failing. Of course, the implications of not removing such a CPU are
pretty terrible.

>
>
>>>In particular I was thinking of the cases where a thread has a
>>> functional reason for remaining on one particular CPU (e.g. if you
>>>had calibrated for some feature of that CPU say its time stamp
>>>counter skew/speed). Another case would be a set of threads which
>>>had set their affinity to the same CPU and then made memory
>>>consistency or locking assumptions that wouldn't be valid
>>>if they got rescheduled onto different CPUs.

This sounds like a theoretical problem. Can you think of any real
examples? The only cases I can think of cause performance hits, but not
functional problems.


2005-04-25 17:33:03

by Dave Gilbert (Home)

[permalink] [raw]
Subject: Re: Hotplug CPU and setaffinity?

Joel Schopp wrote:

> On ppc64 we have CPU guard, which would remove a processor if it is
> failing. Of course, the implications of not removing such a CPU are
> pretty terrible.

Indeed.

>>>> In particular I was thinking of the cases where a thread has a
>>>> functional reason for remaining on one particular CPU (e.g. if you
>>>> had calibrated for some feature of that CPU say its time stamp
>>>> counter skew/speed). Another case would be a set of threads which
>>>> had set their affinity to the same CPU and then made memory
>>>> consistency or locking assumptions that wouldn't be valid
>>>> if they got rescheduled onto different CPUs.
>
>
> This sounds like a theoretical problem. Can you think of any real
> examples? The only cases I can think of cause performance hits, but not
> functional problems.

Well I'm not aware of anything that currently would break with it; but I
was gently thinking of whether it would be possible to read cycle
counters (as a faster gettimeofday) even on systems which had
unsynchronised counters if you could lock the thread that did it to a
particular physical cpu.
But this behaviour currently makes that a bad idea; in this case it
would be nicer if the kernel either just killed my process or just
unscheduled it or sent it a signal.

Dave