2003-01-10 12:36:52

by Erich Focht

[permalink] [raw]
Subject: small migration thread fix

Hi,

the small patch fixes a potential problem in the migration thread for
the case that the first CPU in the cpus_allowed mask of a process is
offline. Please consider applying it to your trees.

Thanks!
Regards,
Erich


Attachments:
migration-fix-2.5.55.patch (498.00 B)

2003-01-10 13:02:29

by William Lee Irwin III

[permalink] [raw]
Subject: Re: small migration thread fix

On Fri, Jan 10, 2003 at 01:46:03PM +0100, Erich Focht wrote:
> the small patch fixes a potential problem in the migration thread for
> the case that the first CPU in the cpus_allowed mask of a process is
> offline. Please consider applying it to your trees.

I'm not mingo, but I can say this looks sane. My only question is
whether there are more codepaths that need this kind of check, for
instance, what happens if someone does set_cpus_allowed() to a cpumask
with !(task->cpumask & cpu_online_map) ?


Thanks,
Bill

2003-01-10 14:20:33

by Erich Focht

[permalink] [raw]
Subject: Re: small migration thread fix

On Friday 10 January 2003 14:11, William Lee Irwin III wrote:
> I'm not mingo, but I can say this looks sane. My only question is
> whether there are more codepaths that need this kind of check, for
> instance, what happens if someone does set_cpus_allowed() to a cpumask
> with !(task->cpumask & cpu_online_map) ?

The piece of code below was intended for that. I agree with Rusty's
comment, BUG() is too strong for that case.

#if 0 /* FIXME: Grab cpu_lock, return error on this case. --RR */
new_mask &= cpu_online_map;
if (!new_mask)
BUG();
#endif

Anyhow, changing the new_mask in this way is BAD, because the masks
are inherited. So when more CPUs come online, they remain excluded
from the mask of the process and it's children.

The fix suggested in the comments still has to be done...

Regards,
Erich

2003-01-10 15:03:48

by William Lee Irwin III

[permalink] [raw]
Subject: Re: small migration thread fix

On Friday 10 January 2003 14:11, William Lee Irwin III wrote:
>> I'm not mingo, but I can say this looks sane. My only question is
>> whether there are more codepaths that need this kind of check, for
>> instance, what happens if someone does set_cpus_allowed() to a cpumask
>> with !(task->cpumask & cpu_online_map) ?

On Fri, Jan 10, 2003 at 03:29:33PM +0100, Erich Focht wrote:
> The piece of code below was intended for that. I agree with Rusty's
> comment, BUG() is too strong for that case.
> #if 0 /* FIXME: Grab cpu_lock, return error on this case. --RR */
> new_mask &= cpu_online_map;
> if (!new_mask)
> BUG();
> #endif
> Anyhow, changing the new_mask in this way is BAD, because the masks
> are inherited. So when more CPUs come online, they remain excluded
> from the mask of the process and it's children.
> The fix suggested in the comments still has to be done...

I don't have much to add but another ack and a "hmm, maybe something
could be done". My prior comments stand. I'd be very much obliged if
you provide a fix for the set_cpus_allowed() issue. I very much rely
upon you now to provide scheduler fixes and optimizations for large
scale and/or NUMA machines these days.


Thanks,
Bill