2008-08-08 19:44:19

by Langsdorf, Mark

[permalink] [raw]
Subject: Regression in 2.6.27-rc1 for set_cpus_allowed_ptr

One of my co-workers noticed that the powernow-k8
driver no longer restarts when a CPU core is
hot-disabled and then hot-enabled on AMD quad-core
systems.

The following comands work fine on 2.6.26 and fail
on 2.6.27-rc1:

echo 0 > /sys/devices/system/cpu/cpu3/online
echo 1 > /sys/devices/system/cpu/cpu3/online
find /sys -name cpufreq

For 2.6.26, the find will return a cpufreq
directory for each processor. In 2.6.27-rc1,
the cpu3 directory is missing.

After digging through the code, the following
logic is failing when the core is hot-enabled
at runtime. The code works during the boot
sequence.

cpumask_t = current->cpus_allowed;
set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
if (smp_processor_id() != cpu)
return -ENODEV;

The objective is to move to the specific CPU and
test it's MSRs to see if it is supported. For
some reason, it isn't working. Any suggestions
on how to fix this are appreciated.

-Mark Langdsorf
Operating System Research Center
AMD


2008-08-08 21:00:50

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: Regression in 2.6.27-rc1 for set_cpus_allowed_ptr

[Adding CCs]

On Friday, 8 of August 2008, Langsdorf, Mark wrote:
> One of my co-workers noticed that the powernow-k8
> driver no longer restarts when a CPU core is
> hot-disabled and then hot-enabled on AMD quad-core
> systems.
>
> The following comands work fine on 2.6.26 and fail
> on 2.6.27-rc1:
>
> echo 0 > /sys/devices/system/cpu/cpu3/online
> echo 1 > /sys/devices/system/cpu/cpu3/online
> find /sys -name cpufreq
>
> For 2.6.26, the find will return a cpufreq
> directory for each processor. In 2.6.27-rc1,
> the cpu3 directory is missing.
>
> After digging through the code, the following
> logic is failing when the core is hot-enabled
> at runtime. The code works during the boot
> sequence.
>
> cpumask_t = current->cpus_allowed;
> set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
> if (smp_processor_id() != cpu)
> return -ENODEV;
>
> The objective is to move to the specific CPU and
> test it's MSRs to see if it is supported. For
> some reason, it isn't working. Any suggestions
> on how to fix this are appreciated.
>
> -Mark Langdsorf
> Operating System Research Center
> AMD

2008-08-08 21:31:18

by Dmitry Adamushko

[permalink] [raw]
Subject: Re: Regression in 2.6.27-rc1 for set_cpus_allowed_ptr

2008/8/8 Rafael J. Wysocki <[email protected]>:
> [Adding CCs]
>
> On Friday, 8 of August 2008, Langsdorf, Mark wrote:
>> One of my co-workers noticed that the powernow-k8
>> driver no longer restarts when a CPU core is
>> hot-disabled and then hot-enabled on AMD quad-core
>> systems.
>>
>> The following comands work fine on 2.6.26 and fail
>> on 2.6.27-rc1:
>>
>> echo 0 > /sys/devices/system/cpu/cpu3/online
>> echo 1 > /sys/devices/system/cpu/cpu3/online
>> find /sys -name cpufreq
>>
>> For 2.6.26, the find will return a cpufreq
>> directory for each processor. In 2.6.27-rc1,
>> the cpu3 directory is missing.
>>
>> After digging through the code, the following
>> logic is failing when the core is hot-enabled
>> at runtime. The code works during the boot
>> sequence.
>>
>> cpumask_t = current->cpus_allowed;
>> set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
>> if (smp_processor_id() != cpu)
>> return -ENODEV;
>>


if it gets called from any of the cpu-hotplug handlers, it won't work
now (x86-microcode is another victim).

Please give a try to the following patch: http://lkml.org/lkml/2008/7/30/171

does it help?

(the explanation is also available in this thread).

well, provided we may guarantee that load-balancing has been
initialized (it's ok in our case) by the moment CPU_ONLINE gets
called, this approach is not that bad perhaps... (and it looks like
there is plenty of code that relies on set_cpus_allowed_ptr() being
workable in cpu-hotplug-handlers).

Although, I personally don't like that much this particular use-case
of set_cpus_allowed_ptr() (I posted patches for x86-microcode). btw.,
last time I briefly looked at various places, there seemed to be a few
where

old_mask = p->cpus_allowed;
set_cpus_allowed_ptr(p, target_cpu);
// do something
set_cpus_allowed_ptr(p, old_mask);

is used just wrongly. e.g. it may race with sched_setaffinity() and
negate its effect.


>> -Mark Langdsorf
>> Operating System Research Center
>> AMD

--
Best regards,
Dmitry Adamushko

2008-08-11 12:30:57

by Ingo Molnar

[permalink] [raw]
Subject: Re: Regression in 2.6.27-rc1 for set_cpus_allowed_ptr


* Dmitry Adamushko <[email protected]> wrote:

> 2008/8/8 Rafael J. Wysocki <[email protected]>:
> > [Adding CCs]
> >
> > On Friday, 8 of August 2008, Langsdorf, Mark wrote:
> >> One of my co-workers noticed that the powernow-k8
> >> driver no longer restarts when a CPU core is
> >> hot-disabled and then hot-enabled on AMD quad-core
> >> systems.
> >>
> >> The following comands work fine on 2.6.26 and fail
> >> on 2.6.27-rc1:
> >>
> >> echo 0 > /sys/devices/system/cpu/cpu3/online
> >> echo 1 > /sys/devices/system/cpu/cpu3/online
> >> find /sys -name cpufreq
> >>
> >> For 2.6.26, the find will return a cpufreq
> >> directory for each processor. In 2.6.27-rc1,
> >> the cpu3 directory is missing.
> >>
> >> After digging through the code, the following
> >> logic is failing when the core is hot-enabled
> >> at runtime. The code works during the boot
> >> sequence.
> >>
> >> cpumask_t = current->cpus_allowed;
> >> set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
> >> if (smp_processor_id() != cpu)
> >> return -ENODEV;
> >>
>
>
> if it gets called from any of the cpu-hotplug handlers, it won't work
> now (x86-microcode is another victim).
>
> Please give a try to the following patch: http://lkml.org/lkml/2008/7/30/171
>
> does it help?
>
> (the explanation is also available in this thread).

i've queued up the fix below in tip/sched/urgent.

Ingo

------------------------>
>From 3bc8fb8f85b79aa2d6341adb5090799b93d64bc9 Mon Sep 17 00:00:00 2001
From: Dmitry Adamushko <[email protected]>
Date: Wed, 30 Jul 2008 12:34:04 +0200
Subject: [PATCH] sched, cpu hotplug: fix set_cpus_allowed() use in hotplug callbacks

Mark Langsdorf reported:

> One of my co-workers noticed that the powernow-k8
> driver no longer restarts when a CPU core is
> hot-disabled and then hot-enabled on AMD quad-core
> systems.
>
> The following comands work fine on 2.6.26 and fail
> on 2.6.27-rc1:
>
> echo 0 > /sys/devices/system/cpu/cpu3/online
> echo 1 > /sys/devices/system/cpu/cpu3/online
> find /sys -name cpufreq
>
> For 2.6.26, the find will return a cpufreq
> directory for each processor. In 2.6.27-rc1,
> the cpu3 directory is missing.
>
> After digging through the code, the following
> logic is failing when the core is hot-enabled
> at runtime. The code works during the boot
> sequence.
>
> cpumask_t = current->cpus_allowed;
> set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
> if (smp_processor_id() != cpu)
> return -ENODEV;

a similar problem also affects the microcode driver.

So set the CPU active before calling the CPU_ONLINE notifier chain,
there are a handful of notifiers that use set_cpus_allowed().

Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/cpu.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index e202a68..c977c33 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -349,6 +349,8 @@ static int __cpuinit _cpu_up(unsigned int cpu, int tasks_frozen)
goto out_notify;
BUG_ON(!cpu_online(cpu));

+ cpu_set(cpu, cpu_active_map);
+
/* Now call notifier in preparation. */
raw_notifier_call_chain(&cpu_chain, CPU_ONLINE | mod, hcpu);

@@ -383,9 +385,6 @@ int __cpuinit cpu_up(unsigned int cpu)

err = _cpu_up(cpu, 0);

- if (cpu_online(cpu))
- cpu_set(cpu, cpu_active_map);
-
out:
cpu_maps_update_done();
return err;

2008-08-11 14:02:11

by Langsdorf, Mark

[permalink] [raw]
Subject: RE: Regression in 2.6.27-rc1 for set_cpus_allowed_ptr


> * Dmitry Adamushko <[email protected]> wrote:
>
> > 2008/8/8 Rafael J. Wysocki <[email protected]>:
> > > [Adding CCs]
> >
> >
> > if it gets called from any of the cpu-hotplug handlers, it
> won't work
> > now (x86-microcode is another victim).
> >
> > Please give a try to the following patch:
> http://lkml.org/lkml/2008/7/30/171
> >
> > does it help?
> >
> > (the explanation is also available in this thread).
>
> i've queued up the fix below in tip/sched/urgent.

Sorry for the delay in response - we had some building power
issues this weekend.

Yes, that fixed the issue.

-Mark Langsdorf
Operating System Research Center
AMD

2008-08-11 14:21:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: Regression in 2.6.27-rc1 for set_cpus_allowed_ptr


* Langsdorf, Mark <[email protected]> wrote:

> > > does it help?
> > >
> > > (the explanation is also available in this thread).
> >
> > i've queued up the fix below in tip/sched/urgent.
>
> Sorry for the delay in response - we had some building power
> issues this weekend.
>
> Yes, that fixed the issue.

thanks Mark - i've added your Tested-by line to the commit.

Ingo

2008-08-11 14:28:26

by Dmitry Adamushko

[permalink] [raw]
Subject: Re: Regression in 2.6.27-rc1 for set_cpus_allowed_ptr

2008/8/11 Ingo Molnar <[email protected]>:
>
> * Dmitry Adamushko <[email protected]> wrote:
>
>> 2008/8/8 Rafael J. Wysocki <[email protected]>:
>> > [Adding CCs]
>> >
>> > On Friday, 8 of August 2008, Langsdorf, Mark wrote:
>> >> One of my co-workers noticed that the powernow-k8
>> >> driver no longer restarts when a CPU core is
>> >> hot-disabled and then hot-enabled on AMD quad-core
>> >> systems.
>> >>
>> >> The following comands work fine on 2.6.26 and fail
>> >> on 2.6.27-rc1:
>> >>
>> >> echo 0 > /sys/devices/system/cpu/cpu3/online
>> >> echo 1 > /sys/devices/system/cpu/cpu3/online
>> >> find /sys -name cpufreq
>> >>
>> >> For 2.6.26, the find will return a cpufreq
>> >> directory for each processor. In 2.6.27-rc1,
>> >> the cpu3 directory is missing.
>> >>
>> >> After digging through the code, the following
>> >> logic is failing when the core is hot-enabled
>> >> at runtime. The code works during the boot
>> >> sequence.
>> >>
>> >> cpumask_t = current->cpus_allowed;
>> >> set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
>> >> if (smp_processor_id() != cpu)
>> >> return -ENODEV;
>> >>
>>
>>
>> if it gets called from any of the cpu-hotplug handlers, it won't work
>> now (x86-microcode is another victim).
>>
>> Please give a try to the following patch: http://lkml.org/lkml/2008/7/30/171
>>
>> does it help?
>>
>> (the explanation is also available in this thread).
>
> i've queued up the fix below in tip/sched/urgent.
>
> Ingo
>
> ------------------------>
> From 3bc8fb8f85b79aa2d6341adb5090799b93d64bc9 Mon Sep 17 00:00:00 2001
> From: Dmitry Adamushko <[email protected]>
> Date: Wed, 30 Jul 2008 12:34:04 +0200
> Subject: [PATCH] sched, cpu hotplug: fix set_cpus_allowed() use in hotplug callbacks
>
> Mark Langsdorf reported:
>
>> One of my co-workers noticed that the powernow-k8
>> driver no longer restarts when a CPU core is
>> hot-disabled and then hot-enabled on AMD quad-core
>> systems.
>>
>> The following comands work fine on 2.6.26 and fail
>> on 2.6.27-rc1:
>>
>> echo 0 > /sys/devices/system/cpu/cpu3/online
>> echo 1 > /sys/devices/system/cpu/cpu3/online
>> find /sys -name cpufreq
>>
>> For 2.6.26, the find will return a cpufreq
>> directory for each processor. In 2.6.27-rc1,
>> the cpu3 directory is missing.
>>
>> After digging through the code, the following
>> logic is failing when the core is hot-enabled
>> at runtime. The code works during the boot
>> sequence.
>>
>> cpumask_t = current->cpus_allowed;
>> set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
>> if (smp_processor_id() != cpu)
>> return -ENODEV;
>
> a similar problem also affects the microcode driver.
>
> So set the CPU active before calling the CPU_ONLINE notifier chain,
> there are a handful of notifiers that use set_cpus_allowed().
>
> Signed-off-by: Ingo Molnar <[email protected]>

Signed-off-by: Dmitry Adamushko <[email protected]>

This fix also solves the problem with x86-microcode. I've sent
alternative patches for microcode, but as this "rely on
set_cpus_allowed_ptr() being workable in cpu-hotplug(CPU_ONLINE, ...)"
thing seems to be more broad than what we thought, perhaps this fix
should be applied.

With this patch we define that by the moment CPU_ONLINE is being sent,
a 'cpu' is online and ready for tasks to be migrated onto it.


> ---
> kernel/cpu.c | 5 ++---
> 1 files changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index e202a68..c977c33 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -349,6 +349,8 @@ static int __cpuinit _cpu_up(unsigned int cpu, int tasks_frozen)
> goto out_notify;
> BUG_ON(!cpu_online(cpu));
>
> + cpu_set(cpu, cpu_active_map);
> +
> /* Now call notifier in preparation. */
> raw_notifier_call_chain(&cpu_chain, CPU_ONLINE | mod, hcpu);
>
> @@ -383,9 +385,6 @@ int __cpuinit cpu_up(unsigned int cpu)
>
> err = _cpu_up(cpu, 0);
>
> - if (cpu_online(cpu))
> - cpu_set(cpu, cpu_active_map);
> -
> out:
> cpu_maps_update_done();
> return err;
>



--
Best regards,
Dmitry Adamushko

2008-08-11 22:04:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: Regression in 2.6.27-rc1 for set_cpus_allowed_ptr



On Mon, 11 Aug 2008, Ingo Molnar wrote:
>
> i've queued up the fix below in tip/sched/urgent.
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index e202a68..c977c33 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -349,6 +349,8 @@ static int __cpuinit _cpu_up(unsigned int cpu, int tasks_frozen)
> goto out_notify;
> BUG_ON(!cpu_online(cpu));
>
> + cpu_set(cpu, cpu_active_map);
> +
> /* Now call notifier in preparation. */
> raw_notifier_call_chain(&cpu_chain, CPU_ONLINE | mod, hcpu);
>
> @@ -383,9 +385,6 @@ int __cpuinit cpu_up(unsigned int cpu)
>
> err = _cpu_up(cpu, 0);
>
> - if (cpu_online(cpu))
> - cpu_set(cpu, cpu_active_map);
> -

Ok, not only does that fix the bug, but it simplifies the code and looks
obviously ok. However, I don't have it in my tree yet, and I'd like to do
an -rc3 that has this fixes (so that along with the PCI MSI thing, we
hopefully have most of the suspend/resume regressions fixed).

And I was hoping to do -rc3 today. Can I please have pull-requests for the
appropriate urgent scheduler/x86 fixes? Or should I just take these as
patches?

Linus

2008-08-11 22:11:22

by Ingo Molnar

[permalink] [raw]
Subject: Re: Regression in 2.6.27-rc1 for set_cpus_allowed_ptr

* Linus Torvalds <[email protected]> wrote:

>
>
> On Mon, 11 Aug 2008, Ingo Molnar wrote:
> >
> > i've queued up the fix below in tip/sched/urgent.
> >
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index e202a68..c977c33 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -349,6 +349,8 @@ static int __cpuinit _cpu_up(unsigned int cpu, int tasks_frozen)
> > goto out_notify;
> > BUG_ON(!cpu_online(cpu));
> >
> > + cpu_set(cpu, cpu_active_map);
> > +
> > /* Now call notifier in preparation. */
> > raw_notifier_call_chain(&cpu_chain, CPU_ONLINE | mod, hcpu);
> >
> > @@ -383,9 +385,6 @@ int __cpuinit cpu_up(unsigned int cpu)
> >
> > err = _cpu_up(cpu, 0);
> >
> > - if (cpu_online(cpu))
> > - cpu_set(cpu, cpu_active_map);
> > -
>
> Ok, not only does that fix the bug, but it simplifies the code and
> looks obviously ok. However, I don't have it in my tree yet, and I'd
> like to do an -rc3 that has this fixes (so that along with the PCI MSI
> thing, we hopefully have most of the suspend/resume regressions
> fixed).
>
> And I was hoping to do -rc3 today. Can I please have pull-requests for
> the appropriate urgent scheduler/x86 fixes? Or should I just take
> these as patches?

i'll send pull requests for all pending patches. I wanted to send them
tomorrow originally but will do them now.

Ingo

2008-08-11 22:14:18

by Max Krasnyansky

[permalink] [raw]
Subject: Re: Regression in 2.6.27-rc1 for set_cpus_allowed_ptr



Linus Torvalds wrote:
>
> On Mon, 11 Aug 2008, Ingo Molnar wrote:
>> i've queued up the fix below in tip/sched/urgent.
>>
>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>> index e202a68..c977c33 100644
>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -349,6 +349,8 @@ static int __cpuinit _cpu_up(unsigned int cpu, int tasks_frozen)
>> goto out_notify;
>> BUG_ON(!cpu_online(cpu));
>>
>> + cpu_set(cpu, cpu_active_map);
>> +
>> /* Now call notifier in preparation. */
>> raw_notifier_call_chain(&cpu_chain, CPU_ONLINE | mod, hcpu);
>>
>> @@ -383,9 +385,6 @@ int __cpuinit cpu_up(unsigned int cpu)
>>
>> err = _cpu_up(cpu, 0);
>>
>> - if (cpu_online(cpu))
>> - cpu_set(cpu, cpu_active_map);
>> -
>
> Ok, not only does that fix the bug, but it simplifies the code and looks
> obviously ok. However, I don't have it in my tree yet, and I'd like to do
> an -rc3 that has this fixes (so that along with the PCI MSI thing, we
> hopefully have most of the suspend/resume regressions fixed).
I actually thought it's somewhat against the original idea. It seems that we'd
be setting 'active' be a little too early. ie before all the hotplug handlers
had a chance to realize that cpu is now online.
I don't have a strong objection though.

> And I was hoping to do -rc3 today. Can I please have pull-requests for the
> appropriate urgent scheduler/x86 fixes? Or should I just take these as
> patches?
It'd be nice if -rc3 included my cpuset patch so that we could put circular
locking issues in the cpu hotplug path to the rest.
Ingo, I'm talking about this:
[PATCH] cpuset: Rework sched domains and CPU hotplug handling (take 4)

Max


Max

2008-08-11 22:47:20

by Ingo Molnar

[permalink] [raw]
Subject: Re: Regression in 2.6.27-rc1 for set_cpus_allowed_ptr


* Max Krasnyansky <[email protected]> wrote:

> > And I was hoping to do -rc3 today. Can I please have pull-requests for the
> > appropriate urgent scheduler/x86 fixes? Or should I just take these as
> > patches?
>
> It'd be nice if -rc3 included my cpuset patch so that we could put circular
> locking issues in the cpu hotplug path to the rest.
> Ingo, I'm talking about this:
> [PATCH] cpuset: Rework sched domains and CPU hotplug handling (take 4)

the latest (-v4) version of the patch was submitted just half an hour
ago and it's rather large/complex, with a few unrelated changes
(whitespace, etc.) mixed in as well. I'd like to wait for Paul's final
ack for -v4 (he has already agreed with the approach in general), and
wanted to have it tested myself as well, at least minimally.

Ingo

2008-08-11 22:50:04

by Max Krasnyansky

[permalink] [raw]
Subject: Re: Regression in 2.6.27-rc1 for set_cpus_allowed_ptr



Ingo Molnar wrote:
> * Max Krasnyansky <[email protected]> wrote:
>
>>> And I was hoping to do -rc3 today. Can I please have pull-requests for the
>>> appropriate urgent scheduler/x86 fixes? Or should I just take these as
>>> patches?
>> It'd be nice if -rc3 included my cpuset patch so that we could put circular
>> locking issues in the cpu hotplug path to the rest.
>> Ingo, I'm talking about this:
>> [PATCH] cpuset: Rework sched domains and CPU hotplug handling (take 4)
>
> the latest (-v4) version of the patch was submitted just half an hour
> ago and it's rather large/complex, with a few unrelated changes
> (whitespace, etc.) mixed in as well. I'd like to wait for Paul's final
> ack for -v4 (he has already agreed with the approach in general), and
> wanted to have it tested myself as well, at least minimally.

Fair enough.

btw Whitespace and other cosmetic changes were requested by reviewers.

Max

2008-08-11 22:56:26

by Ingo Molnar

[permalink] [raw]
Subject: Re: Regression in 2.6.27-rc1 for set_cpus_allowed_ptr


* Max Krasnyansky <[email protected]> wrote:

>
>
> Ingo Molnar wrote:
> > * Max Krasnyansky <[email protected]> wrote:
> >
> >>> And I was hoping to do -rc3 today. Can I please have pull-requests for the
> >>> appropriate urgent scheduler/x86 fixes? Or should I just take these as
> >>> patches?
> >> It'd be nice if -rc3 included my cpuset patch so that we could put circular
> >> locking issues in the cpu hotplug path to the rest.
> >> Ingo, I'm talking about this:
> >> [PATCH] cpuset: Rework sched domains and CPU hotplug handling (take 4)
> >
> > the latest (-v4) version of the patch was submitted just half an hour
> > ago and it's rather large/complex, with a few unrelated changes
> > (whitespace, etc.) mixed in as well. I'd like to wait for Paul's final
> > ack for -v4 (he has already agreed with the approach in general), and
> > wanted to have it tested myself as well, at least minimally.
>
> Fair enough.
>
> btw Whitespace and other cosmetic changes were requested by reviewers.

yeah - it just makes it a tiny bit harder decision whether to queue up a
patch in the urgent path.

It's better to keep cleanups separate - that way any typos and
unintended bugs in cleanups are more obvious as well. (because later on
a person debugging a breakage does not have to wonder about whether a
change's side-effects were intended or not.)

But your patch certainly looks OK standalone as well, just IMO not as a
very-last-minute patch. (No strong feelings though, your patch should
not break anything in the normal !CPUSETS or the CPUSETS+no-cpuset-used
usecases.)

Ingo