2014-10-27 02:59:12

by Neil Zhang

[permalink] [raw]
Subject: [PATCH V2] Driver cpu: update online when cpu_up/down besides sysfs

The current per-cpu offline info won't be updated when we use
any other method besides sysfs to call cpu_up/down.
Thus the cpu/online can't reflect the real online status.

This patch is going to fix the issue introduced by commit
0902a9044fa5b7a0456ea4daacec2c2b3189ba8c (Driver core:
Use generic offline/online for CPU offline/online)

CC: Rafael J. Wysocki <[email protected]>
Tested-by: Dan Streetman <[email protected]>
Signed-off-by: Neil Zhang <[email protected]>
---
drivers/base/cpu.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 006b1bc..9d61824 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -418,10 +418,35 @@ static void __init cpu_dev_register_generic(void)
#endif
}

+static int device_hotplug_notifier(struct notifier_block *nfb,
+ unsigned long action, void *hcpu)
+{
+ unsigned int cpu = (unsigned long)hcpu;
+ struct device *dev = get_cpu_device(cpu);
+ int ret;
+
+ switch (action & ~CPU_TASKS_FROZEN) {
+ case CPU_ONLINE:
+ dev->offline = false;
+ ret = NOTIFY_OK;
+ break;
+ case CPU_DYING:
+ dev->offline = true;
+ ret = NOTIFY_OK;
+ break;
+ default:
+ ret = NOTIFY_DONE;
+ break;
+ }
+
+ return ret;
+}
+
void __init cpu_dev_init(void)
{
if (subsys_system_register(&cpu_subsys, cpu_root_attr_groups))
panic("Failed to register CPU subsystem");

cpu_dev_register_generic();
+ cpu_notifier(device_hotplug_notifier, 0);
}
--
1.7.9.5


2014-10-27 21:39:01

by Wysocki, Rafael J

[permalink] [raw]
Subject: Re: [PATCH V2] Driver cpu: update online when cpu_up/down besides sysfs

On 10/27/2014 3:59 AM, Neil Zhang wrote:
> The current per-cpu offline info won't be updated when we use
> any other method besides sysfs to call cpu_up/down.
> Thus the cpu/online can't reflect the real online status.
>
> This patch is going to fix the issue introduced by commit
> 0902a9044fa5b7a0456ea4daacec2c2b3189ba8c (Driver core:
> Use generic offline/online for CPU offline/online)
>
> CC: Rafael J. Wysocki <[email protected]>
> Tested-by: Dan Streetman <[email protected]>
> Signed-off-by: Neil Zhang <[email protected]>

Oh dear, no.

Please first tell me what exactly the problem you're seeing is.

> ---
> drivers/base/cpu.c | 25 +++++++++++++++++++++++++
> 1 file changed, 25 insertions(+)
>
> diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> index 006b1bc..9d61824 100644
> --- a/drivers/base/cpu.c
> +++ b/drivers/base/cpu.c
> @@ -418,10 +418,35 @@ static void __init cpu_dev_register_generic(void)
> #endif
> }
>
> +static int device_hotplug_notifier(struct notifier_block *nfb,
> + unsigned long action, void *hcpu)
> +{
> + unsigned int cpu = (unsigned long)hcpu;
> + struct device *dev = get_cpu_device(cpu);
> + int ret;
> +
> + switch (action & ~CPU_TASKS_FROZEN) {
> + case CPU_ONLINE:
> + dev->offline = false;
> + ret = NOTIFY_OK;
> + break;
> + case CPU_DYING:
> + dev->offline = true;
> + ret = NOTIFY_OK;
> + break;
> + default:
> + ret = NOTIFY_DONE;
> + break;
> + }
> +
> + return ret;
> +}
> +
> void __init cpu_dev_init(void)
> {
> if (subsys_system_register(&cpu_subsys, cpu_root_attr_groups))
> panic("Failed to register CPU subsystem");
>
> cpu_dev_register_generic();
> + cpu_notifier(device_hotplug_notifier, 0);
> }

2014-10-28 00:46:31

by Dan Streetman

[permalink] [raw]
Subject: Re: [PATCH V2] Driver cpu: update online when cpu_up/down besides sysfs

On Mon, Oct 27, 2014 at 5:38 PM, Rafael J. Wysocki
<[email protected]> wrote:
> On 10/27/2014 3:59 AM, Neil Zhang wrote:
>>
>> The current per-cpu offline info won't be updated when we use
>> any other method besides sysfs to call cpu_up/down.
>> Thus the cpu/online can't reflect the real online status.
>>
>> This patch is going to fix the issue introduced by commit
>> 0902a9044fa5b7a0456ea4daacec2c2b3189ba8c (Driver core:
>> Use generic offline/online for CPU offline/online)
>>
>> CC: Rafael J. Wysocki <[email protected]>
>> Tested-by: Dan Streetman <[email protected]>
>> Signed-off-by: Neil Zhang <[email protected]>
>
>
> Oh dear, no.
>
> Please first tell me what exactly the problem you're seeing is.

For some background, here is my last comment on the first email thread on this:
https://lkml.org/lkml/2014/10/27/595

I didn't create this patch, but the problem essentially is that before
your commit the individual cpu online nodes
(/sys/devices/system/cpu/cpuN/online) stayed in sync during
cpu_down/up, because they used the cpu_online_mask; while after the
commit, they are tracked by the cpu's generic dev->offline flag, which
isn't updated during cpu_down/up. So now, any place in the kernel
that brings a cpu up or down must also update the cpu->dev->offline
flag. My interest in the patch was coincidental because I was seeing
the same problem when using dlpar operations to hotplug cpus, which
uses the arch/powerpc/platform/pseries/dlpar.c code; that code brings
a cpu offline when it's hot-removed (and the cpu online when it's
hot-added), but it hasn't been changed to also update the cpu's
dev->offline flag.

As I said in the previous email to the first thread, the ppc dlpar
operation might be changed in the future to fully unregister a cpu
when it's hot-removed, which would remove the entire sysfs cpuN
directory. Alternately and/or until then, it could be updated to
simply update the cpu'd dev->offline flag (that's what I originally
did for my own testing). However, without a central place to update
the cpu's dev->offline field, like this, or possibly in
set_cpu_online(), or elsewhere during cpu_down/up, each place in the
kernel that calls cpu_down() or cpu_up() also needs to update the
dev->offline flag. It's possible that the ppc dlpar code is the only
place in the kernel that has this problem; I haven't searched.

>
>
>> ---
>> drivers/base/cpu.c | 25 +++++++++++++++++++++++++
>> 1 file changed, 25 insertions(+)
>>
>> diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
>> index 006b1bc..9d61824 100644
>> --- a/drivers/base/cpu.c
>> +++ b/drivers/base/cpu.c
>> @@ -418,10 +418,35 @@ static void __init cpu_dev_register_generic(void)
>> #endif
>> }
>> +static int device_hotplug_notifier(struct notifier_block *nfb,
>> + unsigned long action, void *hcpu)
>> +{
>> + unsigned int cpu = (unsigned long)hcpu;
>> + struct device *dev = get_cpu_device(cpu);
>> + int ret;
>> +
>> + switch (action & ~CPU_TASKS_FROZEN) {
>> + case CPU_ONLINE:
>> + dev->offline = false;
>> + ret = NOTIFY_OK;
>> + break;
>> + case CPU_DYING:
>> + dev->offline = true;
>> + ret = NOTIFY_OK;
>> + break;
>> + default:
>> + ret = NOTIFY_DONE;
>> + break;
>> + }
>> +
>> + return ret;
>> +}
>> +
>> void __init cpu_dev_init(void)
>> {
>> if (subsys_system_register(&cpu_subsys, cpu_root_attr_groups))
>> panic("Failed to register CPU subsystem");
>> cpu_dev_register_generic();
>> + cpu_notifier(device_hotplug_notifier, 0);
>> }
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2014-10-29 21:26:13

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH V2] Driver cpu: update online when cpu_up/down besides sysfs

On Monday, October 27, 2014 08:46:08 PM Dan Streetman wrote:
> On Mon, Oct 27, 2014 at 5:38 PM, Rafael J. Wysocki
> <[email protected]> wrote:
> > On 10/27/2014 3:59 AM, Neil Zhang wrote:
> >>
> >> The current per-cpu offline info won't be updated when we use
> >> any other method besides sysfs to call cpu_up/down.
> >> Thus the cpu/online can't reflect the real online status.
> >>
> >> This patch is going to fix the issue introduced by commit
> >> 0902a9044fa5b7a0456ea4daacec2c2b3189ba8c (Driver core:
> >> Use generic offline/online for CPU offline/online)
> >>
> >> CC: Rafael J. Wysocki <[email protected]>
> >> Tested-by: Dan Streetman <[email protected]>
> >> Signed-off-by: Neil Zhang <[email protected]>
> >
> >
> > Oh dear, no.
> >
> > Please first tell me what exactly the problem you're seeing is.
>
> For some background, here is my last comment on the first email thread on this:
> https://lkml.org/lkml/2014/10/27/595
>
> I didn't create this patch, but the problem essentially is that before
> your commit the individual cpu online nodes
> (/sys/devices/system/cpu/cpuN/online) stayed in sync during
> cpu_down/up, because they used the cpu_online_mask; while after the
> commit, they are tracked by the cpu's generic dev->offline flag, which
> isn't updated during cpu_down/up.

Which is not triggered from sysfs.

> So now, any place in the kernel
> that brings a cpu up or down must also update the cpu->dev->offline
> flag.

Not any place. In particular, system suspend-resume doesn't need to
do that, because it takes CPUs offline and then brings them back
online.

If there's a place in the kernel where CPUs are taken offline and
left in that state, then it needs to be updated.

> My interest in the patch was coincidental because I was seeing
> the same problem when using dlpar operations to hotplug cpus, which
> uses the arch/powerpc/platform/pseries/dlpar.c code; that code brings
> a cpu offline when it's hot-removed (and the cpu online when it's
> hot-added), but it hasn't been changed to also update the cpu's
> dev->offline flag.

It should be modified to do that.

> As I said in the previous email to the first thread, the ppc dlpar
> operation might be changed in the future to fully unregister a cpu
> when it's hot-removed, which would remove the entire sysfs cpuN
> directory. Alternately and/or until then, it could be updated to
> simply update the cpu'd dev->offline flag (that's what I originally
> did for my own testing). However, without a central place to update
> the cpu's dev->offline field, like this, or possibly in
> set_cpu_online(), or elsewhere during cpu_down/up, each place in the
> kernel that calls cpu_down() or cpu_up() also needs to update the
> dev->offline flag. It's possible that the ppc dlpar code is the only
> place in the kernel that has this problem; I haven't searched.

It is quite likely to be the only place like that.

While I'm not familiar with the code in question, the most straightforward
way to fix the problem would be to replace cpu_down() in there with
device_offline(get_cpu_device(cpu)), but that needs to be called under
device_hotplug_lock.

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2014-10-29 21:52:35

by Dan Streetman

[permalink] [raw]
Subject: Re: [PATCH V2] Driver cpu: update online when cpu_up/down besides sysfs

On Wed, Oct 29, 2014 at 5:46 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Monday, October 27, 2014 08:46:08 PM Dan Streetman wrote:
>> On Mon, Oct 27, 2014 at 5:38 PM, Rafael J. Wysocki
>> <[email protected]> wrote:
>> > On 10/27/2014 3:59 AM, Neil Zhang wrote:
>> >>
>> >> The current per-cpu offline info won't be updated when we use
>> >> any other method besides sysfs to call cpu_up/down.
>> >> Thus the cpu/online can't reflect the real online status.
>> >>
>> >> This patch is going to fix the issue introduced by commit
>> >> 0902a9044fa5b7a0456ea4daacec2c2b3189ba8c (Driver core:
>> >> Use generic offline/online for CPU offline/online)
>> >>
>> >> CC: Rafael J. Wysocki <[email protected]>
>> >> Tested-by: Dan Streetman <[email protected]>
>> >> Signed-off-by: Neil Zhang <[email protected]>
>> >
>> >
>> > Oh dear, no.
>> >
>> > Please first tell me what exactly the problem you're seeing is.
>>
>> For some background, here is my last comment on the first email thread on this:
>> https://lkml.org/lkml/2014/10/27/595
>>
>> I didn't create this patch, but the problem essentially is that before
>> your commit the individual cpu online nodes
>> (/sys/devices/system/cpu/cpuN/online) stayed in sync during
>> cpu_down/up, because they used the cpu_online_mask; while after the
>> commit, they are tracked by the cpu's generic dev->offline flag, which
>> isn't updated during cpu_down/up.
>
> Which is not triggered from sysfs.
>
>> So now, any place in the kernel
>> that brings a cpu up or down must also update the cpu->dev->offline
>> flag.
>
> Not any place. In particular, system suspend-resume doesn't need to
> do that, because it takes CPUs offline and then brings them back
> online.
>
> If there's a place in the kernel where CPUs are taken offline and
> left in that state, then it needs to be updated.

The only place I know of is ppc's dlpar code, as mentioned below.

Neil, as you crafted the original patch, I assume you know of some
other place in the kernel doing cpu_up/down directly, where you're
seeing this problem?

>
>> My interest in the patch was coincidental because I was seeing
>> the same problem when using dlpar operations to hotplug cpus, which
>> uses the arch/powerpc/platform/pseries/dlpar.c code; that code brings
>> a cpu offline when it's hot-removed (and the cpu online when it's
>> hot-added), but it hasn't been changed to also update the cpu's
>> dev->offline flag.
>
> It should be modified to do that.
>
>> As I said in the previous email to the first thread, the ppc dlpar
>> operation might be changed in the future to fully unregister a cpu
>> when it's hot-removed, which would remove the entire sysfs cpuN
>> directory. Alternately and/or until then, it could be updated to
>> simply update the cpu'd dev->offline flag (that's what I originally
>> did for my own testing). However, without a central place to update
>> the cpu's dev->offline field, like this, or possibly in
>> set_cpu_online(), or elsewhere during cpu_down/up, each place in the
>> kernel that calls cpu_down() or cpu_up() also needs to update the
>> dev->offline flag. It's possible that the ppc dlpar code is the only
>> place in the kernel that has this problem; I haven't searched.
>
> It is quite likely to be the only place like that.
>
> While I'm not familiar with the code in question, the most straightforward
> way to fix the problem would be to replace cpu_down() in there with
> device_offline(get_cpu_device(cpu)), but that needs to be called under
> device_hotplug_lock.

Ok, will do.

>
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.

2014-10-30 02:00:21

by Neil Zhang

[permalink] [raw]
Subject: RE: [PATCH V2] Driver cpu: update online when cpu_up/down besides sysfs

Rafael,

> -----Original Message-----
> From: Rafael J. Wysocki [mailto:[email protected]]
> Sent: 2014年10月30日 5:47
> To: Dan Streetman
> Cc: Rafael J. Wysocki; Neil Zhang; linux-kernel; Greg Kroah-Hartman;
> [email protected]
> Subject: Re: [PATCH V2] Driver cpu: update online when cpu_up/down besides
> sysfs
>
> On Monday, October 27, 2014 08:46:08 PM Dan Streetman wrote:
> > On Mon, Oct 27, 2014 at 5:38 PM, Rafael J. Wysocki
> > <[email protected]> wrote:
> > > On 10/27/2014 3:59 AM, Neil Zhang wrote:
> > >>
> > >> The current per-cpu offline info won't be updated when we use any
> > >> other method besides sysfs to call cpu_up/down.
> > >> Thus the cpu/online can't reflect the real online status.
> > >>
> > >> This patch is going to fix the issue introduced by commit
> > >> 0902a9044fa5b7a0456ea4daacec2c2b3189ba8c (Driver core:
> > >> Use generic offline/online for CPU offline/online)
> > >>
> > >> CC: Rafael J. Wysocki <[email protected]>
> > >> Tested-by: Dan Streetman <[email protected]>
> > >> Signed-off-by: Neil Zhang <[email protected]>
> > >
> > >
> > > Oh dear, no.
> > >
> > > Please first tell me what exactly the problem you're seeing is.
> >
> > For some background, here is my last comment on the first email thread on
> this:
> > https://lkml.org/lkml/2014/10/27/595
> >
> > I didn't create this patch, but the problem essentially is that before
> > your commit the individual cpu online nodes
> > (/sys/devices/system/cpu/cpuN/online) stayed in sync during
> > cpu_down/up, because they used the cpu_online_mask; while after the
> > commit, they are tracked by the cpu's generic dev->offline flag, which
> > isn't updated during cpu_down/up.
>
> Which is not triggered from sysfs.
>
> > So now, any place in the kernel
> > that brings a cpu up or down must also update the cpu->dev->offline
> > flag.
>
> Not any place. In particular, system suspend-resume doesn't need to do that,
> because it takes CPUs offline and then brings them back online.
>
> If there's a place in the kernel where CPUs are taken offline and left in that
> state, then it needs to be updated.

Many ARM SoCs will have an in kernel profiler to determine whether we need
to remove or add a cpu into system for power and performance consideration.
Thus we will call cpu_up / down in kernel directly.

>
> > My interest in the patch was coincidental because I was seeing the
> > same problem when using dlpar operations to hotplug cpus, which uses
> > the arch/powerpc/platform/pseries/dlpar.c code; that code brings a cpu
> > offline when it's hot-removed (and the cpu online when it's
> > hot-added), but it hasn't been changed to also update the cpu's
> > dev->offline flag.
>
> It should be modified to do that.
>
> > As I said in the previous email to the first thread, the ppc dlpar
> > operation might be changed in the future to fully unregister a cpu
> > when it's hot-removed, which would remove the entire sysfs cpuN
> > directory. Alternately and/or until then, it could be updated to
> > simply update the cpu'd dev->offline flag (that's what I originally
> > did for my own testing). However, without a central place to update
> > the cpu's dev->offline field, like this, or possibly in
> > set_cpu_online(), or elsewhere during cpu_down/up, each place in the
> > kernel that calls cpu_down() or cpu_up() also needs to update the
> > dev->offline flag. It's possible that the ppc dlpar code is the only
> > place in the kernel that has this problem; I haven't searched.
>
> It is quite likely to be the only place like that.
>
> While I'm not familiar with the code in question, the most straightforward way
> to fix the problem would be to replace cpu_down() in there with
> device_offline(get_cpu_device(cpu)), but that needs to be called under
> device_hotplug_lock.

Great, we can use this way to fix our problems.
Thanks for the remind!

>
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.

Best Regards,
Neil Zhang
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-10-30 02:04:17

by Neil Zhang

[permalink] [raw]
Subject: RE: [PATCH V2] Driver cpu: update online when cpu_up/down besides sysfs

Dan,

> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Dan
> Streetman
> Sent: 2014年10月30日 5:52
> To: Rafael J. Wysocki
> Cc: Rafael J. Wysocki; Neil Zhang; linux-kernel; Greg Kroah-Hartman;
> [email protected]
> Subject: Re: [PATCH V2] Driver cpu: update online when cpu_up/down besides
> sysfs
>
> On Wed, Oct 29, 2014 at 5:46 PM, Rafael J. Wysocki <[email protected]> wrote:
> > On Monday, October 27, 2014 08:46:08 PM Dan Streetman wrote:
> >> On Mon, Oct 27, 2014 at 5:38 PM, Rafael J. Wysocki
> >> <[email protected]> wrote:
> >> > On 10/27/2014 3:59 AM, Neil Zhang wrote:
> >> >>
> >> >> The current per-cpu offline info won't be updated when we use any
> >> >> other method besides sysfs to call cpu_up/down.
> >> >> Thus the cpu/online can't reflect the real online status.
> >> >>
> >> >> This patch is going to fix the issue introduced by commit
> >> >> 0902a9044fa5b7a0456ea4daacec2c2b3189ba8c (Driver core:
> >> >> Use generic offline/online for CPU offline/online)
> >> >>
> >> >> CC: Rafael J. Wysocki <[email protected]>
> >> >> Tested-by: Dan Streetman <[email protected]>
> >> >> Signed-off-by: Neil Zhang <[email protected]>
> >> >
> >> >
> >> > Oh dear, no.
> >> >
> >> > Please first tell me what exactly the problem you're seeing is.
> >>
> >> For some background, here is my last comment on the first email thread on
> this:
> >> https://lkml.org/lkml/2014/10/27/595
> >>
> >> I didn't create this patch, but the problem essentially is that
> >> before your commit the individual cpu online nodes
> >> (/sys/devices/system/cpu/cpuN/online) stayed in sync during
> >> cpu_down/up, because they used the cpu_online_mask; while after the
> >> commit, they are tracked by the cpu's generic dev->offline flag,
> >> which isn't updated during cpu_down/up.
> >
> > Which is not triggered from sysfs.
> >
> >> So now, any place in the kernel
> >> that brings a cpu up or down must also update the cpu->dev->offline
> >> flag.
> >
> > Not any place. In particular, system suspend-resume doesn't need to
> > do that, because it takes CPUs offline and then brings them back
> > online.
> >
> > If there's a place in the kernel where CPUs are taken offline and left
> > in that state, then it needs to be updated.
>
> The only place I know of is ppc's dlpar code, as mentioned below.
>
> Neil, as you crafted the original patch, I assume you know of some other place
> in the kernel doing cpu_up/down directly, where you're seeing this problem?
>

As I replied to Rafael many ARM SoCs will use an in kernel profiler to romove/add
a core via cpu_up /down for power and performance consideration.
But actually we can fix these issues as Rafael suggested.
Thanks for the discussion in these days.

> >
> >> My interest in the patch was coincidental because I was seeing the
> >> same problem when using dlpar operations to hotplug cpus, which uses
> >> the arch/powerpc/platform/pseries/dlpar.c code; that code brings a
> >> cpu offline when it's hot-removed (and the cpu online when it's
> >> hot-added), but it hasn't been changed to also update the cpu's
> >> dev->offline flag.
> >
> > It should be modified to do that.
> >
> >> As I said in the previous email to the first thread, the ppc dlpar
> >> operation might be changed in the future to fully unregister a cpu
> >> when it's hot-removed, which would remove the entire sysfs cpuN
> >> directory. Alternately and/or until then, it could be updated to
> >> simply update the cpu'd dev->offline flag (that's what I originally
> >> did for my own testing). However, without a central place to update
> >> the cpu's dev->offline field, like this, or possibly in
> >> set_cpu_online(), or elsewhere during cpu_down/up, each place in the
> >> kernel that calls cpu_down() or cpu_up() also needs to update the
> >> dev->offline flag. It's possible that the ppc dlpar code is the only
> >> place in the kernel that has this problem; I haven't searched.
> >
> > It is quite likely to be the only place like that.
> >
> > While I'm not familiar with the code in question, the most
> > straightforward way to fix the problem would be to replace cpu_down()
> > in there with device_offline(get_cpu_device(cpu)), but that needs to
> > be called under device_hotplug_lock.
>
> Ok, will do.
>
> >
> > --
> > I speak only for myself.
> > Rafael J. Wysocki, Intel Open Source Technology Center.

Best Regards,
Neil Zhang
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-10-31 19:41:54

by Dan Streetman

[permalink] [raw]
Subject: [PATCH] powerpc: use device_online/offline() instead of cpu_up/down()

In powerpc pseries platform dlpar operations, Use device_online() and
device_offline() instead of cpu_up() and cpu_down().

Calling cpu_up/down directly does not update the cpu device offline
field, which is used to online/offline a cpu from sysfs. Calling
device_online/offline instead keeps the sysfs cpu online value correct.
The hotplug lock, which is required to be held when calling
device_online/offline, is already held when dlpar_online/offline_cpu
are called, since they are called only from cpu_probe|release_store.

This patch fixes errors on PowerVM systems that have cpu(s) added/removed
using dlpar operations; without this patch, the
/sys/devices/system/cpu/cpuN/online nodes do not correctly show the
online state of added/removed cpus.

Signed-off-by: Dan Streetman <[email protected]>
Cc: Nathan Fontenot <[email protected]>
---

Previous discussion for this:
https://lkml.org/lkml/2014/10/29/839

arch/powerpc/platforms/pseries/dlpar.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
index 6ad83bd..c22bb1b 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -382,7 +382,7 @@ static int dlpar_online_cpu(struct device_node *dn)
BUG_ON(get_cpu_current_state(cpu)
!= CPU_STATE_OFFLINE);
cpu_maps_update_done();
- rc = cpu_up(cpu);
+ rc = device_online(get_cpu_device(cpu));
if (rc)
goto out;
cpu_maps_update_begin();
@@ -467,7 +467,7 @@ static int dlpar_offline_cpu(struct device_node *dn)
if (get_cpu_current_state(cpu) == CPU_STATE_ONLINE) {
set_preferred_offline_state(cpu, CPU_STATE_OFFLINE);
cpu_maps_update_done();
- rc = cpu_down(cpu);
+ rc = device_offline(get_cpu_device(cpu));
if (rc)
goto out;
cpu_maps_update_begin();
--
1.8.3.1

2014-11-02 04:59:05

by Bharata B Rao

[permalink] [raw]
Subject: Re: [PATCH] powerpc: use device_online/offline() instead of cpu_up/down()

On Fri, Oct 31, 2014 at 03:41:34PM -0400, Dan Streetman wrote:
> In powerpc pseries platform dlpar operations, Use device_online() and
> device_offline() instead of cpu_up() and cpu_down().
>
> Calling cpu_up/down directly does not update the cpu device offline
> field, which is used to online/offline a cpu from sysfs. Calling
> device_online/offline instead keeps the sysfs cpu online value correct.
> The hotplug lock, which is required to be held when calling
> device_online/offline, is already held when dlpar_online/offline_cpu
> are called, since they are called only from cpu_probe|release_store.
>
> This patch fixes errors on PowerVM systems that have cpu(s) added/removed
> using dlpar operations; without this patch, the
> /sys/devices/system/cpu/cpuN/online nodes do not correctly show the
> online state of added/removed cpus.

Verified the patch to be working as expected when I online and offline
CPUs of a PowerKVM guest using QEMU (plus my RFC hotplug patchset for
QEMU)

Regards,
Bharata.

2014-11-03 15:45:41

by Nathan Fontenot

[permalink] [raw]
Subject: Re: [PATCH] powerpc: use device_online/offline() instead of cpu_up/down()

On 10/31/2014 02:41 PM, Dan Streetman wrote:
> In powerpc pseries platform dlpar operations, Use device_online() and
> device_offline() instead of cpu_up() and cpu_down().
>
> Calling cpu_up/down directly does not update the cpu device offline
> field, which is used to online/offline a cpu from sysfs. Calling
> device_online/offline instead keeps the sysfs cpu online value correct.
> The hotplug lock, which is required to be held when calling
> device_online/offline, is already held when dlpar_online/offline_cpu
> are called, since they are called only from cpu_probe|release_store.
>
> This patch fixes errors on PowerVM systems that have cpu(s) added/removed
> using dlpar operations; without this patch, the
> /sys/devices/system/cpu/cpuN/online nodes do not correctly show the
> online state of added/removed cpus.
>
> Signed-off-by: Dan Streetman <[email protected]>
> Cc: Nathan Fontenot <[email protected]>

Acked-by: Nathan Fontenot <[email protected]>

> ---
>
> Previous discussion for this:
> https://lkml.org/lkml/2014/10/29/839
>
> arch/powerpc/platforms/pseries/dlpar.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
> index 6ad83bd..c22bb1b 100644
> --- a/arch/powerpc/platforms/pseries/dlpar.c
> +++ b/arch/powerpc/platforms/pseries/dlpar.c
> @@ -382,7 +382,7 @@ static int dlpar_online_cpu(struct device_node *dn)
> BUG_ON(get_cpu_current_state(cpu)
> != CPU_STATE_OFFLINE);
> cpu_maps_update_done();
> - rc = cpu_up(cpu);
> + rc = device_online(get_cpu_device(cpu));
> if (rc)
> goto out;
> cpu_maps_update_begin();
> @@ -467,7 +467,7 @@ static int dlpar_offline_cpu(struct device_node *dn)
> if (get_cpu_current_state(cpu) == CPU_STATE_ONLINE) {
> set_preferred_offline_state(cpu, CPU_STATE_OFFLINE);
> cpu_maps_update_done();
> - rc = cpu_down(cpu);
> + rc = device_offline(get_cpu_device(cpu));
> if (rc)
> goto out;
> cpu_maps_update_begin();
>