2009-12-11 10:54:08

by Sachin Sant

[permalink] [raw]
Subject: [Next] CPU Hotplug test failures on powerpc

While executing cpu_hotplug(from autotest) tests against latest
next on a power6 box, the machine locks up. A soft reset shows
the following trace

cpu 0x0: Vector: 100 (System Reset) at [c00000000c9333d0]
pc: c0000000003433d8: .find_next_bit+0x54/0xc4
lr: c000000000342f10: .cpumask_next_and+0x4c/0x94
sp: c00000000c933650
msr: 8000000000089032
current = 0xc00000000c173840
paca = 0xc000000000bc2600
pid = 2602, comm = hotplug06.top.s
enter ? for help
[link register ] c000000000342f10 .cpumask_next_and+0x4c/0x94
[c00000000c933650] c0000000000e9f34 .cpuset_cpus_allowed_locked+0x38/0x74 (unreliable)
[c00000000c9336e0] c000000000090074 .move_task_off_dead_cpu+0xc4/0x1ac
[c00000000c9337a0] c0000000005e4e5c .migration_call+0x304/0x830
[c00000000c933880] c0000000005e0880 .notifier_call_chain+0x68/0xe0
[c00000000c933920] c00000000012a92c ._cpu_down+0x210/0x34c
[c00000000c933a90] c00000000012aad8 .cpu_down+0x70/0xa8
[c00000000c933b20] c000000000525940 .store_online+0x54/0x894
[c00000000c933bb0] c000000000463430 .sysdev_store+0x3c/0x50
[c00000000c933c20] c0000000001f8320 .sysfs_write_file+0x124/0x18c
[c00000000c933ce0] c00000000017edac .vfs_write+0xd4/0x1fc
[c00000000c933d80] c00000000017efdc .SyS_write+0x58/0xa0
[c00000000c933e30] c0000000000085b4 syscall_exit+0x0/0x40
--- Exception: c01 (System Call) at 00000fff9fa8a8f8
SP (fffe7aef200) is in userspace
0:mon> e
cpu 0x0: Vector: 100 (System Reset) at [c00000000c9333d0]
pc: c0000000003433d8: .find_next_bit+0x54/0xc4
lr: c000000000342f10: .cpumask_next_and+0x4c/0x94
sp: c00000000c933650
msr: 8000000000089032
current = 0xc00000000c173840
paca = 0xc000000000bc2600
pid = 2602, comm = hotplug06.top.s

Last few messages from the dmesg log shows

0:mon>
<4>IRQ 17 affinity broken off cpu 0
<4>IRQ 18 affinity broken off cpu 0
<4>IRQ 19 affinity broken off cpu 0
<4>IRQ 264 affinity broken off cpu 0
<4>cpu 0 (hwid 0) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[0]
<4>Processor 0 found.
<4>IRQ 17 affinity broken off cpu 1
<4>IRQ 18 affinity broken off cpu 1
<4>IRQ 19 affinity broken off cpu 1
<4>IRQ 264 affinity broken off cpu 1
<4>cpu 1 (hwid 1) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<4>cpu 1 (hwid 1) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<4>cpu 1 (hwid 1) Ready to die...
<6>process 2423 (bash) no longer affine to cpu1
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<4>cpu 1 (hwid 1) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<4>cpu 1 (hwid 1) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
<4>Processor 1 found.
<4>cpu 1 (hwid 1) Ready to die...
<3>INFO: RCU detected CPU 0 stall (t=1000 jiffies)
<3>INFO: RCU detected CPU 0 stall (t=4000 jiffies)
0:mon>

After some debugging a possible suspect seems to be commit
6ad4c18.. : sched: Fix balance vs hotplug race

If i revert this patch i am able to execute the tests on this
power6 without any issues.

But at the same time the above patch is required to solve the
cpu hotplug related race on x86_64(as a side note this same
x86_64 issue can be recreated against latest Linus git as well)
that i reported here :

http://marc.info/?l=linux-kernel&m=125802682922299&w=2

I will try few more iterations with and without the above
patch just to make sure i have the correct results.

If someone has a suggestion let me know.

Thanks
-Sachin


--

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------


2009-12-14 02:52:31

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [Next] CPU Hotplug test failures on powerpc

On Fri, 2009-12-11 at 16:23 +0530, Sachin Sant wrote:
> While executing cpu_hotplug(from autotest) tests against latest
> next on a power6 box, the machine locks up. A soft reset shows
> the following trace

Have you heard anything about that one yet or it's still to be
debugged ? It probably hit upstream by now.

Cheers,
Ben.

> cpu 0x0: Vector: 100 (System Reset) at [c00000000c9333d0]
> pc: c0000000003433d8: .find_next_bit+0x54/0xc4
> lr: c000000000342f10: .cpumask_next_and+0x4c/0x94
> sp: c00000000c933650
> msr: 8000000000089032
> current = 0xc00000000c173840
> paca = 0xc000000000bc2600
> pid = 2602, comm = hotplug06.top.s
> enter ? for help
> [link register ] c000000000342f10 .cpumask_next_and+0x4c/0x94
> [c00000000c933650] c0000000000e9f34 .cpuset_cpus_allowed_locked+0x38/0x74 (unreliable)
> [c00000000c9336e0] c000000000090074 .move_task_off_dead_cpu+0xc4/0x1ac
> [c00000000c9337a0] c0000000005e4e5c .migration_call+0x304/0x830
> [c00000000c933880] c0000000005e0880 .notifier_call_chain+0x68/0xe0
> [c00000000c933920] c00000000012a92c ._cpu_down+0x210/0x34c
> [c00000000c933a90] c00000000012aad8 .cpu_down+0x70/0xa8
> [c00000000c933b20] c000000000525940 .store_online+0x54/0x894
> [c00000000c933bb0] c000000000463430 .sysdev_store+0x3c/0x50
> [c00000000c933c20] c0000000001f8320 .sysfs_write_file+0x124/0x18c
> [c00000000c933ce0] c00000000017edac .vfs_write+0xd4/0x1fc
> [c00000000c933d80] c00000000017efdc .SyS_write+0x58/0xa0
> [c00000000c933e30] c0000000000085b4 syscall_exit+0x0/0x40
> --- Exception: c01 (System Call) at 00000fff9fa8a8f8
> SP (fffe7aef200) is in userspace
> 0:mon> e
> cpu 0x0: Vector: 100 (System Reset) at [c00000000c9333d0]
> pc: c0000000003433d8: .find_next_bit+0x54/0xc4
> lr: c000000000342f10: .cpumask_next_and+0x4c/0x94
> sp: c00000000c933650
> msr: 8000000000089032
> current = 0xc00000000c173840
> paca = 0xc000000000bc2600
> pid = 2602, comm = hotplug06.top.s
>
> Last few messages from the dmesg log shows
>
> 0:mon>
> <4>IRQ 17 affinity broken off cpu 0
> <4>IRQ 18 affinity broken off cpu 0
> <4>IRQ 19 affinity broken off cpu 0
> <4>IRQ 264 affinity broken off cpu 0
> <4>cpu 0 (hwid 0) Ready to die...
> <7>clockevent: decrementer mult[83126e97] shift[32] cpu[0]
> <4>Processor 0 found.
> <4>IRQ 17 affinity broken off cpu 1
> <4>IRQ 18 affinity broken off cpu 1
> <4>IRQ 19 affinity broken off cpu 1
> <4>IRQ 264 affinity broken off cpu 1
> <4>cpu 1 (hwid 1) Ready to die...
> <7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
> <4>Processor 1 found.
> <4>cpu 1 (hwid 1) Ready to die...
> <7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
> <4>Processor 1 found.
> <4>cpu 1 (hwid 1) Ready to die...
> <6>process 2423 (bash) no longer affine to cpu1
> <7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
> <4>Processor 1 found.
> <4>cpu 1 (hwid 1) Ready to die...
> <7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
> <4>Processor 1 found.
> <4>cpu 1 (hwid 1) Ready to die...
> <7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
> <4>Processor 1 found.
> <4>cpu 1 (hwid 1) Ready to die...
> <3>INFO: RCU detected CPU 0 stall (t=1000 jiffies)
> <3>INFO: RCU detected CPU 0 stall (t=4000 jiffies)
> 0:mon>
>
> After some debugging a possible suspect seems to be commit
> 6ad4c18.. : sched: Fix balance vs hotplug race
>
> If i revert this patch i am able to execute the tests on this
> power6 without any issues.
>
> But at the same time the above patch is required to solve the
> cpu hotplug related race on x86_64(as a side note this same
> x86_64 issue can be recreated against latest Linus git as well)
> that i reported here :
>
> http://marc.info/?l=linux-kernel&m=125802682922299&w=2
>
> I will try few more iterations with and without the above
> patch just to make sure i have the correct results.
>
> If someone has a suggestion let me know.
>
> Thanks
> -Sachin
>
>

2009-12-14 04:37:40

by Sachin Sant

[permalink] [raw]
Subject: Re: [Next] CPU Hotplug test failures on powerpc

Benjamin Herrenschmidt wrote:
> On Fri, 2009-12-11 at 16:23 +0530, Sachin Sant wrote:
>
>> While executing cpu_hotplug(from autotest) tests against latest
>> next on a power6 box, the machine locks up. A soft reset shows
>> the following trace
>>
>
> Have you heard anything about that one yet or it's still to be
> debugged ? It probably hit upstream by now.
>
Haven't received any response yet.

As you mentioned that patch went upstream and so did the problem.

thanks
-Sachin

> Cheers,
> Ben.
>
>
>> cpu 0x0: Vector: 100 (System Reset) at [c00000000c9333d0]
>> pc: c0000000003433d8: .find_next_bit+0x54/0xc4
>> lr: c000000000342f10: .cpumask_next_and+0x4c/0x94
>> sp: c00000000c933650
>> msr: 8000000000089032
>> current = 0xc00000000c173840
>> paca = 0xc000000000bc2600
>> pid = 2602, comm = hotplug06.top.s
>> enter ? for help
>> [link register ] c000000000342f10 .cpumask_next_and+0x4c/0x94
>> [c00000000c933650] c0000000000e9f34 .cpuset_cpus_allowed_locked+0x38/0x74 (unreliable)
>> [c00000000c9336e0] c000000000090074 .move_task_off_dead_cpu+0xc4/0x1ac
>> [c00000000c9337a0] c0000000005e4e5c .migration_call+0x304/0x830
>> [c00000000c933880] c0000000005e0880 .notifier_call_chain+0x68/0xe0
>> [c00000000c933920] c00000000012a92c ._cpu_down+0x210/0x34c
>> [c00000000c933a90] c00000000012aad8 .cpu_down+0x70/0xa8
>> [c00000000c933b20] c000000000525940 .store_online+0x54/0x894
>> [c00000000c933bb0] c000000000463430 .sysdev_store+0x3c/0x50
>> [c00000000c933c20] c0000000001f8320 .sysfs_write_file+0x124/0x18c
>> [c00000000c933ce0] c00000000017edac .vfs_write+0xd4/0x1fc
>> [c00000000c933d80] c00000000017efdc .SyS_write+0x58/0xa0
>> [c00000000c933e30] c0000000000085b4 syscall_exit+0x0/0x40
>> --- Exception: c01 (System Call) at 00000fff9fa8a8f8
>> SP (fffe7aef200) is in userspace
>> 0:mon> e
>> cpu 0x0: Vector: 100 (System Reset) at [c00000000c9333d0]
>> pc: c0000000003433d8: .find_next_bit+0x54/0xc4
>> lr: c000000000342f10: .cpumask_next_and+0x4c/0x94
>> sp: c00000000c933650
>> msr: 8000000000089032
>> current = 0xc00000000c173840
>> paca = 0xc000000000bc2600
>> pid = 2602, comm = hotplug06.top.s
>>
>> Last few messages from the dmesg log shows
>>
>> 0:mon>
>> <4>IRQ 17 affinity broken off cpu 0
>> <4>IRQ 18 affinity broken off cpu 0
>> <4>IRQ 19 affinity broken off cpu 0
>> <4>IRQ 264 affinity broken off cpu 0
>> <4>cpu 0 (hwid 0) Ready to die...
>> <7>clockevent: decrementer mult[83126e97] shift[32] cpu[0]
>> <4>Processor 0 found.
>> <4>IRQ 17 affinity broken off cpu 1
>> <4>IRQ 18 affinity broken off cpu 1
>> <4>IRQ 19 affinity broken off cpu 1
>> <4>IRQ 264 affinity broken off cpu 1
>> <4>cpu 1 (hwid 1) Ready to die...
>> <7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
>> <4>Processor 1 found.
>> <4>cpu 1 (hwid 1) Ready to die...
>> <7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
>> <4>Processor 1 found.
>> <4>cpu 1 (hwid 1) Ready to die...
>> <6>process 2423 (bash) no longer affine to cpu1
>> <7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
>> <4>Processor 1 found.
>> <4>cpu 1 (hwid 1) Ready to die...
>> <7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
>> <4>Processor 1 found.
>> <4>cpu 1 (hwid 1) Ready to die...
>> <7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
>> <4>Processor 1 found.
>> <4>cpu 1 (hwid 1) Ready to die...
>> <3>INFO: RCU detected CPU 0 stall (t=1000 jiffies)
>> <3>INFO: RCU detected CPU 0 stall (t=4000 jiffies)
>> 0:mon>
>>
>> After some debugging a possible suspect seems to be commit
>> 6ad4c18.. : sched: Fix balance vs hotplug race
>>
>> If i revert this patch i am able to execute the tests on this
>> power6 without any issues.
>>
>> But at the same time the above patch is required to solve the
>> cpu hotplug related race on x86_64(as a side note this same
>> x86_64 issue can be recreated against latest Linus git as well)
>> that i reported here :
>>
>> http://marc.info/?l=linux-kernel&m=125802682922299&w=2
>>
>> I will try few more iterations with and without the above
>> patch just to make sure i have the correct results.
>>
>> If someone has a suggestion let me know.
>>
>> Thanks
>> -Sachin
>>
>>
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-next" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>


--

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------

2009-12-14 10:22:15

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Next] CPU Hotplug test failures on powerpc

On Fri, 2009-12-11 at 16:23 +0530, Sachin Sant wrote:
> While executing cpu_hotplug(from autotest) tests against latest
> next on a power6 box, the machine locks up. A soft reset shows
> the following trace
>
> cpu 0x0: Vector: 100 (System Reset) at [c00000000c9333d0]
> pc: c0000000003433d8: .find_next_bit+0x54/0xc4
> lr: c000000000342f10: .cpumask_next_and+0x4c/0x94
> sp: c00000000c933650
> msr: 8000000000089032
> current = 0xc00000000c173840
> paca = 0xc000000000bc2600
> pid = 2602, comm = hotplug06.top.s
> enter ? for help
> [link register ] c000000000342f10 .cpumask_next_and+0x4c/0x94
> [c00000000c933650] c0000000000e9f34 .cpuset_cpus_allowed_locked+0x38/0x74 (unreliable)
> [c00000000c9336e0] c000000000090074 .move_task_off_dead_cpu+0xc4/0x1ac
> [c00000000c9337a0] c0000000005e4e5c .migration_call+0x304/0x830
> [c00000000c933880] c0000000005e0880 .notifier_call_chain+0x68/0xe0
> [c00000000c933920] c00000000012a92c ._cpu_down+0x210/0x34c
> [c00000000c933a90] c00000000012aad8 .cpu_down+0x70/0xa8
> [c00000000c933b20] c000000000525940 .store_online+0x54/0x894
> [c00000000c933bb0] c000000000463430 .sysdev_store+0x3c/0x50
> [c00000000c933c20] c0000000001f8320 .sysfs_write_file+0x124/0x18c
> [c00000000c933ce0] c00000000017edac .vfs_write+0xd4/0x1fc
> [c00000000c933d80] c00000000017efdc .SyS_write+0x58/0xa0
> [c00000000c933e30] c0000000000085b4 syscall_exit+0x0/0x40
> --- Exception: c01 (System Call) at 00000fff9fa8a8f8
> SP (fffe7aef200) is in userspace
> 0:mon> e
> cpu 0x0: Vector: 100 (System Reset) at [c00000000c9333d0]
> pc: c0000000003433d8: .find_next_bit+0x54/0xc4
> lr: c000000000342f10: .cpumask_next_and+0x4c/0x94
> sp: c00000000c933650
> msr: 8000000000089032
> current = 0xc00000000c173840
> paca = 0xc000000000bc2600
> pid = 2602, comm = hotplug06.top.s
>
> Last few messages from the dmesg log shows


> After some debugging a possible suspect seems to be commit
> 6ad4c18.. : sched: Fix balance vs hotplug race


Oh, wonderful :-/

So what is that thing whining about? Not being able to read a cpumask or
something?

Does your .config have cpusets enabled (there's a different
cpuset_cpus_allowed_locked implementation depending on that)?

I know of at least one remaining race and am working on closing that,
but I'm not sure I can explain this crash with that.

2009-12-14 11:12:16

by Sachin Sant

[permalink] [raw]
Subject: Re: [Next] CPU Hotplug test failures on powerpc

Peter Zijlstra wrote:
> On Fri, 2009-12-11 at 16:23 +0530, Sachin Sant wrote:
>
>> While executing cpu_hotplug(from autotest) tests against latest
>> next on a power6 box, the machine locks up. A soft reset shows
>> the following trace
>>
>> cpu 0x0: Vector: 100 (System Reset) at [c00000000c9333d0]
>> pc: c0000000003433d8: .find_next_bit+0x54/0xc4
>> lr: c000000000342f10: .cpumask_next_and+0x4c/0x94
>> sp: c00000000c933650
>> msr: 8000000000089032
>> current = 0xc00000000c173840
>> paca = 0xc000000000bc2600
>> pid = 2602, comm = hotplug06.top.s
>> enter ? for help
>> [link register ] c000000000342f10 .cpumask_next_and+0x4c/0x94
>> [c00000000c933650] c0000000000e9f34 .cpuset_cpus_allowed_locked+0x38/0x74 (unreliable)
>> [c00000000c9336e0] c000000000090074 .move_task_off_dead_cpu+0xc4/0x1ac
>> [c00000000c9337a0] c0000000005e4e5c .migration_call+0x304/0x830
>> [c00000000c933880] c0000000005e0880 .notifier_call_chain+0x68/0xe0
>> [c00000000c933920] c00000000012a92c ._cpu_down+0x210/0x34c
>> [c00000000c933a90] c00000000012aad8 .cpu_down+0x70/0xa8
>> [c00000000c933b20] c000000000525940 .store_online+0x54/0x894
>> [c00000000c933bb0] c000000000463430 .sysdev_store+0x3c/0x50
>> [c00000000c933c20] c0000000001f8320 .sysfs_write_file+0x124/0x18c
>> [c00000000c933ce0] c00000000017edac .vfs_write+0xd4/0x1fc
>> [c00000000c933d80] c00000000017efdc .SyS_write+0x58/0xa0
>> [c00000000c933e30] c0000000000085b4 syscall_exit+0x0/0x40
>> --- Exception: c01 (System Call) at 00000fff9fa8a8f8
>> SP (fffe7aef200) is in userspace
>> 0:mon> e
>> cpu 0x0: Vector: 100 (System Reset) at [c00000000c9333d0]
>> pc: c0000000003433d8: .find_next_bit+0x54/0xc4
>> lr: c000000000342f10: .cpumask_next_and+0x4c/0x94
>> sp: c00000000c933650
>> msr: 8000000000089032
>> current = 0xc00000000c173840
>> paca = 0xc000000000bc2600
>> pid = 2602, comm = hotplug06.top.s
>>
>> Last few messages from the dmesg log shows
>>
>
>
>
>> After some debugging a possible suspect seems to be commit
>> 6ad4c18.. : sched: Fix balance vs hotplug race
>>
>
>
> Oh, wonderful :-/
>
> So what is that thing whining about? Not being able to read a cpumask or
> something?
>
> Does your .config have cpusets enabled (there's a different
> cpuset_cpus_allowed_locked implementation depending on that)?
>
Yes CPUSETS config is enabled. I have attached the config.

Thanks
-Sachin

> I know of at least one remaining race and am working on closing that,
> but I'm not sure I can explain this crash with that.
>
>


--

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------


Attachments:
config_cpu_hotplug.gz (19.37 kB)

2009-12-14 12:21:48

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Next] CPU Hotplug test failures on powerpc

On Mon, 2009-12-14 at 16:41 +0530, Sachin Sant wrote:
> Peter Zijlstra wrote:
> > On Fri, 2009-12-11 at 16:23 +0530, Sachin Sant wrote:
> >
> >> While executing cpu_hotplug(from autotest) tests against latest
> >> next on a power6 box, the machine locks up. A soft reset shows
> >> the following trace
> >>
> >> cpu 0x0: Vector: 100 (System Reset) at [c00000000c9333d0]
> >> pc: c0000000003433d8: .find_next_bit+0x54/0xc4
> >> lr: c000000000342f10: .cpumask_next_and+0x4c/0x94
> >> sp: c00000000c933650
> >> msr: 8000000000089032
> >> current = 0xc00000000c173840
> >> paca = 0xc000000000bc2600
> >> pid = 2602, comm = hotplug06.top.s
> >> enter ? for help
> >> [link register ] c000000000342f10 .cpumask_next_and+0x4c/0x94
> >> [c00000000c933650] c0000000000e9f34 .cpuset_cpus_allowed_locked+0x38/0x74 (unreliable)
> >> [c00000000c9336e0] c000000000090074 .move_task_off_dead_cpu+0xc4/0x1ac
> >> [c00000000c9337a0] c0000000005e4e5c .migration_call+0x304/0x830
> >> [c00000000c933880] c0000000005e0880 .notifier_call_chain+0x68/0xe0
> >> [c00000000c933920] c00000000012a92c ._cpu_down+0x210/0x34c
> >> [c00000000c933a90] c00000000012aad8 .cpu_down+0x70/0xa8
> >> [c00000000c933b20] c000000000525940 .store_online+0x54/0x894
> >> [c00000000c933bb0] c000000000463430 .sysdev_store+0x3c/0x50
> >> [c00000000c933c20] c0000000001f8320 .sysfs_write_file+0x124/0x18c
> >> [c00000000c933ce0] c00000000017edac .vfs_write+0xd4/0x1fc
> >> [c00000000c933d80] c00000000017efdc .SyS_write+0x58/0xa0
> >> [c00000000c933e30] c0000000000085b4 syscall_exit+0x0/0x40
> >> --- Exception: c01 (System Call) at 00000fff9fa8a8f8
> >> SP (fffe7aef200) is in userspace
> >> 0:mon> e
> >> cpu 0x0: Vector: 100 (System Reset) at [c00000000c9333d0]
> >> pc: c0000000003433d8: .find_next_bit+0x54/0xc4
> >> lr: c000000000342f10: .cpumask_next_and+0x4c/0x94
> >> sp: c00000000c933650
> >> msr: 8000000000089032
> >> current = 0xc00000000c173840
> >> paca = 0xc000000000bc2600
> >> pid = 2602, comm = hotplug06.top.s
> >>

OK so how do I read that above thing? What's a System Reset? Is that
like the x86 triple fault thing?

>From what I can make of it, its in move_task_off_dead_cpu(), right after
having called cpuset_cpus_allowed_locked(), doing that cpumask_any_and()
call.

static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
{
int dest_cpu;
const struct cpumask *nodemask = cpumask_of_node(cpu_to_node(dead_cpu));

again:
/* Look for allowed, online CPU in same node. */
for_each_cpu_and(dest_cpu, nodemask, cpu_active_mask)
if (cpumask_test_cpu(dest_cpu, &p->cpus_allowed))
goto move;

/* Any allowed, online CPU? */
dest_cpu = cpumask_any_and(&p->cpus_allowed, cpu_active_mask);
if (dest_cpu < nr_cpu_ids)
goto move;

/* No more Mr. Nice Guy. */
if (dest_cpu >= nr_cpu_ids) {
cpuset_cpus_allowed_locked(p, &p->cpus_allowed);
====> dest_cpu = cpumask_any_and(cpu_active_mask, &p->cpus_allowed);

/*
* Don't tell them about moving exiting tasks or
* kernel threads (both mm NULL), since they never
* leave kernel.
*/
if (p->mm && printk_ratelimit()) {
pr_info("process %d (%s) no longer affine to cpu%d\n",
task_pid_nr(p), p->comm, dead_cpu);
}
}

move:
/* It can have affinity changed while we were choosing. */
if (unlikely(!__migrate_task_irq(p, dead_cpu, dest_cpu)))
goto again;
}

Both masks, p->cpus_allowed and cpu_active_mask are stable in that p
won't go away since we hold the tasklist_lock (in migrate_list_tasks),
and cpu_active_mask is static storage, so WTH is it going funny on?

2009-12-14 21:18:52

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [Next] CPU Hotplug test failures on powerpc

On Mon, 2009-12-14 at 13:19 +0100, Peter Zijlstra wrote:

> > >> cpu 0x0: Vector: 100 (System Reset) at [c00000000c9333d0]
> > >> pc: c0000000003433d8: .find_next_bit+0x54/0xc4
> > >> lr: c000000000342f10: .cpumask_next_and+0x4c/0x94
> > >> sp: c00000000c933650
> > >> msr: 8000000000089032
> > >> current = 0xc00000000c173840
> > >> paca = 0xc000000000bc2600
> > >> pid = 2602, comm = hotplug06.top.s
> > >> enter ? for help
> > >> [link register ] c000000000342f10 .cpumask_next_and+0x4c/0x94
> > >> [c00000000c933650] c0000000000e9f34 .cpuset_cpus_allowed_locked+0x38/0x74 (unreliable)
> > >> [c00000000c9336e0] c000000000090074 .move_task_off_dead_cpu+0xc4/0x1ac
> > >> [c00000000c9337a0] c0000000005e4e5c .migration_call+0x304/0x830
> > >> [c00000000c933880] c0000000005e0880 .notifier_call_chain+0x68/0xe0
> > >> [c00000000c933920] c00000000012a92c ._cpu_down+0x210/0x34c
> > >> [c00000000c933a90] c00000000012aad8 .cpu_down+0x70/0xa8
> > >> [c00000000c933b20] c000000000525940 .store_online+0x54/0x894
> > >> [c00000000c933bb0] c000000000463430 .sysdev_store+0x3c/0x50
> > >> [c00000000c933c20] c0000000001f8320 .sysfs_write_file+0x124/0x18c
> > >> [c00000000c933ce0] c00000000017edac .vfs_write+0xd4/0x1fc
> > >> [c00000000c933d80] c00000000017efdc .SyS_write+0x58/0xa0
> > >> [c00000000c933e30] c0000000000085b4 syscall_exit+0x0/0x40
> > >> --- Exception: c01 (System Call) at 00000fff9fa8a8f8
> > >> SP (fffe7aef200) is in userspace
> > >> 0:mon> e
> > >> cpu 0x0: Vector: 100 (System Reset) at [c00000000c9333d0]
> > >> pc: c0000000003433d8: .find_next_bit+0x54/0xc4
> > >> lr: c000000000342f10: .cpumask_next_and+0x4c/0x94
> > >> sp: c00000000c933650
> > >> msr: 8000000000089032
> > >> current = 0xc00000000c173840
> > >> paca = 0xc000000000bc2600
> > >> pid = 2602, comm = hotplug06.top.s
> > >>
>
> OK so how do I read that above thing? What's a System Reset? Is that
> like the x86 triple fault thing?

Nah, it's an NMI that throws you into xmon. Basically, the machine was
hung and Sachin interrupted it with an NMI to see what was going on. The
above is the backtrace. It was at the moment of the NMI inside
find_next_bit() called from cpumask_next_and() etc...

> >From what I can make of it, its in move_task_off_dead_cpu(), right after
> having called cpuset_cpus_allowed_locked(), doing that cpumask_any_and()
> call.

Yes, it looks like it.

> static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
> {
> int dest_cpu;
> const struct cpumask *nodemask = cpumask_of_node(cpu_to_node(dead_cpu));
>
> again:
> /* Look for allowed, online CPU in same node. */
> for_each_cpu_and(dest_cpu, nodemask, cpu_active_mask)
> if (cpumask_test_cpu(dest_cpu, &p->cpus_allowed))
> goto move;
>
> /* Any allowed, online CPU? */
> dest_cpu = cpumask_any_and(&p->cpus_allowed, cpu_active_mask);
> if (dest_cpu < nr_cpu_ids)
> goto move;
>
> /* No more Mr. Nice Guy. */
> if (dest_cpu >= nr_cpu_ids) {
> cpuset_cpus_allowed_locked(p, &p->cpus_allowed);
> ====> dest_cpu = cpumask_any_and(cpu_active_mask, &p->cpus_allowed);
>
> /*
> * Don't tell them about moving exiting tasks or
> * kernel threads (both mm NULL), since they never
> * leave kernel.
> */
> if (p->mm && printk_ratelimit()) {
> pr_info("process %d (%s) no longer affine to cpu%d\n",
> task_pid_nr(p), p->comm, dead_cpu);
> }
> }
>
> move:
> /* It can have affinity changed while we were choosing. */
> if (unlikely(!__migrate_task_irq(p, dead_cpu, dest_cpu)))
> goto again;
> }
>
> Both masks, p->cpus_allowed and cpu_active_mask are stable in that p
> won't go away since we hold the tasklist_lock (in migrate_list_tasks),
> and cpu_active_mask is static storage, so WTH is it going funny on?

Sachin, this is 100% reproduceable right ? You should be able to
sprinkle it with some xmon_printf() (rather than printk, just add a
prototype extern void xmon_printf(const char *fmt,...); somewhere, this
has the advantage of being fully synchronous and will print out even if
the printk sem is held.

Cheers,
Ben.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2009-12-15 09:44:20

by Sachin Sant

[permalink] [raw]
Subject: Re: [Next] CPU Hotplug test failures on powerpc

Benjamin Herrenschmidt wrote:
>> static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
>> {
>> int dest_cpu;
>> const struct cpumask *nodemask = cpumask_of_node(cpu_to_node(dead_cpu));
>>
>> again:
>> /* Look for allowed, online CPU in same node. */
>> for_each_cpu_and(dest_cpu, nodemask, cpu_active_mask)
>> if (cpumask_test_cpu(dest_cpu, &p->cpus_allowed))
>> goto move;
>>
>> /* Any allowed, online CPU? */
>> dest_cpu = cpumask_any_and(&p->cpus_allowed, cpu_active_mask);
>> if (dest_cpu < nr_cpu_ids)
>> goto move;
>>
>> /* No more Mr. Nice Guy. */
>> if (dest_cpu >= nr_cpu_ids) {
>> cpuset_cpus_allowed_locked(p, &p->cpus_allowed);
>> ====> dest_cpu = cpumask_any_and(cpu_active_mask, &p->cpus_allowed);
>>
>> /*
>> * Don't tell them about moving exiting tasks or
>> * kernel threads (both mm NULL), since they never
>> * leave kernel.
>> */
>> if (p->mm && printk_ratelimit()) {
>> pr_info("process %d (%s) no longer affine to cpu%d\n",
>> task_pid_nr(p), p->comm, dead_cpu);
>> }
>> }
>>
>> move:
>> /* It can have affinity changed while we were choosing. */
>> if (unlikely(!__migrate_task_irq(p, dead_cpu, dest_cpu)))
>> goto again;
>> }
>>
>> Both masks, p->cpus_allowed and cpu_active_mask are stable in that p
>> won't go away since we hold the tasklist_lock (in migrate_list_tasks),
>> and cpu_active_mask is static storage, so WTH is it going funny on?
>>
I added some debug statements within the above code.
This is a 2 cpu machine.

XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
XMON dest_cpu = 1024
XMON dest_cpu = 1024 . dead_cpu = 1
XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
XMON dest_cpu = 1024
XMON dest_cpu = 1024 . dead_cpu = 1
XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
XMON dest_cpu = 1024
XMON dest_cpu = 1024 . dead_cpu = 1

Seems to me that the control is stuck in an infinite loop and hence the
machine appears to be in hung state. The dest_cpu value is always 1024
and never changes, which result in an infinite loop.

In working scenario the o/p is something on the following lines

XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
XMON dest_cpu = 0
XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
XMON dest_cpu = 0
XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
XMON dest_cpu = 0

Let me know if i should try to record any specific value ?

Thanks
-Sachin

--

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------

2009-12-15 10:43:56

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Next] CPU Hotplug test failures on powerpc

On Tue, 2009-12-15 at 15:14 +0530, Sachin Sant wrote:
> Benjamin Herrenschmidt wrote:
> >> static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
> >> {
> >> int dest_cpu;
> >> const struct cpumask *nodemask = cpumask_of_node(cpu_to_node(dead_cpu));
> >>
> >> again:
> >> /* Look for allowed, online CPU in same node. */
> >> for_each_cpu_and(dest_cpu, nodemask, cpu_active_mask)
> >> if (cpumask_test_cpu(dest_cpu, &p->cpus_allowed))
> >> goto move;
> >>
> >> /* Any allowed, online CPU? */
> >> dest_cpu = cpumask_any_and(&p->cpus_allowed, cpu_active_mask);
> >> if (dest_cpu < nr_cpu_ids)
> >> goto move;
> >>
> >> /* No more Mr. Nice Guy. */
> >> if (dest_cpu >= nr_cpu_ids) {
> >> cpuset_cpus_allowed_locked(p, &p->cpus_allowed);
> >> ====> dest_cpu = cpumask_any_and(cpu_active_mask, &p->cpus_allowed);
> >>
> >> /*
> >> * Don't tell them about moving exiting tasks or
> >> * kernel threads (both mm NULL), since they never
> >> * leave kernel.
> >> */
> >> if (p->mm && printk_ratelimit()) {
> >> pr_info("process %d (%s) no longer affine to cpu%d\n",
> >> task_pid_nr(p), p->comm, dead_cpu);
> >> }
> >> }
> >>
> >> move:
> >> /* It can have affinity changed while we were choosing. */
> >> if (unlikely(!__migrate_task_irq(p, dead_cpu, dest_cpu)))
> >> goto again;
> >> }
> >>
> >> Both masks, p->cpus_allowed and cpu_active_mask are stable in that p
> >> won't go away since we hold the tasklist_lock (in migrate_list_tasks),
> >> and cpu_active_mask is static storage, so WTH is it going funny on?
> >>
> I added some debug statements within the above code.
> This is a 2 cpu machine.
>
> XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
> XMON dest_cpu = 1024
> XMON dest_cpu = 1024 . dead_cpu = 1
> XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
> XMON dest_cpu = 1024
> XMON dest_cpu = 1024 . dead_cpu = 1
> XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
> XMON dest_cpu = 1024
> XMON dest_cpu = 1024 . dead_cpu = 1
>
> Seems to me that the control is stuck in an infinite loop and hence the
> machine appears to be in hung state. The dest_cpu value is always 1024
> and never changes, which result in an infinite loop.
>
> In working scenario the o/p is something on the following lines
>
> XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
> XMON dest_cpu = 0
> XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
> XMON dest_cpu = 0
> XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
> XMON dest_cpu = 0
>
> Let me know if i should try to record any specific value ?

Could you possibly print the two masks themselves? cpumask_scnprintf()
and friend come in handy for this.

The dest_cpu=1024 thing seem to suggest the intersection between
p->cpus_allowed and cpu_active_mask is empty for some reason, even
though we forcefully reset p->cpus_allowed to the full set using
cpuset_cpus_allowed_locked().

/me goes re-read the cpu_active_map code, this really shouldn't happen.

2009-12-15 13:47:39

by Sachin Sant

[permalink] [raw]
Subject: Re: [Next] CPU Hotplug test failures on powerpc

Peter Zijlstra wrote:
>> I added some debug statements within the above code.
>> This is a 2 cpu machine.
>>
>> XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
>> XMON dest_cpu = 1024
>> XMON dest_cpu = 1024 . dead_cpu = 1
>> XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
>> XMON dest_cpu = 1024
>> XMON dest_cpu = 1024 . dead_cpu = 1
>> XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
>> XMON dest_cpu = 1024
>> XMON dest_cpu = 1024 . dead_cpu = 1
>>
>> Seems to me that the control is stuck in an infinite loop and hence the
>> machine appears to be in hung state. The dest_cpu value is always 1024
>> and never changes, which result in an infinite loop.
>>
>> In working scenario the o/p is something on the following lines
>>
>> XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
>> XMON dest_cpu = 0
>> XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
>> XMON dest_cpu = 0
>> XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
>> XMON dest_cpu = 0
>>
>> Let me know if i should try to record any specific value ?
>>
>
> Could you possibly print the two masks themselves? cpumask_scnprintf()
> and friend come in handy for this.
>
> The dest_cpu=1024 thing seem to suggest the intersection between
> p->cpus_allowed and cpu_active_mask is empty for some reason, even
> though we forcefully reset p->cpus_allowed to the full set using
> cpuset_cpus_allowed_locked().
>
So here is the data related to the two masks.

cpu_active_mask = 00000000,00000000,00000000,00000000,00000000,00000000,
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
00000000,00000000
XMON dest_cpu = 1024

while p->cpus_allowed = 00000000,00000000,00000000,00000000,00000000,
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
00000000,00000000,00000001
XMON dest_cpu = 1024

In working scenario the above data looks like

cpu_active_mask = 00000000,00000000,00000000,00000000,00000000,00000000,
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
00000000,00000002
XMON dest_cpu = 1

while p->cpus_allowed = 00000000,00000000,00000000,00000000,00000000,
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
00000000,00000000,00000002
XMON dest_cpu = 1


hope i got the data correct.

Thanks
-Sachin


--

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------

2009-12-15 15:03:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Next] CPU Hotplug test failures on powerpc


Could you try the below?

---
init/main.c | 7 +------
1 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/init/main.c b/init/main.c
index 4051d75..4be7de2 100644
--- a/init/main.c
+++ b/init/main.c
@@ -369,12 +369,6 @@ static void __init smp_init(void)
{
unsigned int cpu;

- /*
- * Set up the current CPU as possible to migrate to.
- * The other ones will be done by cpu_up/cpu_down()
- */
- set_cpu_active(smp_processor_id(), true);
-
/* FIXME: This should be done in userspace --RR */
for_each_present_cpu(cpu) {
if (num_online_cpus() >= setup_max_cpus)
@@ -486,6 +480,7 @@ static void __init boot_cpu_init(void)
int cpu = smp_processor_id();
/* Mark the boot cpu "present", "online" etc for SMP and UP case */
set_cpu_online(cpu, true);
+ set_cpu_active(cpu, true);
set_cpu_present(cpu, true);
set_cpu_possible(cpu, true);
}

2009-12-16 05:38:21

by Sachin Sant

[permalink] [raw]
Subject: Re: [Next] CPU Hotplug test failures on powerpc

Peter Zijlstra wrote:
> Could you try the below?
>
No luck. Still the same issue. The mask values don't change.

Thanks
-Sachin

> ---
> init/main.c | 7 +------
> 1 files changed, 1 insertions(+), 6 deletions(-)
>
> diff --git a/init/main.c b/init/main.c
> index 4051d75..4be7de2 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -369,12 +369,6 @@ static void __init smp_init(void)
> {
> unsigned int cpu;
>
> - /*
> - * Set up the current CPU as possible to migrate to.
> - * The other ones will be done by cpu_up/cpu_down()
> - */
> - set_cpu_active(smp_processor_id(), true);
> -
> /* FIXME: This should be done in userspace --RR */
> for_each_present_cpu(cpu) {
> if (num_online_cpus() >= setup_max_cpus)
> @@ -486,6 +480,7 @@ static void __init boot_cpu_init(void)
> int cpu = smp_processor_id();
> /* Mark the boot cpu "present", "online" etc for SMP and UP case */
> set_cpu_online(cpu, true);
> + set_cpu_active(cpu, true);
> set_cpu_present(cpu, true);
> set_cpu_possible(cpu, true);
> }
>
>
>


--

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------

2009-12-16 06:25:22

by Xiaotian Feng

[permalink] [raw]
Subject: Re: [Next] CPU Hotplug test failures on powerpc

On Fri, Dec 11, 2009 at 6:53 PM, Sachin Sant <[email protected]> wrote:
> While executing cpu_hotplug(from autotest) tests against latest
> next on a power6 box, the machine locks up. A soft reset shows
> the following trace
>
> cpu 0x0: Vector: 100 (System Reset) at [c00000000c9333d0]
>   pc: c0000000003433d8: .find_next_bit+0x54/0xc4
>   lr: c000000000342f10: .cpumask_next_and+0x4c/0x94
>   sp: c00000000c933650
>  msr: 8000000000089032
>  current = 0xc00000000c173840
>  paca    = 0xc000000000bc2600
>   pid   = 2602, comm = hotplug06.top.s
> enter ? for help
> [link register   ] c000000000342f10 .cpumask_next_and+0x4c/0x94
> [c00000000c933650] c0000000000e9f34 .cpuset_cpus_allowed_locked+0x38/0x74
> (unreliable)
> [c00000000c9336e0] c000000000090074 .move_task_off_dead_cpu+0xc4/0x1ac
> [c00000000c9337a0] c0000000005e4e5c .migration_call+0x304/0x830
> [c00000000c933880] c0000000005e0880 .notifier_call_chain+0x68/0xe0
> [c00000000c933920] c00000000012a92c ._cpu_down+0x210/0x34c
> [c00000000c933a90] c00000000012aad8 .cpu_down+0x70/0xa8
> [c00000000c933b20] c000000000525940 .store_online+0x54/0x894
> [c00000000c933bb0] c000000000463430 .sysdev_store+0x3c/0x50
> [c00000000c933c20] c0000000001f8320 .sysfs_write_file+0x124/0x18c
> [c00000000c933ce0] c00000000017edac .vfs_write+0xd4/0x1fc
> [c00000000c933d80] c00000000017efdc .SyS_write+0x58/0xa0
> [c00000000c933e30] c0000000000085b4 syscall_exit+0x0/0x40
> --- Exception: c01 (System Call) at 00000fff9fa8a8f8
> SP (fffe7aef200) is in userspace
> 0:mon> e
> cpu 0x0: Vector: 100 (System Reset) at [c00000000c9333d0]
>   pc: c0000000003433d8: .find_next_bit+0x54/0xc4
>   lr: c000000000342f10: .cpumask_next_and+0x4c/0x94
>   sp: c00000000c933650
>  msr: 8000000000089032
>  current = 0xc00000000c173840
>  paca    = 0xc000000000bc2600
>   pid   = 2602, comm = hotplug06.top.s
>

Does this testcase hotplug cpu 0 off?

> Last few messages from the dmesg log shows
>
> 0:mon> <4>IRQ 17 affinity broken off cpu 0
> <4>IRQ 18 affinity broken off cpu 0
> <4>IRQ 19 affinity broken off cpu 0
> <4>IRQ 264 affinity broken off cpu 0
> <4>cpu 0 (hwid 0) Ready to die...
> <7>clockevent: decrementer mult[83126e97] shift[32] cpu[0]
> <4>Processor 0 found.
> <4>IRQ 17 affinity broken off cpu 1
> <4>IRQ 18 affinity broken off cpu 1
> <4>IRQ 19 affinity broken off cpu 1
> <4>IRQ 264 affinity broken off cpu 1
> <4>cpu 1 (hwid 1) Ready to die...
> <7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
> <4>Processor 1 found.
> <4>cpu 1 (hwid 1) Ready to die...
> <7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
> <4>Processor 1 found.
> <4>cpu 1 (hwid 1) Ready to die...
> <6>process 2423 (bash) no longer affine to cpu1
> <7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
> <4>Processor 1 found.
> <4>cpu 1 (hwid 1) Ready to die...
> <7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
> <4>Processor 1 found.
> <4>cpu 1 (hwid 1) Ready to die...
> <7>clockevent: decrementer mult[83126e97] shift[32] cpu[1]
> <4>Processor 1 found.
> <4>cpu 1 (hwid 1) Ready to die...
> <3>INFO: RCU detected CPU 0 stall (t=1000 jiffies)
> <3>INFO: RCU detected CPU 0 stall (t=4000 jiffies)
> 0:mon>
>
> After some debugging a possible suspect seems to be commit
> 6ad4c18.. : sched: Fix balance vs hotplug race
>
> If i revert this patch i am able to execute the tests on this
> power6 without any issues.
> But at the same time the above patch is required to solve the
> cpu hotplug related race on x86_64(as a side note this same
> x86_64 issue can be recreated against latest Linus git as well)
> that i reported here :
>
> http://marc.info/?l=linux-kernel&m=125802682922299&w=2
>
> I will try few more iterations with and without the above
> patch just to make sure i have the correct results.
>
> If someone has a suggestion let me know.
>
> Thanks
> -Sachin
>
>
> --
>
> ---------------------------------
> Sachin Sant
> IBM Linux Technology Center
> India Systems and Technology Labs
> Bangalore, India
> ---------------------------------
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

2009-12-16 06:41:59

by Sachin Sant

[permalink] [raw]
Subject: Re: [Next] CPU Hotplug test failures on powerpc

Xiaotian Feng wrote:
> Does this testcase hotplug cpu 0 off?
>
No, i don't think so. It skips cpu0 during online/offline
process.

thanks
-Sachin

--

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------

2009-12-16 06:45:32

by Xiaotian Feng

[permalink] [raw]
Subject: Re: [Next] CPU Hotplug test failures on powerpc

On Wed, Dec 16, 2009 at 2:41 PM, Sachin Sant <[email protected]> wrote:
> Xiaotian Feng wrote:
>>
>> Does this testcase hotplug cpu 0 off?
>>
>
> No, i don't think so. It skips cpu0 during online/offline
> process.

Then how could this happen ? Looks like cpu 0 is offline ....
0:mon> <4>IRQ 17 affinity broken off cpu 0
<4>IRQ 18 affinity broken off cpu 0
<4>IRQ 19 affinity broken off cpu 0
<4>IRQ 264 affinity broken off cpu 0
<4>cpu 0 (hwid 0) Ready to die...
<7>clockevent: decrementer mult[83126e97] shift[32] cpu[0]


>
> thanks
> -Sachin
>
> --
>
> ---------------------------------
> Sachin Sant
> IBM Linux Technology Center
> India Systems and Technology Labs
> Bangalore, India
> ---------------------------------
>
>

2009-12-16 06:54:17

by Sachin Sant

[permalink] [raw]
Subject: Re: [Next] CPU Hotplug test failures on powerpc

Xiaotian Feng wrote:
> On Wed, Dec 16, 2009 at 2:41 PM, Sachin Sant <[email protected]> wrote:
>
>> Xiaotian Feng wrote:
>>
>>> Does this testcase hotplug cpu 0 off?
>>>
>>>
>> No, i don't think so. It skips cpu0 during online/offline
>> process.
>>
>
> Then how could this happen ? Looks like cpu 0 is offline ....
> 0:mon> <4>IRQ 17 affinity broken off cpu 0
> <4>IRQ 18 affinity broken off cpu 0
> <4>IRQ 19 affinity broken off cpu 0
> <4>IRQ 264 affinity broken off cpu 0
> <4>cpu 0 (hwid 0) Ready to die...
> <7>clockevent: decrementer mult[83126e97] shift[32] cpu[0]
>
Sorry i was looking at only one script. Looking more closely
at the test there are 6 different sub tests. The rest of the
tests do seem to hotplug CPU 0.

Thanks
-Sachin


--

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------

2009-12-16 06:56:10

by Xiaotian Feng

[permalink] [raw]
Subject: Re: [Next] CPU Hotplug test failures on powerpc

On Tue, Dec 15, 2009 at 9:47 PM, Sachin Sant <[email protected]> wrote:
> Peter Zijlstra wrote:
>>>
>>> I added some debug statements within the above code. This is a 2 cpu
>>> machine.
>>>
>>> XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
>>> XMON dest_cpu = 1024 XMON dest_cpu = 1024 . dead_cpu = 1
>>> XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
>>> XMON dest_cpu = 1024 XMON dest_cpu = 1024 . dead_cpu = 1
>>> XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
>>> XMON dest_cpu = 1024 XMON dest_cpu = 1024 . dead_cpu = 1
>>>
>>> Seems to me that the control is stuck in an infinite loop and hence the
>>> machine appears to be in hung state. The dest_cpu value is always 1024
>>> and never changes, which result in an infinite loop.
>>>
>>> In working scenario the o/p is something on the following lines
>>>
>>> XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
>>> XMON dest_cpu = 0 XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
>>> XMON dest_cpu = 0 XMON dest_cpu = 1024 . dead_cpu = 1 . nr_cpu_ids = 2
>>> XMON dest_cpu = 0
>>> Let me know if i should try to record any specific value ?
>>>
>>
>> Could you possibly print the two masks themselves? cpumask_scnprintf()
>> and friend come in handy for this.
>>
>> The dest_cpu=1024 thing seem to suggest the intersection between
>> p->cpus_allowed and cpu_active_mask is empty for some reason, even
>> though we forcefully reset p->cpus_allowed to the full set using
>> cpuset_cpus_allowed_locked().
>>
>
> So here is the data related to the two masks.
>
> cpu_active_mask = 00000000,00000000,00000000,00000000,00000000,00000000,
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
> 00000000,00000000
> XMON dest_cpu = 1024
>

How about cpu_online_mask? commit 6ad4c1 switches from cpu_online_mask
to cpu_active_mask.
Is there a mismatch for cpu_online_mask and cpu_active_mask?

> while p->cpus_allowed =  00000000,00000000,00000000,00000000,00000000,
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
> 00000000,00000000,00000001
> XMON dest_cpu = 1024
>
> In working scenario the above data looks like
>
> cpu_active_mask = 00000000,00000000,00000000,00000000,00000000,00000000,
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
> 00000000,00000002
> XMON dest_cpu = 1
>
> while p->cpus_allowed =  00000000,00000000,00000000,00000000,00000000,
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
> 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,
> 00000000,00000000,00000002
> XMON dest_cpu = 1
>
>
> hope i got the data correct.
>
> Thanks
> -Sachin
>
>
> --
>
> ---------------------------------
> Sachin Sant
> IBM Linux Technology Center
> India Systems and Technology Labs
> Bangalore, India
> ---------------------------------
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-next" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

2009-12-16 07:14:28

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Next] CPU Hotplug test failures on powerpc

On Wed, 2009-12-16 at 11:08 +0530, Sachin Sant wrote:
> Peter Zijlstra wrote:
> > Could you try the below?
> >
> No luck. Still the same issue. The mask values don't change.

Bugger, that patch did solve a similar problem for a patch I'm working
on.

Can you maybe add a print of the cpu_active_mask() in set_cpu_active()
using WARN() so we can see where it changes the mask, and why it things
its empty?

> > ---
> > init/main.c | 7 +------
> > 1 files changed, 1 insertions(+), 6 deletions(-)
> >
> > diff --git a/init/main.c b/init/main.c
> > index 4051d75..4be7de2 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -369,12 +369,6 @@ static void __init smp_init(void)
> > {
> > unsigned int cpu;
> >
> > - /*
> > - * Set up the current CPU as possible to migrate to.
> > - * The other ones will be done by cpu_up/cpu_down()
> > - */
> > - set_cpu_active(smp_processor_id(), true);
> > -
> > /* FIXME: This should be done in userspace --RR */
> > for_each_present_cpu(cpu) {
> > if (num_online_cpus() >= setup_max_cpus)
> > @@ -486,6 +480,7 @@ static void __init boot_cpu_init(void)
> > int cpu = smp_processor_id();
> > /* Mark the boot cpu "present", "online" etc for SMP and UP case */
> > set_cpu_online(cpu, true);
> > + set_cpu_active(cpu, true);
> > set_cpu_present(cpu, true);
> > set_cpu_possible(cpu, true);
> > }
> >
> >
> >
>
>

2009-12-16 07:18:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [Next] CPU Hotplug test failures on powerpc

On Wed, 2009-12-16 at 12:24 +0530, Sachin Sant wrote:
> Xiaotian Feng wrote:
> > On Wed, Dec 16, 2009 at 2:41 PM, Sachin Sant <[email protected]> wrote:
> >
> >> Xiaotian Feng wrote:
> >>
> >>> Does this testcase hotplug cpu 0 off?
> >>>
> >>>
> >> No, i don't think so. It skips cpu0 during online/offline
> >> process.
> >>
> >
> > Then how could this happen ? Looks like cpu 0 is offline ....
> > 0:mon> <4>IRQ 17 affinity broken off cpu 0
> > <4>IRQ 18 affinity broken off cpu 0
> > <4>IRQ 19 affinity broken off cpu 0
> > <4>IRQ 264 affinity broken off cpu 0
> > <4>cpu 0 (hwid 0) Ready to die...
> > <7>clockevent: decrementer mult[83126e97] shift[32] cpu[0]
> >
> Sorry i was looking at only one script. Looking more closely
> at the test there are 6 different sub tests. The rest of the
> tests do seem to hotplug CPU 0.

Ooh, cute, so you can actually hotplug cpu 0.. no wonder that didn't get
exposed on x86.

Still, the only time cpu_active_mask should not be equal to
cpu_online_mask is when we're in the middle of a hotplug, we clear
active early and set it late, but its all done under the hotplug mutex,
so we can at most have 1 cpu differences with online mask.

Unless of course, I messed up, which appears to be rather likely given
these problems ;-)

2009-12-16 07:57:59

by Xiaotian Feng

[permalink] [raw]
Subject: Re: [Next] CPU Hotplug test failures on powerpc

On Wed, Dec 16, 2009 at 3:18 PM, Peter Zijlstra <[email protected]> wrote:
> On Wed, 2009-12-16 at 12:24 +0530, Sachin Sant wrote:
>> Xiaotian Feng wrote:
>> > On Wed, Dec 16, 2009 at 2:41 PM, Sachin Sant <[email protected]> wrote:
>> >
>> >> Xiaotian Feng wrote:
>> >>
>> >>> Does this testcase hotplug cpu 0 off?
>> >>>
>> >>>
>> >> No, i don't think so. It skips cpu0 during online/offline
>> >> process.
>> >>
>> >
>> > Then how could this happen ? Looks like cpu 0 is offline ....
>> > 0:mon> <4>IRQ 17 affinity broken off cpu 0
>> > <4>IRQ 18 affinity broken off cpu 0
>> > <4>IRQ 19 affinity broken off cpu 0
>> > <4>IRQ 264 affinity broken off cpu 0
>> > <4>cpu 0 (hwid 0) Ready to die...
>> > <7>clockevent: decrementer mult[83126e97] shift[32] cpu[0]
>> >
>> Sorry i was looking at only one script. Looking more closely
>> at the test there are 6 different sub tests. The rest of the
>> tests do seem to hotplug CPU 0.
>
> Ooh, cute, so you can actually hotplug cpu 0.. no wonder that didn't get
> exposed on x86.
>
> Still, the only time cpu_active_mask should not be equal to
> cpu_online_mask is when we're in the middle of a hotplug, we clear
> active early and set it late, but its all done under the hotplug mutex,
> so we can at most have 1 cpu differences with online mask.
>

Could follow be possible? We know there's cpu 0 and cpu 1,

offline cpu1 > done
offline cpu0 > false

consider this in cpu_down code,


int __ref cpu_down(unsigned int cpu)
{
<snip>
set_cpu_active(cpu, false); // here, we set cpu 0 to inactive

synchronize_sched();

err = _cpu_down(cpu, 0);
out:
<snip>
}

Then in _cpu_down code:

static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
{
<snip>
if (num_online_cpus() == 1) // if we're trying to
offline cpu0, num_online_cpus will be 1
return -EBUSY; // after return back
to cpu_down, we didn't change cpu 0 back to active

if (!cpu_online(cpu))
return -EINVAL;

if (!alloc_cpumask_var(&old_allowed, GFP_KERNEL))
return -ENOMEM;
<snip>
}

Then cpu 0 is not active, but online, then we try to offline cpu1, .......
This can not be exposed because x86 does not have
/sys/devices/system/cpu0/online.
I guess following patch fixes this bug.

---
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 291ac58..21ddace 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -199,14 +199,18 @@ static int __ref _cpu_down(unsigned int cpu, int
tasks_frozen)
.hcpu = hcpu,
};

- if (num_online_cpus() == 1)
+ if (num_online_cpus() == 1) {
+ set_cpu_active(cpu, true);
return -EBUSY;
+ }

if (!cpu_online(cpu))
return -EINVAL;

- if (!alloc_cpumask_var(&old_allowed, GFP_KERNEL))
+ if (!alloc_cpumask_var(&old_allowed, GFP_KERNEL)) {
+ set_cpu_active(cpu, true);
return -ENOMEM;
+ }

cpu_hotplug_begin();
err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE | mod,


> Unless of course, I messed up, which appears to be rather likely given
> these problems ;-)
>
>

2009-12-16 08:25:16

by Sachin Sant

[permalink] [raw]
Subject: Re: [Next] CPU Hotplug test failures on powerpc

Xiaotian Feng wrote:
> Could follow be possible? We know there's cpu 0 and cpu 1,
>
> offline cpu1 > done
> offline cpu0 > false
>
> consider this in cpu_down code,
>
>
> int __ref cpu_down(unsigned int cpu)
> {
> <snip>
> set_cpu_active(cpu, false); // here, we set cpu 0 to inactive
>
> synchronize_sched();
>
> err = _cpu_down(cpu, 0);
> out:
> <snip>
> }
>
> Then in _cpu_down code:
>
> static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
> {
> <snip>
> if (num_online_cpus() == 1) // if we're trying to
> offline cpu0, num_online_cpus will be 1
> return -EBUSY; // after return back
> to cpu_down, we didn't change cpu 0 back to active
>
> if (!cpu_online(cpu))
> return -EINVAL;
>
> if (!alloc_cpumask_var(&old_allowed, GFP_KERNEL))
> return -ENOMEM;
> <snip>
> }
>
> Then cpu 0 is not active, but online, then we try to offline cpu1, .......
> This can not be exposed because x86 does not have
> /sys/devices/system/cpu0/online.
> I guess following patch fixes this bug.
>
Just tested this one on the POWER box and the test passed.
I did not observe the hang.

Thanks
-Sachin

> ---
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 291ac58..21ddace 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -199,14 +199,18 @@ static int __ref _cpu_down(unsigned int cpu, int
> tasks_frozen)
> .hcpu = hcpu,
> };
>
> - if (num_online_cpus() == 1)
> + if (num_online_cpus() == 1) {
> + set_cpu_active(cpu, true);
> return -EBUSY;
> + }
>
> if (!cpu_online(cpu))
> return -EINVAL;
>
> - if (!alloc_cpumask_var(&old_allowed, GFP_KERNEL))
> + if (!alloc_cpumask_var(&old_allowed, GFP_KERNEL)) {
> + set_cpu_active(cpu, true);
> return -ENOMEM;
> + }
>
> cpu_hotplug_begin();
> err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE | mod,
>
>
>
>> Unless of course, I messed up, which appears to be rather likely given
>> these problems ;-)
>>
>>
>>
>
>


--

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------

2009-12-16 09:07:33

by Xiaotian Feng

[permalink] [raw]
Subject: Re: [Next] CPU Hotplug test failures on powerpc

On Wed, Dec 16, 2009 at 4:24 PM, Sachin Sant <[email protected]> wrote:
> Xiaotian Feng wrote:
>>
>> Could follow be possible?  We know there's cpu 0 and cpu 1,
>>
>> offline cpu1 > done
>> offline cpu0 > false
>>
>> consider this in cpu_down code,
>>
>>
>> int __ref cpu_down(unsigned int cpu)
>> {
>> <snip>
>>        set_cpu_active(cpu, false); // here, we set cpu 0 to inactive
>>
>>        synchronize_sched();
>>
>>        err = _cpu_down(cpu, 0);
>> out:
>> <snip>
>> }
>>
>> Then in _cpu_down code:
>>
>> static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
>> {
>> <snip>
>>        if (num_online_cpus() == 1)        // if we're trying to
>> offline cpu0, num_online_cpus will be 1
>>                return -EBUSY;                    // after return back
>> to cpu_down, we didn't change cpu 0 back to active
>>
>>        if (!cpu_online(cpu))
>>                return -EINVAL;
>>
>>        if (!alloc_cpumask_var(&old_allowed, GFP_KERNEL))
>>                return -ENOMEM;
>> <snip>
>> }
>>
>> Then cpu 0 is not active, but online, then we try to offline cpu1, .......
>> This can not be exposed because x86 does not have
>> /sys/devices/system/cpu0/online.
>> I guess following patch fixes this bug.
>>
>
> Just tested this one on the POWER box and the test passed.
> I did not observe the hang.

Thanks for confirm, I will send formatted patch to upstream then:-)

>
> Thanks
> -Sachin
>
>> ---
>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>> index 291ac58..21ddace 100644
>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -199,14 +199,18 @@ static int __ref _cpu_down(unsigned int cpu, int
>> tasks_frozen)
>>                .hcpu = hcpu,
>>        };
>>
>> -       if (num_online_cpus() == 1)
>> +       if (num_online_cpus() == 1) {
>> +               set_cpu_active(cpu, true);
>>                return -EBUSY;
>> +       }
>>
>>        if (!cpu_online(cpu))
>>                return -EINVAL;
>>
>> -       if (!alloc_cpumask_var(&old_allowed, GFP_KERNEL))
>> +       if (!alloc_cpumask_var(&old_allowed, GFP_KERNEL)) {
>> +               set_cpu_active(cpu, true);
>>                return -ENOMEM;
>> +       }
>>
>>        cpu_hotplug_begin();
>>        err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE | mod,
>>
>>
>>
>>>
>>> Unless of course, I messed up, which appears to be rather likely given
>>> these problems ;-)
>>>
>>>
>>>
>>
>>
>
>
> --
>
> ---------------------------------
> Sachin Sant
> IBM Linux Technology Center
> India Systems and Technology Labs
> Bangalore, India
> ---------------------------------
>
>

2009-12-16 09:16:30

by Xiaotian Feng

[permalink] [raw]
Subject: [PATCH] fix cpu hotplug test failures on powerpc

Sachin found cpu hotplug test failures on powerpc, which made kernel
hangs on his POWER box. This is addressed in
http://marc.info/?l=linux-kernel&m=126052886204649&w=2

commit 6ad4c18(sched: Fix balance vs hotplug race), switches to
cpu_active_mask, but at some specific situation, kernel may cause
some cpu inactive but online.

In some powerpc machine, hotplug cpu0 is allowed. If cpu0 is the
last alive cpu, when we tried to offline cpu0, we'll inactive cpu0
in cpu_down(), after goes into __cpu_down(), kernel found num_online_cpus
is 1, returned -EBUSY but cpu0 is not changed back to active. So
cpu0 is inactive but online.

The fix is to set cpu inactive when we're going to bring down the specific
cpu in _cpu_down().

Reported-by: Sachin Sant <[email protected]>
Signed-off-by: Xiaotian Feng <[email protected]>
Tested-by: Sachin Sant <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Heiko Carstens <[email protected]>
---
kernel/cpu.c | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 291ac58..a1e7165 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -209,6 +209,7 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
return -ENOMEM;

cpu_hotplug_begin();
+ set_cpu_active(cpu, false);
err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE | mod,
hcpu, -1, &nr_calls);
if (err == NOTIFY_BAD) {
@@ -280,8 +281,6 @@ int __ref cpu_down(unsigned int cpu)
goto out;
}

- set_cpu_active(cpu, false);
-
/*
* Make sure the all cpus did the reschedule and are not
* using stale version of the cpu_active_mask.
@@ -387,12 +386,6 @@ int disable_nonboot_cpus(void)
*/
cpumask_clear(frozen_cpus);

- for_each_online_cpu(cpu) {
- if (cpu == first_cpu)
- continue;
- set_cpu_active(cpu, false);
- }
-
synchronize_sched();

printk("Disabling non-boot CPUs ...\n");

2009-12-16 10:16:49

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] fix cpu hotplug test failures on powerpc

On Wed, 2009-12-16 at 17:15 +0800, Xiaotian Feng wrote:
> Sachin found cpu hotplug test failures on powerpc, which made kernel
> hangs on his POWER box. This is addressed in
> http://marc.info/?l=linux-kernel&m=126052886204649&w=2
>
> commit 6ad4c18(sched: Fix balance vs hotplug race), switches to
> cpu_active_mask, but at some specific situation, kernel may cause
> some cpu inactive but online.
>
> In some powerpc machine, hotplug cpu0 is allowed. If cpu0 is the
> last alive cpu, when we tried to offline cpu0, we'll inactive cpu0
> in cpu_down(), after goes into __cpu_down(), kernel found num_online_cpus
> is 1, returned -EBUSY but cpu0 is not changed back to active. So
> cpu0 is inactive but online.
>
> The fix is to set cpu inactive when we're going to bring down the specific
> cpu in _cpu_down().

Good spotting, thanks! Some comments below.

> Reported-by: Sachin Sant <[email protected]>
> Signed-off-by: Xiaotian Feng <[email protected]>
> Tested-by: Sachin Sant <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Rusty Russell <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: H. Peter Anvin <[email protected]>
> Cc: Heiko Carstens <[email protected]>
> ---
> kernel/cpu.c | 8 ++++++--
> 1 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 291ac58..a1e7165 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -209,6 +209,7 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
> return -ENOMEM;
>
> cpu_hotplug_begin();
> + set_cpu_active(cpu, false);
> err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE | mod,
> hcpu, -1, &nr_calls);
> if (err == NOTIFY_BAD) {
> @@ -280,8 +281,6 @@ int __ref cpu_down(unsigned int cpu)
> goto out;
> }
>
> - set_cpu_active(cpu, false);
> -
> /*
> * Make sure the all cpus did the reschedule and are not
> * using stale version of the cpu_active_mask.

That renders the synchronize_sched() call down there useless, so might
as well remove it then.

> @@ -387,12 +386,6 @@ int disable_nonboot_cpus(void)
> */
> cpumask_clear(frozen_cpus);
>
> - for_each_online_cpu(cpu) {
> - if (cpu == first_cpu)
> - continue;
> - set_cpu_active(cpu, false);
> - }
> -
> synchronize_sched();

And here too.

> printk("Disabling non-boot CPUs ...\n");