2014-06-03 20:30:07

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode

On 05/28/2014 07:01 PM, Vivek Goyal wrote:
> On Tue, May 27, 2014 at 04:25:34PM +0530, Srivatsa S. Bhat wrote:
>> If we try to perform a kexec when the machine is in ST (Single-Threaded) mode
>> (ppc64_cpu --smt=off), the kexec operation doesn't succeed properly, and we
>> get the following messages during boot:
>>
>> [ 0.089866] POWER8 performance monitor hardware support registered
>> [ 0.089985] power8-pmu: PMAO restore workaround active.
>> [ 5.095419] Processor 1 is stuck.
>> [ 10.097933] Processor 2 is stuck.
>> [ 15.100480] Processor 3 is stuck.
>> [ 20.102982] Processor 4 is stuck.
>> [ 25.105489] Processor 5 is stuck.
>> [ 30.108005] Processor 6 is stuck.
>> [ 35.110518] Processor 7 is stuck.
>> [ 40.113369] Processor 9 is stuck.
>> [ 45.115879] Processor 10 is stuck.
>> [ 50.118389] Processor 11 is stuck.
>> [ 55.120904] Processor 12 is stuck.
>> [ 60.123425] Processor 13 is stuck.
>> [ 65.125970] Processor 14 is stuck.
>> [ 70.128495] Processor 15 is stuck.
>> [ 75.131316] Processor 17 is stuck.
>>
>> Note that only the sibling threads are stuck, while the primary threads (0, 8,
>> 16 etc) boot just fine. Looking closer at the previous step of kexec, we observe
>> that kexec tries to wakeup (bring online) the sibling threads of all the cores,
>> before performing kexec:
>>
>> [ 9464.131231] Starting new kernel
>> [ 9464.148507] kexec: Waking offline cpu 1.
>> [ 9464.148552] kexec: Waking offline cpu 2.
>> [ 9464.148600] kexec: Waking offline cpu 3.
>> [ 9464.148636] kexec: Waking offline cpu 4.
>> [ 9464.148671] kexec: Waking offline cpu 5.
>> [ 9464.148708] kexec: Waking offline cpu 6.
>> [ 9464.148743] kexec: Waking offline cpu 7.
>> [ 9464.148779] kexec: Waking offline cpu 9.
>> [ 9464.148815] kexec: Waking offline cpu 10.
>> [ 9464.148851] kexec: Waking offline cpu 11.
>> [ 9464.148887] kexec: Waking offline cpu 12.
>> [ 9464.148922] kexec: Waking offline cpu 13.
>> [ 9464.148958] kexec: Waking offline cpu 14.
>> [ 9464.148994] kexec: Waking offline cpu 15.
>> [ 9464.149030] kexec: Waking offline cpu 17.
>>
>> Instrumenting this piece of code revealed that the cpu_up() operation actually
>> fails with -EBUSY. Thus, only the primary threads of all the cores are online
>> during kexec, and hence this is a sure-shot receipe for disaster, as explained
>> in commit e8e5c2155b (powerpc/kexec: Fix orphaned offline CPUs across kexec),
>> as well as in the comment above wake_offline_cpus().
>>
>> It turns out that cpu_up() was returning -EBUSY because the variable
>> 'cpu_hotplug_disabled' was set to 1; and this disabling of CPU hotplug was done
>> by migrate_to_reboot_cpu() inside kernel_kexec().
>>
>> Now, migrate_to_reboot_cpu() was originally written with the assumption that
>> any further code will not need to perform CPU hotplug, since we are anyway in
>> the reboot path. However, kexec is clearly not such a case, since we depend on
>> onlining CPUs, atleast on powerpc.
>>
>> So re-enable cpu-hotplug after returning from migrate_to_reboot_cpu() in the
>> kexec path, to fix this regression in kexec on powerpc.
>>
>> Also, wrap the cpu_up() in powerpc kexec code within a WARN_ON(), so that we
>> can catch such issues more easily in the future.
>>
>> Fixes: c97102ba963 (kexec: migrate to reboot cpu)
>> Cc: [email protected]
>> Signed-off-by: Srivatsa S. Bhat <[email protected]>
>> ---
>>
>> arch/powerpc/kernel/machine_kexec_64.c | 2 +-
>> kernel/kexec.c | 8 ++++++++
>> 2 files changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c
>> index 59d229a..879b3aa 100644
>> --- a/arch/powerpc/kernel/machine_kexec_64.c
>> +++ b/arch/powerpc/kernel/machine_kexec_64.c
>> @@ -237,7 +237,7 @@ static void wake_offline_cpus(void)
>> if (!cpu_online(cpu)) {
>> printk(KERN_INFO "kexec: Waking offline cpu %d.\n",
>> cpu);
>> - cpu_up(cpu);
>> + WARN_ON(cpu_up(cpu));
>> }
>> }
>> }
>> diff --git a/kernel/kexec.c b/kernel/kexec.c
>> index c8380ad..28c5706 100644
>> --- a/kernel/kexec.c
>> +++ b/kernel/kexec.c
>> @@ -1683,6 +1683,14 @@ int kernel_kexec(void)
>> kexec_in_progress = true;
>> kernel_restart_prepare(NULL);
>> migrate_to_reboot_cpu();
>> +
>> + /*
>> + * migrate_to_reboot_cpu() disables CPU hotplug assuming that
>> + * no further code needs to use CPU hotplug (which is true in
>> + * the reboot case). However, the kexec path depends on using
>> + * CPU hotplug again; so re-enable it here.
>> + */
>> + cpu_hotplug_enable();
>> printk(KERN_EMERG "Starting new kernel\n");
>> machine_shutdown();
>
> After migrate_to_reboot_cpu(), we are calling machine_shutdown() which
> calls disable_nonboot_cpus() and which in turn calls _cpu_down().
>

Hmm? I see only 'arm' calling disable_nonboot_cpus() from machine_shutdown().
None of the other architectures call it. Is that a leftover in arm?

> So it is kind of odd that we first migrate to boot cpu, and then disable
> all non-boot cpus and after that powerpc goes ahead and onlines all
> cpus.
>
> I think this is not a good idea. For whatever reason if powerpc has to
> online all cpus, then it should happne earlier and not in machine_kexec().
>
> In fact I think generic code expects that all non-boot cpus are disabled
> so that generic code can use all the RAM as it wants to. Now if powerpc
> breaks that assumption, it will lead to various kind of issues.
>
> So I think we need to go back and see if we can find a way where we
> don't have to online all cpus in first kernel. And second kernel needs
> to have a way to detect it and online things.
>

Yep, that makes sense. But unfortunately I don't have enough insight into
why exactly powerpc has to online the CPUs before doing a kexec. I just
know from the commit log and the comment mentioned above (and from my own
experiments) that the CPUs will get stuck if they were offline. Perhaps
somebody more knowledgeable can explain this in detail and suggest a proper
long-term solution.

Matt, Ben, any thoughts on this?

Regards,
Srivatsa S. Bhat


2014-06-03 22:15:51

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode

On Wed, 2014-06-04 at 01:58 +0530, Srivatsa S. Bhat wrote:
> Yep, that makes sense. But unfortunately I don't have enough insight into
> why exactly powerpc has to online the CPUs before doing a kexec. I just
> know from the commit log and the comment mentioned above (and from my own
> experiments) that the CPUs will get stuck if they were offline. Perhaps
> somebody more knowledgeable can explain this in detail and suggest a proper
> long-term solution.
>
> Matt, Ben, any thoughts on this?

The problem is with our "soft offline" which we do on some platforms. When we
offline we don't actually send the CPUs back to firmware or anything like that.

We put them into a very low low power loop inside Linux.

The new kernel has no way to extract them from that loop. So we must re-"online"
them before we kexec so they can be passed to the new kernel normally (or returned
to firmware like we do on powernv).

Cheers,
Ben.

2014-06-04 13:41:47

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode

On Wed, Jun 04, 2014 at 01:58:40AM +0530, Srivatsa S. Bhat wrote:
> On 05/28/2014 07:01 PM, Vivek Goyal wrote:
> > On Tue, May 27, 2014 at 04:25:34PM +0530, Srivatsa S. Bhat wrote:
> >> If we try to perform a kexec when the machine is in ST (Single-Threaded) mode
> >> (ppc64_cpu --smt=off), the kexec operation doesn't succeed properly, and we
> >> get the following messages during boot:
> >>
> >> [ 0.089866] POWER8 performance monitor hardware support registered
> >> [ 0.089985] power8-pmu: PMAO restore workaround active.
> >> [ 5.095419] Processor 1 is stuck.
> >> [ 10.097933] Processor 2 is stuck.
> >> [ 15.100480] Processor 3 is stuck.
> >> [ 20.102982] Processor 4 is stuck.
> >> [ 25.105489] Processor 5 is stuck.
> >> [ 30.108005] Processor 6 is stuck.
> >> [ 35.110518] Processor 7 is stuck.
> >> [ 40.113369] Processor 9 is stuck.
> >> [ 45.115879] Processor 10 is stuck.
> >> [ 50.118389] Processor 11 is stuck.
> >> [ 55.120904] Processor 12 is stuck.
> >> [ 60.123425] Processor 13 is stuck.
> >> [ 65.125970] Processor 14 is stuck.
> >> [ 70.128495] Processor 15 is stuck.
> >> [ 75.131316] Processor 17 is stuck.
> >>
> >> Note that only the sibling threads are stuck, while the primary threads (0, 8,
> >> 16 etc) boot just fine. Looking closer at the previous step of kexec, we observe
> >> that kexec tries to wakeup (bring online) the sibling threads of all the cores,
> >> before performing kexec:
> >>
> >> [ 9464.131231] Starting new kernel
> >> [ 9464.148507] kexec: Waking offline cpu 1.
> >> [ 9464.148552] kexec: Waking offline cpu 2.
> >> [ 9464.148600] kexec: Waking offline cpu 3.
> >> [ 9464.148636] kexec: Waking offline cpu 4.
> >> [ 9464.148671] kexec: Waking offline cpu 5.
> >> [ 9464.148708] kexec: Waking offline cpu 6.
> >> [ 9464.148743] kexec: Waking offline cpu 7.
> >> [ 9464.148779] kexec: Waking offline cpu 9.
> >> [ 9464.148815] kexec: Waking offline cpu 10.
> >> [ 9464.148851] kexec: Waking offline cpu 11.
> >> [ 9464.148887] kexec: Waking offline cpu 12.
> >> [ 9464.148922] kexec: Waking offline cpu 13.
> >> [ 9464.148958] kexec: Waking offline cpu 14.
> >> [ 9464.148994] kexec: Waking offline cpu 15.
> >> [ 9464.149030] kexec: Waking offline cpu 17.
> >>
> >> Instrumenting this piece of code revealed that the cpu_up() operation actually
> >> fails with -EBUSY. Thus, only the primary threads of all the cores are online
> >> during kexec, and hence this is a sure-shot receipe for disaster, as explained
> >> in commit e8e5c2155b (powerpc/kexec: Fix orphaned offline CPUs across kexec),
> >> as well as in the comment above wake_offline_cpus().
> >>
> >> It turns out that cpu_up() was returning -EBUSY because the variable
> >> 'cpu_hotplug_disabled' was set to 1; and this disabling of CPU hotplug was done
> >> by migrate_to_reboot_cpu() inside kernel_kexec().
> >>
> >> Now, migrate_to_reboot_cpu() was originally written with the assumption that
> >> any further code will not need to perform CPU hotplug, since we are anyway in
> >> the reboot path. However, kexec is clearly not such a case, since we depend on
> >> onlining CPUs, atleast on powerpc.
> >>
> >> So re-enable cpu-hotplug after returning from migrate_to_reboot_cpu() in the
> >> kexec path, to fix this regression in kexec on powerpc.
> >>
> >> Also, wrap the cpu_up() in powerpc kexec code within a WARN_ON(), so that we
> >> can catch such issues more easily in the future.
> >>
> >> Fixes: c97102ba963 (kexec: migrate to reboot cpu)
> >> Cc: [email protected]
> >> Signed-off-by: Srivatsa S. Bhat <[email protected]>
> >> ---
> >>
> >> arch/powerpc/kernel/machine_kexec_64.c | 2 +-
> >> kernel/kexec.c | 8 ++++++++
> >> 2 files changed, 9 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c
> >> index 59d229a..879b3aa 100644
> >> --- a/arch/powerpc/kernel/machine_kexec_64.c
> >> +++ b/arch/powerpc/kernel/machine_kexec_64.c
> >> @@ -237,7 +237,7 @@ static void wake_offline_cpus(void)
> >> if (!cpu_online(cpu)) {
> >> printk(KERN_INFO "kexec: Waking offline cpu %d.\n",
> >> cpu);
> >> - cpu_up(cpu);
> >> + WARN_ON(cpu_up(cpu));
> >> }
> >> }
> >> }
> >> diff --git a/kernel/kexec.c b/kernel/kexec.c
> >> index c8380ad..28c5706 100644
> >> --- a/kernel/kexec.c
> >> +++ b/kernel/kexec.c
> >> @@ -1683,6 +1683,14 @@ int kernel_kexec(void)
> >> kexec_in_progress = true;
> >> kernel_restart_prepare(NULL);
> >> migrate_to_reboot_cpu();
> >> +
> >> + /*
> >> + * migrate_to_reboot_cpu() disables CPU hotplug assuming that
> >> + * no further code needs to use CPU hotplug (which is true in
> >> + * the reboot case). However, the kexec path depends on using
> >> + * CPU hotplug again; so re-enable it here.
> >> + */
> >> + cpu_hotplug_enable();
> >> printk(KERN_EMERG "Starting new kernel\n");
> >> machine_shutdown();
> >
> > After migrate_to_reboot_cpu(), we are calling machine_shutdown() which
> > calls disable_nonboot_cpus() and which in turn calls _cpu_down().
> >
>
> Hmm? I see only 'arm' calling disable_nonboot_cpus() from machine_shutdown().
> None of the other architectures call it. Is that a leftover in arm?

You are right. I did not notice that only arm is doing that. Looks like
it is calling into some platform code, I am not sure what exactly arm
does for disabling cpu.

x86 code calls stop_other_cpus() in machine_shutdown() which sends
REBOOT_VECTOR to other cpus and calls stop_this_cpu() which in turn
does.

for (;;)
halt();

IIUC, upon receipt of certain interrupts cpu will come out of halt state.
Not sure how safe it is from kexec point of view as we will be replacing
original kernel that means if cpu comes out of halt state it might be
running some random code.

Eric/hpa might know better the context here and what safeguards us on x86.

So one should not make cpu spin on some code as kexec will change that
code. It should be some other platform specific mechanism which brings
cpu in to hlt like state. So that way arm seems to be doing right thing.

I am not sure what powerpc does to stop cpus.

Thanks
Vivek

2014-06-04 13:47:36

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode

On Wed, Jun 04, 2014 at 08:09:25AM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2014-06-04 at 01:58 +0530, Srivatsa S. Bhat wrote:
> > Yep, that makes sense. But unfortunately I don't have enough insight into
> > why exactly powerpc has to online the CPUs before doing a kexec. I just
> > know from the commit log and the comment mentioned above (and from my own
> > experiments) that the CPUs will get stuck if they were offline. Perhaps
> > somebody more knowledgeable can explain this in detail and suggest a proper
> > long-term solution.
> >
> > Matt, Ben, any thoughts on this?
>
> The problem is with our "soft offline" which we do on some platforms. When we
> offline we don't actually send the CPUs back to firmware or anything like that.
>
> We put them into a very low low power loop inside Linux.
>
> The new kernel has no way to extract them from that loop. So we must re-"online"
> them before we kexec so they can be passed to the new kernel normally (or returned
> to firmware like we do on powernv).

Srivatsa,

Looks like your patch has been merged.

I don't like the following change in arch independent code.

/*
* migrate_to_reboot_cpu() disables CPU hotplug assuming that
* no further code needs to use CPU hotplug (which is true in
* the reboot case). However, the kexec path depends on using
* CPU hotplug again; so re-enable it here.
*/
cpu_hotplug_enable();

As it is very powerpc specific requirement, can you enable hotplug in powerpc
arch dependent code as a short term solution.

Ideally one needs to fix the requirement of online all cpus in powerpc
as a long term solution and then get rid of hotplug enable call.

Thanks
Vivek

2014-06-06 12:30:43

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode

On 06/04/2014 03:39 AM, Benjamin Herrenschmidt wrote:
> On Wed, 2014-06-04 at 01:58 +0530, Srivatsa S. Bhat wrote:
>> Yep, that makes sense. But unfortunately I don't have enough insight into
>> why exactly powerpc has to online the CPUs before doing a kexec. I just
>> know from the commit log and the comment mentioned above (and from my own
>> experiments) that the CPUs will get stuck if they were offline. Perhaps
>> somebody more knowledgeable can explain this in detail and suggest a proper
>> long-term solution.
>>
>> Matt, Ben, any thoughts on this?
>
> The problem is with our "soft offline" which we do on some platforms. When we
> offline we don't actually send the CPUs back to firmware or anything like that.
>
> We put them into a very low low power loop inside Linux.
>
> The new kernel has no way to extract them from that loop. So we must re-"online"
> them before we kexec so they can be passed to the new kernel normally (or returned
> to firmware like we do on powernv).
>

Thanks a lot for the explanation Ben!

I thought about this and this is what I think: whether the CPU is in the kernel
or in the firmware is a hard-boundary. But once we know it is still in the
kernel, whether it is online or offline is a soft-boundary, something that
ideally shouldn't make any difference to kexec.

Then I looked at what is that special state that kexec expects the online CPUs
to be in, before performing kexec, and I found that that state is entered via
kexec_smp_down().

Which means, if we poke the soft-offline CPUs and make them execute
kexec_smp_down(), we should be able to do a successful kexec without having to
actually online them. After all, the core kexec code doesn't mandate that they
should be online. So if we satisfy powerpc's requirement that all the CPUs are
in a sane state, that should be good enough. (This would be similar to how the
subcore code wakes up offline CPUs to perform the split-core procedure).

I know, this is all theory for now since I haven't tested it yet, but I think
we can make this work.

Below are the 4 preliminary patches I'm have so far, to implement this.


===============================================================================
Patch 1
===============================================================================

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 16d7e33..2a31b52 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -68,6 +68,7 @@ static inline void crash_setup_regs(struct pt_regs *newregs,
ppc_save_regs(newregs);
}

+extern bool kexec_cpu_wake(void);
extern void kexec_smp_wait(void); /* get and clear naca physid, wait for
master to copy new code to 0 */
extern int crashing_cpu;
diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index f92b0b5..39f721d 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -255,6 +255,16 @@ struct machdep_calls {
void (*machine_shutdown)(void);

#ifdef CONFIG_KEXEC
+#if (defined CONFIG_PPC64) && (defined CONFIG_PPC_BOOK3S)
+
+ /*
+ * The pseries and powernv book3s platforms have a special requirement
+ * that soft-offline CPUs have to be woken up before kexec, to avoid
+ * CPUs getting stuck. This callback prepares the system for the
+ * impending wakeup of the offline CPUs.
+ */
+ void (*kexec_wake_prepare)(void);
+#endif
void (*kexec_cpu_down)(int crash_shutdown, int secondary);

/* Called to do what every setup is needed on image and the
diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c
index 879b3aa..2ef6c58 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -182,6 +182,14 @@ static void kexec_smp_down(void *arg)
/* NOTREACHED */
}

+bool kexec_cpu_wake(void)
+{
+ kexec_smp_down(NULL);
+
+ /* NOTREACHED */
+ return true;
+}
+
static void kexec_prepare_cpus_wait(int wait_state)
{
int my_cpu, i, notified=-1;
@@ -202,7 +210,7 @@ static void kexec_prepare_cpus_wait(int wait_state)
* these possible-but-not-online-but-should-be CPUs and chaperone them
* into kexec_smp_wait().
*/
- for_each_online_cpu(i) {
+ for_each_present_cpu(i) {
if (i == my_cpu)
continue;

@@ -228,6 +236,8 @@ static void kexec_prepare_cpus_wait(int wait_state)
* threads as offline -- and again, these CPUs will be stuck.
*
* So, we online all CPUs that should be running, including secondary threads.
+ *
+ * TODO: Update this comment
*/
static void wake_offline_cpus(void)
{
@@ -237,7 +247,8 @@ static void wake_offline_cpus(void)
if (!cpu_online(cpu)) {
printk(KERN_INFO "kexec: Waking offline cpu %d.\n",
cpu);
- WARN_ON(cpu_up(cpu));
+ /* This should work even though the cpu is offline */
+ smp_send_reschedule(cpu);
}
}
}



===============================================================================
Patch 2
===============================================================================

diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h
index 75501bf..910081c 100644
--- a/arch/powerpc/platforms/powernv/powernv.h
+++ b/arch/powerpc/platforms/powernv/powernv.h
@@ -27,4 +27,8 @@ extern void pnv_lpc_init(void);

bool cpu_core_split_required(void);

+#ifdef CONFIG_KEXEC
+extern void pnv_kexec_wake_prepare(void);
+#endif
+
#endif /* _POWERNV_H */
diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index 8c16a5f..8dbccb7 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -331,6 +331,7 @@ define_machine(powernv) {
.calibrate_decr = generic_calibrate_decr,
.dma_set_mask = pnv_dma_set_mask,
#ifdef CONFIG_KEXEC
+ .kexec_wake_prepare = pnv_kexec_wake_prepare,
.kexec_cpu_down = pnv_kexec_cpu_down,
#endif
#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c
index 0062a43..0b017b0 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -32,6 +32,7 @@
#include <asm/opal.h>
#include <asm/runlatch.h>
#include <asm/code-patching.h>
+#include <asm/kexec.h>

#include "powernv.h"

@@ -140,6 +141,15 @@ static int pnv_smp_cpu_disable(void)
return 0;
}

+#ifdef CONFIG_KEXEC
+static bool kexec_wake_offline_cpus;
+
+void pnv_kexec_wake_prepare(void)
+{
+ kexec_wake_offline_cpus = true;
+}
+#endif
+
static void pnv_smp_cpu_kill_self(void)
{
unsigned int cpu;
@@ -170,6 +180,11 @@ static void pnv_smp_cpu_kill_self(void)
if (cpu_core_split_required())
continue;

+#ifdef CONFIG_KEXEC
+ if (kexec_wake_offline_cpus)
+ kexec_cpu_wake(); /* This function won't return! */
+#endif
+
if (!generic_check_cpu_restart(cpu))
DBG("CPU%d Unexpected exit while offline !\n", cpu);
}




===============================================================================
Patch 3
===============================================================================

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 20d6297..d026028 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -31,6 +31,7 @@
#include <asm/vdso_datapage.h>
#include <asm/xics.h>
#include <asm/plpar_wrappers.h>
+#include <asm/kexec.h>

#include "offline_states.h"

@@ -143,6 +144,13 @@ static void pseries_mach_cpu_die(void)
get_lppaca()->donate_dedicated_cpu = 0;
get_lppaca()->idle = 0;

+#if CONFIG_KEXEC
+ if (get_preferred_offline_state(cpu) == CPU_STATE_KEXEC_WAKE) {
+ /* This function won't return! */
+ kexec_cpu_wake();
+ }
+#endif
+
if (get_preferred_offline_state(cpu) == CPU_STATE_ONLINE) {
unregister_slb_shadow(hwcpu);

diff --git a/arch/powerpc/platforms/pseries/kexec.c b/arch/powerpc/platforms/pseries/kexec.c
index 13fa95b3..fc135e6 100644
--- a/arch/powerpc/platforms/pseries/kexec.c
+++ b/arch/powerpc/platforms/pseries/kexec.c
@@ -20,6 +20,17 @@
#include <asm/plpar_wrappers.h>

#include "pseries.h"
+#include "offline_states.h"
+
+void pseries_kexec_wake_prepare(void)
+{
+ unsigned int cpu;
+
+ for_each_present_cpu(cpu) {
+ if (!cpu_online(cpu))
+ set_preferred_offline_state(cpu, CPU_STATE_KEXEC_WAKE);
+ }
+}

static void pseries_kexec_cpu_down(int crash_shutdown, int secondary)
{
diff --git a/arch/powerpc/platforms/pseries/offline_states.h b/arch/powerpc/platforms/pseries/offline_states.h
index 08672d9..32fe5e8 100644
--- a/arch/powerpc/platforms/pseries/offline_states.h
+++ b/arch/powerpc/platforms/pseries/offline_states.h
@@ -5,6 +5,9 @@
enum cpu_state_vals {
CPU_STATE_OFFLINE,
CPU_STATE_INACTIVE,
+#ifdef CONFIG_KEXEC
+ CPU_STATE_KEXEC_WAKE,
+#endif
CPU_STATE_ONLINE,
CPU_MAX_OFFLINE_STATES
};
diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
index 361add6..35ecb99 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -38,6 +38,8 @@ static inline void smp_init_pseries_xics(void) { };
#endif

#ifdef CONFIG_KEXEC
+extern void pseries_kexec_wake_prepare(void);
+
extern void setup_kexec_cpu_down_xics(void);
extern void setup_kexec_cpu_down_mpic(void);
#else
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index adc21a0..c1a0722 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -808,6 +808,7 @@ define_machine(pseries) {
.system_reset_exception = pSeries_system_reset_exception,
.machine_check_exception = pSeries_machine_check_exception,
#ifdef CONFIG_KEXEC
+ .kexec_wake_prepare = pseries_kexec_wake_prepare,
.machine_kexec = pSeries_machine_kexec,
#endif
#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE



===============================================================================
Patch 4
===============================================================================

diff --git a/kernel/kexec.c b/kernel/kexec.c
index 28c5706..55a6350 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1684,13 +1684,6 @@ int kernel_kexec(void)
kernel_restart_prepare(NULL);
migrate_to_reboot_cpu();

- /*
- * migrate_to_reboot_cpu() disables CPU hotplug assuming that
- * no further code needs to use CPU hotplug (which is true in
- * the reboot case). However, the kexec path depends on using
- * CPU hotplug again; so re-enable it here.
- */
- cpu_hotplug_enable();
printk(KERN_EMERG "Starting new kernel\n");
machine_shutdown();
}

2014-06-06 12:32:12

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode

On 06/04/2014 07:16 PM, Vivek Goyal wrote:
> On Wed, Jun 04, 2014 at 08:09:25AM +1000, Benjamin Herrenschmidt wrote:
>> On Wed, 2014-06-04 at 01:58 +0530, Srivatsa S. Bhat wrote:
>>> Yep, that makes sense. But unfortunately I don't have enough insight into
>>> why exactly powerpc has to online the CPUs before doing a kexec. I just
>>> know from the commit log and the comment mentioned above (and from my own
>>> experiments) that the CPUs will get stuck if they were offline. Perhaps
>>> somebody more knowledgeable can explain this in detail and suggest a proper
>>> long-term solution.
>>>
>>> Matt, Ben, any thoughts on this?
>>
>> The problem is with our "soft offline" which we do on some platforms. When we
>> offline we don't actually send the CPUs back to firmware or anything like that.
>>
>> We put them into a very low low power loop inside Linux.
>>
>> The new kernel has no way to extract them from that loop. So we must re-"online"
>> them before we kexec so they can be passed to the new kernel normally (or returned
>> to firmware like we do on powernv).
>
> Srivatsa,
>
> Looks like your patch has been merged.
>
> I don't like the following change in arch independent code.
>
> /*
> * migrate_to_reboot_cpu() disables CPU hotplug assuming that
> * no further code needs to use CPU hotplug (which is true in
> * the reboot case). However, the kexec path depends on using
> * CPU hotplug again; so re-enable it here.
> */
> cpu_hotplug_enable();
>
> As it is very powerpc specific requirement, can you enable hotplug in powerpc
> arch dependent code as a short term solution.
>

I didn't do that because that would mean that the _disable() would be
performed inside kernel/kexec.c and the corresponding _enable() would
be performed in arch/powerpc/kernel/machine_kexec_64.c -- with no apparent
connection between them, which would have made them hard to relate.

> Ideally one needs to fix the requirement of online all cpus in powerpc
> as a long term solution and then get rid of hotplug enable call.
>

Yes, I agree. I'm trying out a solution at the moment (see the 4
preliminary patches I sent in my reply to Ben). If that works, we won't
need the enable call on powerpc.

Regards,
Srivatsa S. Bhat

2014-06-06 12:32:45

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode

On 06/04/2014 07:11 PM, Vivek Goyal wrote:
> On Wed, Jun 04, 2014 at 01:58:40AM +0530, Srivatsa S. Bhat wrote:
>> On 05/28/2014 07:01 PM, Vivek Goyal wrote:
>>> On Tue, May 27, 2014 at 04:25:34PM +0530, Srivatsa S. Bhat wrote:
>>>> If we try to perform a kexec when the machine is in ST (Single-Threaded) mode
>>>> (ppc64_cpu --smt=off), the kexec operation doesn't succeed properly, and we
>>>> get the following messages during boot:
>>>>
[...]
>>>> diff --git a/kernel/kexec.c b/kernel/kexec.c
>>>> index c8380ad..28c5706 100644
>>>> --- a/kernel/kexec.c
>>>> +++ b/kernel/kexec.c
>>>> @@ -1683,6 +1683,14 @@ int kernel_kexec(void)
>>>> kexec_in_progress = true;
>>>> kernel_restart_prepare(NULL);
>>>> migrate_to_reboot_cpu();
>>>> +
>>>> + /*
>>>> + * migrate_to_reboot_cpu() disables CPU hotplug assuming that
>>>> + * no further code needs to use CPU hotplug (which is true in
>>>> + * the reboot case). However, the kexec path depends on using
>>>> + * CPU hotplug again; so re-enable it here.
>>>> + */
>>>> + cpu_hotplug_enable();
>>>> printk(KERN_EMERG "Starting new kernel\n");
>>>> machine_shutdown();
>>>
>>> After migrate_to_reboot_cpu(), we are calling machine_shutdown() which
>>> calls disable_nonboot_cpus() and which in turn calls _cpu_down().
>>>
>>
>> Hmm? I see only 'arm' calling disable_nonboot_cpus() from machine_shutdown().
>> None of the other architectures call it. Is that a leftover in arm?
>
> You are right. I did not notice that only arm is doing that. Looks like
> it is calling into some platform code, I am not sure what exactly arm
> does for disabling cpu.
>
> x86 code calls stop_other_cpus() in machine_shutdown() which sends
> REBOOT_VECTOR to other cpus and calls stop_this_cpu() which in turn
> does.
>
> for (;;)
> halt();
>
> IIUC, upon receipt of certain interrupts cpu will come out of halt state.
> Not sure how safe it is from kexec point of view as we will be replacing
> original kernel that means if cpu comes out of halt state it might be
> running some random code.
>
> Eric/hpa might know better the context here and what safeguards us on x86.
>
> So one should not make cpu spin on some code as kexec will change that
> code. It should be some other platform specific mechanism which brings
> cpu in to hlt like state. So that way arm seems to be doing right thing.
>
> I am not sure what powerpc does to stop cpus.
>

powerpc shepherds all CPUs to a safe state, by making them run kexec_smp_down(),
and eventually those CPUs end up calling kexec_wait() in assembly.

Regards,
Srivatsa S. Bhat

2014-06-06 12:38:47

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode

On 06/06/2014 05:59 PM, Srivatsa S. Bhat wrote:

> +bool kexec_cpu_wake(void)
> +{
> + kexec_smp_down(NULL);
> +
> + /* NOTREACHED */
> + return true;
> +}
> +

This function doesn't have to return anything, so we can define it as void.
The bool is a remnant of my previous attempt at making this work. (But these
patches compile fine as they are, though).

Regards,
Srivatsa S. Bhat

2014-06-06 18:27:58

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode

On Fri, Jun 06, 2014 at 06:00:43PM +0530, Srivatsa S. Bhat wrote:
> On 06/04/2014 07:16 PM, Vivek Goyal wrote:
> > On Wed, Jun 04, 2014 at 08:09:25AM +1000, Benjamin Herrenschmidt wrote:
> >> On Wed, 2014-06-04 at 01:58 +0530, Srivatsa S. Bhat wrote:
> >>> Yep, that makes sense. But unfortunately I don't have enough insight into
> >>> why exactly powerpc has to online the CPUs before doing a kexec. I just
> >>> know from the commit log and the comment mentioned above (and from my own
> >>> experiments) that the CPUs will get stuck if they were offline. Perhaps
> >>> somebody more knowledgeable can explain this in detail and suggest a proper
> >>> long-term solution.
> >>>
> >>> Matt, Ben, any thoughts on this?
> >>
> >> The problem is with our "soft offline" which we do on some platforms. When we
> >> offline we don't actually send the CPUs back to firmware or anything like that.
> >>
> >> We put them into a very low low power loop inside Linux.
> >>
> >> The new kernel has no way to extract them from that loop. So we must re-"online"
> >> them before we kexec so they can be passed to the new kernel normally (or returned
> >> to firmware like we do on powernv).
> >
> > Srivatsa,
> >
> > Looks like your patch has been merged.
> >
> > I don't like the following change in arch independent code.
> >
> > /*
> > * migrate_to_reboot_cpu() disables CPU hotplug assuming that
> > * no further code needs to use CPU hotplug (which is true in
> > * the reboot case). However, the kexec path depends on using
> > * CPU hotplug again; so re-enable it here.
> > */
> > cpu_hotplug_enable();
> >
> > As it is very powerpc specific requirement, can you enable hotplug in powerpc
> > arch dependent code as a short term solution.
> >
>
> I didn't do that because that would mean that the _disable() would be
> performed inside kernel/kexec.c and the corresponding _enable() would
> be performed in arch/powerpc/kernel/machine_kexec_64.c -- with no apparent
> connection between them, which would have made them hard to relate.

Which we are doing anyway. The difference is that now we are doing it
for all arches.

If this is powerpc specific requirement, then we should limit this to
powerpc only and not let spill over in generic code.

And putting a big fat comment should take care of being able to figure
out why arch code is overwriting the generic code's decision. By putting
it in generic code and enforcing this on all arches does not buy us
anything, IMHO.


>
> > Ideally one needs to fix the requirement of online all cpus in powerpc
> > as a long term solution and then get rid of hotplug enable call.
> >
>
> Yes, I agree. I'm trying out a solution at the moment (see the 4
> preliminary patches I sent in my reply to Ben). If that works, we won't
> need the enable call on powerpc.

Thanks. This will help.

Thanks
Vivek

2014-06-06 19:01:37

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode

On 06/06/2014 11:57 PM, Vivek Goyal wrote:
> On Fri, Jun 06, 2014 at 06:00:43PM +0530, Srivatsa S. Bhat wrote:
>> On 06/04/2014 07:16 PM, Vivek Goyal wrote:
>>> On Wed, Jun 04, 2014 at 08:09:25AM +1000, Benjamin Herrenschmidt wrote:
>>>> On Wed, 2014-06-04 at 01:58 +0530, Srivatsa S. Bhat wrote:
>>>>> Yep, that makes sense. But unfortunately I don't have enough insight into
>>>>> why exactly powerpc has to online the CPUs before doing a kexec. I just
>>>>> know from the commit log and the comment mentioned above (and from my own
>>>>> experiments) that the CPUs will get stuck if they were offline. Perhaps
>>>>> somebody more knowledgeable can explain this in detail and suggest a proper
>>>>> long-term solution.
>>>>>
>>>>> Matt, Ben, any thoughts on this?
>>>>
>>>> The problem is with our "soft offline" which we do on some platforms. When we
>>>> offline we don't actually send the CPUs back to firmware or anything like that.
>>>>
>>>> We put them into a very low low power loop inside Linux.
>>>>
>>>> The new kernel has no way to extract them from that loop. So we must re-"online"
>>>> them before we kexec so they can be passed to the new kernel normally (or returned
>>>> to firmware like we do on powernv).
>>>
>>> Srivatsa,
>>>
>>> Looks like your patch has been merged.
>>>
>>> I don't like the following change in arch independent code.
>>>
>>> /*
>>> * migrate_to_reboot_cpu() disables CPU hotplug assuming that
>>> * no further code needs to use CPU hotplug (which is true in
>>> * the reboot case). However, the kexec path depends on using
>>> * CPU hotplug again; so re-enable it here.
>>> */
>>> cpu_hotplug_enable();
>>>
>>> As it is very powerpc specific requirement, can you enable hotplug in powerpc
>>> arch dependent code as a short term solution.
>>>
>>
>> I didn't do that because that would mean that the _disable() would be
>> performed inside kernel/kexec.c and the corresponding _enable() would
>> be performed in arch/powerpc/kernel/machine_kexec_64.c -- with no apparent
>> connection between them, which would have made them hard to relate.
>
> Which we are doing anyway. The difference is that now we are doing it
> for all arches.
>
> If this is powerpc specific requirement, then we should limit this to
> powerpc only and not let spill over in generic code.
>
> And putting a big fat comment should take care of being able to figure
> out why arch code is overwriting the generic code's decision. By putting
> it in generic code and enforcing this on all arches does not buy us
> anything, IMHO.
>

Yep, I see your point. Sorry about that!

Actually, I originally thought of fixing cpu_hotplug_disable/enable itself:
their true intent is to prevent *userspace* (i.e., from sysfs) from performing
CPU hotplug after a certain quiescent point in the kernel, and not to prevent
the kernel's own cpu hotplug attempts. But currently it prevents _all_ hotplug,
including those that are initiated from within the kernel, which is the reason
why kexec was effectively locking itself out on powerpc. I explored options to
fix that (which would in turn fix the powerpc problem automatically, without
having to add any code to kernel/kexec.c or even arch/powerpc code). But it
turned out to be too difficult and ugly given the current CPU hotplug locking
scheme. I'll revisit that once CPU hotplug locking is cleaned up.

But anyway, the powerpc kexec fix that I'm working on right now is not only a
much better solution, but it will also restore the original kexec code in
kernel/kexec.c, by removing the _enable() call.

Thank you!

Regards,
Srivatsa S. Bhat

2014-06-06 21:18:13

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode

On 06/06/2014 05:59 PM, Srivatsa S. Bhat wrote:
> On 06/04/2014 03:39 AM, Benjamin Herrenschmidt wrote:
>> On Wed, 2014-06-04 at 01:58 +0530, Srivatsa S. Bhat wrote:
>>> Yep, that makes sense. But unfortunately I don't have enough insight into
>>> why exactly powerpc has to online the CPUs before doing a kexec. I just
>>> know from the commit log and the comment mentioned above (and from my own
>>> experiments) that the CPUs will get stuck if they were offline. Perhaps
>>> somebody more knowledgeable can explain this in detail and suggest a proper
>>> long-term solution.
>>>
>>> Matt, Ben, any thoughts on this?
>>
>> The problem is with our "soft offline" which we do on some platforms. When we
>> offline we don't actually send the CPUs back to firmware or anything like that.
>>
>> We put them into a very low low power loop inside Linux.
>>
>> The new kernel has no way to extract them from that loop. So we must re-"online"
>> them before we kexec so they can be passed to the new kernel normally (or returned
>> to firmware like we do on powernv).
>>
>
> Thanks a lot for the explanation Ben!
>
> I thought about this and this is what I think: whether the CPU is in the kernel
> or in the firmware is a hard-boundary. But once we know it is still in the
> kernel, whether it is online or offline is a soft-boundary, something that
> ideally shouldn't make any difference to kexec.
>
> Then I looked at what is that special state that kexec expects the online CPUs
> to be in, before performing kexec, and I found that that state is entered via
> kexec_smp_down().
>
> Which means, if we poke the soft-offline CPUs and make them execute
> kexec_smp_down(), we should be able to do a successful kexec without having to
> actually online them. After all, the core kexec code doesn't mandate that they
> should be online. So if we satisfy powerpc's requirement that all the CPUs are
> in a sane state, that should be good enough. (This would be similar to how the
> subcore code wakes up offline CPUs to perform the split-core procedure).
>
> I know, this is all theory for now since I haven't tested it yet, but I think
> we can make this work.
>
> Below are the 4 preliminary patches I'm have so far, to implement this.
>

And with the following hunk added (which I had forgotten earlier), it worked just
fine on powernv :-)


diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c
index 2ef6c58..84e91293 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -243,6 +243,9 @@ static void wake_offline_cpus(void)
{
int cpu = 0;

+ if (ppc_md.kexec_wake_prepare)
+ ppc_md.kexec_wake_prepare();
+
for_each_present_cpu(cpu) {
if (!cpu_online(cpu)) {
printk(KERN_INFO "kexec: Waking offline cpu %d.\n",

I tried putting the machine into ST mode, and in a separate experiment, I kept
just CPU 0 online in the first kernel, and then issued a kexec. The second kernel
booted successfully with all the CPUs in both the cases.

I haven't explored the crashed-kernel case though, it might need some auditing
to check if the code handles that as well.

Regards,
Srivatsa S. Bhat

2014-06-12 06:40:03

by Joel Stanley

[permalink] [raw]
Subject: Re: [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode

Hi Srivatsa,

On Sat, Jun 7, 2014 at 7:16 AM, Srivatsa S. Bhat
<[email protected]> wrote:
> And with the following hunk added (which I had forgotten earlier), it worked just
> fine on powernv :-)

How are the patches coming along?

I just hung a machine here while attempting to kexec. It appears to
have onlined all of the secondary threads, and then hung here:

kexec: Waking offline cpu 1.
kvm: enabling virtualization on CPU1
kexec: Waking offline cpu 2.
kvm: enabling virtualization on CPU2
kexec: Waking offline cpu 3.
kvm: enabling virtualization on CPU3
kexec: Waking offline cpu 5.
kvm: enabling virtualization on CPU5
[...]
kvm: enabling virtualization on CPU63
kexec: waiting for cpu 1 (physical 1) to enter OPAL
kexec: waiting for cpu 2 (physical 2) to enter OPAL
kexec: waiting for cpu 3 (physical 3) to enter OPAL

I'm running benh's next branch as of thismorning, and SMT was off.

Could you please post your latest patches a series? I will test them here.

Cheers,

Joel

2014-06-12 08:18:43

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode

Hi Joel,

On 06/12/2014 12:09 PM, Joel Stanley wrote:
> Hi Srivatsa,
>
> On Sat, Jun 7, 2014 at 7:16 AM, Srivatsa S. Bhat
> <[email protected]> wrote:
>> And with the following hunk added (which I had forgotten earlier), it worked just
>> fine on powernv :-)
>
> How are the patches coming along?
>

I'm still waiting to test this patch series on a PowerVM box, and unfortunately
there are some machine issues to debug first :-( So that's why this is taking
time... :-(

> I just hung a machine here while attempting to kexec. It appears to
> have onlined all of the secondary threads, and then hung here:
>
> kexec: Waking offline cpu 1.
> kvm: enabling virtualization on CPU1
> kexec: Waking offline cpu 2.
> kvm: enabling virtualization on CPU2
> kexec: Waking offline cpu 3.
> kvm: enabling virtualization on CPU3
> kexec: Waking offline cpu 5.
> kvm: enabling virtualization on CPU5
> [...]
> kvm: enabling virtualization on CPU63
> kexec: waiting for cpu 1 (physical 1) to enter OPAL
> kexec: waiting for cpu 2 (physical 2) to enter OPAL
> kexec: waiting for cpu 3 (physical 3) to enter OPAL
>
> I'm running benh's next branch as of thismorning, and SMT was off.
>

Oh! This looks like a different hang than the one I tried to fix. My patch
("powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode")
which is already in benh's next branch was aimed at fixing the "CPU is stuck"
issue which was observed during the second kernel boot. If the first kernel
itself is hanging in the down-path, then it looks like a different problem
altogether.

> Could you please post your latest patches a series? I will test them here.
>

The 4 patches that I proposed in this thread are aimed at making the above
solution more elegant, by not having to actually online the secondary threads
while doing kexec. I don't think it will solve the hang that you are seeing.
In any case, I'll provide the consolidated patch below if you want to give it
a try.

By the way, I have a few questions regarding the hang you observed: is it
always reproducible with SMT=off? And if SMT was 8 (i.e, all CPUs in the system
were online) and then you did a kexec, do you still see the hang?

Regards,
Srivatsa S. Bhat

---------------------------------------------------------------------------

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 16d7e33..2a31b52 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -68,6 +68,7 @@ static inline void crash_setup_regs(struct pt_regs *newregs,
ppc_save_regs(newregs);
}

+extern bool kexec_cpu_wake(void);
extern void kexec_smp_wait(void); /* get and clear naca physid, wait for
master to copy new code to 0 */
extern int crashing_cpu;
diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index f92b0b5..39f721d 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -255,6 +255,16 @@ struct machdep_calls {
void (*machine_shutdown)(void);

#ifdef CONFIG_KEXEC
+#if (defined CONFIG_PPC64) && (defined CONFIG_PPC_BOOK3S)
+
+ /*
+ * The pseries and powernv book3s platforms have a special requirement
+ * that soft-offline CPUs have to be woken up before kexec, to avoid
+ * CPUs getting stuck. This callback prepares the system for the
+ * impending wakeup of the offline CPUs.
+ */
+ void (*kexec_wake_prepare)(void);
+#endif
void (*kexec_cpu_down)(int crash_shutdown, int secondary);

/* Called to do what every setup is needed on image and the
diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c
index 879b3aa..84e91293 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -182,6 +182,14 @@ static void kexec_smp_down(void *arg)
/* NOTREACHED */
}

+bool kexec_cpu_wake(void)
+{
+ kexec_smp_down(NULL);
+
+ /* NOTREACHED */
+ return true;
+}
+
static void kexec_prepare_cpus_wait(int wait_state)
{
int my_cpu, i, notified=-1;
@@ -202,7 +210,7 @@ static void kexec_prepare_cpus_wait(int wait_state)
* these possible-but-not-online-but-should-be CPUs and chaperone them
* into kexec_smp_wait().
*/
- for_each_online_cpu(i) {
+ for_each_present_cpu(i) {
if (i == my_cpu)
continue;

@@ -228,16 +236,22 @@ static void kexec_prepare_cpus_wait(int wait_state)
* threads as offline -- and again, these CPUs will be stuck.
*
* So, we online all CPUs that should be running, including secondary threads.
+ *
+ * TODO: Update this comment
*/
static void wake_offline_cpus(void)
{
int cpu = 0;

+ if (ppc_md.kexec_wake_prepare)
+ ppc_md.kexec_wake_prepare();
+
for_each_present_cpu(cpu) {
if (!cpu_online(cpu)) {
printk(KERN_INFO "kexec: Waking offline cpu %d.\n",
cpu);
- WARN_ON(cpu_up(cpu));
+ /* This should work even though the cpu is offline */
+ smp_send_reschedule(cpu);
}
}
}
diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h
index 75501bf..910081c 100644
--- a/arch/powerpc/platforms/powernv/powernv.h
+++ b/arch/powerpc/platforms/powernv/powernv.h
@@ -27,4 +27,8 @@ extern void pnv_lpc_init(void);

bool cpu_core_split_required(void);

+#ifdef CONFIG_KEXEC
+extern void pnv_kexec_wake_prepare(void);
+#endif
+
#endif /* _POWERNV_H */
diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index 8c16a5f..8dbccb7 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -331,6 +331,7 @@ define_machine(powernv) {
.calibrate_decr = generic_calibrate_decr,
.dma_set_mask = pnv_dma_set_mask,
#ifdef CONFIG_KEXEC
+ .kexec_wake_prepare = pnv_kexec_wake_prepare,
.kexec_cpu_down = pnv_kexec_cpu_down,
#endif
#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c
index 0062a43..0b017b0 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -32,6 +32,7 @@
#include <asm/opal.h>
#include <asm/runlatch.h>
#include <asm/code-patching.h>
+#include <asm/kexec.h>

#include "powernv.h"

@@ -140,6 +141,15 @@ static int pnv_smp_cpu_disable(void)
return 0;
}

+#ifdef CONFIG_KEXEC
+static bool kexec_wake_offline_cpus;
+
+void pnv_kexec_wake_prepare(void)
+{
+ kexec_wake_offline_cpus = true;
+}
+#endif
+
static void pnv_smp_cpu_kill_self(void)
{
unsigned int cpu;
@@ -170,6 +180,11 @@ static void pnv_smp_cpu_kill_self(void)
if (cpu_core_split_required())
continue;

+#ifdef CONFIG_KEXEC
+ if (kexec_wake_offline_cpus)
+ kexec_cpu_wake(); /* This function won't return! */
+#endif
+
if (!generic_check_cpu_restart(cpu))
DBG("CPU%d Unexpected exit while offline !\n", cpu);
}
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 20d6297..d026028 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -31,6 +31,7 @@
#include <asm/vdso_datapage.h>
#include <asm/xics.h>
#include <asm/plpar_wrappers.h>
+#include <asm/kexec.h>

#include "offline_states.h"

@@ -143,6 +144,13 @@ static void pseries_mach_cpu_die(void)
get_lppaca()->donate_dedicated_cpu = 0;
get_lppaca()->idle = 0;

+#if CONFIG_KEXEC
+ if (get_preferred_offline_state(cpu) == CPU_STATE_KEXEC_WAKE) {
+ /* This function won't return! */
+ kexec_cpu_wake();
+ }
+#endif
+
if (get_preferred_offline_state(cpu) == CPU_STATE_ONLINE) {
unregister_slb_shadow(hwcpu);

diff --git a/arch/powerpc/platforms/pseries/kexec.c b/arch/powerpc/platforms/pseries/kexec.c
index 13fa95b3..fc135e6 100644
--- a/arch/powerpc/platforms/pseries/kexec.c
+++ b/arch/powerpc/platforms/pseries/kexec.c
@@ -20,6 +20,17 @@
#include <asm/plpar_wrappers.h>

#include "pseries.h"
+#include "offline_states.h"
+
+void pseries_kexec_wake_prepare(void)
+{
+ unsigned int cpu;
+
+ for_each_present_cpu(cpu) {
+ if (!cpu_online(cpu))
+ set_preferred_offline_state(cpu, CPU_STATE_KEXEC_WAKE);
+ }
+}

static void pseries_kexec_cpu_down(int crash_shutdown, int secondary)
{
diff --git a/arch/powerpc/platforms/pseries/offline_states.h b/arch/powerpc/platforms/pseries/offline_states.h
index 08672d9..32fe5e8 100644
--- a/arch/powerpc/platforms/pseries/offline_states.h
+++ b/arch/powerpc/platforms/pseries/offline_states.h
@@ -5,6 +5,9 @@
enum cpu_state_vals {
CPU_STATE_OFFLINE,
CPU_STATE_INACTIVE,
+#ifdef CONFIG_KEXEC
+ CPU_STATE_KEXEC_WAKE,
+#endif
CPU_STATE_ONLINE,
CPU_MAX_OFFLINE_STATES
};
diff --git a/arch/powerpc/platforms/pseries/pseries.h b/arch/powerpc/platforms/pseries/pseries.h
index 361add6..35ecb99 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -38,6 +38,8 @@ static inline void smp_init_pseries_xics(void) { };
#endif

#ifdef CONFIG_KEXEC
+extern void pseries_kexec_wake_prepare(void);
+
extern void setup_kexec_cpu_down_xics(void);
extern void setup_kexec_cpu_down_mpic(void);
#else
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index adc21a0..c1a0722 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -808,6 +808,7 @@ define_machine(pseries) {
.system_reset_exception = pSeries_system_reset_exception,
.machine_check_exception = pSeries_machine_check_exception,
#ifdef CONFIG_KEXEC
+ .kexec_wake_prepare = pseries_kexec_wake_prepare,
.machine_kexec = pSeries_machine_kexec,
#endif
#ifdef CONFIG_MEMORY_HOTPLUG_SPARSE
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 28c5706..55a6350 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1684,13 +1684,6 @@ int kernel_kexec(void)
kernel_restart_prepare(NULL);
migrate_to_reboot_cpu();

- /*
- * migrate_to_reboot_cpu() disables CPU hotplug assuming that
- * no further code needs to use CPU hotplug (which is true in
- * the reboot case). However, the kexec path depends on using
- * CPU hotplug again; so re-enable it here.
- */
- cpu_hotplug_enable();
printk(KERN_EMERG "Starting new kernel\n");
machine_shutdown();
}