Hang is observed on virtual machines during CPU hotplug,
especially in big guests with many CPUs. (It happens more
often if host is over-committed).
Hang happens because master CPU timeouts on waiting till
AP boots and 'cancels' CPU online operation assuming AP
is not functional but AP may continue run wild later
causing various hangs or panics in running kernel that
is assuming that AP was offline.
This is an alternative approach, that instead of canceling
in-progress AP bringup (https://lkml.org/lkml/2014/3/6/257),
removes timeouts so that AP bringup won't be affected by
poor timing and syncs AP with master CPU at early startup
making sure that AP won't run wild if master CPU doesn't
expect AP to come online.
Series also fixes 3 bugs found during testing CPU bringup
failure case.
--
Below is the detailed description of a more often happening hang:
---
Master CPU may timeout before cpu_callin_mask is set and cancel
booting CPU, but being onlined CPU still continues to boot, sets
cpu_active_mask (CPU_STARTING notifiers) and spins in
check_tsc_sync_target() for master cpu to arrive. Following attempt
to online another cpu hangs in stop_machine, initiated from here:
smp_callin ->
smp_store_cpu_info ->
identify_secondary_cpu ->
mtrr_ap_init -> set_mtrr_from_inactive_cpu
stop_machine waits on completion of stop_work on all CPUs from
cpu_active_mask including a failed CPU that spins in check_tsc_sync_target().
Igor Mammedov (5):
x86: initialize secondary CPU only if master CPU will wait for it
x86: log error on secondary CPU wakeup failure at ERR level
x86: fix list corruption on CPU hotplug
x86: fix memory corruption in acpi_unmap_lsapic()
acpi_processor: do not mark present at boot but not onlined CPU as
onlined
arch/x86/kernel/cpu/common.c | 28 +++++++----
arch/x86/kernel/smpboot.c | 103 ++++++++++++----------------------------
drivers/acpi/acpi_processor.c | 3 -
3 files changed, 48 insertions(+), 86 deletions(-)
acpi_processor_add() assumes that present at boot CPUs
are always onlined, it is not so if a CPU failed to become
onlined. As result acpi_processor_add() will mark such CPU
device as onlined in sysfs and following attempts to
online/offline it using /sys/device/system/cpu/cpuX/online
attribute will fail.
Do not poke into device internals in acpi_processor_add()
and touch "struct device { .offline }" attribute, since
for CPUs onlined at boot it's set by:
topology_init() -> arch_register_cpu() -> register_cpu()
before ACPI device tree is parsed, and for hotplugged
CPUs it's set when userspace onlines CPU via sysfs.
Signed-off-by: Igor Mammedov <[email protected]>
---
drivers/acpi/acpi_processor.c | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index c29c2c3..d56e4b4 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -403,9 +403,6 @@ static int acpi_processor_add(struct acpi_device *device,
if (result)
goto err;
- pr->dev = dev;
- dev->offline = pr->flags.need_hotplug_init;
-
/* Trigger the processor driver's .probe() if present. */
if (device_attach(dev) >= 0)
return 1;
--
1.7.1
if during CPU hotplug master CPU failed to wake up AP
it set percpu x86_cpu_to_apicid to BAD_APICID=0xFFFF for AP.
However following attempt to unplug that CPU will lead to
out of bound write access to __apicid_to_node[] which is
32768 items long on x86_64 kernel.
So drop setting x86_cpu_to_apicid to BAD_APICID in do_boot_cpu()
and allow acpi_processor_remove()->acpi_unmap_lsapic() cleanly
remove CPU.
Signed-off-by: Igor Mammedov <[email protected]>
---
arch/x86/kernel/smpboot.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index e7c15d7..44903ad 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -821,8 +821,6 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
udelay(100);
schedule();
}
- } else {
- per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
}
/* mark "stuck" area as not stuck */
--
1.7.1
currently if AP wake up is failed, master CPU marks AP as not present
in do_boot_cpu() by calling set_cpu_present(cpu, false).
That leads to following list corruption on the next physical CPU
hotplug:
[ 418.107336] WARNING: CPU: 1 PID: 45 at lib/list_debug.c:33 __list_add+0xbe/0xd0()
[ 418.115268] list_add corruption. prev->next should be next (ffff88003dc57600), but was ffff88003e20c3a0. (prev=ffff88003e20c3a0).
[ 418.123693] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT ipt_REJECT cfg80211 xt_conntrack rfkill ee
[ 418.138979] CPU: 1 PID: 45 Comm: kworker/u10:1 Not tainted 3.14.0-rc6+ #387
[ 418.149989] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[ 418.165750] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
[ 418.166433] 0000000000000021 ffff880038ca7988 ffffffff8159b22d 0000000000000021
[ 418.176460] ffff880038ca79d8 ffff880038ca79c8 ffffffff8106942c ffff880038ca79e8
[ 418.177453] ffff88003e20c3a0 ffff88003dc57600 ffff88003e20c3a0 00000000ffffffea
[ 418.178445] Call Trace:
[ 418.185811] [<ffffffff8159b22d>] dump_stack+0x49/0x5c
[ 418.186440] [<ffffffff8106942c>] warn_slowpath_common+0x8c/0xc0
[ 418.187192] [<ffffffff81069516>] warn_slowpath_fmt+0x46/0x50
[ 418.191231] [<ffffffff8136ef51>] ? acpi_ns_get_node+0xb7/0xc7
[ 418.193889] [<ffffffff812f796e>] __list_add+0xbe/0xd0
[ 418.196649] [<ffffffff812e2aa9>] kobject_add_internal+0x79/0x200
[ 418.208610] [<ffffffff812e2e18>] kobject_add_varg+0x38/0x60
[ 418.213831] [<ffffffff812e2ef4>] kobject_add+0x44/0x70
[ 418.229961] [<ffffffff813e2c60>] device_add+0xd0/0x550
[ 418.234991] [<ffffffff813f0e95>] ? pm_runtime_init+0xe5/0xf0
[ 418.250226] [<ffffffff813e32be>] device_register+0x1e/0x30
[ 418.255296] [<ffffffff813e82a3>] register_cpu+0xe3/0x130
[ 418.266539] [<ffffffff81592be5>] arch_register_cpu+0x65/0x150
[ 418.285845] [<ffffffff81355c0d>] acpi_processor_hotadd_init+0x5a/0x9b
...
Which is caused by the fact that generic_processor_info() allocates
logical CPU id by calling:
cpu = cpumask_next_zero(-1, cpu_present_mask);
which returns id of previously failed to wake up CPU, since its bit
is cleared by do_boot_cpu() and as result register_cpu() tries to
register another CPU with the same id as already present but failed
to be onlined CPU.
Taking in account that AP will not do anything if master CPU failed to
wake it up, there is no reason to mark that AP as not present and
break next cpu hotplug attempts. As a side effect of not marking AP
as not present, user would be allowed to online it again later.
Signed-off-by: Igor Mammedov <[email protected]>
---
arch/x86/kernel/smpboot.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 853473d..e7c15d7 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -822,7 +822,6 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
schedule();
}
} else {
- set_cpu_present(cpu, false);
per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
}
--
1.7.1
Hang is observed on virtual machines during CPU hotplug,
especially in big guests with many CPUs. (It reproducible
more often if host is over-committed).
It happens because master CPU gives up waiting on
secondary CPU and allows it to run wild. As result
AP causes locking or crashing system. For example
as described here: https://lkml.org/lkml/2014/3/6/257
If master CPU have sent STARTUP IPI successfully,
and AP signalled to master CPU that it's ready
to start initialization, make master CPU wait
indefinitely till AP is onlined.
To ensure that AP won't ever run wild, make it
wait at early startup till master CPU confirms its
intention to wait for AP.
Signed-off-by: Igor Mammedov <[email protected]>
---
v2:
- ammend comment in cpu_init()
v3:
- leave timeouts in do_boot_cpu(), so that master CPU
won't hang if AP doesn't respond, use cpu_initialized_mask
as a way for AP to signal to master CPU that it's ready
to start initialzation.
---
arch/x86/kernel/cpu/common.c | 28 +++++++-----
arch/x86/kernel/smpboot.c | 100 +++++++++++++-----------------------------
2 files changed, 48 insertions(+), 80 deletions(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a135239..6650110 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1236,16 +1236,23 @@ void cpu_init(void)
struct task_struct *me;
struct tss_struct *t;
unsigned long v;
- int cpu;
+ int cpu = stack_smp_processor_id();
int i;
/*
+ * wait for ACK from master CPU before continuing
+ * with AP initialization
+ */
+ cpumask_set_cpu(cpu, cpu_initialized_mask);
+ while (!cpumask_test_cpu(cpu, cpu_callout_mask))
+ cpu_relax();
+
+ /*
* Load microcode on this cpu if a valid microcode is available.
* This is early microcode loading procedure.
*/
load_ucode_ap();
- cpu = stack_smp_processor_id();
t = &per_cpu(init_tss, cpu);
oist = &per_cpu(orig_ist, cpu);
@@ -1257,9 +1264,6 @@ void cpu_init(void)
me = current;
- if (cpumask_test_and_set_cpu(cpu, cpu_initialized_mask))
- panic("CPU#%d already initialized!\n", cpu);
-
pr_debug("Initializing CPU#%d\n", cpu);
clear_in_cr4(X86_CR4_VME|X86_CR4_PVI|X86_CR4_TSD|X86_CR4_DE);
@@ -1336,13 +1340,15 @@ void cpu_init(void)
struct tss_struct *t = &per_cpu(init_tss, cpu);
struct thread_struct *thread = &curr->thread;
- show_ucode_info_early();
+ /*
+ * wait for ACK from master CPU before continuing
+ * with AP initialization
+ */
+ cpumask_set_cpu(cpu, cpu_initialized_mask);
+ while (!cpumask_test_cpu(cpu, cpu_callout_mask))
+ cpu_relax();
- if (cpumask_test_and_set_cpu(cpu, cpu_initialized_mask)) {
- printk(KERN_WARNING "CPU#%d already initialized!\n", cpu);
- for (;;)
- local_irq_enable();
- }
+ show_ucode_info_early();
printk(KERN_INFO "Initializing CPU#%d\n", cpu);
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 3482693..5e57a0a 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -111,7 +111,6 @@ atomic_t init_deasserted;
static void smp_callin(void)
{
int cpuid, phys_id;
- unsigned long timeout;
/*
* If waken up by an INIT in an 82489DX configuration
@@ -130,37 +129,6 @@ static void smp_callin(void)
* (This works even if the APIC is not enabled.)
*/
phys_id = read_apic_id();
- if (cpumask_test_cpu(cpuid, cpu_callin_mask)) {
- panic("%s: phys CPU#%d, CPU#%d already present??\n", __func__,
- phys_id, cpuid);
- }
- pr_debug("CPU#%d (phys ID: %d) waiting for CALLOUT\n", cpuid, phys_id);
-
- /*
- * STARTUP IPIs are fragile beasts as they might sometimes
- * trigger some glue motherboard logic. Complete APIC bus
- * silence for 1 second, this overestimates the time the
- * boot CPU is spending to send the up to 2 STARTUP IPIs
- * by a factor of two. This should be enough.
- */
-
- /*
- * Waiting 2s total for startup (udelay is not yet working)
- */
- timeout = jiffies + 2*HZ;
- while (time_before(jiffies, timeout)) {
- /*
- * Has the boot CPU finished it's STARTUP sequence?
- */
- if (cpumask_test_cpu(cpuid, cpu_callout_mask))
- break;
- cpu_relax();
- }
-
- if (!time_before(jiffies, timeout)) {
- panic("%s: CPU%d started up but did not get a callout!\n",
- __func__, cpuid);
- }
/*
* the boot CPU has finished the init stage and is spinning
@@ -750,8 +718,8 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
unsigned long start_ip = real_mode_header->trampoline_start;
unsigned long boot_error = 0;
- int timeout;
int cpu0_nmi_registered = 0;
+ unsigned long timeout;
/* Just in case we booted with a single CPU. */
alternatives_enable_smp();
@@ -799,6 +767,14 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
}
/*
+ * AP might wait on cpu_callout_mask in cpu_init() with
+ * cpu_initialized_mask set if previous attempt to online
+ * it timed-out. Clear cpu_initialized_mask so that after
+ * INIT/SIPI it could start with a clean state.
+ */
+ cpumask_clear_cpu(cpu, cpu_initialized_mask);
+
+ /*
* Wake up a CPU in difference cases:
* - Use the method in the APIC driver if it's defined
* Otherwise,
@@ -810,56 +786,42 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid,
&cpu0_nmi_registered);
+
if (!boot_error) {
/*
- * allow APs to start initializing.
+ * Wait 10s total for a response from AP
*/
- pr_debug("Before Callout %d\n", cpu);
- cpumask_set_cpu(cpu, cpu_callout_mask);
- pr_debug("After Callout %d\n", cpu);
+ boot_error = -1;
+ timeout = jiffies + 10*HZ;
+ while (time_before(jiffies, timeout)) {
+ if (cpumask_test_cpu(cpu, cpu_initialized_mask)) {
+ /*
+ * Tell AP to proceed with initialization
+ */
+ cpumask_set_cpu(cpu, cpu_callout_mask);
+ boot_error = 0;
+ break;
+ }
+ udelay(100);
+ schedule();
+ }
+ }
+ if (!boot_error) {
/*
- * Wait 5s total for a response
+ * Wait till AP completes initial initialization
*/
- for (timeout = 0; timeout < 50000; timeout++) {
- if (cpumask_test_cpu(cpu, cpu_callin_mask))
- break; /* It has booted */
- udelay(100);
+ while (!cpumask_test_cpu(cpu, cpu_callin_mask)) {
/*
* Allow other tasks to run while we wait for the
* AP to come online. This also gives a chance
* for the MTRR work(triggered by the AP coming online)
* to be completed in the stop machine context.
*/
+ udelay(100);
schedule();
}
-
- if (cpumask_test_cpu(cpu, cpu_callin_mask)) {
- print_cpu_msr(&cpu_data(cpu));
- pr_debug("CPU%d: has booted.\n", cpu);
- } else {
- boot_error = 1;
- if (*trampoline_status == 0xA5A5A5A5)
- /* trampoline started but...? */
- pr_err("CPU%d: Stuck ??\n", cpu);
- else
- /* trampoline code not run */
- pr_err("CPU%d: Not responding\n", cpu);
- if (apic->inquire_remote_apic)
- apic->inquire_remote_apic(apicid);
- }
- }
-
- if (boot_error) {
- /* Try to put things back the way they were before ... */
- numa_remove_cpu(cpu); /* was set by numa_add_cpu */
-
- /* was set by do_boot_cpu() */
- cpumask_clear_cpu(cpu, cpu_callout_mask);
-
- /* was set by cpu_init() */
- cpumask_clear_cpu(cpu, cpu_initialized_mask);
-
+ } else {
set_cpu_present(cpu, false);
per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
}
--
1.7.1
If system is running without debug level logging,
it will not log error if do_boot_cpu() failed to
wakeup AP. It may lead to silent AP bringup
failures at boot time.
Change message level to KERN_ERR to make error
visible to user as it's done on other architectures.
Signed-off-by: Igor Mammedov <[email protected]>
---
arch/x86/kernel/smpboot.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 5e57a0a..853473d 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -883,7 +883,7 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
err = do_boot_cpu(apicid, cpu, tidle);
if (err) {
- pr_debug("do_boot_cpu failed %d\n", err);
+ pr_err("do_boot_cpu failed(%d) to wakeup CPU#%u\n", err, cpu);
return -EIO;
}
--
1.7.1
* Igor Mammedov <[email protected]> wrote:
> /*
> + * wait for ACK from master CPU before continuing
> + * with AP initialization
> + */
> + cpumask_set_cpu(cpu, cpu_initialized_mask);
> + while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> + cpu_relax();
> + /*
> + * wait for ACK from master CPU before continuing
> + * with AP initialization
> + */
> + cpumask_set_cpu(cpu, cpu_initialized_mask);
> + while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> + cpu_relax();
That repetitive pattern could be stuck into a properly named helper
inline function.
(Also, before the cpumask_set_cpu() we should probably do a WARN_ON()
if the bit is already set.)
Thanks,
Ingo
* Igor Mammedov <[email protected]> wrote:
> currently if AP wake up is failed, master CPU marks AP as not present
> in do_boot_cpu() by calling set_cpu_present(cpu, false).
> That leads to following list corruption on the next physical CPU
> hotplug:
Shouldn't this fix precede the main change to the smp bootup logic?
Can this bug trigger with current upstream kernels?
Thanks,
Ingo
* Igor Mammedov <[email protected]> wrote:
> if during CPU hotplug master CPU failed to wake up AP
> it set percpu x86_cpu_to_apicid to BAD_APICID=0xFFFF for AP.
>
> However following attempt to unplug that CPU will lead to
> out of bound write access to __apicid_to_node[] which is
> 32768 items long on x86_64 kernel.
>
> So drop setting x86_cpu_to_apicid to BAD_APICID in do_boot_cpu()
> and allow acpi_processor_remove()->acpi_unmap_lsapic() cleanly
> remove CPU.
Same suggestion as for the other fix patch: the fix should precede the
patch that exposes it.
Thanks,
Ingo
* Igor Mammedov <[email protected]> wrote:
> acpi_processor_add() assumes that present at boot CPUs
> are always onlined, it is not so if a CPU failed to become
> onlined. As result acpi_processor_add() will mark such CPU
> device as onlined in sysfs and following attempts to
> online/offline it using /sys/device/system/cpu/cpuX/online
> attribute will fail.
>
> Do not poke into device internals in acpi_processor_add()
> and touch "struct device { .offline }" attribute, since
> for CPUs onlined at boot it's set by:
> topology_init() -> arch_register_cpu() -> register_cpu()
> before ACPI device tree is parsed, and for hotplugged
> CPUs it's set when userspace onlines CPU via sysfs.
>
> Signed-off-by: Igor Mammedov <[email protected]>
> ---
> drivers/acpi/acpi_processor.c | 3 ---
> 1 files changed, 0 insertions(+), 3 deletions(-)
Can this fix be moved first too, or does it have undesirable side
effects on unmodified kernels?
Thanks,
Ingo
On Mon, 14 Apr 2014 11:16:00 +0200
Ingo Molnar <[email protected]> wrote:
>
> * Igor Mammedov <[email protected]> wrote:
>
> > /*
> > + * wait for ACK from master CPU before continuing
> > + * with AP initialization
> > + */
> > + cpumask_set_cpu(cpu, cpu_initialized_mask);
> > + while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > + cpu_relax();
>
> > + /*
> > + * wait for ACK from master CPU before continuing
> > + * with AP initialization
> > + */
> > + cpumask_set_cpu(cpu, cpu_initialized_mask);
> > + while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > + cpu_relax();
>
> That repetitive pattern could be stuck into a properly named helper
> inline function.
sure
> (Also, before the cpumask_set_cpu() we should probably do a WARN_ON()
> if the bit is already set.)
The reason why there is no any WARN_ON or likes is that printk is quite
complicated, takes looks and so on. So it's not safe at this point since
CPU could be shot down by any time by INIT/SIPI until it's out of
cpu_callout_mask loop.
That said it's possible to add WARN_ON in do_boot_cpu() before
cpu_initialized_mask is cleared, to achieve the same effect,
so I'll stick it there.
>
> Thanks,
>
> Ingo
--
Regards,
Igor
On Mon, 14 Apr 2014 11:19:54 +0200
Ingo Molnar <[email protected]> wrote:
>
> * Igor Mammedov <[email protected]> wrote:
>
> > currently if AP wake up is failed, master CPU marks AP as not present
> > in do_boot_cpu() by calling set_cpu_present(cpu, false).
> > That leads to following list corruption on the next physical CPU
> > hotplug:
>
> Shouldn't this fix precede the main change to the smp bootup logic?
>
> Can this bug trigger with current upstream kernels?
That's not impossible, tests showed that with current kernel there will
be other problems due wild AP running around.
I'll reorder patch anyway.
>
> Thanks,
>
> Ingo
--
Regards,
Igor
* Igor Mammedov <[email protected]> wrote:
> On Mon, 14 Apr 2014 11:16:00 +0200
> Ingo Molnar <[email protected]> wrote:
>
> >
> > * Igor Mammedov <[email protected]> wrote:
> >
> > > /*
> > > + * wait for ACK from master CPU before continuing
> > > + * with AP initialization
> > > + */
> > > + cpumask_set_cpu(cpu, cpu_initialized_mask);
> > > + while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > > + cpu_relax();
> >
> > > + /*
> > > + * wait for ACK from master CPU before continuing
> > > + * with AP initialization
> > > + */
> > > + cpumask_set_cpu(cpu, cpu_initialized_mask);
> > > + while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > > + cpu_relax();
> >
> > That repetitive pattern could be stuck into a properly named helper
> > inline function.
> sure
>
> > (Also, before the cpumask_set_cpu() we should probably do a WARN_ON()
> > if the bit is already set.)
> The reason why there is no any WARN_ON or likes is that printk is quite
> complicated, takes looks and so on. [...]
[ Yeah, I too heard that printk(), like a pretty girl, is complicated
and makes people look twice. ]
> [...] So it's not safe at this point since
> CPU could be shot down by any time by INIT/SIPI until it's out of
> cpu_callout_mask loop.
Not sure where you got that from, but it's not a valid concern really:
the only place where we don't want to do a printk() is in printk code
itself.
Debug warnings, by definition, should never trigger. If they trigger
then they will very likely not cause lockups, but will cause the bug
to be fixed.
Thanks,
Ingo
* Igor Mammedov <[email protected]> wrote:
> On Mon, 14 Apr 2014 11:19:54 +0200
> Ingo Molnar <[email protected]> wrote:
>
> >
> > * Igor Mammedov <[email protected]> wrote:
> >
> > > currently if AP wake up is failed, master CPU marks AP as not present
> > > in do_boot_cpu() by calling set_cpu_present(cpu, false).
> > > That leads to following list corruption on the next physical CPU
> > > hotplug:
> >
> > Shouldn't this fix precede the main change to the smp bootup logic?
> >
> > Can this bug trigger with current upstream kernels?
> That's not impossible, tests showed that with current kernel there will
> be other problems due wild AP running around.
>
> I'll reorder patch anyway.
So, could you please first make sure that with only the fixes applied
there's no problems left?
Only then should we apply the patch that adds/tweaks the timeout/etc.
Thanks,
Ingo
On Mon, 14 Apr 2014 12:03:35 +0200
Ingo Molnar <[email protected]> wrote:
>
> * Igor Mammedov <[email protected]> wrote:
>
> > On Mon, 14 Apr 2014 11:16:00 +0200
> > Ingo Molnar <[email protected]> wrote:
> >
> > >
> > > * Igor Mammedov <[email protected]> wrote:
> > >
> > > > /*
> > > > + * wait for ACK from master CPU before continuing
> > > > + * with AP initialization
> > > > + */
> > > > + cpumask_set_cpu(cpu, cpu_initialized_mask);
> > > > + while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > > > + cpu_relax();
> > >
> > > > + /*
> > > > + * wait for ACK from master CPU before continuing
> > > > + * with AP initialization
> > > > + */
> > > > + cpumask_set_cpu(cpu, cpu_initialized_mask);
> > > > + while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > > > + cpu_relax();
> > >
> > > That repetitive pattern could be stuck into a properly named helper
> > > inline function.
> > sure
> >
> > > (Also, before the cpumask_set_cpu() we should probably do a WARN_ON()
> > > if the bit is already set.)
> > The reason why there is no any WARN_ON or likes is that printk is quite
> > complicated, takes looks and so on. [...]
>
> [ Yeah, I too heard that printk(), like a pretty girl, is complicated
> and makes people look twice. ]
>
> > [...] So it's not safe at this point since
> > CPU could be shot down by any time by INIT/SIPI until it's out of
> > cpu_callout_mask loop.
>
> Not sure where you got that from, but it's not a valid concern really:
> the only place where we don't want to do a printk() is in printk code
> itself.
>
> Debug warnings, by definition, should never trigger. If they trigger
> then they will very likely not cause lockups, but will cause the bug
> to be fixed.
ok, I'll add WARN_ON in cpu_init() as you've suggested.
> Thanks,
>
> Ingo
--
Regards,
Igor
On Mon, 14 Apr 2014 12:04:57 +0200
Ingo Molnar <[email protected]> wrote:
>
> * Igor Mammedov <[email protected]> wrote:
>
> > On Mon, 14 Apr 2014 11:19:54 +0200
> > Ingo Molnar <[email protected]> wrote:
> >
> > >
> > > * Igor Mammedov <[email protected]> wrote:
> > >
> > > > currently if AP wake up is failed, master CPU marks AP as not present
> > > > in do_boot_cpu() by calling set_cpu_present(cpu, false).
> > > > That leads to following list corruption on the next physical CPU
> > > > hotplug:
> > >
> > > Shouldn't this fix precede the main change to the smp bootup logic?
> > >
> > > Can this bug trigger with current upstream kernels?
> > That's not impossible, tests showed that with current kernel there will
> > be other problems due wild AP running around.
> >
> > I'll reorder patch anyway.
>
> So, could you please first make sure that with only the fixes applied
> there's no problems left?
Sure, I'll retest reordered series.
>
> Only then should we apply the patch that adds/tweaks the timeout/etc.
>
> Thanks,
>
> Ingo
--
Regards,
Igor
* Igor Mammedov <[email protected]> wrote:
> On Mon, 14 Apr 2014 12:04:57 +0200
> Ingo Molnar <[email protected]> wrote:
>
> >
> > * Igor Mammedov <[email protected]> wrote:
> >
> > > On Mon, 14 Apr 2014 11:19:54 +0200
> > > Ingo Molnar <[email protected]> wrote:
> > >
> > > >
> > > > * Igor Mammedov <[email protected]> wrote:
> > > >
> > > > > currently if AP wake up is failed, master CPU marks AP as not present
> > > > > in do_boot_cpu() by calling set_cpu_present(cpu, false).
> > > > > That leads to following list corruption on the next physical CPU
> > > > > hotplug:
> > > >
> > > > Shouldn't this fix precede the main change to the smp bootup logic?
> > > >
> > > > Can this bug trigger with current upstream kernels?
> > > That's not impossible, tests showed that with current kernel there will
> > > be other problems due wild AP running around.
> > >
> > > I'll reorder patch anyway.
> >
> > So, could you please first make sure that with only the fixes applied
> > there's no problems left?
>
> Sure, I'll retest reordered series.
Please don't jus test a reodered series, but a 'fixes only' series,
which does not include patch #1.
We will apply that patch too, to improve the bootup of virtualized
environments, but we first want to know whether the 'baseline' is OK
and fixed 100%.
Thanks,
Ingo
On Mon, 14 Apr 2014 12:34:13 +0200
Ingo Molnar <[email protected]> wrote:
>
> * Igor Mammedov <[email protected]> wrote:
>
> > On Mon, 14 Apr 2014 12:04:57 +0200
> > Ingo Molnar <[email protected]> wrote:
> >
> > >
> > > * Igor Mammedov <[email protected]> wrote:
> > >
> > > > On Mon, 14 Apr 2014 11:19:54 +0200
> > > > Ingo Molnar <[email protected]> wrote:
> > > >
> > > > >
> > > > > * Igor Mammedov <[email protected]> wrote:
> > > > >
> > > > > > currently if AP wake up is failed, master CPU marks AP as not present
> > > > > > in do_boot_cpu() by calling set_cpu_present(cpu, false).
> > > > > > That leads to following list corruption on the next physical CPU
> > > > > > hotplug:
> > > > >
> > > > > Shouldn't this fix precede the main change to the smp bootup logic?
> > > > >
> > > > > Can this bug trigger with current upstream kernels?
> > > > That's not impossible, tests showed that with current kernel there will
> > > > be other problems due wild AP running around.
> > > >
> > > > I'll reorder patch anyway.
> > >
> > > So, could you please first make sure that with only the fixes applied
> > > there's no problems left?
> >
> > Sure, I'll retest reordered series.
>
> Please don't jus test a reodered series, but a 'fixes only' series,
> which does not include patch #1.
>
Yep, that's ^^^ what I've meant to do, I'm sorry for not being clear enough.
I'll check that bugs, that patches fix, are fixed and they don't break
something else except of issues #1 fixes of cause.
> We will apply that patch too, to improve the bootup of virtualized
> environments, but we first want to know whether the 'baseline' is OK
> and fixed 100%.
>
> Thanks,
>
> Ingo
--
Regards,
Igor
* Igor Mammedov <[email protected]> wrote:
> On Mon, 14 Apr 2014 12:34:13 +0200
> Ingo Molnar <[email protected]> wrote:
>
> >
> > * Igor Mammedov <[email protected]> wrote:
> >
> > > On Mon, 14 Apr 2014 12:04:57 +0200
> > > Ingo Molnar <[email protected]> wrote:
> > >
> > > >
> > > > * Igor Mammedov <[email protected]> wrote:
> > > >
> > > > > On Mon, 14 Apr 2014 11:19:54 +0200
> > > > > Ingo Molnar <[email protected]> wrote:
> > > > >
> > > > > >
> > > > > > * Igor Mammedov <[email protected]> wrote:
> > > > > >
> > > > > > > currently if AP wake up is failed, master CPU marks AP as not present
> > > > > > > in do_boot_cpu() by calling set_cpu_present(cpu, false).
> > > > > > > That leads to following list corruption on the next physical CPU
> > > > > > > hotplug:
> > > > > >
> > > > > > Shouldn't this fix precede the main change to the smp bootup logic?
> > > > > >
> > > > > > Can this bug trigger with current upstream kernels?
> > > > > That's not impossible, tests showed that with current kernel there will
> > > > > be other problems due wild AP running around.
> > > > >
> > > > > I'll reorder patch anyway.
> > > >
> > > > So, could you please first make sure that with only the fixes applied
> > > > there's no problems left?
> > >
> > > Sure, I'll retest reordered series.
> >
> > Please don't jus test a reodered series, but a 'fixes only' series,
> > which does not include patch #1.
> >
>
> Yep, that's ^^^ what I've meant to do, I'm sorry for not being clear
> enough. I'll check that bugs, that patches fix, are fixed and they
> don't break something else except of issues #1 fixes of cause.
Great, thanks!
Ingo
On Mon, 14 Apr 2014 12:03:35 +0200
Ingo Molnar <[email protected]> wrote:
>
> * Igor Mammedov <[email protected]> wrote:
>
> > On Mon, 14 Apr 2014 11:16:00 +0200
> > Ingo Molnar <[email protected]> wrote:
> >
> > >
> > > * Igor Mammedov <[email protected]> wrote:
> > >
> > > > /*
> > > > + * wait for ACK from master CPU before continuing
> > > > + * with AP initialization
> > > > + */
> > > > + cpumask_set_cpu(cpu, cpu_initialized_mask);
> > > > + while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > > > + cpu_relax();
> > >
> > > > + /*
> > > > + * wait for ACK from master CPU before continuing
> > > > + * with AP initialization
> > > > + */
> > > > + cpumask_set_cpu(cpu, cpu_initialized_mask);
> > > > + while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > > > + cpu_relax();
> > >
> > > That repetitive pattern could be stuck into a properly named helper
> > > inline function.
> > sure
> >
> > > (Also, before the cpumask_set_cpu() we should probably do a WARN_ON()
> > > if the bit is already set.)
WARN_ON will never be triggered here since bit is always cleared by master
CPU before AP gets here. There is no harm keeping WARN_ON though,
do you still want it be here?
It could be useful to put WARN_ON in do_boot_cpu() before bit is cleared,
so that user would see that he tries to online AP which has failed
previous time. It's not really necessary since failed to online attempt
reported in logs at ERR level now, see patch 2/5.
Thanks,
Igor
* Igor Mammedov <[email protected]> wrote:
> On Mon, 14 Apr 2014 12:03:35 +0200
> Ingo Molnar <[email protected]> wrote:
>
> >
> > * Igor Mammedov <[email protected]> wrote:
> >
> > > On Mon, 14 Apr 2014 11:16:00 +0200
> > > Ingo Molnar <[email protected]> wrote:
> > >
> > > >
> > > > * Igor Mammedov <[email protected]> wrote:
> > > >
> > > > > /*
> > > > > + * wait for ACK from master CPU before continuing
> > > > > + * with AP initialization
> > > > > + */
> > > > > + cpumask_set_cpu(cpu, cpu_initialized_mask);
> > > > > + while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > > > > + cpu_relax();
> > > >
> > > > > + /*
> > > > > + * wait for ACK from master CPU before continuing
> > > > > + * with AP initialization
> > > > > + */
> > > > > + cpumask_set_cpu(cpu, cpu_initialized_mask);
> > > > > + while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > > > > + cpu_relax();
> > > >
> > > > That repetitive pattern could be stuck into a properly named helper
> > > > inline function.
> > > sure
> > >
> > > > (Also, before the cpumask_set_cpu() we should probably do a WARN_ON()
> > > > if the bit is already set.)
>
> WARN_ON will never be triggered here since bit is always cleared by
> master CPU before AP gets here. There is no harm keeping WARN_ON
> though, do you still want it be here?
The previous code panic()ed on this condition - so it makes sense to
at least keep a WARN_ON(). That it won't ever trigger is good:
> It could be useful to put WARN_ON in do_boot_cpu() before bit is
> cleared, so that user would see that he tries to online AP which has
> failed previous time. It's not really necessary since failed to
> online attempt reported in logs at ERR level now, see patch 2/5.
WARN_ON()s are not used to communicate with users, they are used to
show developers that there's a _bug_ in the code!
So a WARN_ON() not triggering, ever, is a good thing.
Thanks,
Ingo
On Mon, 14 Apr 2014 16:51:19 +0200
Ingo Molnar <[email protected]> wrote:
>
> * Igor Mammedov <[email protected]> wrote:
>
> > On Mon, 14 Apr 2014 12:03:35 +0200
> > Ingo Molnar <[email protected]> wrote:
> >
> > >
> > > * Igor Mammedov <[email protected]> wrote:
> > >
> > > > On Mon, 14 Apr 2014 11:16:00 +0200
> > > > Ingo Molnar <[email protected]> wrote:
> > > >
> > > > >
> > > > > * Igor Mammedov <[email protected]> wrote:
> > > > >
> > > > > > /*
> > > > > > + * wait for ACK from master CPU before continuing
> > > > > > + * with AP initialization
> > > > > > + */
> > > > > > + cpumask_set_cpu(cpu, cpu_initialized_mask);
> > > > > > + while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > > > > > + cpu_relax();
> > > > >
> > > > > > + /*
> > > > > > + * wait for ACK from master CPU before continuing
> > > > > > + * with AP initialization
> > > > > > + */
> > > > > > + cpumask_set_cpu(cpu, cpu_initialized_mask);
> > > > > > + while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > > > > > + cpu_relax();
> > > > >
> > > > > That repetitive pattern could be stuck into a properly named helper
> > > > > inline function.
> > > > sure
> > > >
> > > > > (Also, before the cpumask_set_cpu() we should probably do a WARN_ON()
> > > > > if the bit is already set.)
> >
> > WARN_ON will never be triggered here since bit is always cleared by
> > master CPU before AP gets here. There is no harm keeping WARN_ON
> > though, do you still want it be here?
>
> The previous code panic()ed on this condition - so it makes sense to
> at least keep a WARN_ON(). That it won't ever trigger is good:
>
> > It could be useful to put WARN_ON in do_boot_cpu() before bit is
> > cleared, so that user would see that he tries to online AP which has
> > failed previous time. It's not really necessary since failed to
> > online attempt reported in logs at ERR level now, see patch 2/5.
>
> WARN_ON()s are not used to communicate with users, they are used to
> show developers that there's a _bug_ in the code!
>
> So a WARN_ON() not triggering, ever, is a good thing.
Thanks for your patience
I'll repost fixed and tested series in a minute
>
> Thanks,
>
> Ingo
On Monday, April 14, 2014 11:21:47 AM Ingo Molnar wrote:
>
> * Igor Mammedov <[email protected]> wrote:
>
> > acpi_processor_add() assumes that present at boot CPUs
> > are always onlined, it is not so if a CPU failed to become
> > onlined. As result acpi_processor_add() will mark such CPU
> > device as onlined in sysfs and following attempts to
> > online/offline it using /sys/device/system/cpu/cpuX/online
> > attribute will fail.
> >
> > Do not poke into device internals in acpi_processor_add()
> > and touch "struct device { .offline }" attribute, since
> > for CPUs onlined at boot it's set by:
> > topology_init() -> arch_register_cpu() -> register_cpu()
> > before ACPI device tree is parsed, and for hotplugged
> > CPUs it's set when userspace onlines CPU via sysfs.
> >
> > Signed-off-by: Igor Mammedov <[email protected]>
> > ---
> > drivers/acpi/acpi_processor.c | 3 ---
> > 1 files changed, 0 insertions(+), 3 deletions(-)
>
> Can this fix be moved first too, or does it have undesirable side
> effects on unmodified kernels?
This patch is not correct.
Rafael
On Thursday, April 10, 2014 07:14:21 PM Igor Mammedov wrote:
> acpi_processor_add() assumes that present at boot CPUs
> are always onlined, it is not so if a CPU failed to become
> onlined.
What do you mean by that? What *exactly* is the failure scenario?
> As result acpi_processor_add() will mark such CPU
> device as onlined in sysfs and following attempts to
> online/offline it using /sys/device/system/cpu/cpuX/online
> attribute will fail.
>
> Do not poke into device internals in acpi_processor_add()
> and touch "struct device { .offline }" attribute, since
> for CPUs onlined at boot it's set by:
> topology_init() -> arch_register_cpu() -> register_cpu()
> before ACPI device tree is parsed, and for hotplugged
> CPUs it's set when userspace onlines CPU via sysfs.
>
> Signed-off-by: Igor Mammedov <[email protected]>
> ---
> drivers/acpi/acpi_processor.c | 3 ---
> 1 files changed, 0 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> index c29c2c3..d56e4b4 100644
> --- a/drivers/acpi/acpi_processor.c
> +++ b/drivers/acpi/acpi_processor.c
> @@ -403,9 +403,6 @@ static int acpi_processor_add(struct acpi_device *device,
> if (result)
> goto err;
>
> - pr->dev = dev;
The line above has to stay as is.
> - dev->offline = pr->flags.need_hotplug_init;
> -
> /* Trigger the processor driver's .probe() if present. */
> if (device_attach(dev) >= 0)
> return 1;
>
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.