2005-12-01 02:24:03

by Shaohua Li

[permalink] [raw]
Subject: [PATCH]nmi VS cpu hotplug

Hi,
With CPU hotplug enabled, NMI watchdog stoped working. It appears the
violation is the cpu_online check in nmi handler. local ACPI based NMI
watchdog is initialized before we set CPU online for APs. It's quite
possible a NMI is fired before we set CPU online, and that's what
happens here.
Zwane, I suppose you saw nmi interrupts on offline CPU, so you added
this one. Several days ago I sent a patch titled 'disable LAPIC
completely for offline CPU', which I guess will make it disappear. Can
you try it?
So the solution is either to initialize nmi later or to delete the
cpu_online check. I just take what x86_64 does.


Signed-off-by: Shaohua Li <[email protected]>
---

linux-2.6.14-root/arch/i386/kernel/traps.c | 7 -------
1 files changed, 7 deletions(-)

diff -puN arch/i386/kernel/traps.c~nmi-cpuhotplug arch/i386/kernel/traps.c
--- linux-2.6.14/arch/i386/kernel/traps.c~nmi-cpuhotplug 2005-12-01 01:22:00.000000000 -0800
+++ linux-2.6.14-root/arch/i386/kernel/traps.c 2005-12-01 01:22:22.000000000 -0800
@@ -650,13 +650,6 @@ fastcall void do_nmi(struct pt_regs * re

cpu = smp_processor_id();

-#ifdef CONFIG_HOTPLUG_CPU
- if (!cpu_online(cpu)) {
- nmi_exit();
- return;
- }
-#endif
-
++nmi_count(cpu);

if (!rcu_dereference(nmi_callback)(regs, cpu))
_



2005-12-01 02:48:41

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH]nmi VS cpu hotplug

On Thu, 1 Dec 2005, Shaohua Li wrote:

> Hi,
> With CPU hotplug enabled, NMI watchdog stoped working. It appears the
> violation is the cpu_online check in nmi handler. local ACPI based NMI
> watchdog is initialized before we set CPU online for APs. It's quite
> possible a NMI is fired before we set CPU online, and that's what
> happens here.
> Zwane, I suppose you saw nmi interrupts on offline CPU, so you added
> this one. Several days ago I sent a patch titled 'disable LAPIC
> completely for offline CPU', which I guess will make it disappear. Can
> you try it?
> So the solution is either to initialize nmi later or to delete the
> cpu_online check. I just take what x86_64 does.
>
>
> Signed-off-by: Shaohua Li <[email protected]>

Signed-off-by: Zwane Mwaikambo <[email protected]>

> ---
>
> linux-2.6.14-root/arch/i386/kernel/traps.c | 7 -------
> 1 files changed, 7 deletions(-)
>
> diff -puN arch/i386/kernel/traps.c~nmi-cpuhotplug arch/i386/kernel/traps.c
> --- linux-2.6.14/arch/i386/kernel/traps.c~nmi-cpuhotplug 2005-12-01 01:22:00.000000000 -0800
> +++ linux-2.6.14-root/arch/i386/kernel/traps.c 2005-12-01 01:22:22.000000000 -0800
> @@ -650,13 +650,6 @@ fastcall void do_nmi(struct pt_regs * re
>
> cpu = smp_processor_id();
>
> -#ifdef CONFIG_HOTPLUG_CPU
> - if (!cpu_online(cpu)) {
> - nmi_exit();
> - return;
> - }
> -#endif

Nice catch, well that's really old debug code for the 'toy' i386 hotplug
code i'm fine with deleting it.

Thanks,
Zwane