2007-05-19 14:17:17

by Thomas Gleixner

[permalink] [raw]
Subject: [PATCH] Ignore bogus ACPI info for offline CPUs

Booting a SMP kernel with maxcpus=1 on a SMP system leads to a hard
hang, because ACPI ignores the maxcpus setting and sends timer broadcast
info for the offline CPUs. This results in a stuck for ever call to
smp_call_function_single() on an offline CPU.

Ignore the bogus information and print a kernel error to remind ACPI
folks to fix it.

Affects 2.6.21 / 2.6.22-rc

Signed-off-by: Thomas Gleixner <[email protected]>

Index: linux-2.6.22-rc/kernel/time/tick-broadcast.c
===================================================================
--- linux-2.6.22-rc.orig/kernel/time/tick-broadcast.c
+++ linux-2.6.22-rc/kernel/time/tick-broadcast.c
@@ -244,11 +244,18 @@ void tick_broadcast_on_off(unsigned long
{
int cpu = get_cpu();

- if (cpu == *oncpu)
- tick_do_broadcast_on_off(&reason);
- else
- smp_call_function_single(*oncpu, tick_do_broadcast_on_off,
- &reason, 1, 1);
+ if (!cpu_isset(*oncpu, cpu_online_map)) {
+ printk(KERN_ERR "tick-braodcast: ignoring broadcast for "
+ "offline CPU #%d\n", *oncpu);
+ } else {
+
+ if (cpu == *oncpu)
+ tick_do_broadcast_on_off(&reason);
+ else
+ smp_call_function_single(*oncpu,
+ tick_do_broadcast_on_off,
+ &reason, 1, 1);
+ }
put_cpu();
}




2007-05-21 16:48:21

by Darren Hart

[permalink] [raw]
Subject: Re: [PATCH] Ignore bogus ACPI info for offline CPUs

On Saturday 19 May 2007 07:22:50 Thomas Gleixner wrote:
> Booting a SMP kernel with maxcpus=1 on a SMP system leads to a hard
> hang, because ACPI ignores the maxcpus setting and sends timer broadcast
> info for the offline CPUs. This results in a stuck for ever call to
> smp_call_function_single() on an offline CPU.
>
> Ignore the bogus information and print a kernel error to remind ACPI
> folks to fix it.
>
> Affects 2.6.21 / 2.6.22-rc
>
> Signed-off-by: Thomas Gleixner <[email protected]>

When I first booted with this patch I received the following in a loop:

irq 9: nobody cared (try booting with the "irqpoll" option)

Call Trace:
[<ffffffff8106d5a4>] dump_trace+0xaa/0x32a
[<ffffffff8106d865>] show_trace+0x41/0x5c
[<ffffffff8106d895>] dump_stack+0x15/0x17
[<ffffffff810c50b8>] __report_bad_irq+0x38/0x87
[<ffffffff810c52cb>] note_interrupt+0x1c4/0x1fc
[<ffffffff810c458d>] thread_simple_irq+0x6c/0x7e
[<ffffffff810c4dc3>] do_irqd+0x14a/0x3e4
[<ffffffff81033d3a>] kthread+0xf5/0x128
[<ffffffff8105ff68>] child_rip+0xa/0x12

handlers:
[<ffffffff8117736e>] (acpi_irq+0x0/0x1b)

I then tried to boot with maxcpus=1 and acpi=noirq and I got all the way to a
login prompt. As we have seen this "nobody cared" and child_rip dump issues
before - I think these are independent issues that should be tracked
separately.

Thanks,

Darren

>
> Index: linux-2.6.22-rc/kernel/time/tick-broadcast.c
> ===================================================================
> --- linux-2.6.22-rc.orig/kernel/time/tick-broadcast.c
> +++ linux-2.6.22-rc/kernel/time/tick-broadcast.c
> @@ -244,11 +244,18 @@ void tick_broadcast_on_off(unsigned long
> {
> int cpu = get_cpu();
>
> - if (cpu == *oncpu)
> - tick_do_broadcast_on_off(&reason);
> - else
> - smp_call_function_single(*oncpu, tick_do_broadcast_on_off,
> - &reason, 1, 1);
> + if (!cpu_isset(*oncpu, cpu_online_map)) {
> + printk(KERN_ERR "tick-braodcast: ignoring broadcast for "
> + "offline CPU #%d\n", *oncpu);
> + } else {
> +
> + if (cpu == *oncpu)
> + tick_do_broadcast_on_off(&reason);
> + else
> + smp_call_function_single(*oncpu,
> + tick_do_broadcast_on_off,
> + &reason, 1, 1);
> + }
> put_cpu();
> }



--
Darren Hart
IBM Linux Technology Center
Realtime Linux Team

2007-05-21 17:12:19

by Chris Wright

[permalink] [raw]
Subject: Re: [stable] [PATCH] Ignore bogus ACPI info for offline CPUs

* Darren Hart ([email protected]) wrote:
> On Saturday 19 May 2007 07:22:50 Thomas Gleixner wrote:
> > Booting a SMP kernel with maxcpus=1 on a SMP system leads to a hard
> > hang, because ACPI ignores the maxcpus setting and sends timer broadcast
> > info for the offline CPUs. This results in a stuck for ever call to
> > smp_call_function_single() on an offline CPU.
> >
> > Ignore the bogus information and print a kernel error to remind ACPI
> > folks to fix it.
> >
> > Affects 2.6.21 / 2.6.22-rc
> >
> > Signed-off-by: Thomas Gleixner <[email protected]>
>
> When I first booted with this patch I received the following in a loop:
>
> irq 9: nobody cared (try booting with the "irqpoll" option)

What happens when booting w/out this patch? Don't want to add known
regression to -stable.

thanks,
-chris

2007-05-21 17:25:30

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [stable] [PATCH] Ignore bogus ACPI info for offline CPUs

On Mon, 2007-05-21 at 10:08 -0700, Chris Wright wrote:
> * Darren Hart ([email protected]) wrote:
> > On Saturday 19 May 2007 07:22:50 Thomas Gleixner wrote:
> > > Booting a SMP kernel with maxcpus=1 on a SMP system leads to a hard
> > > hang, because ACPI ignores the maxcpus setting and sends timer broadcast
> > > info for the offline CPUs. This results in a stuck for ever call to
> > > smp_call_function_single() on an offline CPU.
> > >
> > > Ignore the bogus information and print a kernel error to remind ACPI
> > > folks to fix it.
> > >
> > > Affects 2.6.21 / 2.6.22-rc
> > >
> > > Signed-off-by: Thomas Gleixner <[email protected]>
> >
> > When I first booted with this patch I received the following in a loop:
> >
> > irq 9: nobody cared (try booting with the "irqpoll" option)
>
> What happens when booting w/out this patch? Don't want to add known
> regression to -stable.

See commit log:

Booting a SMP kernel with maxcpus=1 on a SMP system leads to a hard
hang, because ACPI ignores the maxcpus setting and sends timer broadcast
info for the offline CPUs. This results in a stuck for ever call to
smp_call_function_single() on an offline CPU.

The irq 9 issue is a separate problem and only surfaces on some boxen,
but it's not related to this patch. It's related to maxcpus=1 as well.

tglx


2007-05-21 17:58:46

by Darren Hart

[permalink] [raw]
Subject: Re: [stable] [PATCH] Ignore bogus ACPI info for offline CPUs

On Monday 21 May 2007 10:08:15 Chris Wright wrote:
> * Darren Hart ([email protected]) wrote:
> > On Saturday 19 May 2007 07:22:50 Thomas Gleixner wrote:
> > > Booting a SMP kernel with maxcpus=1 on a SMP system leads to a hard
> > > hang, because ACPI ignores the maxcpus setting and sends timer
> > > broadcast info for the offline CPUs. This results in a stuck for ever
> > > call to smp_call_function_single() on an offline CPU.
> > >
> > > Ignore the bogus information and print a kernel error to remind ACPI
> > > folks to fix it.
> > >
> > > Affects 2.6.21 / 2.6.22-rc
> > >
> > > Signed-off-by: Thomas Gleixner <[email protected]>
> >
> > When I first booted with this patch I received the following in a loop:
> >
> > irq 9: nobody cared (try booting with the "irqpoll" option)
>
> What happens when booting w/out this patch? Don't want to add known
> regression to -stable.

The system will boot w/o acpi=noirq with and without this patch if maxcpus is
not specified. If maxcpus is specified without the patch, the system locks
up as Thomas described. If maxcpus is specified with the patch, then
acpi=noirq is required to boot - but it will boot. This does not introduce a
regression IMO.

--Darren

>
> thanks,
> -chris



--
Darren Hart
IBM Linux Technology Center
Realtime Linux Team

2007-05-21 18:00:24

by Chris Wright

[permalink] [raw]
Subject: Re: [stable] [PATCH] Ignore bogus ACPI info for offline CPUs

* Thomas Gleixner ([email protected]) wrote:
> The irq 9 issue is a separate problem and only surfaces on some boxen,
> but it's not related to this patch. It's related to maxcpus=1 as well.

Thanks for confirming, just was double checking the two issues were
indeed separate.

thanks,
-chris