/* pending-IPI.c - Panic a kernel with IPI pending */
#include <linux/config.h>
#include <linux/module.h>
#include <linux/smp.h>
int init_module(void)
{
local_irq_disable();
apic_wait_icr_idle();
apic_write_around(APIC_ICR2, SET_APIC_DEST_FIELD(1));
apic_write_around(APIC_ICR, (APIC_DM_FIXED
| APIC_DEST_LOGICAL | 0xfb));
panic("pending-IPI: Bye-bye.");
return 0;
}
MODULE_LICENSE("GPL");
MODULE_AUTHOR("OBATA Noboru <[email protected]>");
MODULE_DESCRIPTION("Panic a kernel with IPI pending");
OBATA Noboru <[email protected]> writes:
> Hi all,
>
> I'm having a problem in kdump on a Pentium 4 box when using SMP
> kernel as the second (dump capturing) kernel. Although the use
> of SMP kernel as the second kernel is discouraged in
> Documentation/kdump/kdump.txt, I hope discussing the problem
> here will benefit users in the future.
The problem with SMP is simply the additional cpus. maxcpus=1
is good enough.
Are you using the version in the -mm tree?
We have recently put in code changes so the new kernel comes
up in apic mode and there is less work done by the crashing kernel.
I don't know if this will affect your problem but it is a good place to start.
> I tracked down the problem to write a small module (attached)
> that can reproduce the problem.
>
> How to reproduce:
>
> - Configure both the first and the second kernels to be
> CONFIG_SMP=y
> - Boot the first kernel with "maxcpus=1"
> - Run "insmod pending-IPI.ko" to oops the second kernel
>
> The module makes IPI (Inter-Processor Interrupt) pending upon a
> crash. This is done by:
>
> - Disable local interrupts.
> - Send IPI to CPU#0.
> - Call panic().
>
> What happens in the second kernel is that it receives the
> pending IPI at the first local_irq_enable() in init/main.c, and
> the IPI handler calls the uninitialized function pointer
> call_data->func in arch/i386/kernel/smp.c.
>
> One way to solve this problem is to implement some blocking
> mechanism in the IPI handler so that it prevents itself from
> calling the uninitialized pointer.
Or simply initializing the pointer to NULL and testing to
see if the IPI handler has been initialized. Weird but
doable.
...
Looking at the code it appears this problem is specific to
SMP kernels (the IPI code only appears there), call_data
is already initialized to NULL before use. So it should
just be a matter of returning acking the irq and returning
if call_data is not set.
> But pending interrupts on other vectors may cause another
> problem. They would cause misrouted IRQs, which are now
> addressed by "irqpoll" kernel parameter. But I'm not sure this
> solves all the problems.
The irqs should not be misrouted. They simply come at an unexpected
time.
> So another way to solve this problem is to clear all such
> pending interrupts before booting the second kernel.
No. The second kernel gets to cope, because we cannot
do anything reliable in the crashed kernel.
> This is
> more challenging because the pending status (IRR bits ON) in a
> local APIC can only be cleared by the acceptance of the
> interrupt by CPU core (See section 8.8.4 in "IA-32 Intel(R)
> Architecture Software Developer's Manual Vol.3"). That is, we
> need to enable the interrupt somewhere in a crashed kernel to
> clear the pending status.
No. There is no benefit to doing that in the crashed kernel over
doing it in the new kernel. We just need to figure out how to
make the kernel as robust as possible.
> Any other ideas to deal with pending interrupts?
Interrupts are hard but it may make sense to disable bus masters,
and interrupts and reset as many busses as possible when the
system is coming up.
> OBATA Noboru ([email protected])
>
> /* pending-IPI.c - Panic a kernel with IPI pending */
>
> #include <linux/config.h>
> #include <linux/module.h>
> #include <linux/smp.h>
>
> int init_module(void)
> {
> local_irq_disable();
>
> apic_wait_icr_idle();
> apic_write_around(APIC_ICR2, SET_APIC_DEST_FIELD(1));
> apic_write_around(APIC_ICR, (APIC_DM_FIXED
> | APIC_DEST_LOGICAL | 0xfb));
>
> panic("pending-IPI: Bye-bye.");
>
> return 0;
> }
>
> MODULE_LICENSE("GPL");
> MODULE_AUTHOR("OBATA Noboru <[email protected]>");
> MODULE_DESCRIPTION("Panic a kernel with IPI pending");
This looks like a good test case.
Eric
Being lazy does this patch fix the problem for you?
It looks like this is enough to keep the kernel going
in the face of unexpected IPIs ...
Eric
---
arch/i386/kernel/smp.c | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)
applies-to: e6a6c8ed12ba1ef7fa376fa3993e3c329e9f294a
195ab3620ba410697ad78957226c5493d55dd2ee
diff --git a/arch/i386/kernel/smp.c b/arch/i386/kernel/smp.c
index 218d725..3d33125 100644
--- a/arch/i386/kernel/smp.c
+++ b/arch/i386/kernel/smp.c
@@ -560,6 +560,7 @@ int smp_call_function (void (*func) (voi
if (wait)
while (atomic_read(&data.finished) != cpus)
cpu_relax();
+ call_data = NULL;
spin_unlock(&call_lock);
return 0;
@@ -609,6 +610,14 @@ fastcall void smp_call_function_interrup
int wait = call_data->wait;
ack_APIC_irq();
+
+ /* Ignore spurious IPIs */
+ if (!call_data)
+ return;
+
+ func = call_data->func;
+ info = call_data->info;
+ wait = call_data->wait;
/*
* Notify initiating CPU that I've grabbed the data and am
* about to execute the function
seems to bad patch. you dereference pointer (1) before check to NULL(2).
> ---
>
> arch/i386/kernel/smp.c | 9 +++++++++
> 1 files changed, 9 insertions(+), 0 deletions(-)
>
> applies-to: e6a6c8ed12ba1ef7fa376fa3993e3c329e9f294a
> 195ab3620ba410697ad78957226c5493d55dd2ee
> diff --git a/arch/i386/kernel/smp.c b/arch/i386/kernel/smp.c
> index 218d725..3d33125 100644
> --- a/arch/i386/kernel/smp.c
> +++ b/arch/i386/kernel/smp.c
> @@ -560,6 +560,7 @@ int smp_call_function (void (*func) (voi
> if (wait)
> while (atomic_read(&data.finished) != cpus)
> cpu_relax();
> + call_data = NULL;
> spin_unlock(&call_lock);
>
> return 0;
> @@ -609,6 +610,14 @@ fastcall void smp_call_function_interrup
> int wait = call_data->wait;
(1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> ack_APIC_irq();
> +
> + /* Ignore spurious IPIs */
> + if (!call_data)
> + return;
(2) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Alex Lyashkov <[email protected]> writes:
> seems to bad patch. you dereference pointer (1) before check to NULL(2).
Duh. I forgot to delete the earlier references.
That should have been...
---
arch/i386/kernel/smp.c | 15 ++++++++++++---
1 files changed, 12 insertions(+), 3 deletions(-)
applies-to: e6a6c8ed12ba1ef7fa376fa3993e3c329e9f294a
50119a3947498cd8112455e8111f0579f5b8a232
diff --git a/arch/i386/kernel/smp.c b/arch/i386/kernel/smp.c
index 218d725..b0fb524 100644
--- a/arch/i386/kernel/smp.c
+++ b/arch/i386/kernel/smp.c
@@ -560,6 +560,7 @@ int smp_call_function (void (*func) (voi
if (wait)
while (atomic_read(&data.finished) != cpus)
cpu_relax();
+ call_data = NULL;
spin_unlock(&call_lock);
return 0;
@@ -604,11 +605,19 @@ fastcall void smp_reschedule_interrupt(s
fastcall void smp_call_function_interrupt(struct pt_regs *regs)
{
- void (*func) (void *info) = call_data->func;
- void *info = call_data->info;
- int wait = call_data->wait;
+ void (*func) (void *info);
+ void *info;
+ int wait;
ack_APIC_irq();
+
+ /* Ignore spurious IPIs */
+ if (!call_data)
+ return;
+
+ func = call_data->func;
+ info = call_data->info;
+ wait = call_data->wait;
/*
* Notify initiating CPU that I've grabbed the data and am
* about to execute the function
---
0.99.8.GIT
On Thu, 27 Oct 2005, Eric W. Biederman wrote:
>
> Alex Lyashkov <[email protected]> writes:
>
> > seems to bad patch. you dereference pointer (1) before check to NULL(2).
>
> Duh. I forgot to delete the earlier references.
> That should have been...
Thank you Eric, the patch you resent has fixed the problem,
which originally happened on vanilla 2.6.13.
But, on such an unusual case, I prefer to mark it unlikely() and
output some warning message.
> + /* Ignore spurious IPIs */
> + if (!call_data)
> + return;
if (unlikely(!call_data)) {
printk(KERN_WARNING "spurious IPI on CPU#%d, ignored\n",
smp_processor_id());
return;
}
Please take this if you like it, Eric. I have also tested this
printk-added code.
Regards,
--
OBATA Noboru ([email protected])
On Thu, 27 Oct 2005, Eric W. Biederman wrote:
>
> OBATA Noboru <[email protected]> writes:
>
> > What happens in the second kernel is that it receives the
> > pending IPI at the first local_irq_enable() in init/main.c, and
> > the IPI handler calls the uninitialized function pointer
> > call_data->func in arch/i386/kernel/smp.c.
> >
> > One way to solve this problem is to implement some blocking
> > mechanism in the IPI handler so that it prevents itself from
> > calling the uninitialized pointer.
>
> Or simply initializing the pointer to NULL and testing to
> see if the IPI handler has been initialized. Weird but
> doable.
As I replied in the other thread, your patch has solved the
problem. Thank you.
> > But pending interrupts on other vectors may cause another
> > problem. They would cause misrouted IRQs, which are now
> > addressed by "irqpoll" kernel parameter. But I'm not sure this
> > solves all the problems.
>
> The irqs should not be misrouted. They simply come at an unexpected
> time.
Okay, I guess the irqs are not misrouted because kdump does not
touch the IO-APIC routing table, and the second kernel will
build the same one.
> > So another way to solve this problem is to clear all such
> > pending interrupts before booting the second kernel.
>
> No. The second kernel gets to cope, because we cannot
> do anything reliable in the crashed kernel.
Well, I first thought that restoring the hardware status back to
normal before booting the second kernel is better because it may
require fewer changes on the kernel core code.
But now I'm getting the idea what kdump is trying to do. Kdump
wants to do less in the crashed kernel, and solve problems in
the second kernel.
> > /* pending-IPI.c - Panic a kernel with IPI pending */
> >
> > #include <linux/config.h>
> > #include <linux/module.h>
> > #include <linux/smp.h>
> >
> > int init_module(void)
> > {
> > local_irq_disable();
> >
> > apic_wait_icr_idle();
> > apic_write_around(APIC_ICR2, SET_APIC_DEST_FIELD(1));
> > apic_write_around(APIC_ICR, (APIC_DM_FIXED
> > | APIC_DEST_LOGICAL | 0xfb));
> >
> > panic("pending-IPI: Bye-bye.");
> >
> > return 0;
> > }
> >
> > MODULE_LICENSE("GPL");
> > MODULE_AUTHOR("OBATA Noboru <[email protected]>");
> > MODULE_DESCRIPTION("Panic a kernel with IPI pending");
>
>
> This looks like a good test case.
I think we need more test cases, especially the cases that focus
on "status" of hardware, to make kdump more reliable. Kdump
should recover from all possible status of supported hardware.
Is anyone working on developing such test cases for kdump?
Regards,
--
OBATA Noboru ([email protected])
OBATA Noboru <[email protected]> writes:
> On Thu, 27 Oct 2005, Eric W. Biederman wrote:
>>
>> > But pending interrupts on other vectors may cause another
>> > problem. They would cause misrouted IRQs, which are now
>> > addressed by "irqpoll" kernel parameter. But I'm not sure this
>> > solves all the problems.
>>
>> The irqs should not be misrouted. They simply come at an unexpected
>> time.
>
> Okay, I guess the irqs are not misrouted because kdump does not
> touch the IO-APIC routing table, and the second kernel will
> build the same one.
Right the problem irqpoll addresses is irqs that get stuck on.
The kernel will disable them at the interrupt controller and
we need a way to continue. Mostly that only happens if the
irq is shared with something else, that we don't load a driver for,
or if an irq comes in before the driver initializes.
>> > So another way to solve this problem is to clear all such
>> > pending interrupts before booting the second kernel.
>>
>> No. The second kernel gets to cope, because we cannot
>> do anything reliable in the crashed kernel.
>
> Well, I first thought that restoring the hardware status back to
> normal before booting the second kernel is better because it may
> require fewer changes on the kernel core code.
>
> But now I'm getting the idea what kdump is trying to do. Kdump
> wants to do less in the crashed kernel, and solve problems in
> the second kernel.
Right and the result is a more robust kernel in general. In most
cases the failure mode is a second kernel that doesn't boot,
or it doesn't successfully initialize the drivers. Which is
a much better failure than potentially scribbling random
data all over you disk, which is what using drivers in a broken
kernel can do.
> I think we need more test cases, especially the cases that focus
> on "status" of hardware, to make kdump more reliable. Kdump
> should recover from all possible status of supported hardware.
Given that part of all possible status is broken hardware,
that isn't necessarily possible. Still attempting to recover
from all possible status is a sound plan.
> Is anyone working on developing such test cases for kdump?
Not to my knowledge. The big push until just lately has simply
been to get the core working. Vivek Goyal would be the most
likely suspect. But feel free to work on pathological scenarios.
I'm still not quite convinced that crashdumps are interesting yet :)
Eric
On Tue, Nov 01, 2005 at 04:34:49AM -0700, Eric W. Biederman wrote:
> OBATA Noboru <[email protected]> writes:
>
[..]
> > I think we need more test cases, especially the cases that focus
> > on "status" of hardware, to make kdump more reliable. Kdump
> > should recover from all possible status of supported hardware.
>
> Given that part of all possible status is broken hardware,
> that isn't necessarily possible. Still attempting to recover
> from all possible status is a sound plan.
>
> > Is anyone working on developing such test cases for kdump?
>
> Not to my knowledge. The big push until just lately has simply
> been to get the core working. Vivek Goyal would be the most
> likely suspect. But feel free to work on pathological scenarios.
>
Currently I am trying to focus on fixing the already reported issues and
have not looked into special scenarios where kdump might fail. But your
effort in this direction is very much appreciated. We need to dig up such
corner cases to make kdump more reliable and robust.
Thanks
Vivek