LinuxLists.cc - [PATCH] kdump: Fix for boot problems on SMP

[permalink] [raw]

Subject: Re: [PATCH] kdump: Fix for boot problems on SMP

Hari,

Tested the patch on my 4-way P-III (where it was hanging earlier)
and it works fine for me.

Thanks,
Badari

Hariprasad Nellitheertha wrote:
> Hi Andrew,
>
> There was a buggy (and unnecessary) reserve_bootmem call in the kdump
> call which was causing hangs during early on some SMP machines. The
> attached patch removes that.
>
> Kindly include this patch into the -mm tree.
>
> Thanks and Regards, Hari
>
>
> ------------------------------------------------------------------------
>
>
>
> Signed-off-by: Hariprasad Nellitheertha <[email protected]>
> ---
>
> linux-2.6.10-rc2-hari/include/asm-i386/crash_dump.h | 1 -
> 1 files changed, 1 deletion(-)
>
> diff -puN include/asm-i386/crash_dump.h~kdump-reserve-bootmem-fix include/asm-i386/crash_dump.h
> --- linux-2.6.10-rc2/include/asm-i386/crash_dump.h~kdump-reserve-bootmem-fix 2004-11-18 19:20:47.000000000 +0530
> +++ linux-2.6.10-rc2-hari/include/asm-i386/crash_dump.h 2004-11-18 19:21:03.000000000 +0530
> @@ -37,7 +37,6 @@ static inline void set_saved_max_pfn(voi
> static inline void crash_reserve_bootmem(void)
> {
> if (!dump_enabled) {
> - reserve_bootmem(0, CRASH_RELOCATE_SIZE);
> reserve_bootmem(CRASH_BACKUP_BASE,
> CRASH_BACKUP_SIZE + CRASH_RELOCATE_SIZE + PAGE_SIZE);
> }
> _

2004-11-19 17:55:04

[permalink] [raw]

Subject: Re: [PATCH] kdump: Fix for boot problems on SMP

On Thursday 18 November 2004 23:08, Hariprasad Nellitheertha wrote:

> There was a buggy (and unnecessary) reserve_bootmem call in the kdump
> call which was causing hangs during early on some SMP machines. The
> attached patch removes that.

Thanks! I also had the same problem.

BTW, If the first kernel enabled CONFIG_DISCONTIGMEM, the second kernel could
not boot. since crash_reserve_bootmem() never called anywhere.

--- 2.6-mm/arch/i386/mm/discontig.c.orig 2004-11-20 00:14:42.000000000 +0900
+++ 2.6-mm/arch/i386/mm/discontig.c 2004-11-20 00:39:38.000000000 +0900
@@ -32,6 +32,7 @@
#include <asm/e820.h>
#include <asm/setup.h>
#include <asm/mmzone.h>
+#include <asm/crash_dump.h>
#include <bios_ebda.h>

struct pglist_data *node_data[MAX_NUMNODES];
@@ -363,6 +364,9 @@ unsigned long __init setup_memory(void)
}
}
#endif
+
+ crash_reserve_bootmem();
+
return system_max_low_pfn;
}

2004-11-19 23:31:32

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH] kdump: Fix for boot problems on SMP

Akinobu Mita <[email protected]> wrote:
>
> On Thursday 18 November 2004 23:08, Hariprasad Nellitheertha wrote:
>
> > There was a buggy (and unnecessary) reserve_bootmem call in the kdump
> > call which was causing hangs during early on some SMP machines. The
> > attached patch removes that.
>
> Thanks! I also had the same problem.

So.. How is the crashdump code working now? I haven't heard from anyone
who is using it and I haven't gotten onto testing it myself.

Do we have any feeling for its success rate on various machines, and on its
ease of use?

2004-11-19 23:49:53

[permalink] [raw]

Subject: Re: [PATCH] kdump: Fix for boot problems on SMP

Hi Andrew,

I haven't tested it yet on any of my machines (due to the hang).
I am about to give it a try. But my understanding (please update
me if I am wrong) is,

1) DISCONTIG_MEM support is not working yet - so i can't use any
of my NUMA boxes.

2) AMD64 is not supported - i can't use my Opteron machine.

3) ppc is not supported - i can't use Power3 and Power4 machines.

So, I can only try it on non-NUMA i386 smp boxes. I have few of
those to try. I will give an update next week on my testing.

Thanks,
Badari

On Fri, 2004-11-19 at 15:30, Andrew Morton wrote:
> Akinobu Mita <[email protected]> wrote:
> >
> > On Thursday 18 November 2004 23:08, Hariprasad Nellitheertha wrote:
> >
> > > There was a buggy (and unnecessary) reserve_bootmem call in the kdump
> > > call which was causing hangs during early on some SMP machines. The
> > > attached patch removes that.
> >
> > Thanks! I also had the same problem.
>
> So.. How is the crashdump code working now? I haven't heard from anyone
> who is using it and I haven't gotten onto testing it myself.
>
> Do we have any feeling for its success rate on various machines, and on its
> ease of use?
>
>

2004-11-20 01:45:25

[permalink] [raw]

Subject: Re: [PATCH] kdump: Fix for boot problems on SMP

Well. I tried to use kdump.

I think documentation needs update. Documentation says

..

4) Load the second kernel to be booted using

kexec -p <second-kernel> --args-linux --append="root=<root-dev> dump
init 1 memmap=exactmap memmap=640k@0 memmap=32M@16M"

But kexec doesn't seem to like option "-p".
Even when I removed "-p", its complaining about "--args-linux"

# ./kexec --args-linux --append="root=/dev/sda2 dump init 1
memmap=exactmap memmap=640k@0 memmap=32M@16M" /boot/kexec2

./kexec: unrecognized option `--args-linux'
kexec 1.98 released 15 September 2004
Usage: kexec [OPTION]... [kernel]
Directly reboot into a new kernel

-h, --help Print this help.
-v, --version Print the version of kexec.
-f, --force Force an immediate kexec, don't call shutdown.
-x, --no-ifdown Don't bring down network interfaces.
(if used, must be last option specified)
-l, --load Load the new kernel into the current kernel.
-u, --unload Unload the current kexec target kernel.
-e, --exec Execute a currently loaded kernel.
-t, --type=TYPE Specify the new kernel is of this type.

Supported kernel file types and options:
elf32-x86
--command-line=STRING Set the kernel command line to STRING
--append=STRING Set the kernel command line to STRING
--initrd=FILE Use FILE as the kernel's initial ramdisk.
--ramdisk=FILE Use FILE as the kernel's initial ramdisk.
--args-linux Pass linux kernel style options
--args-elf Pass elf boot notes
bzImage
-d, --debug Enable debugging to help spot a failure.
--real-mode Use the kernels real mode entry point.
--command-line=STRING Set the kernel command line to STRING.
--append=STRING Set the kernel command line to STRING.
--initrd=FILE Use FILE as the kernel's initial ramdisk.
--ramdisk=FILE Use FILE as the kernel's initial ramdisk.

Cannot load /boot/kexec2

Thanks,
Badari

On Fri, 2004-11-19 at 15:30, Andrew Morton wrote:
> Akinobu Mita <[email protected]> wrote:
> >
> > On Thursday 18 November 2004 23:08, Hariprasad Nellitheertha wrote:
> >
> > > There was a buggy (and unnecessary) reserve_bootmem call in the kdump
> > > call which was causing hangs during early on some SMP machines. The
> > > attached patch removes that.
> >
> > Thanks! I also had the same problem.
>
> So.. How is the crashdump code working now? I haven't heard from anyone
> who is using it and I haven't gotten onto testing it myself.
>
> Do we have any feeling for its success rate on various machines, and on its
> ease of use?
>
>

2004-11-20 03:05:07

[permalink] [raw]

Subject: Re: [PATCH] kdump: Fix for boot problems on SMP

I've forgotten CC-ing.

On Saturday 20 November 2004 10:05, Badari Pulavarty wrote:

> 4) Load the second kernel to be booted using
>
> kexec -p <second-kernel> --args-linux --append="root=<root-dev> dump
> init 1 memmap=exactmap memmap=640k@0 memmap=32M@16M"
>
> But kexec doesn't seem to like option "-p".
> Even when I removed "-p", its complaining about "--args-linux"

I also have the kexec which does not have "-p" option.
Instead of using "-p" option, I use "-l" option after changing the kexec
as follows.

--- kexec-tools-1.98/kexec/kexec.c.orig 2004-10-31 19:42:34.000000000 +0900
+++ kexec-tools-1.98/kexec/kexec.c 2004-10-31 19:43:01.000000000 +0900
@@ -243,7 +243,7 @@ static int my_load(const char *type, int
if (sort_segments(segments, nr_segments) < 0) {
return -1;
}
- result = kexec_load(entry, nr_segments, segments, 0);
+ result = kexec_load(entry, nr_segments, segments, 1);
if (result != 0) {
/* The load failed, print some debugging information */
fprintf(stderr, "kexec_load failed: %s\n",

2004-11-20 03:44:00

[permalink] [raw]

Subject: Re: [PATCH] kdump: Fix for boot problems on SMP

On Saturday 20 November 2004 08:30, Andrew Morton wrote:
> So.. How is the crashdump code working now? I haven't heard from anyone
> who is using it and I haven't gotten onto testing it myself.
>
> Do we have any feeling for its success rate on various machines, and on its
> ease of use?

Though I always genarate a panic intentionally on normal UP box,
(enable panic_on_oops, and generate kernel NULL pointer dereference)
It allways boot second-kernel successfully.

# gdb <first-kernel> -c /proc/vmcore
...

"up" or "down", and it displays the correct local/global values with "print"

2004-11-22 16:36:01

[permalink] [raw]

Subject: Re: [PATCH] kdump: Fix for boot problems on SMP

Signed-off-by: Hariprasad Nellitheertha <[email protected]>

---

kexec-tools-1.95-hari/kexec/kexec.c | 10 +++++++++-
kexec-tools-1.95-hari/kexec/kexec.h | 6 ++++--
2 files changed, 13 insertions(+), 3 deletions(-)

diff -puN kexec/kexec.c~kexec-tools-panic kexec/kexec.c
--- kexec-tools-1.95/kexec/kexec.c~kexec-tools-panic 2004-10-18 14:27:27.000000000 +0530
+++ kexec-tools-1.95-hari/kexec/kexec.c 2004-10-19 21:00:23.000000000 +0530
@@ -30,6 +30,7 @@
/* local variables */
static struct memory_range *memory_range;
static int memory_ranges;
+static unsigned long load_flags;

int valid_memory_range(struct kexec_segment *segment)
{
@@ -243,7 +244,7 @@ static int my_load(const char *type, int
if (sort_segments(segments, nr_segments) < 0) {
return -1;
}
- result = kexec_load(entry, nr_segments, segments, 0);
+ result = kexec_load(entry, nr_segments, segments, load_flags);
if (result != 0) {
/* The load failed, print some debugging information */
fprintf(stderr, "kexec_load failed: %s\n",
@@ -325,6 +326,7 @@ void usage(void)
" -u, --unload Unload the current kexec target kernel.\n"
" -e, --exec Execute a currently loaded kernel.\n"
" -t, --type=TYPE Specify the new kernel is of this type.\n"
+ " -p, --load-panic Load kernel for the reboot on panic case.\n"
"\n"
"Supported kernel file types and options: \n"
);
@@ -393,6 +395,12 @@ int main(int argc, char *argv[])
case OPT_TYPE:
type = optarg;
break;
+ case OPT_PANIC:
+ do_load = 1;
+ do_exec = 0;
+ do_shutdown = 0;
+ load_flags = 1;
+ break;
default:
break;
}
diff -puN kexec/kexec.h~kexec-tools-panic kexec/kexec.h
--- kexec-tools-1.95/kexec/kexec.h~kexec-tools-panic 2004-10-18 14:36:23.000000000 +0530
+++ kexec-tools-1.95-hari/kexec/kexec.h 2004-10-20 14:09:46.000000000 +0530
@@ -45,6 +45,7 @@ extern int file_types;
#define OPT_LOAD 'l'
#define OPT_UNLOAD 'u'
#define OPT_TYPE 't'
+#define OPT_PANIC 'p'
#define OPT_MAX 256
#define KEXEC_OPTIONS \
{ "help", 0, 0, OPT_HELP }, \
@@ -54,7 +55,8 @@ extern int file_types;
{ "load", 0, 0, OPT_LOAD }, \
{ "unload", 0, 0, OPT_UNLOAD }, \
{ "exec", 0, 0, OPT_EXEC }, \
- { "type", 1, 0, OPT_TYPE },
-#define KEXEC_OPT_STR "hvdfxluet:"
+ { "type", 1, 0, OPT_TYPE }, \
+ { "panic", 0, 0, OPT_PANIC },
+#define KEXEC_OPT_STR "hvdfxluet:p"

#endif /* KEXEC_H */

_

Attachments:

kexec-tools-panic.patch (2.35 kB)

2004-11-22 22:58:22

[permalink] [raw]

Subject: Re: [PATCH] kdump: Fix for boot problems on SMP

Hari,

Thanks for the patch and I tried it.

I hacked "sysrq-b" to call panic() to test this.
So far, my success is limited.

These could be already known and being worked on ..
Out of few times I tried, I run into following.

1) When panic the system, I get
Badness in smp_call_function() in arch/i386/kernel/smp.c: 552
and the system hangs.

2) Machine boots to single user only with 1 CPU.
I get following msgs while booting second kernel.

..

Booting processor 1/1 eip 2000
Stuck ??
Inquiring remote APIC #1...
... APIC #1 ID: 01000000
... APIC #1 VERSION: 00040011
... APIC #1 SPIV: 000000ff
CPU #1 not responding - cannot use it.
Booting processor 1/2 eip 2000
Stuck ??
Inquiring remote APIC #2...
... APIC #2 ID: 02000000
... APIC #2 VERSION: 00040011
... APIC #2 SPIV: 000000ff
CPU #2 not responding - cannot use it.
Booting processor 1/3 eip 2000
Stuck ??
Inquiring remote APIC #3...
... APIC #3 ID: 03000000
... APIC #3 VERSION: 00040011
...

3) When I tried to run gdb on the core file,
gdb gets killed since there is not enough memory.
(this is on the second kernel - so this could be okay).

#gdb vmlinux.kexec1 ../core/vmcore.1
GNU gdb 5.2.1
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for
details.
This GDB was configured as "i586-suse-linux"...oom-killer:
gfp_mask=0x1d2
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 4, high 12, batch 2
cpu 0 cold: low 0, high 4, batch 2
HighMem per-cpu: empty

Free pages: 1116kB (0kB HighMem)
Active:2222 inactive:3280 dirty:0 writeback:0 unstable:0 free:279
slab:804 mapped:2275 pagetables:23
DMA free:292kB min:292kB low:364kB high:436kB active:108kB
inactive:128kB present:16384kB pages_scanned:544 all_unreclaimable? yes
protections[]: 0 0 0
Normal free:824kB min:588kB low:732kB high:880kB active:8780kB
inactive:12992kB present:32768kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
DMA: 1*4kB 0*8kB 0*16kB 1*32kB 0*64kB 0*128kB 1*256kB 0*512kB 0*1024kB
0*2048kB 0*4096kB = 292kB
Normal: 44*4kB 7*8kB 1*16kB 0*32kB 3*64kB 1*128kB 1*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 824kB
HighMem: empty
Swap cache: add 23125, delete 19925, find 8355/9281, race 2+1
Out of Memory: Killed process 4290 (gdb).
Terminated

FYI.

Thanks,
Badari

On Mon, 2004-11-22 at 08:03, Hariprasad Nellitheertha wrote:
> Akinobu Mita wrote:
> > I've forgotten CC-ing.
> >
> > On Saturday 20 November 2004 10:05, Badari Pulavarty wrote:
> >
> >
> >>4) Load the second kernel to be booted using
> >>
> >> kexec -p <second-kernel> --args-linux --append="root=<root-dev> dump
> >> init 1 memmap=exactmap memmap=640k@0 memmap=32M@16M"
> >>
> >>But kexec doesn't seem to like option "-p".
> >>Even when I removed "-p", its complaining about "--args-linux"
>
>
> There is a kexec-tools patch that is required to get the "-p" option
> working. I had sent it out only to the fastboot mailing list without
> updating kdump documentation. I will send out an updated documentation
> patch indicating this requirement (I will host the patch on some site
> and point to it in the document).
>
> Meanwhile, I am attaching the patch with this note. Kindly try kdump
> with this. Thanks!
>
> Regards, Hari
>

2004-11-23 01:05:55

[permalink] [raw]

Subject: Re: [PATCH] kdump: Fix for boot problems on SMP

More info testing results...

gdb is not showing the stack info properly, on my saved vmcore.
I thought vmlinux is not matching the vmcore, so I verified that
vmcore and vmlinux matchup. But still no luck...

# gdb ../linux-2.6.9/vmlinux vmcore.2
GNU gdb 5.2.1
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for
details.
This GDB was configured as "i586-suse-linux"...
Core was generated by `root=/dev/sda2 dump init 1 memmap=exactmap
memmap=640k@0
memmap=32M@16M console='.
#0 default_idle () at arch/i386/kernel/process.c:108
108 }
(gdb) bt
#0 default_idle () at arch/i386/kernel/process.c:108
#1 0xc04cdff8 in init_thread_union ()
#2 0xc0101b86 in cpu_idle () at arch/i386/kernel/process.c:196
#3 0xc04cea20 in start_kernel () at init/main.c:523
#4 0xc0100211 in L6 () at /tmp/cch2z2jk.s:2054
Cannot access memory at address 0x550007

Thanks,
Badari

2004-11-23 18:19:59

[permalink] [raw]

Subject: Re: [PATCH] kdump: Fix for boot problems on SMP

Hi Badari,

Badari Pulavarty wrote:
> More info testing results...
>
> gdb is not showing the stack info properly, on my saved vmcore.
> I thought vmlinux is not matching the vmcore, so I verified that
> vmcore and vmlinux matchup. But still no luck...

I will try to recreate this using the 'sysrq' method you described in
the earlier mail. Will let you know my findings asap.

Thanks very much for trying kdump!

Regards, Hari

2004-11-24 22:02:21

[permalink] [raw]

Subject: Re: [PATCH] kdump: Fix for boot problems on SMP

Hari,

I have a success case and a failure case to report.

1) Success first.. I was able save /proc/vmcore when my machine
paniced (not thro sysrq) and gdb showed the stack correctly :)

For some reason, gdb failed to show stack correctly, when I
ran it on /proc/vmcore directly, when I am on kxec kernel :(

# gdb ../l*9/vmlinux vmcore.3
...
Core was generated by `root=/dev/sda2 dump init 1 memmap=exactmap
memmap=640k@0
memmap=32M@16M console='.
#0 crash_get_current_regs (regs=0xc050b000)
at arch/i386/kernel/crash_dump.c:98
98 }
(gdb) bt
#0 crash_get_current_regs (regs=0xc050b000)
at arch/i386/kernel/crash_dump.c:98
#1 0xc0139986 in __crash_machine_kexec () at kernel/crash.c:83
#2 0xc011b2aa in panic (fmt=0xc050b000 "") at
include/linux/crash_dump.h:21
#3 0xc0104ed5 in die (str=0x0, regs=0x1, err=2)
at arch/i386/kernel/traps.c:392
#4 0xc0113ad2 in do_page_fault (regs=0xd4937edc, error_code=2)
at arch/i386/mm/fault.c:480
#5 0xc0104707 in error_code () at /tmp/ccK5IM1b.s:2135
#6 0xc017a55e in aio_put_req (req=0x0) at fs/aio.c:529
#7 0xc017ba0d in io_submit_one (ctx=0xd46fddc0, user_iocb=0xbfffecb0,
iocb=0xf75af124) at fs/aio.c:1551
#8 0xc017baf1 in sys_io_submit (ctx_id=3226513408, nr=32,
iocbpp=0xbfffec30)
at fs/aio.c:1609
#9 0xc0103c63 in syscall_call () at /tmp/ccK5IM1b.s:1946
#10 0xc0407220 in default_exec_domain ()
(gdb) q

2) Failure case:

When I recreated the panic again, it tried to run kexec() and
ran into exception in kexec() code, and machine hung.

Here is the console output:

Unable to handle kernel NULL pointer dereference at virtual address
00000020
printing eip:
c128c044
*pde = 00000000
Oops: 0002 [#1]
SMP
Modules linked in:
CPU: 0
EIP: 0060:[<c128c044>] Not tainted VLI
EFLAGS: 00010086 (2.6.10-rc2-mm2kexec)
EIP is at _spin_lock_irq+0x4/0x20 <<<<<<<<<**** my original panic
eax: 00000020 ebx: c2dd77e0 ecx: c2821bb0 edx: c2821b80
esi: 00000020 edi: 00000000 ebp: c1dd9f10 esp: c1dd9f10
ds: 007b es: 007b ss: 0068
Process aio_tio (pid: 8084, threadinfo=c1dd8000 task=c2110570)
Stack: c1dd9f2c c107a56e c1dd9f18 c1dd9f18 c2821ba0 c2dd77e0 c1dd9f70
c1dd9f54
c107ba1d c2821b80 00000000 00000000 bfffecb0 c2821b80 c2821b80
00000000
bfffec30 c1dd9fbc c107bb01 c1dd9f70 bfffecb0 00000040 bfffecb0
00000000
Call Trace:
[<c1004aaf>] show_stack+0x7f/0xa0
[<c1004c5e>] show_registers+0x15e/0x1c0
[<c1004e62>] die+0xf2/0x180
[<c1013ad2>] do_page_fault+0x3b2/0x710
[<c1004707>] error_code+0x2b/0x30
[<c107a56e>] aio_put_req+0x1e/0x90
[<c107ba1d>] io_submit_one+0x20d/0x250
[<c107bb01>] sys_io_submit+0xa1/0x110
[<c1003c63>] syscall_call+0x7/0xb
Code: fe 0a 79 12 a9 00 02 00 00 74 01 fb f3 90 80 3a 00 7e f9 fa eb e9
5d c3 90 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 55 89 e5 fa <f0> fe
08 79 09 f3 90 80 38 00 7e f9 eb f2 5d c3 8d b6 00 00 00
<0>Fatal exception: panic in 5 seconds
Kernel panic - not syncing: Fatal exception
<0>kexec: opening parachute <<<<<<<<<<*** trying to kexec ?
Unable to handle kernel paging request at virtual address c30a0000
printing eip:
c1039956
*pde = 00000000
Oops: 0002 [#2]
SMP
Modules linked in:
CPU: 0
EIP: 0060:[<c1039956>] Not tainted VLI
EFLAGS: 00010206 (2.6.10-rc2-mm2kexec)
EIP is at __crash_machine_kexec+0x66/0x110 <<<<<<** panic in kexec
eax: 00005400 ebx: c2003180 ecx: 000001e0 edx: 00000001
esi: c140b000 edi: c30a0000 ebp: c1dd9d98 esp: c1dd9d80
ds: 007b es: 007b ss: 0068
Process aio_tio (pid: 8084, threadinfo=c1dd8000 task=c2110570)
Stack: c140b000 c1dd9d94 c1dd9d98 c1dd8000 c1dd9edc c12a01d5 c1dd9db4
c101b2aa
00000000 c140c380 c129e8dd c1dd9dc0 c1dd8000 c1dd9df8 c1004ed5
c129e8ce
00000001 c1dd9dcc 00000001 c1dd9edc c12a01d5 00000002 000000ff
0000000b
Call Trace:
[<c1004aaf>] show_stack+0x7f/0xa0
[<c1004c5e>] show_registers+0x15e/0x1c0
[<c1004e62>] die+0xf2/0x180
[<c1013ad2>] do_page_fault+0x3b2/0x710
[<c1004707>] error_code+0x2b/0x30
[<c101b2aa>] panic+0x5a/0x120
[<c1004ed5>] die+0x165/0x180
[<c1013ad2>] do_page_fault+0x3b2/0x710
[<c1004707>] error_code+0x2b/0x30
[<c107a56e>] aio_put_req+0x1e/0x90
[<c107ba1d>] io_submit_one+0x20d/0x250
[<c107bb01>] sys_io_submit+0xa1/0x110
[<c1003c63>] syscall_call+0x7/0xb
Code: 2a c1 be 01 00 00 00 89 35 a4 c7 40 c1 e8 03 22 fe ff 8b 0d a4 c7
40 c1 85 c9 75 6c bf 00 00 0a c3 be 00 b0 40 c1 b9 e0 01 00 00 <f3> a5
c7 04 24 80 07 0a c3 c7 44 24 04 80 b7 40 c1 c7 44 24 08
<0>Fatal exception: panic in 5 seconds

Thanks,
Badari

On Tue, 2004-11-23 at 10:15, Hariprasad Nellitheertha wrote:
> Hi Badari,
>
> Badari Pulavarty wrote:
> > More info testing results...
> >
> > gdb is not showing the stack info properly, on my saved vmcore.
> > I thought vmlinux is not matching the vmcore, so I verified that
> > vmcore and vmlinux matchup. But still no luck...
>
> I will try to recreate this using the 'sysrq' method you described in
> the earlier mail. Will let you know my findings asap.
>
> Thanks very much for trying kdump!
>
> Regards, Hari
>

2004-11-26 21:56:23

[permalink] [raw]

Subject: Re: [PATCH] kdump: Fix for boot problems on SMP

On Tuesday 23 November 2004 09:43, Badari Pulavarty wrote:
> gdb is not showing the stack info properly, on my saved vmcore.
> I thought vmlinux is not matching the vmcore, so I verified that
> vmcore and vmlinux matchup. But still no luck...
>
> # gdb ../linux-2.6.9/vmlinux vmcore.2

[...]

> (gdb) bt
> #0 default_idle () at arch/i386/kernel/process.c:108
> #1 0xc04cdff8 in init_thread_union ()
> #2 0xc0101b86 in cpu_idle () at arch/i386/kernel/process.c:196
> #3 0xc04cea20 in start_kernel () at init/main.c:523
> #4 0xc0100211 in L6 () at /tmp/cch2z2jk.s:2054
> Cannot access memory at address 0x550007

I think the panic was happened on the CPU except for CPU#0.

Currently vmcore contains only CPU#0's register contents.
Therefore, GDB always shows backtrace of CPU#0.

fs/proc/vmcore.c:

static void elf_vmcore_store_hdr(char *bufp, int nphdr, int dataoff)
{
...
/* 1 - Get the registers from the reserved memory area */
reg_ppos = BACKUP_END + CRASH_RELOCATE_SIZE;
read_from_oldmem(reg_buf, REG_SIZE, &reg_ppos, 0);
elf_core_copy_regs(&prstatus.pr_reg, (struct pt_regs *)reg_buf);
buf = storenote(&notes[0], buf);

In this place, "reg_ppos" is the pointer to the copy of relocated
crash_smp_regs[0].
kdump should save the "crash_smp_regs[**panic_cpu**]".

Or, it is better to save all crash_smp_regs[NR_CPUS].
In other words:

# readelf --note /proc/vmcore

Notes at offset 0x00000074 with length 0x0000069c:
Owner Data size Description
CORE 0x00000090 NT_PRSTATUS (prstatus structure)
CORE 0x0000007c NT_PRPSINFO (prpsinfo structure)
CORE 0x00000560 NT_TASKSTRUCT (task structure)
:
:
:
...(repeat NR_CPU times)

2004-11-27 03:56:42

[permalink] [raw]

Subject: Re: [PATCH] kdump: Fix for boot problems on SMP

Hi Badari,

Badari Pulavarty wrote:
> Hari,
>
>
> I have a success case and a failure case to report.
>
> 1) Success first.. I was able save /proc/vmcore when my machine
> paniced (not thro sysrq) and gdb showed the stack correctly :)

Thanks for this news! Reassures us that we are on the right track on
making kdump useful for real-life problems.

>
> For some reason, gdb failed to show stack correctly, when I
> ran it on /proc/vmcore directly, when I am on kxec kernel :(

Does it throw up wrong entries or does it completely fail?

>
> # gdb ../l*9/vmlinux vmcore.3
> ...
.
.
.
> <0>kexec: opening parachute <<<<<<<<<<*** trying to kexec ?

Yes, this is the kexec call from the crash dump code.

> Unable to handle kernel paging request at virtual address c30a0000

This is the page reserved for storing the register values. Its really
strange that it faults here. The page is reserved already during early
boot.

> printing eip:
> c1039956
> *pde = 00000000
> Oops: 0002 [#2]
> SMP
> Modules linked in:
> CPU: 0
> EIP: 0060:[<c1039956>] Not tainted VLI
> EFLAGS: 00010206 (2.6.10-rc2-mm2kexec)
> EIP is at __crash_machine_kexec+0x66/0x110 <<<<<<** panic in kexec

The panic is in crash_dump_save_registers() while doing a memcpy. As I
mentioned above, it faults on the page reserved to save the registers.

Is it possible I can get the testcase so I can attempt recreating the
problem here. Please let me know.

Regards, Hari

2004-11-27 06:03:01