2006-10-04 00:18:23

by Steven Truong

[permalink] [raw]
Subject: kexec / kdump kernel panic

Hi, I have a dual Xeon 3.2 GHz with Cent OS 4.3 and this box is in a
cluster. It keeps bailling out with kernel panic type of error and I
can not determine for sure what type of kernel or hardware problem. I
have tried to play with kexec and kdump with the hope to set up and
capture the kernel dump to debug.

I have followed the instruction in linux-2.16.18
Documentation/kdump/kdump.txt closely but still have not been able to
get it to work for loading the caputured kernel for panic kernel
situation.

I have the system kernel with Linux 2.16.18 booted up and set with
crashkernel=128M@16M. I compiled this system kernel with KEXEC, SYSFS,
DEBUG_INFO and CRASH_DUMP enabled. When this box is up with this
system kernel and can see that the total memory is 128 MB less than
the physical memory.

For the crash/captured kernel, I had SMP disable and KEXEC,
CRASH_DUMP, and VMCORE enabled. PHYSICAL_START=0x1000000.

I first tested with the following command and saw that the
crash/captured kernel booted up the box without going through the BIOS
initialization.

/usr/sbin/kexec -l /boot/vmlinux
--initrd=/boot/initrd-2.6.18-kdump.img --args-linux
--append="root=/dev/sda3 init 1"

However, when I tried to load the crash/captured kernel for kernel
panic situation, I just got failed to load kernel /boot/vmlinux error
message. I used the following command to load :

/usr/sbin/kexec -p /boot/vmlinux
--initrd=/boot/initrd-2.6.18-kdump.img --args-linux
--append="root=/dev/sda3 irqpoll init 1"

I did make sure that vmlinux is not a bzImage file by using this command

readelf -h /boot/vmlinux

and I was able to see the output of this command. If I used this one
with bzImage file, I won't see anything. So I am pretty sure the
kernel file vmlinux is ok.

I did strace the second command but did not gain any special knowledge
here and no error message could be found in any log files.

I used kexec-tools-1.101 and kexec-tools-1.101-kdump10.patch.


2006-10-04 03:47:00

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: kexec / kdump kernel panic

On Tue, 03 Oct 2006 17:18:21 PDT, Steven Truong said:

> /usr/sbin/kexec -p /boot/vmlinux
> --initrd=/boot/initrd-2.6.18-kdump.img --args-linux
> --append="root=/dev/sda3 irqpoll init 1"

If the /boot/vmlinux is the one you usually use to boot, that won't work.

Your usual vmlinux is almost certainly linked to load at the 1M line,
and you need a kernel linked to load at the 16M line (as set in crashkernel=).

See the CONFIG_PHYSICAL_START config option, and there's other details
in Documentation/kdump/kdump.txt - it looks like you have most of it right,
except you need to build *TWO* specially configured kernels (your production
one with KEXEC support and a few other things, and then the dump kernel
with a different PHYSICAL_START and a few settings).


Attachments:
(No filename) (226.00 B)

2006-10-04 21:39:10

by Steven Truong

[permalink] [raw]
Subject: Re: kexec / kdump kernel panic

Hi, Valdis. No, I actually used 2 different kernels for this: one
for system kernel and the other for captured/crash kernel.

System kernel .config file with these options

CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PHYSICAL_START=0x1000000
CONFIG_SYSFS=y
CONFIG_DEBUG_INFO=y

make; make modules_install; make install

System kernel Grub entry

title CentOS (2.6.18)
root (hd0,0)
kernel /vmlinuz-2.6.18 ro root=/dev/sda3 crashkernel=128M@16M rhgb quiet
initrd /initrd-2.6.18.img


Crash/captured kernel .config file with these options
CONFIG_LOCALVERSION="-kdump"
# CONFIG_SMP is not set
CONFIG_KEXEC=y <-------------------------------------------------------------
CONFIG_CRASH_DUMP=y
CONFIG_PHYSICAL_START=0x1000000
CONFIG_PROC_VMCORE=y

the /boot/vmlinux is found in the linux-2.6.18kdump directory after I
make and make install_modules for the crash kernel.

Am I missing something? Or did I do something wrong? Is my vmlinux ok
or how I go about to obtain an uncompressed ELF image of the crash
kernel?

Thank you for all the helps.
Steven.

On 10/3/06, [email protected] <[email protected]> wrote:
> On Tue, 03 Oct 2006 17:18:21 PDT, Steven Truong said:
>
> > /usr/sbin/kexec -p /boot/vmlinux
> > --initrd=/boot/initrd-2.6.18-kdump.img --args-linux
> > --append="root=/dev/sda3 irqpoll init 1"
>
> If the /boot/vmlinux is the one you usually use to boot, that won't work.
>
> Your usual vmlinux is almost certainly linked to load at the 1M line,
> and you need a kernel linked to load at the 16M line (as set in crashkernel=).
>
> See the CONFIG_PHYSICAL_START config option, and there's other details
> in Documentation/kdump/kdump.txt - it looks like you have most of it right,
> except you need to build *TWO* specially configured kernels (your production
> one with KEXEC support and a few other things, and then the dump kernel
> with a different PHYSICAL_START and a few settings).
>
>
>

2006-10-04 22:37:39

by Keith Mannthey

[permalink] [raw]
Subject: Re: kexec / kdump kernel panic

On 10/4/06, Steven Truong <[email protected]> wrote:
> Hi, Valdis. No, I actually used 2 different kernels for this: one
> for system kernel and the other for captured/crash kernel.
>
<snip >
> CONFIG_PHYSICAL_START=0x1000000
<snip>
> CONFIG_PHYSICAL_START=0x1000000
>

if both cases you have the same CONFIG_PHYSICAL_START? I thought the
kexec kernel needed to start at a diffrent location then the original
kernel?

Thanks,
Keith

2006-10-04 22:54:25

by Vivek Goyal

[permalink] [raw]
Subject: Re: kexec / kdump kernel panic

On Wed, Oct 04, 2006 at 02:38:53PM -0700, Steven Truong wrote:
> Hi, Valdis. No, I actually used 2 different kernels for this: one
> for system kernel and the other for captured/crash kernel.
>
> System kernel .config file with these options
>
> CONFIG_KEXEC=y
> CONFIG_CRASH_DUMP=y
> CONFIG_PHYSICAL_START=0x1000000
> CONFIG_SYSFS=y
> CONFIG_DEBUG_INFO=y
>

Valdis, you don't have to enable CONFIG_CRASH_DUMP in your system kernel.
The moment you enable it, by default it thinks that I am the capture kernel
and sets the value of CONFIG_PHYSICAL_START to 16MB (0x1000000) instead
of 1MB (0x100000).

Your procedure seems to be right. Please also paste output of /proc/iomem
in first kernel.

You can find more info on following link.

http://lse.sourceforge.net/kdump/

I am also copying the mail to fastboot mailing list where generally
kexec/kdump discussions take place

Thanks
Vivek