Hi, I have a dual Xeon 3.2 GHz with Cent OS 4.3 and this box is in a
cluster. It keeps bailling out with kernel panic type of error and I
can not determine for sure what type of kernel or hardware problem. I
have tried to play with kexec and kdump with the hope to set up and
capture the kernel dump to debug.
I have followed the instruction in linux-2.16.18
Documentation/kdump/kdump.txt closely but still have not been able to
get it to work for loading the caputured kernel for panic kernel
situation.
I have the system kernel with Linux 2.16.18 booted up and set with
crashkernel=128M@16M. I compiled this system kernel with KEXEC, SYSFS,
DEBUG_INFO and CRASH_DUMP enabled. When this box is up with this
system kernel and can see that the total memory is 128 MB less than
the physical memory.
For the crash/captured kernel, I had SMP disable and KEXEC,
CRASH_DUMP, and VMCORE enabled. PHYSICAL_START=0x1000000.
I first tested with the following command and saw that the
crash/captured kernel booted up the box without going through the BIOS
initialization.
/usr/sbin/kexec -l /boot/vmlinux
--initrd=/boot/initrd-2.6.18-kdump.img --args-linux
--append="root=/dev/sda3 init 1"
However, when I tried to load the crash/captured kernel for kernel
panic situation, I just got failed to load kernel /boot/vmlinux error
message. I used the following command to load :
/usr/sbin/kexec -p /boot/vmlinux
--initrd=/boot/initrd-2.6.18-kdump.img --args-linux
--append="root=/dev/sda3 irqpoll init 1"
I did make sure that vmlinux is not a bzImage file by using this command
readelf -h /boot/vmlinux
and I was able to see the output of this command. If I used this one
with bzImage file, I won't see anything. So I am pretty sure the
kernel file vmlinux is ok.
I did strace the second command but did not gain any special knowledge
here and no error message could be found in any log files.
I used kexec-tools-1.101 and kexec-tools-1.101-kdump10.patch.
On Tue, 03 Oct 2006 17:18:21 PDT, Steven Truong said:
> /usr/sbin/kexec -p /boot/vmlinux
> --initrd=/boot/initrd-2.6.18-kdump.img --args-linux
> --append="root=/dev/sda3 irqpoll init 1"
If the /boot/vmlinux is the one you usually use to boot, that won't work.
Your usual vmlinux is almost certainly linked to load at the 1M line,
and you need a kernel linked to load at the 16M line (as set in crashkernel=).
See the CONFIG_PHYSICAL_START config option, and there's other details
in Documentation/kdump/kdump.txt - it looks like you have most of it right,
except you need to build *TWO* specially configured kernels (your production
one with KEXEC support and a few other things, and then the dump kernel
with a different PHYSICAL_START and a few settings).
Hi, Valdis. No, I actually used 2 different kernels for this: one
for system kernel and the other for captured/crash kernel.
System kernel .config file with these options
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PHYSICAL_START=0x1000000
CONFIG_SYSFS=y
CONFIG_DEBUG_INFO=y
make; make modules_install; make install
System kernel Grub entry
title CentOS (2.6.18)
root (hd0,0)
kernel /vmlinuz-2.6.18 ro root=/dev/sda3 crashkernel=128M@16M rhgb quiet
initrd /initrd-2.6.18.img
Crash/captured kernel .config file with these options
CONFIG_LOCALVERSION="-kdump"
# CONFIG_SMP is not set
CONFIG_KEXEC=y <-------------------------------------------------------------
CONFIG_CRASH_DUMP=y
CONFIG_PHYSICAL_START=0x1000000
CONFIG_PROC_VMCORE=y
the /boot/vmlinux is found in the linux-2.6.18kdump directory after I
make and make install_modules for the crash kernel.
Am I missing something? Or did I do something wrong? Is my vmlinux ok
or how I go about to obtain an uncompressed ELF image of the crash
kernel?
Thank you for all the helps.
Steven.
On 10/3/06, [email protected] <[email protected]> wrote:
> On Tue, 03 Oct 2006 17:18:21 PDT, Steven Truong said:
>
> > /usr/sbin/kexec -p /boot/vmlinux
> > --initrd=/boot/initrd-2.6.18-kdump.img --args-linux
> > --append="root=/dev/sda3 irqpoll init 1"
>
> If the /boot/vmlinux is the one you usually use to boot, that won't work.
>
> Your usual vmlinux is almost certainly linked to load at the 1M line,
> and you need a kernel linked to load at the 16M line (as set in crashkernel=).
>
> See the CONFIG_PHYSICAL_START config option, and there's other details
> in Documentation/kdump/kdump.txt - it looks like you have most of it right,
> except you need to build *TWO* specially configured kernels (your production
> one with KEXEC support and a few other things, and then the dump kernel
> with a different PHYSICAL_START and a few settings).
>
>
>
On 10/4/06, Steven Truong <[email protected]> wrote:
> Hi, Valdis. No, I actually used 2 different kernels for this: one
> for system kernel and the other for captured/crash kernel.
>
<snip >
> CONFIG_PHYSICAL_START=0x1000000
<snip>
> CONFIG_PHYSICAL_START=0x1000000
>
if both cases you have the same CONFIG_PHYSICAL_START? I thought the
kexec kernel needed to start at a diffrent location then the original
kernel?
Thanks,
Keith
On Wed, Oct 04, 2006 at 02:38:53PM -0700, Steven Truong wrote:
> Hi, Valdis. No, I actually used 2 different kernels for this: one
> for system kernel and the other for captured/crash kernel.
>
> System kernel .config file with these options
>
> CONFIG_KEXEC=y
> CONFIG_CRASH_DUMP=y
> CONFIG_PHYSICAL_START=0x1000000
> CONFIG_SYSFS=y
> CONFIG_DEBUG_INFO=y
>
Valdis, you don't have to enable CONFIG_CRASH_DUMP in your system kernel.
The moment you enable it, by default it thinks that I am the capture kernel
and sets the value of CONFIG_PHYSICAL_START to 16MB (0x1000000) instead
of 1MB (0x100000).
Your procedure seems to be right. Please also paste output of /proc/iomem
in first kernel.
You can find more info on following link.
http://lse.sourceforge.net/kdump/
I am also copying the mail to fastboot mailing list where generally
kexec/kdump discussions take place
Thanks
Vivek