LinuxLists.cc - [RFC Patch 0/6] slimdump: Enable slimdump if crashing kernel memory is not required

2011-05-26 17:07:45

Subject: [RFC Patch 0/6] slimdump: Enable slimdump if crashing kernel memory is not required

Hi All,
Please find a patchset that implements 'slimdump' - a new type
of coredump (captured through kdump) which will retain only essential
debugging information but will discard old kernel memory (which is mostly
irrelevant) in case of system crashes triggered through fatal hardware errors.

When a system crashes due to an unrecoverable memory error, kdump would
ordinarily attempt to read/capture the crashing kernel's memory which has
the following undesirable consequences:
- The old kernel memory is irrelevant for debugging crashes initiated due
to hardware errors (such as physical memory corruption causing fatal
machine check exceptions - MCE).
- The kdump capture kernel might experience a second MCE when attempting
to read the corrupt memory area having fatal results (and failure to
capture the coredump).

'slimdump' is particularly useful in avoiding the above hazards. It is a
light-weight dump and increases the reliability of the system by avoiding
unsafe operations.

In essence slimdump enables light-weight coredumps and increases the
reliability of the system by avoiding unsafe operations.

Summary of the patches
----
Patch 1, 2 - Patches from Andi Kleen's tree
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6.git which
introduce an enhanced panic function - xpanic().

Patch 3 - Introduces a new PANIC_MCE flag to signal the need for
SlimDump for
MCE initiated panic calls.

Patch 4 - Append a new elf-note inside coredump containing MCE related
registers

Patch 5 - Create a SlimDump by discarding old kernel's memory

Patch 6 - Patch for 'crash' tool
(http://people.redhat.com/anderson/crash-5.1.5.tar.gz) to recognise the
new
elf-notes section.

The patches have been tested from inside a VM and have been found to
work fine. The user-space tool used for copying the coredump is 'cp'
(makedumpfile testing will be done soon).

Example
--------
Normal vmcore
-----------
# ls -lh /home/prasadkr/vmcore.usual
-r-------- 1 root root 1.6G May 17 01:07 /home/prasadkr/vmcore.usual

slimdump
-----------
# ls -lh /home/prasadkr/vmcore.SlimDump
-r-------- 1 root root 1.8K May 18 23:28 /home/prasadkr/vmcore.SlimDump
# ./crash vmlinux vmcore.SlimDump

crash 5.1.5
Copyright (C) 2002-2011 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public
License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for
details.

"System crashed due to a hardware memory error. No coredump available."
#
#

Thanks,
K.Prasad
_______________________________________________________
ltcras mailing list [email protected]
To unsubscribe from the list, change your list options
or if you have forgotten your list password visit:
http://lists.linux.ibm.com/mailman/listinfo/ltcras

2011-05-26 17:12:24

Subject: [RFC Patch 0/6] slimdump: Enable slimdump if crashing kernel memory is not required

Subject: [Patch 1/6] XPANIC: Add extended panic interface

Subject: [Patch 2/6] x86: mce: Convert mce code to xpanic

Subject: [Bugfix][Patch 3/3] Invoke vpanic inside xpanic function

Subject: [RFC Patch 4/6] PANIC_MCE: Introduce a new panic flag for fatal MCE, capture related information

Subject: [RFC Patch 5/6] slimdump: Capture slimdump for fatal MCE generated crashes

Subject: [RFC Patch 6/6] Crash: Recognise slim coredumps and process new elf-note sections

Subject: Re: [RFC Patch 0/6] slimdump: Enable slimdump if crashing kernel memory is not required

Subject: Re: [RFC Patch 5/6] slimdump: Capture slimdump for fatal MCE generated crashes

Subject: Re: [Patch 1/6] XPANIC: Add extended panic interface

Subject: Re: [RFC Patch 5/6] slimdump: Capture slimdump for fatal MCE generated crashes

Subject: Re: [RFC Patch 5/6] slimdump: Capture slimdump for fatal MCE generated crashes

Subject: Re: [RFC Patch 5/6] slimdump: Capture slimdump for fatal MCE generated crashes

Subject: Re: [RFC Patch 4/6] PANIC_MCE: Introduce a new panic flag for fatal MCE, capture related information

Subject: Re: [RFC Patch 5/6] slimdump: Capture slimdump for fatal MCE generated crashes

Subject: Re: [RFC Patch 5/6] slimdump: Capture slimdump for fatal MCE generated crashes

Subject: Re: [RFC Patch 5/6] slimdump: Capture slimdump for fatal MCE generated crashes

Subject: Re: [RFC Patch 6/6] Crash: Recognise slim coredumps and process new elf-note sections

Subject: Re: [RFC Patch 5/6] slimdump: Capture slimdump for fatal MCE generated crashes

Subject: Re: [Patch 1/6] XPANIC: Add extended panic interface

Subject: Re: [RFC Patch 5/6] slimdump: Capture slimdump for fatal MCE generated crashes

Subject: Re: [RFC Patch 4/6] PANIC_MCE: Introduce a new panic flag for fatal MCE, capture related information

Subject: Re: [Patch 1/6] XPANIC: Add extended panic interface

Subject: Re: [RFC Patch 5/6] slimdump: Capture slimdump for fatal MCE generated crashes

Subject: Re: [Patch 2/6] x86: mce: Convert mce code to xpanic

Subject: Re: [RFC Patch 4/6] PANIC_MCE: Introduce a new panic flag for fatal MCE, capture related information

Subject: Re: [RFC Patch 4/6] PANIC_MCE: Introduce a new panic flag for fatal MCE, capture related information

Subject: Re: [RFC Patch 5/6] slimdump: Capture slimdump for fatal MCE generated crashes

Subject: Re: [RFC Patch 6/6] Crash: Recognise slim coredumps and process new elf-note sections

Subject: Re: [RFC Patch 6/6] Crash: Recognise slim coredumps and process new elf-note sections

Subject: Re: [RFC Patch 4/6] PANIC_MCE: Introduce a new panic flag for fatal MCE, capture related information

Subject: Re: [RFC Patch 6/6] Crash: Recognise slim coredumps and process new elf-note sections

Subject: Re: [RFC Patch 4/6] PANIC_MCE: Introduce a new panic flag for fatal MCE, capture related information

Subject: Re: [RFC Patch 4/6] PANIC_MCE: Introduce a new panic flag for fatal MCE, capture related information

Subject: Re: [RFC Patch 4/6] PANIC_MCE: Introduce a new panic flag for fatal MCE, capture related information

Subject: Re: [RFC Patch 4/6] PANIC_MCE: Introduce a new panic flag for fatal MCE, capture related information

Subject: Re: [RFC Patch 5/6] slimdump: Capture slimdump for fatal MCE generated crashes

Subject: Re: [RFC Patch 4/6] PANIC_MCE: Introduce a new panic flag for fatal MCE, capture related information

Subject: Re: [RFC Patch 4/6] PANIC_MCE: Introduce a new panic flag for fatal MCE, capture related information

Subject: Re: [RFC Patch 4/6] PANIC_MCE: Introduce a new panic flag for fatal MCE, capture related information