2005-09-21 12:00:24

by Hiro Yoshioka

[permalink] [raw]
Subject: Linux Kernel Dump Summit 2005

To whom may concern

We had a Linux Kernel Dump Summit 2005.

The participants are

Dump tools Session
diskdump -- Fujitsu
mkdump -- NTT Data Intellilink
LTD -- Hitachi
kdump -- Turbolinux
Summary -- Miracle Linux

Dump Analysis tools Session
Alicia/crash -- Uniadex

Other participants are
VA Linux/NEC/NSSOL/IPA/OSDL/Toshiba

Some discussion topics are (but not limited to)

- What kind of information do we need?
trace information
all of registers
the last log of panic, oops
LTD (Linux Tough Dump) has some nice features

- We need a partial dump
- We have to minimize the down time

- We have to dump all memory
how can we distinguish from the kernel and user if
kernel data is corrupted

- How we are not able to dump data
device
power management
we need a generic mechanism to reset a device

- Hang
NMI watch dog
mount

- It is very difficult to debug a memory corrupt bug
- hardware error

- Where will we go to?
IHV and Linux Kernel community collaboration are needed

Dump Analysis tools are very important

- There is a concern that the development process of 'crash'
is not open.
- Do we have to extend gdb?
- We'd like to collaborate 'crash'

- kexec/kdump, mkdump, LTD, all of them use the second kernel
to dump it.

- We have to share the test data, check list, test tools of
dump tool developments.

We agree to have the Linux Kernel Dump Summit.

Regards,
Hiro


2005-10-06 12:17:45

by Noboru OBATA

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

Hi, Hiro,

On Wed, 21 Sep 2005, Hiro Yoshioka wrote:
>
> We had a Linux Kernel Dump Summit 2005.

> - We need a partial dump
> - We have to minimize the down time
>
> - We have to dump all memory
> how can we distinguish from the kernel and user if
> kernel data is corrupted

As memory size grows, the time and space for capturing kernel
crash dump really matter.

We discussed two strategies in the dump summit.

1. Partial dump
2. Full dump with compression


PARTIAL DUMP
============

Partial dump captures only pages that are essential for later
analysis, possibly by using some mark in mem_map[].

This certainly reduces both time and space of crash dump, but
there is a risk because no one can guarantee that a dropped page
is really unnecessary in analysis (it can be a tragedy if
analysis went unsolved because of the dropped page).

Another risk is a corruption of mem_map[] (or other kernel
structure), which makes the identification of necessary pages
unreliable.

So there would be best if a user can select the level of partial
dump. A careful user may always choose a full dump, while a
user who is tracking the well-reproducible kernel bug may choose
fast and small dump.


FULL DUMP WITH COMPRESSION
==========================

Those who still want a full dump, including me, are interested
in dump compression. For example, the LKCD format (at least v7
format) supports pagewise compression with the deflate
algorithm. A dump analyze tool "crash" can transparently
analyze the compressed dump file in this format.

The compression will reduce the storage space at certain degree,
and may also reduce the time if a dump process were I/O bounded.


WHICH IS BETTER?
================

I wrote a small compression tool for LKCD v7 format to see how
effective the compression is, and it turned out that the time
and size of compression were very much similar to that of gzip,
not surprisingly.

Compressing a 32GB dump file took about 40 minutes on Pentium 4
Xeon 3.0GHz, which is not good enough because the dump without
compression took only 5 minutes; eight times slower.

Besides, the compress ratios were somewhat picky. Some dump
files could not be compressed well (the worst case I found was
only 10% reduction in size).


After examining the LKCD compress format, I must conclude that
the partial dump is the only way to go when time and size really
matter.

Now I'd like to see how effective the existing partial dump
functionalities are.


Regards,

--
OBATA Noboru ([email protected])

2005-10-06 14:43:32

by Hiro Yoshioka

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

Obata san,

Thanks for your comments. I really appreciate your effort.

Regards,
Hiro

From: OBATA Noboru <[email protected]>
Subject: Re: Linux Kernel Dump Summit 2005
Date: Thu, 06 Oct 2005 21:17:18 +0900 (JST)
Message-ID: <[email protected]>

> Hi, Hiro,
>
> On Wed, 21 Sep 2005, Hiro Yoshioka wrote:
> >
> > We had a Linux Kernel Dump Summit 2005.
>
> > - We need a partial dump
> > - We have to minimize the down time
> >
> > - We have to dump all memory
> > how can we distinguish from the kernel and user if
> > kernel data is corrupted
>
> As memory size grows, the time and space for capturing kernel
> crash dump really matter.
>
> We discussed two strategies in the dump summit.
>
> 1. Partial dump
> 2. Full dump with compression
>
>
> PARTIAL DUMP
> ============
>
> Partial dump captures only pages that are essential for later
> analysis, possibly by using some mark in mem_map[].
>
> This certainly reduces both time and space of crash dump, but
> there is a risk because no one can guarantee that a dropped page
> is really unnecessary in analysis (it can be a tragedy if
> analysis went unsolved because of the dropped page).
>
> Another risk is a corruption of mem_map[] (or other kernel
> structure), which makes the identification of necessary pages
> unreliable.
>
> So there would be best if a user can select the level of partial
> dump. A careful user may always choose a full dump, while a
> user who is tracking the well-reproducible kernel bug may choose
> fast and small dump.
>
>
> FULL DUMP WITH COMPRESSION
> ==========================
>
> Those who still want a full dump, including me, are interested
> in dump compression. For example, the LKCD format (at least v7
> format) supports pagewise compression with the deflate
> algorithm. A dump analyze tool "crash" can transparently
> analyze the compressed dump file in this format.
>
> The compression will reduce the storage space at certain degree,
> and may also reduce the time if a dump process were I/O bounded.
>
>
> WHICH IS BETTER?
> ================
>
> I wrote a small compression tool for LKCD v7 format to see how
> effective the compression is, and it turned out that the time
> and size of compression were very much similar to that of gzip,
> not surprisingly.
>
> Compressing a 32GB dump file took about 40 minutes on Pentium 4
> Xeon 3.0GHz, which is not good enough because the dump without
> compression took only 5 minutes; eight times slower.
>
> Besides, the compress ratios were somewhat picky. Some dump
> files could not be compressed well (the worst case I found was
> only 10% reduction in size).
>
>
> After examining the LKCD compress format, I must conclude that
> the partial dump is the only way to go when time and size really
> matter.
>
> Now I'd like to see how effective the existing partial dump
> functionalities are.
>
>
> Regards,
>
> --
> OBATA Noboru ([email protected])

2005-10-10 08:45:58

by Pavel Machek

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

Hi!

> FULL DUMP WITH COMPRESSION
> ==========================
>
> Those who still want a full dump, including me, are interested
> in dump compression. For example, the LKCD format (at least v7
> format) supports pagewise compression with the deflate
> algorithm. A dump analyze tool "crash" can transparently
> analyze the compressed dump file in this format.
>
> The compression will reduce the storage space at certain degree,
> and may also reduce the time if a dump process were I/O bounded.

I'd say that compression does not help much, it can only speed it up
twice. But...

>
> WHICH IS BETTER?
> ================
>
> I wrote a small compression tool for LKCD v7 format to see how
> effective the compression is, and it turned out that the time
> and size of compression were very much similar to that of gzip,
> not surprisingly.
>
> Compressing a 32GB dump file took about 40 minutes on Pentium 4
> Xeon 3.0GHz, which is not good enough because the dump without
> compression took only 5 minutes; eight times slower.

....you probably want to look at suspend2.net project. They have
special compressor aimed at compressing exactly this kind of data,
fast enough to be improvement.
Pavel

--
if you have sharp zaurus hardware you don't need... you know my address

2005-10-11 00:50:01

by Andrew Morton

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

OBATA Noboru <[email protected]> wrote:
>
> > We had a Linux Kernel Dump Summit 2005.

I was rather expecting that the various groups which are interested in
crash dumping would converge around kdump once it was merged. But it seems
that this is not the case and that work continues on other strategies.

Is that a correct impression? If so, what shortcoming(s) in kdump are
causing people to be reluctant to use it?

2005-10-11 04:45:28

by Hiro Yoshioka

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

Hi Andrew,

From: Andrew Morton <[email protected]>
> OBATA Noboru <[email protected]> wrote:
> >
> > > We had a Linux Kernel Dump Summit 2005.
>
> I was rather expecting that the various groups which are interested in
> crash dumping would converge around kdump once it was merged. But it seems
> that this is not the case and that work continues on other strategies.

My impression is that most of crash dump developers would like
to converge kexec/kdump approach. However they are developing
dump tools.

The reasons are
1) They have to maintain the dump tools and support their users.
Many users are still using 2.4 kernels so merging kdump into 2.6
kernel does not help them.
2) Commercial Linux Distros (Red Hat/Suse/MIRACLE(Asianux)/Turbo etc) use
LKCD/diskdump/netdump etc.
Almost no users use a vanilla kernel so kdump does not have users yet.

> Is that a correct impression? If so, what shortcoming(s) in kdump are
> causing people to be reluctant to use it?

I think the way to go is the kdump however it may take time.

Regards,
Hiro

2005-10-12 08:29:00

by Noboru OBATA

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

Hi, Pavel,

On Mon, 10 Oct 2005, Pavel Machek wrote:
>
> > Compressing a 32GB dump file took about 40 minutes on Pentium 4
> > Xeon 3.0GHz, which is not good enough because the dump without
> > compression took only 5 minutes; eight times slower.
>
> ....you probably want to look at suspend2.net project. They have
> special compressor aimed at compressing exactly this kind of data,
> fast enough to be improvement.

Thank you for pointing me to the interesting project.

I looked at their patch to find that the special compressor uses
the LZF compression algorithm. (Correct me if I'm wrong.)

So I made a quick comparison between cp, lzf(*) and gzip as
follows. The INFILE is a file that showed the worst compression
ratio last time.

(*) A simple compress tool based on LZF algorithm included in
liblzf, available from http://www.goof.com/pcg/marc/liblzf.html


$ cp INFILE /dev/null # this puts whole INFILE in a cache

$ /usr/bin/time cp INFILE /mnt/OUTFILE-cp
0.23user 14.16system 0:35.94elapsed 40%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (115major+15minor)pagefaults 0swaps

$ /usr/bin/time ./lzf -c < INFILE > /mnt/OUTFILE-lzf
35.04user 13.10system 0:54.30elapsed 88%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (84major+74minor)pagefaults 0swaps

$ /usr/bin/time gzip -1c < INFILE > /mnt/OUTFILE-gzip-1
186.84user 11.73system 3:20.36elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (90major+93minor)pagefaults 0swaps


The results can be summarized as follows. The lzf tool is
slower than cp (normal file copy) only by a factor of 1.5,
achieving the compress ratio close to gzip -1.


CMD | NET TIME (in seconds) | OUTPUT SIZE (in bytes)
---------+--------------------------------+------------------------
cp | 35.94 (usr 0.23, sys 14.16) | 2,121,438,352 (100.0%)
lzf | 54.30 (usr 35.04, sys 13.10) | 1,959,473,330 ( 92.3%)
gzip -1 | 200.36 (usr 186.84, sys 11.73) | 1,938,686,487 ( 91.3%)
---------+--------------------------------+------------------------


Although it is too early to say lzf's compress ratio is good
enough, its compression speed is impressive indeed. And the
result also suggests that it is too early to give up the idea of
full dump with compression.

Pavel, thank you again for your advice and I'll welcome further
suggestions.

--
OBATA Noboru ([email protected])

2005-10-12 08:30:53

by Noboru OBATA

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

On Tue, 11 Oct 2005, Hiro Yoshioka wrote:
>
> The reasons are
> 1) They have to maintain the dump tools and support their users.
> Many users are still using 2.4 kernels so merging kdump into 2.6
> kernel does not help them.
> 2) Commercial Linux Distros (Red Hat/Suse/MIRACLE(Asianux)/Turbo etc) use
> LKCD/diskdump/netdump etc.
> Almost no users use a vanilla kernel so kdump does not have users yet.

Agreed.

I am testing (or tasting ;-) kdump myself, and find it really
impressive and promising. Thank you all who have worked on.

In term of users, however, the majority of commercial users
still use 2.4 kernels of commercial Linux distributions. This
is especially true for careful users who have large systems
because switching to 2.6 kernels without regression is not an
easy task. So merging kdump into the mainline kernel does not
directly mean that these users start using it now.

Rather, merging kdump has much meaning for commercial Linux
distributors, who should be planning how and when to include
kdump in their distros.

> > Is that a correct impression? If so, what shortcoming(s) in kdump are
> > causing people to be reluctant to use it?
>
> I think the way to go is the kdump however it may take time.

Agreed.

I'd say commercial users are not reluctant to use kdump, but
they are just waiting for kdump-ready distros. So in turn, we
still have some time left for improving kdump further before
kdump-ready distros are shipped to users, and I would like to be
involved in such improvement hereafter.

Thinking about the requirements in enterprise systems,
challenges of kdump will be:

- Reliability
+ Hardware-related issues

- Manageability
+ Easy configuration
+ Automated dump-capture and restart
+ Time and space for capturing dump
+ Handling two kernels

- Flexibility
+ Hook points before booting the 2nd kernel

My short impressions follow. I understand that kdump/kexec
developers are already discussing and working on some issues
above, and I am grateful if someone tell me about the current
status, or point me to the past lkml threads.


Reliability
-----------

In terms of reliability, hardware-related issues, such as a
device reinitialization problem, an ongoing DMA problem, and
possibly a pending interrupts problem, must be carefully
resolved.

Manageability
-------------

As for manageability, it is nice if a user can easily setup
kdump just by writing DEVICE=/dev/sdc6 to one's
/etc/sysconfig/kdump and start the kdump service, for example.
It is also desirable that an action taken after capturing a dump
(halt, reboot, or poweroff) is configurable. I believe these are
userspace tasks.

Time and space problem in capturing huge crash dump is raised
already. The partial dump and dump compress technology must be
explored.

One of my worries is that the current kdump requires distinct
two kernels (one for normal use, and one for capturing dumps) to
work. And I'm not fully convinced whether a use of two kernels
is the only solution or not. Well, I heard that this decision
better solves the ongoing DMA problem (please correct me if
other reasons are prominent), but from a pure management point
of view handing one kernel is happier than two kernels.

Flexibility
-----------

To minimize the downtime, a crashed kernel would want to
communicate with clustering software/firmware to help it detect
the failure quickly. This can be generalized by making
appropriate hook points (or notifier lists) in kdump.

Perhaps these hooks can be used to try reseting devices when
reinitialization of devices in the 2nd kernel tends to fail.


Sorry if I'm bringing up the already discussed issues again, but
I believe that addressing above issues will help the commercial
users in the future, and so I would like to discuss them again
how these issues can be addressed.

Regards,

--
OBATA Noboru ([email protected])

2005-10-12 09:03:20

by Felix Oxley

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

On Wednesday 12 October 2005 09:28, OBATA Noboru wrote:

> CMD | NET TIME (in seconds) | OUTPUT SIZE (in bytes)
> ---------+--------------------------------+------------------------
> cp | 35.94 (usr 0.23, sys 14.16) | 2,121,438,352 (100.0%)
> lzf | 54.30 (usr 35.04, sys 13.10) | 1,959,473,330 ( 92.3%)
> gzip -1 | 200.36 (usr 186.84, sys 11.73) | 1,938,686,487 ( 91.3%)
> ---------+--------------------------------+------------------------
>
> Although it is too early to say lzf's compress ratio is good
> enough, its compression speed is impressive indeed.

As you say, the speed of lzf relative to gzip is impressive.

However if the properties of the kernel dump mean that it is not suitable for
compression then surely it is not efficient to spend any time on it.

>And the
> result also suggests that it is too early to give up the idea of
> full dump with compression.

Are you sure? :-)
If we are talking about systems with 32GB of memory then we must be taking
about organisations who can afford an extra 100GB of disk space just for
keeping their kernel dump files.

I would expect that speed of recovery would always be the primary concern.
Would you agree?

regards,
Felix



2005-10-12 09:10:05

by Pavel Machek

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

Hi!

> > CMD | NET TIME (in seconds) | OUTPUT SIZE (in bytes)
> > ---------+--------------------------------+------------------------
> > cp | 35.94 (usr 0.23, sys 14.16) | 2,121,438,352 (100.0%)
> > lzf | 54.30 (usr 35.04, sys 13.10) | 1,959,473,330 ( 92.3%)
> > gzip -1 | 200.36 (usr 186.84, sys 11.73) | 1,938,686,487 ( 91.3%)
> > ---------+--------------------------------+------------------------
> >
> > Although it is too early to say lzf's compress ratio is good
> > enough, its compression speed is impressive indeed.
>
> As you say, the speed of lzf relative to gzip is impressive.
>
> However if the properties of the kernel dump mean that it is not suitable for
> compression then surely it is not efficient to spend any time on it.
>
> >And the
> > result also suggests that it is too early to give up the idea of
> > full dump with compression.
>
> Are you sure? :-)
> If we are talking about systems with 32GB of memory then we must be taking
> about organisations who can afford an extra 100GB of disk space just for
> keeping their kernel dump files.
>
> I would expect that speed of recovery would always be the primary concern.
> Would you agree?

Notice that suspend2 project actually introduced compression *for
speed*. Doing it right means that it is faster to do it
compressed. See Jamie Lokier's description how to *never* slow down.

Pavel
--
if you have sharp zaurus hardware you don't need... you know my address

2005-10-12 09:57:08

by Felix Oxley

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005


Thank you for helping a clueless newbie :-)

> Notice that suspend2 project actually introduced compression *for
> speed*. Doing it right means that it is faster to do it
> compressed.

I see!
Little benchmarks here: http://wiki.suspend2.net/BenchMarks
shows 15% speed _increase_ with compression.

> See Jamie Lokier's description how to *never* slow down.
Sorry, where is this?

So, if compression is a no-brainer then it is just necessary for the user to
select: no dump. partial dump or full dump suitable for there circumstances?
Can this be set from user-space?

regards,
Felix

2005-10-12 10:08:32

by Pavel Machek

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

On St 12-10-05 10:56:46, Felix Oxley wrote:
>
> Thank you for helping a clueless newbie :-)
>
> > Notice that suspend2 project actually introduced compression *for
> > speed*. Doing it right means that it is faster to do it
> > compressed.
>
> I see!
> Little benchmarks here: http://wiki.suspend2.net/BenchMarks
> shows 15% speed _increase_ with compression.
>
> > See Jamie Lokier's description how to *never* slow down.
> Sorry, where is this?

Somewhere on the lkml, *long* ago. Basically idea is to have one
thread doing writing to disk, and second thread doing compression. If
no compressed pages are available, just write uncompressed ones. That
way compression can only speed things up.

Pavel
--
if you have sharp zaurus hardware you don't need... you know my address

2005-10-12 18:03:04

by Andy Isaacson

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

On Wed, Oct 12, 2005 at 12:07:31PM +0200, Pavel Machek wrote:
> On St 12-10-05 10:56:46, Felix Oxley wrote:
> > > See Jamie Lokier's description how to *never* slow down.
> > Sorry, where is this?
>
> Somewhere on the lkml, *long* ago. Basically idea is to have one
> thread doing writing to disk, and second thread doing compression. If
> no compressed pages are available, just write uncompressed ones. That
> way compression can only speed things up.

That's Message-ID: <[email protected]>, dated
2004-03-27 14:49:45. (Oh look, sendmail encodes the date in the
Message-ID in exactly that format.)

This technique only works for DMA-capable IO -- PIO will make it suck --
but attempting to dump 32GB via PIO would be insane anyways, so...

-andy

2005-10-12 21:02:52

by Jerome Lacoste

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

On 10/12/05, Felix Oxley <[email protected]> wrote:
>
> Thank you for helping a clueless newbie :-)
>
> > Notice that suspend2 project actually introduced compression *for
> > speed*. Doing it right means that it is faster to do it
> > compressed.
>
> I see!
> Little benchmarks here: http://wiki.suspend2.net/BenchMarks
> shows 15% speed _increase_ with compression.

But in the LZF case, there's 100 M more memory in the cache. That
certainly has some I/O perf. impact, right?

Jerome

2005-10-12 21:10:41

by Felix Oxley

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

On Wednesday 12 October 2005 12:05, jerome lacoste wrote:
>
> But in the LZF case, there's 100 M more memory in the cache. That
> certainly has some I/O perf. impact, right?
>

mm.. well spotted.
It would be better if both measurements were taken from the same starting
point, e.g. immediately after boot.

2005-10-12 22:34:35

by Felix Oxley

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

On Wednesday 12 October 2005 19:03, Andy Isaacson wrote:

> That's Message-ID: <[email protected]>, dated
> 2004-03-27 14:49:45. (Oh look, sendmail encodes the date in the
> Message-ID in exactly that format.)
>

Thanks for the link.
Here it is is an easier format:
http://marc.theaimsgroup.com/?l=linux-kernel&m=108056647514134&w=2

2005-10-13 05:51:48

by Maneesh Soni

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

On Wed, Oct 12, 2005 at 05:30:43PM +0900, OBATA Noboru wrote:
> On Tue, 11 Oct 2005, Hiro Yoshioka wrote:
> >
> > The reasons are
> > 1) They have to maintain the dump tools and support their users.
> > Many users are still using 2.4 kernels so merging kdump into 2.6
> > kernel does not help them.
> > 2) Commercial Linux Distros (Red Hat/Suse/MIRACLE(Asianux)/Turbo etc) use
> > LKCD/diskdump/netdump etc.
> > Almost no users use a vanilla kernel so kdump does not have users yet.

As of now I can see Red Hat has put kexec/kdump in FC5 devel tree
(rawhide), and hopefully it will be merged with FC5.

> Agreed.
>
> I am testing (or tasting ;-) kdump myself, and find it really
> impressive and promising. Thank you all who have worked on.
>
> In term of users, however, the majority of commercial users
> still use 2.4 kernels of commercial Linux distributions. This
> is especially true for careful users who have large systems
> because switching to 2.6 kernels without regression is not an
> easy task. So merging kdump into the mainline kernel does not
> directly mean that these users start using it now.
>
> Rather, merging kdump has much meaning for commercial Linux
> distributors, who should be planning how and when to include
> kdump in their distros.
>
> > > Is that a correct impression? If so, what shortcoming(s) in kdump are
> > > causing people to be reluctant to use it?
> >
> > I think the way to go is the kdump however it may take time.
>
> Agreed.
>
> I'd say commercial users are not reluctant to use kdump, but
> they are just waiting for kdump-ready distros. So in turn, we
> still have some time left for improving kdump further before
> kdump-ready distros are shipped to users, and I would like to be
> involved in such improvement hereafter.
>
> Thinking about the requirements in enterprise systems,
> challenges of kdump will be:
>
> - Reliability
> + Hardware-related issues
>
> - Manageability
> + Easy configuration
> + Automated dump-capture and restart
> + Time and space for capturing dump
> + Handling two kernels
>
> - Flexibility
> + Hook points before booting the 2nd kernel
>
> My short impressions follow. I understand that kdump/kexec
> developers are already discussing and working on some issues
> above, and I am grateful if someone tell me about the current
> status, or point me to the past lkml threads.

Many of the discussions are on fastboot mailing list. As of now
work is being done to port kdump to x86_64 and ppc64 architectures
and tackling the device initialization issues.

>
> Reliability
> -----------
>
> In terms of reliability, hardware-related issues, such as a
> device reinitialization problem, an ongoing DMA problem, and
> possibly a pending interrupts problem, must be carefully
> resolved.

As of now the idea is to tackle these issues as per driver basis,
as and when reported. It seems there may not be any generic way
to solve device initialization.
>
> Manageability
> -------------
>
> As for manageability, it is nice if a user can easily setup
> kdump just by writing DEVICE=/dev/sdc6 to one's
> /etc/sysconfig/kdump and start the kdump service, for example.
> It is also desirable that an action taken after capturing a dump
> (halt, reboot, or poweroff) is configurable. I believe these are
> userspace tasks.

These are user space things and mostly distro specific. Though there
are some prototypes done for automatically loading the second kernel
and autmoatically saving the captured dump using initrd at
http://lse.sf.net/kdump/

> Time and space problem in capturing huge crash dump is raised
> already. The partial dump and dump compress technology must be
> explored.
Agreed, any collaboration in this area is greatly appreciated.

> One of my worries is that the current kdump requires distinct
> two kernels (one for normal use, and one for capturing dumps) to
> work. And I'm not fully convinced whether a use of two kernels
> is the only solution or not. Well, I heard that this decision
> better solves the ongoing DMA problem (please correct me if
> other reasons are prominent), but from a pure management point
> of view handing one kernel is happier than two kernels.

I think there were some efforts being done in having a relocatable
kernel, which can facilitate running the same kernel as regular and
dump capture kernel, though at different physical start address.

> Flexibility
> -----------
>
> To minimize the downtime, a crashed kernel would want to
> communicate with clustering software/firmware to help it detect
> the failure quickly. This can be generalized by making
> appropriate hook points (or notifier lists) in kdump.
>
Sorry, I am not getting what is being said here. I think the right thing
is to always minimize what a crashed kernel is supposed to do. So, why/what
should a crashed kernel communicate to someone.

> Perhaps these hooks can be used to try reseting devices when
> reinitialization of devices in the 2nd kernel tends to fail.


Thanks
Maneesh
--
Maneesh Soni
Linux Technology Center,
IBM India Software Labs,
Bangalore, India
email: [email protected]
Phone: 91-80-25044990

2005-10-13 14:28:19

by Troy Heber

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

On 10/10/05 17:49, Andrew Morton wrote:
>
> But it seems that this is not the case and that work continues on other
> strategies. Is that a correct impression? If so, what shortcoming(s) in
> kdump are causing people to be reluctant to use it?

True. There are many of us who continue to work on LKCD. Kdump is extremely
promising, but it's simply not ready for commercial use. Last I checked it
was i386 only and there are several conditions that can result in not being
able to generate a crash dump. LKCD using a non-interrupt driven based
polling mode (a la diskdump) has become quite capable of generating crash
dumps from very nasty situations.

When kdump gets to the point where it works as well as LKCD on IA-64, i386,
and x86_64 I'll be happy to switch over.

Troy

2005-10-17 11:15:55

by Takao Indoh

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

Hi,
I am a developer of diskdump.

On Mon, 10 Oct 2005 17:49:31 -0700, Andrew Morton wrote:
>I was rather expecting that the various groups which are interested in
>crash dumping would converge around kdump once it was merged. But it seems
>that this is not the case and that work continues on other strategies.
>
>Is that a correct impression? If so, what shortcoming(s) in kdump are
>causing people to be reluctant to use it?


I hope all current dump functions (diskdump, LKCD, netdump, etc) are
integrated into kdump. I think it is possible if the following two
problems of kdump are solved.

(1) problem of reliability

It seems that kdump has some problems regarding hardware.
Noboru OBATA said:

>In terms of reliability, hardware-related issues, such as a
>device reinitialization problem, an ongoing DMA problem, and
>possibly a pending interrupts problem, must be carefully
>resolved.

I think it is necessary to verify how surely the 2nd kernel can be
booted.


(2) memory size problem

If memory size is huge, the time for dumping and size of dump file
become serious problem. Diskdump has functions of partial dump and
compression to solve this problem. Kdump does not have such functions
yet.


The 2nd issue (memory size problem) may be solved by exporting
diskdump's functions to kdump.

I hope these issues would be solved at first. (We can wait
for completion of kdump development by operating current
diskdump and/or LKCD.)

Takao Indoh

2005-10-18 13:49:01

by Noboru OBATA

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

Hi, Indoh-san,

On Mon, 17 Oct 2005, Takao Indoh wrote:
>
> The 2nd issue (memory size problem) may be solved by exporting
> diskdump's functions to kdump.

Could you briefly explain the implementation of partial dump in
diskdump for those who are not familiar with it?

- Levels of partial dump (supported page categories)
- How to indentify the category (kernel data structure used)

Regards,

--
OBATA Noboru ([email protected])

2005-10-18 13:49:00

by Noboru OBATA

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

On Wed, 12 Oct 2005, Felix Oxley wrote:
>
> On Wednesday 12 October 2005 09:28, OBATA Noboru wrote:
>
> > CMD | NET TIME (in seconds) | OUTPUT SIZE (in bytes)
> > ---------+--------------------------------+------------------------
> > cp | 35.94 (usr 0.23, sys 14.16) | 2,121,438,352 (100.0%)
> > lzf | 54.30 (usr 35.04, sys 13.10) | 1,959,473,330 ( 92.3%)
> > gzip -1 | 200.36 (usr 186.84, sys 11.73) | 1,938,686,487 ( 91.3%)
> > ---------+--------------------------------+------------------------
> >
> > Although it is too early to say lzf's compress ratio is good
> > enough, its compression speed is impressive indeed.
>
> As you say, the speed of lzf relative to gzip is impressive.
>
> However if the properties of the kernel dump mean that it is not suitable for
> compression then surely it is not efficient to spend any time on it.

Sorry, my last result was misleading. The dumpfile used above
was the one which showed the _worst_ compression ratio. So it
does not necessarily mean that kernel dump is not suitable for
compression.

I will retest with the normal dumpfiles.

> >And the
> > result also suggests that it is too early to give up the idea of
> > full dump with compression.
>
> Are you sure? :-)
> If we are talking about systems with 32GB of memory then we must be taking
> about organisations who can afford an extra 100GB of disk space just for
> keeping their kernel dump files.
>
> I would expect that speed of recovery would always be the primary concern.
> Would you agree?

Well, it should depend on a user. It seems to be a trade off
between resource-for-dump and problem-traceability.

I have a bitter experience in analyzing a partial dump. The
dump completely lacks the PTE pages of user processes and I had
to give up analysis then. A partial dump has a risk of failure
in analysis.

So what I'd suggest is that the level of partial dump should be
tunable by a user, when implemented as a kdump functionality.
Then, a user who want faster recovery or has a limited storage
may choose a partial dump, and a careful user who has plenty of
storage may choose a full dump.

By the way, if the speed of recovery is _really_ the primary
concern for a user, I'd suggest the user to form a cluster to
continue operation by failover. (Then the crashed node should
have enough time to generate a full dump.)

Regards,

--
OBATA Noboru ([email protected])

2005-10-18 14:11:19

by Hugh Dickins

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

On Tue, 18 Oct 2005, OBATA Noboru wrote:
>
> I have a bitter experience in analyzing a partial dump. The
> dump completely lacks the PTE pages of user processes and I had
> to give up analysis then. A partial dump has a risk of failure
> in analysis.

Page tables of user processes are very often essential in a dump.
Data pages of user processes are almost always just a waste of
space and time in a dump. Please don't judge against partial
dumps on the basis of one that was badly selected.

Hugh

2005-10-18 14:54:23

by Carsten Otte

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

Andrew Morton wrote:
> I was rather expecting that the various groups which are interested in
> crash dumping would converge around kdump once it was merged. But it seems
> that this is not the case and that work continues on other strategies.
>
> Is that a correct impression? If so, what shortcoming(s) in kdump are
> causing people to be reluctant to use it?
On 390, we have standalone dump. That is a tool you can install on a disk
with zipl (like lilo) and that you boot when your server has crashed.
Newer machines also have a hardware feature built-in that does this from the
service element (that is a laptop computer mounted to the big box).
When running on z/VM, there is a command you can enter on z/VM's console
which causes z/VM to create a dump of Linux' memory.
Different from kdump we can even take a dump if our system is so badly
corrupted that you don't even get a panic message. As far as I know, kdump
would require to reserve memory for the extra kernel prior to crash, which
is not the case with our soloutions.
--

Carsten Otte
IBM Linux technology center
ARCH=s390

2005-10-19 03:13:34

by Takao Indoh

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

Hi,

On Tue, 18 Oct 2005 22:48:23 +0900 (JST), OBATA Noboru wrote:

>> The 2nd issue (memory size problem) may be solved by exporting
>> diskdump's functions to kdump.
>
>Could you briefly explain the implementation of partial dump in
>diskdump for those who are not familiar with it?
>
>- Levels of partial dump (supported page categories)
>- How to indentify the category (kernel data structure used)

Ok.
Partial dump of diskdump defines 5 filters.

#define DUMP_EXCLUDE_CACHE 0x00000001 /* Exclude LRU & SwapCache pages*/
#define DUMP_EXCLUDE_CLEAN 0x00000002 /* Exclude all-zero pages */
#define DUMP_EXCLUDE_FREE 0x00000004 /* Exclude free pages */
#define DUMP_EXCLUDE_ANON 0x00000008 /* Exclude Anon pages */
#define DUMP_SAVE_PRIVATE 0x00000010 /* Save private pages */

You can select each filters for partialdump. (Therefore, there are 32
levels of partial dump.)

1) DUMP_EXCLUDE_CACHE

This filter uses only page flags of struct page.
If the following condition is true, the page is not dumped.

!PageAnon(page) && (PageLRU(page) || PageSwapCache(page))

2) DUMP_EXCLUDE_CLEAN

If this filter is enabled, a page which is filled with zero is not
dumped.

3) DUMP_EXCLUDE_FREE

If this filter is enabled, free pages are not dumped. Diskdump find free
pages from free_list of zone->free_area.

4) DUMP_EXCLUDE_ANON

This filter uses only page flags of struct page.
If the following condition is true, the page is not dumped.

PageAnon(page)

5) DUMP_SAVE_PRIVATE

This filter is different from others. Even if you specified
DUMP_EXCLUDE_CACHE, a page which has PG_private flag is dumped if this
filter is enabled.



DUMP_EXCLUDE_FREE has some risks. If this filter is enable, diskdump
scans free page linked lists. If the list is corrupt, diskdump may hang.
Therefore, I always use level-19 (EXCLUDE_CACHE & EXCLUDE_CLEAN &
SAVE_PRIVATE).

DUMP_EXCLUDE_CACHE reduces dump size effectively when file caches on
memory are big. I don't use DUMP_EXCLUDE_ANON because user data(user
stack, thread stack, mutex, etc.) is sometimes needed to investigate
dump.
DUMP_SAVE_PRIVATE is needed for filesystem. Filesystem (journal) uses
PG_private pages, so these pages is necessary to investigate
trouble of filesystem.

If there are other useful filters, please let me know.


These filters may be able to be used for kdump, but I don't know how I
can find the kernel structure (for example, page flag of struct page)
when kdump dumps memory.


Takao Indoh

2005-10-19 19:00:50

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

On Tue, Oct 18, 2005 at 03:10:24PM +0100, Hugh Dickins wrote:
> On Tue, 18 Oct 2005, OBATA Noboru wrote:
> >
> > I have a bitter experience in analyzing a partial dump. The
> > dump completely lacks the PTE pages of user processes and I had
> > to give up analysis then. A partial dump has a risk of failure
> > in analysis.
>
> Page tables of user processes are very often essential in a dump.
> Data pages of user processes are almost always just a waste of
> space and time in a dump. Please don't judge against partial
> dumps on the basis of one that was badly selected.

We've had hard-to-reproduce problems out in the field where being able
to find the data pages of the user process was critical to figuring
out what the heck was going on. So I wouldn't be quite so eager to
dismiss the need for user pages. There are times when they come in
quite handy....

- Ted

2005-10-27 07:52:21

by Noboru OBATA

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

On Wed, 19 Oct 2005, Takao Indoh wrote:
> >
> > Could you briefly explain the implementation of partial dump in
> > diskdump for those who are not familiar with it?
> >
> > - Levels of partial dump (supported page categories)
> > - How to indentify the category (kernel data structure used)
>
> Ok.
> Partial dump of diskdump defines 5 filters.
>
> #define DUMP_EXCLUDE_CACHE 0x00000001 /* Exclude LRU & SwapCache pages*/
> #define DUMP_EXCLUDE_CLEAN 0x00000002 /* Exclude all-zero pages */
> #define DUMP_EXCLUDE_FREE 0x00000004 /* Exclude free pages */
> #define DUMP_EXCLUDE_ANON 0x00000008 /* Exclude Anon pages */
> #define DUMP_SAVE_PRIVATE 0x00000010 /* Save private pages */

> DUMP_EXCLUDE_FREE has some risks. If this filter is enable, diskdump
> scans free page linked lists. If the list is corrupt, diskdump may hang.
> Therefore, I always use level-19 (EXCLUDE_CACHE & EXCLUDE_CLEAN &
> SAVE_PRIVATE).
>
> DUMP_EXCLUDE_CACHE reduces dump size effectively when file caches on
> memory are big. I don't use DUMP_EXCLUDE_ANON because user data(user
> stack, thread stack, mutex, etc.) is sometimes needed to investigate
> dump.
> DUMP_SAVE_PRIVATE is needed for filesystem. Filesystem (journal) uses
> PG_private pages, so these pages is necessary to investigate
> trouble of filesystem.

Thank you for filters' description as well as the recommended
filter combination.

I'm just wondering the use of DUMP_EXCLUDE_CLEAN. When a zero
page is excluded from a dump, how people know? What I'm afraid
is that people would see an error (e.g., no such page in a dump)
in analyzing such a dump and be confused why. Because it is a
cache, or zero-cleared, or, ...? Any ideas?

Regards,

--
OBATA Noboru ([email protected])

2005-10-27 07:51:55

by Noboru OBATA

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

On Wed, 19 Oct 2005, "Theodore Ts'o" wrote:
>
> On Tue, Oct 18, 2005 at 03:10:24PM +0100, Hugh Dickins wrote:
> > On Tue, 18 Oct 2005, OBATA Noboru wrote:
> > >
> > > I have a bitter experience in analyzing a partial dump. The
> > > dump completely lacks the PTE pages of user processes and I had
> > > to give up analysis then. A partial dump has a risk of failure
> > > in analysis.
> >
> > Page tables of user processes are very often essential in a dump.
> > Data pages of user processes are almost always just a waste of
> > space and time in a dump. Please don't judge against partial
> > dumps on the basis of one that was badly selected.

My apologies. What should be blamed was the bad partial dump
implementation and not the partial dump itself.

But I don't think data pages of user processes are almost always
useless, as Ted comments.

> We've had hard-to-reproduce problems out in the field where being able
> to find the data pages of the user process was critical to figuring
> out what the heck was going on. So I wouldn't be quite so eager to
> dismiss the need for user pages. There are times when they come in
> quite handy....

I agree.

When a system crashed, a user may want to _avoid_ the cause of
crash and continue operation, until bugs are fixed and well
tested.

Then we try to find the way to avoid the specific situation that
has caused the crash. Sometimes it can be done by changing
resource limits, timeouts, or some fancy features in XXX.conf of
user programs.

To investigate the behavior of user processes, having data pages
of user processes in a dump is mandatory.

Regards,

--
OBATA Noboru ([email protected])

2005-10-27 07:52:22

by Noboru OBATA

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

On Thu, 13 Oct 2005, Maneesh Soni wrote:
>
> Many of the discussions are on fastboot mailing list. As of now
> work is being done to port kdump to x86_64 and ppc64 architectures
> and tackling the device initialization issues.

Thanks.

> > Reliability
> > -----------
> >
> > In terms of reliability, hardware-related issues, such as a
> > device reinitialization problem, an ongoing DMA problem, and
> > possibly a pending interrupts problem, must be carefully
> > resolved.
>
> As of now the idea is to tackle these issues as per driver basis,
> as and when reported. It seems there may not be any generic way
> to solve device initialization.

Agreed. A generic way would be something like what is done in
device_shutdown(). That is, make major drivers implement their
own "reset" code in struct device_driver, and call it upon a
crash. It would be nice if the reset code also stops DMA
transfer.

I understand that doing this in the first kernel has a risk that
following a device chain upon a crash would cause further
problem when kernel memory is corrupted. But the
driver-dependent reset code had better be implemented in its own
device_driver structure. Hmm...

> > Manageability
> > -------------
> >
> > As for manageability, it is nice if a user can easily setup
> > kdump just by writing DEVICE=/dev/sdc6 to one's
> > /etc/sysconfig/kdump and start the kdump service, for example.
> > It is also desirable that an action taken after capturing a dump
> > (halt, reboot, or poweroff) is configurable. I believe these are
> > userspace tasks.
>
> These are user space things and mostly distro specific. Though there
> are some prototypes done for automatically loading the second kernel
> and autmoatically saving the captured dump using initrd at
> http://lse.sf.net/kdump/

Interesting. I'll try it.

My concern in this area is that the device name (e.g.,
/dev/sdc6) in the first kernel may not be the same in the second
kernel due to the order in loading device drivers. Hope UUID in
recent filesystems would help.

> > One of my worries is that the current kdump requires distinct
> > two kernels (one for normal use, and one for capturing dumps) to
> > work. And I'm not fully convinced whether a use of two kernels
> > is the only solution or not. Well, I heard that this decision
> > better solves the ongoing DMA problem (please correct me if
> > other reasons are prominent), but from a pure management point
> > of view handing one kernel is happier than two kernels.
>
> I think there were some efforts being done in having a relocatable
> kernel, which can facilitate running the same kernel as regular and
> dump capture kernel, though at different physical start address.

Hmm...

I'm wondering how the second kernel (and its associated device
drivers) will be provided when kdump-ready distros are shipped.

> > Flexibility
> > -----------
> >
> > To minimize the downtime, a crashed kernel would want to
> > communicate with clustering software/firmware to help it detect
> > the failure quickly. This can be generalized by making
> > appropriate hook points (or notifier lists) in kdump.
> >
> Sorry, I am not getting what is being said here. I think the right thing
> is to always minimize what a crashed kernel is supposed to do. So, why/what
> should a crashed kernel communicate to someone.

The idea is to provide some hooks for a system administrator
that run upon a crash.

One would want to use this hook to provide the faster failover
in a clustering system. Usually a failover is executed when a
heart-beat between nodes is lost. Such a detection takes about,
say, 10 or 30 seconds, depending on a configuration. But if the
oopsed node can notice the failure to others, the detection of
failure can be done more quickly, possibly less than a second.

Another use of this hook will be resetting devices. If device
drivers in distros do not support robust reinitialization in the
second kernel, one would want to use this hook to reset the
device so that kdump works.

I'm not fully convinced that doing minimum on a crashed kernel
is always right. As for preparing hardware conditions for the
second kernel, doing more on the crashed kernel may make sense
because most drivers expect that.

Anyway, providing appropriate hooks seems to be a good
compromise for me.

Regard,

--
OBATA Noboru ([email protected])

2005-10-14 09:19:43

by Takahashi, Hideki

[permalink] [raw]
Subject: Re: Linux Kernel Dump Summit 2005

Dear all,

We prepared a mailing list for Linux Kernel Dump Summit.

English: [email protected]
Japanese: [email protected]

We had a Linux Kernel Dump Summit 2005 in Japan on Sep.16, 2005
with members from many Japanese companies.
Below is a company listing of current members of the lkds mailing list.

Hitachi, Intel, IPA, MIRACLE LINUX, NEC, NEC Soft
NS Solutions, NTT DATA, NTT DATA INTELLILINK,
OSDL Japan, TechStyle, TOSHIBA, turbolinux,
UNIADEX, VA Linux Systems Japan

This mailing list will be used to send information (& news) about
Linux Kernel Dump Summit to the members.

Click on the following URL to subscribe, unsubscribe, and change
your preferences:

http://lists.sourceforge.net/lists/listinfo/alicia-lkds

---

To subscribe Japanese mailing list, please send a mail to

[email protected]

Then the list administrator will respond to you with Japanese.

Best regards,
All of the Japan lkds members

-----------------
Hideki Takahashi
UNIADEX, Ltd., Software Product Support

-