On Sat, Jan 10, 2009 at 1:44 AM, Nicholas Miell <[email protected]> wrote:
> On Fri, 2009-01-09 at 20:05 -0800, Linus Torvalds wrote:
>>
>> On Fri, 9 Jan 2009, Nicholas Miell wrote:
>> >
>> > It's only too big if you always keep it in memory, and I wasn't
>> > suggesting that.
>>
>> Umm. We're talking kernel panics here. If it's not in memory, it doesn't
>> exist as far as the kernel is concerned.
>>
>> If it doesn't exist, it cannot be reported.
>
> The idea was that the kernel would generate a crash dump and then after
> the reboot, a post-processing tool would do something with it. (e.g. run
> the dump through crash to get a stack trace using separate debug info or
> ship the entire dump off to a collection server or something).
>
>> > And this is where we disagree. I believe that crash dumps should be the
>> > norm and all the reasons you have against crash dumps in general are in
>> > fact reasons against Linux's sub-par implementation of crash dumps in
>> > specific.
>>
>> Good luck with that. Go ahead and try it. You'll find it wasn't so easy
>> after all.
>>
>> > So, here I am, a non-enterprise end user with a non-stale kernel who'd
>> > love to be able to give you a crash dump (or, more likely, a stack trace
>> > created from that crash dump), but I can't because Linux crash dumps are
>> > stuck in the enterprise ghetto.
>>
>> No, you're stuck because you apparently have your mind stuck on a
>> crash-dump, and aren't willing to look at alternatives.
>>
>> You could use a network console. Trust me - if you can't set up a network
>> console, you have no business mucking around with crash dumps.
>
> netconsole requires a second computer. Feel free to mail me one. :)
>
>> And if the crash is hard enough that you can't any output from that,
>> again, a crash dump wouldn't exactly help, would it?
>>
>> > Hell, I'd be happy if I could get the the normal panic text written to
>> > disk, but since the hard part is the actual writing to disk, there's no
>> > reason not to do the full crash dump if you can.
>>
>> Umm. And why do you think the two have anything to do with each other?
>>
>> Only insane people want the kernel to write to disk when it has problems.
>> Sane people try to write to something that doesn't potentially overwrite
>> their data. Like the network.
>>
>> Which is there. Try it. Trust me, it's a _hell_ of a lot more likely to
>> wotk than a crash dump.
>
> Well, yes, but that has everything to do with how terrible kdump is and
> nothing to do with the idea of crash dumps in general.
>
> Anyway, we've strayed off topic long enough, I'm sure everyone in the Cc
> list would be happy to stop getting an earful about the merits of crash
> dumps.
Yes, especially from someone who lacks the ability to properly
configure kdump. I'm fairly surprised others are giving you a free
pass when you keep asserting how broken kdump is with such hollow
criticism. I rely heavily on kdump and it works quite well (kvm
integration was lacking but has improved).
Now that said, I too value crash dumps and will stray a bit off-topic
(subject changed), I do have one significant debugability issue when
using crash dumps: on x86_64 I'm unable to get line number information
from symbols that reside in a module.
I tried to get some insight on this from Dave Anderson (crash utility
developer/maintainer) here:
http://www.mail-archive.com/[email protected]/msg01101.html
Dave was helpful but ultimately couldn't explain why I'm unable to get
line numbers with my x86_64 kernel.org kernels (and he/redhat with
RHEL4 kernels).
I strongly respect those cc'd and have to believe someone can help me
cut through this.
I've updated scripts/package/mkspec so that the 'make rpm' target
produces RedHat-style kernel rpms (models redhat's kernel.spec). This
includes creating debuginfo rpms. I've attached a patch that works on
2.6.28; but again the resulting debuginfo doesn't provide line numbers
for the kernel modules under crash!?
If anyone has some insight on what I might be missing (as part of the
kernel build) I'd _really_ appreciate it. I provided the mkspec patch
just as a reference, if the RPM stuff is too opaque please just help
me understand the bare mechanics of what is needed without mkspec (in
the meantime I'll try the debuginfo generation patch Sam Ravnborg
recently posted too).
regards,
Mike
* Mike Snitzer <[email protected]> wrote:
> Yes, especially from someone who lacks the ability to properly configure
> kdump. I'm fairly surprised others are giving you a free pass when you
> keep asserting how broken kdump is with such hollow criticism. I rely
> heavily on kdump and it works quite well (kvm integration was lacking
> but has improved).
hm, you say you rely heavily on kdump ... for what exactly, and how does
it help the upstream Linux kernel?
I see a single fix from you in the whole repository:
ffc41cf: nbd: prevent sock_xmit from attempting to use a NULL socket
... and that single fix is a NULL pointer dereference that ought to have
been quite debuggable from a plain oops alone.
In practice i rarely see bugfixes that were debugged via kdump. Normal
oops based fixes outnumber kdump based fixes by a ratio of 1:100 or worse
- and kdump is readily available these days - just nobody configures it.
For example, in the whole kernel repo there's just 45 commits that mention
'kdump' [excluding those commits that develop kdump itself]:
$ git log --pretty=format:"%h: %s" --no-merges -i --grep="kdump" |
grep -viE 'kdump|kexec|dump|mem' | wc -l
45
Contrast that to the 1954 commits that contain the string 'oops' or
'crash':
$ git log --pretty=format:"%h: %s" --no-merges -i -E --grep="oops|crash" |
wc -l
5900
That's a ratio of 1:131. (and probably optimistic in favor of kdump.)
Note, i dont have any negative feelings towards kdump - some people use it
and enterprise folks with their frozen, immutable kernels love it - it
just has not yet given me a reason to have particularly positive feelings
towards it in the upstream kernel space.
Ingo
On Sat, Jan 10, 2009 at 10:34 AM, Ingo Molnar <[email protected]> wrote:
>
> * Mike Snitzer <[email protected]> wrote:
>
>> Yes, especially from someone who lacks the ability to properly configure
>> kdump. I'm fairly surprised others are giving you a free pass when you
>> keep asserting how broken kdump is with such hollow criticism. I rely
>> heavily on kdump and it works quite well (kvm integration was lacking
>> but has improved).
>
> hm, you say you rely heavily on kdump ... for what exactly, and how does
> it help the upstream Linux kernel?
>
> I see a single fix from you in the whole repository:
>
> ffc41cf: nbd: prevent sock_xmit from attempting to use a NULL socket
>
> ... and that single fix is a NULL pointer dereference that ought to have
> been quite debuggable from a plain oops alone.
I've reported various bugs and helped with prototypes for fixes (e.g.
a0da84f3). But by all means belittle me... must be fun.
Baiting me into this fairly irrelevant tangent like you have really shows class.
I've read hundreds of posts from you over the years and don't recall
you be so overtly antagonistic for absolutely no reason. I was simply
saying "kdump doesn't suck"; certainly not as bad as Nicholas would
have us all believe.
Yes, I'm not Ingo Molnar. I don't rewrite the core of Linux with ease
but for the past 10 years I've developed solely on Linux to pay the
bills. I started hacking Linux distributions and have progressed to
kernel development where I primarily focus on storage and filesystems.
As things relate to upstream Linux, I'm particularly good at
backporting and cherry picking upstream advances to help stabilize
enterprise solutions.
I have to believe that you understand not all Linux kernel development
happens upstream.
> In practice i rarely see bugfixes that were debugged via kdump. Normal
> oops based fixes outnumber kdump based fixes by a ratio of 1:100 or worse
> - and kdump is readily available these days - just nobody configures it.
So you're telling me RedHat doesn't rely on kdump at enterprise
customer installations? I find that hard to believe. Few enterprise
customers allow defects to be debugged on-site, sometimes collecting a
crash dump is all you can hope for to make progress. I have to
believe you know this fairly well; if not with direct experience then
through your co-workers? Or am I living in Ingo's version of Linux
hell where kdump is actually useful?
> For example, in the whole kernel repo there's just 45 commits that mention
> 'kdump' [excluding those commits that develop kdump itself]:
>
> $ git log --pretty=format:"%h: %s" --no-merges -i --grep="kdump" |
> grep -viE 'kdump|kexec|dump|mem' | wc -l
> 45
>
> Contrast that to the 1954 commits that contain the string 'oops' or
> 'crash':
>
> $ git log --pretty=format:"%h: %s" --no-merges -i -E --grep="oops|crash" |
> wc -l
> 5900
>
> That's a ratio of 1:131. (and probably optimistic in favor of kdump.)
>
> Note, i dont have any negative feelings towards kdump - some people use it
> and enterprise folks with their frozen, immutable kernels love it - it
> just has not yet given me a reason to have particularly positive feelings
> towards it in the upstream kernel space.
Clearly you don't care about kdump; but please don't abuse your
standing to turn this into a referendum on kdump's existence in the
upstream kernel. Is upstream Linux somehow less pure by the existence
of kdump?
I'm left with a certain disappointment that the amazing Ingo Molnar
took the time to respond to my post yet thought it best to immediately
go negative _and_ off-topic on me with some vendetta against kdump.
Your kdump vs oops statistics don't help defend Nicholas' "kump sucks"
assertion either.
So just what was your point (other than to flame me cause your
cornflakes were nasty this morning)?
Mike
On Sat, Jan 10, 2009 at 01:21:06PM -0500, Mike Snitzer wrote:
> > In practice i rarely see bugfixes that were debugged via kdump. Normal
> > oops based fixes outnumber kdump based fixes by a ratio of 1:100 or worse
> > - and kdump is readily available these days - just nobody configures it.
>
> So you're telling me RedHat doesn't rely on kdump at enterprise
> customer installations? I find that hard to believe. Few enterprise
> customers allow defects to be debugged on-site, sometimes collecting a
> crash dump is all you can hope for to make progress. I have to
> believe you know this fairly well; if not with direct experience then
> through your co-workers? Or am I living in Ingo's version of Linux
> hell where kdump is actually useful?
In my experience, there are very few kernel versions and hardware for
which kdump works. I've talked to the people who have to make kdump
work, and every 12-18 months, with a new set of enterprise kernels
comes out, they have to go and fix kdump so it works again for the set
of hardware that they care about, and for the kernel version involved.
Part of the problem is one which has infected nearly every single RAS
technology out there, from kdump to Systemtap, which is the people who
architect and fund these RAS technologies delude themselves into
thinking that they only have to worry about making it work for
enterprise kernels and enterprise users, and to hell with everyone
else --- specifically, kernel developers, which don't matter since
they aren't enterprise users. Heck, until July of last year,
Systemtap wouldn't even ***compile*** out of the box on a
non-enterprise distribution like Ubuntu or Debian. And I still have
yet to make kdump work on a Thinkpad, although I've tried.
Since pretty much no one uses these RAS technologies except enterprise
users, and no one bothers to make it easy for kernel developers,
kernel developers have developed alternate mechanisms for debugging
the Linux kernel --- and they don't involve using Systemtap or kdump,
because in practice, it doesn't work for them at all, or it's too hard
to make it work for them.
And this becomes a vicious cycle; since no one is bothered to spend
time making RAS technologies work for everyday use by kernel
developers, bitrot inevitably sets in, and so the RAS developers get
no help from other kernel developers, who are busy fixing their own
problems via different means; and so the RAS developers hunker down,
and spend even more time fixing the bitrot and complaining that no one
helps them or takes them seriously, and the problem gets worse and
worse and worse --- until now there are people who are busily
developing alternatives to Systemtap, just because too many RAS
architects and developers and had their priorities wrong, and forgot
to focus on every day kernel developers instead of just enterprise
users.
It's very sad, and it means a lot of investment gets wasted, and work
is getting duplicated as a result.
Oh, well.
- Ted
> In my experience, there are very few kernel versions and hardware for
> which kdump works.
I think that's mostly because kexec from arbitary context is a
somewhat unstable concept. It requires all drivers to be able
to reinit the hardware from an arbitary state, and that's just
hard (it's kind of "make suspend/resume work everywhere"
and then a little harder and we know how long that took)
We also don't really have any tools to help making this
easier to implement for driver developers. Like e.g. some self
test that restarted drivers regularly to check this.
But you often don't need kdump for crash dumps.
In many cases the system is still alive after an oops
or other problem and you can just do a live dump or even
live crash session to look at data structures.
I used to do this with gdb regularly, but now
usually use crash because it has better tools.
> they aren't enterprise users. Heck, until July of last year,
> Systemtap wouldn't even ***compile*** out of the box on a
> non-enterprise distribution like Ubuntu or Debian.
At least on opensuse releases it tends to work for me.
The biggest PITA used to be the elfutils dependency which
seemed to come out of a all-world-is-redhat mindset at
the developers, but that can be worked around now.
Sometimes on has to patch it up when updating kernels
because some interface by the runtime has changed again, but
that shouldn't be a demanding task for kernel hackers really.
At least I didn't find it particularly difficult.
I wish they would merge the runtime into mainline though.
-Andi
--
[email protected]
On Sat, 10 Jan 2009, Andi Kleen wrote:
>
> I think that's mostly because kexec from arbitary context is a
> somewhat unstable concept.
I think that's the understatement of the year.
We have tons of problems with standard suspend-to-ram, and that's when the
suspend sequence has done its best to make everything quiescent. Expecting
that we can reinitialize all the hardware at some random point when things
are going haywire is "optimistic" at best.
So of course it will work on some hardware and not others.
I think we've been fairly successful at keeping a running system for
_most_ of our bugs. Even when things go bad with X running, it's quite
often possible to ssh in over the network (although it's often better if
you were already connected) and see the dump.
Not always, obviously. Many dumps really are painful. I'm hoping that
kernel-mode-setting will at least give us the oops message _more_ of the
time.
As far as I'm concerned, digital cameras have been more useful than kernel
dumps to kernel debugging.
Linus
* Linus Torvalds <[email protected]> wrote:
> As far as I'm concerned, digital cameras have been more useful than
> kernel dumps to kernel debugging.
Yes, especially ones with VGA video capture. (I caught a
"oops+triple-fault" crash via that trick once, which was not
serial-console capture-able and which was just a single frame in the 25
fps video of the incident.)
Ingo
* Mike Snitzer <[email protected]> wrote:
> On Sat, Jan 10, 2009 at 10:34 AM, Ingo Molnar <[email protected]> wrote:
> >
> > * Mike Snitzer <[email protected]> wrote:
> >
> >> Yes, especially from someone who lacks the ability to properly
> >> configure kdump. I'm fairly surprised others are giving you a free
> >> pass when you keep asserting how broken kdump is with such hollow
> >> criticism. I rely heavily on kdump and it works quite well (kvm
> >> integration was lacking but has improved).
> >
> > hm, you say you rely heavily on kdump ... for what exactly, and how
> > does it help the upstream Linux kernel?
> >
> > I see a single fix from you in the whole repository:
> >
> > ffc41cf: nbd: prevent sock_xmit from attempting to use a NULL socket
> >
> > ... and that single fix is a NULL pointer dereference that ought to have
> > been quite debuggable from a plain oops alone.
>
> I've reported various bugs and helped with prototypes for fixes (e.g.
> a0da84f3). But by all means belittle me... must be fun.
I really did not want to belittle you - but in hindsight it really reads
that way ... sorry about that and how insensitive it was from me! :(
I just wanted to point out that if kdump is useful it must be _visible_.
Ask commit logs to include "this was debugged via kdump" lines, etc. The
upstream kernel must feel that it all matters.
The upstream kernel really has to be ruthless about such things and must
react to how things are not to how things are wished to be - one of the
most critical things is that keeps Linux ticking is people like you who
report and debug problems.
Note, back when kdump was added to the kernel many moons ago i strongly
supported it and helped out with the patches, etc. I still think it might
have the potential to become big - but it needs a ton of tech and care to
reach that level of convenience.
'kdump light' perhaps that dumps the most important data structures like
registers of all CPUs, task struct and the symbol tables, the current task
itself including the kernel stack plus the surrounding 4K of all pointers
that are in current registers and that point into kernel memory - maybe
straight to kerneloops.org [if the user agrees] - or something like that.
Ingo
On Sat, Jan 10, 2009 at 8:28 PM, Ingo Molnar <[email protected]> wrote:
>
> Note, back when kdump was added to the kernel many moons ago i strongly
> supported it and helped out with the patches, etc. I still think it might
> have the potential to become big - but it needs a ton of tech and care to
> reach that level of convenience.
>
> 'kdump light' perhaps that dumps the most important data structures like
> registers of all CPUs, task struct and the symbol tables, the current task
> itself including the kernel stack plus the surrounding 4K of all pointers
> that are in current registers and that point into kernel memory - maybe
> straight to kerneloops.org [if the user agrees] - or something like that.
I think 'kdump light' is a good idea. I'm all for infrastructure that
works better for more people. Having to deal with multi-gigabyte dump
files can be a chore.
The mechanics of dumping your suggested 'light' amount of data vs. all
memory should be configurable (e.g. /sys/kernel/kexec_crash_light).
And this obviously doesn't change the potentially fragile nature of
kexec'ing to a crash kernel from an arbitrary context; or the fact
that drivers can easily be incompatible with cleanly shutting down and
restarting on kexec.
I worked with Eric Biederman testing in the early days of his kexec
work and the e1000 driver was incompatible with kexec at that time
(IFF it was built into the kernel, workaround was to use a module and
unload it before kexec, *shudder*; I was using kexec in a custom
bootloader for a storage appliance, not for kdump).
But honestly 99+% of my filesystem/storage enduced Linux crashes
kexec/kdump properly (with RHEL5, 2.6.22, 2.6.25, and 2.6.28); so all
the hard work of people like yourself and other kexec/kdump hackers
(upstream and at RedHat) really is paying off for real Linux users!
The fairly recent kvm kdump compatibility work (2340b62f) is a perfect
example of how hard things can be. But it is encouraging to see such
commendable effort being put to making kdump workable for all.
Now if only I could fix line numbers when debugging crashes in x86_64
modules with the crash utility! :)
Regards,
Mike
On Jan 10, 2009 16:15 -0500, Theodore Ts'o wrote:
> In my experience, there are very few kernel versions and hardware for
> which kdump works. I've talked to the people who have to make kdump
> work, and every 12-18 months, with a new set of enterprise kernels
> comes out, they have to go and fix kdump so it works again for the set
> of hardware that they care about, and for the kernel version involved.
I'm sad that netconsole/netdump never made it big. It was fairly useful,
and extending the eth drivers to add the polling mode was trivial to do.
We were using that for a few years, but it got replaced by kdump and it
appears to be less usable IMHO.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
On Sun, Jan 11, 2009 at 5:11 AM, Andreas Dilger <[email protected]> wrote:
> On Jan 10, 2009 16:15 -0500, Theodore Ts'o wrote:
>> In my experience, there are very few kernel versions and hardware for
>> which kdump works. I've talked to the people who have to make kdump
>> work, and every 12-18 months, with a new set of enterprise kernels
>> comes out, they have to go and fix kdump so it works again for the set
>> of hardware that they care about, and for the kernel version involved.
>
> I'm sad that netconsole/netdump never made it big. It was fairly useful,
> and extending the eth drivers to add the polling mode was trivial to do.
> We were using that for a few years, but it got replaced by kdump and it
> appears to be less usable IMHO.
Less usable in terms of ease of configuration and use? Or
reliability? In practice netdump's non-interrupt-driven polling mode
proved fairly reliable but it seems to me kdump provides more reliable
dumping than netdump. Because you're performing the kdump from a
reliable new dump kernel (provided the initial kexec transition to the
dump kernel works properly).
Mike
> 'kdump light' perhaps that dumps the most important data structures like
> registers of all CPUs, task struct and the symbol tables, the current task
> itself including the kernel stack plus the surrounding 4K of all pointers
> that are in current registers and that point into kernel memory - maybe
> straight to kerneloops.org [if the user agrees] - or something like that.
All you would need for that would be a new custom level in makedumpfile
that dumps only this data (except that for live dumps it can be difficult to get
the full register contents) No kernel changes needed.
-Andi
--
[email protected]
On Sun, Jan 11, 2009 at 06:11:35PM +0800, Andreas Dilger wrote:
> I'm sad that netconsole/netdump never made it big. It was fairly useful,
> and extending the eth drivers to add the polling mode was trivial to do.
> We were using that for a few years, but it got replaced by kdump and it
> appears to be less usable IMHO.
The netdump I'm familiar with had the misfeature that it didn't do
packet retransmission, so when it was used on a customer network with
any amount of traffic, packets would get dropped and the crash dump
would utterly fail. I honestly can't remember which enterprise distro
shipped it, but I can't say I was terribly impressed. :-(
- Ted
"Mike Snitzer" <[email protected]> writes:
> On Sat, Jan 10, 2009 at 8:28 PM, Ingo Molnar <[email protected]> wrote:
>>
>> Note, back when kdump was added to the kernel many moons ago i strongly
>> supported it and helped out with the patches, etc. I still think it might
>> have the potential to become big - but it needs a ton of tech and care to
>> reach that level of convenience.
>>
>> 'kdump light' perhaps that dumps the most important data structures like
>> registers of all CPUs, task struct and the symbol tables, the current task
>> itself including the kernel stack plus the surrounding 4K of all pointers
>> that are in current registers and that point into kernel memory - maybe
>> straight to kerneloops.org [if the user agrees] - or something like that.
>
> I think 'kdump light' is a good idea. I'm all for infrastructure that
> works better for more people. Having to deal with multi-gigabyte dump
> files can be a chore.
>
> The mechanics of dumping your suggested 'light' amount of data vs. all
> memory should be configurable (e.g. /sys/kernel/kexec_crash_light).
Not in sys because this is a user space configuration issue.
All of the dumping happens from user space. The kernel just provides
access to the state of the previous kernel.
> And this obviously doesn't change the potentially fragile nature of
> kexec'ing to a crash kernel from an arbitrary context; or the fact
> that drivers can easily be incompatible with cleanly shutting down and
> restarting on kexec.
Yep. Although the general answer in the kdump case is that if
the kdump kernel is running you have gotten past all of the driver
problems.
> But honestly 99+% of my filesystem/storage enduced Linux crashes
> kexec/kdump properly (with RHEL5, 2.6.22, 2.6.25, and 2.6.28); so all
> the hard work of people like yourself and other kexec/kdump hackers
> (upstream and at RedHat) really is paying off for real Linux users!
Thanks. It is good to hear that the code works in the field.
> Now if only I could fix line numbers when debugging crashes in x86_64
> modules with the crash utility! :)
It's a userspace problem...
All of the little usability things are userspace problems.
I won't claim that it is trivial because it is a userspace problem, at the same
time there is no reason to wait for any kernel features to merge etc. Someone
just has to scratch an itch and go fix it.
Eric