LinuxLists.cc - [ANNOUNCE] Merkey's Kernel Debugger

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

This patch is formally submitted for consideration for inclusion in the
base linux kernel.

ftp://ftp.wolfmountaingroup.org/pub/mdb/mdb-2.6.26-ia32-08-02-08.patch

Jeff

2008-08-03 20:00:26

by Rene Herman

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

On 03-08-08 21:36, [email protected] wrote:

> This patch is formally submitted for consideration for inclusion in the
> base linux kernel.
>
> ftp://ftp.wolfmountaingroup.org/pub/mdb/mdb-2.6.26-ia32-08-02-08.patch

Haven't actually looked, but you should've probably waited just a bit
for people to start using and then getting fed up with kgdb...

Rene.

2008-08-04 00:14:26

by Josh Boyer

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

On Sun, 2008-08-03 at 13:36 -0600, [email protected] wrote:
>
> This patch is formally submitted for consideration for inclusion in the
> base linux kernel.
>
> ftp://ftp.wolfmountaingroup.org/pub/mdb/mdb-2.6.26-ia32-08-02-08.patch

Formally submitted patches should be sent to the list inline. Reviewing
something on an FTP server just becomes that much harder.

josh

2008-08-04 02:40:06

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

> On Sun, 2008-08-03 at 13:36 -0600, [email protected] wrote:
>>
>> This patch is formally submitted for consideration for inclusion in the
>> base linux kernel.
>>
>> ftp://ftp.wolfmountaingroup.org/pub/mdb/mdb-2.6.26-ia32-08-02-08.patch
>
> Formally submitted patches should be sent to the list inline. Reviewing
> something on an FTP server just becomes that much harder.
>
> josh
>
>

Submitted as inline patches.

:-)

Jeff

2008-08-04 13:42:43

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

[email protected] wrote:
>> On Sun, 2008-08-03 at 13:36 -0600, [email protected] wrote:
>>>
>>> This patch is formally submitted for consideration for inclusion in the
>>> base linux kernel.
>>>
>>> ftp://ftp.wolfmountaingroup.org/pub/mdb/mdb-2.6.26-ia32-08-02-08.patch
>>
>> Formally submitted patches should be sent to the list inline. Reviewing
>> something on an FTP server just becomes that much harder.
>>
>> josh
>>
>>
>
> Submitted as inline patches.

Some non-technical comments to the patch series:
- Each patch posting in a patch series should have an own Subject and
changelog which specifically describes the included patch.
- The Developer's Certificate of Origin is written simply as a single
line:
Signed-off-by: Jeffrey Vernon Merkey <email@address>
This line needs to be included in the changelog of each patch, i.e.
precedes the diff. (Tools which harvest patches from mboxes are
trained to pick the changelog up from before the diff.)
- The MUA rewrapped some lines.
- File name and date of last change are redundant information and are
better left out of the source files.
- Understandably for a port from other kernels, there are clashes with
Linux kernel's coding style like CamelCase names, comment style,
indentations.
- Why define LONGLONG, WORD, BYTE and so on? They could be plain
unsigned char etc., or u8 etc. if you like it brief.
- Boolean values should be the standard true and false, not locally
defined TRUE and FALSE.
- Usually the #include's are not collected in an intermediary header
(as in patch 7/25) but put directly into the files which require
a particular #include.

I haven't looked in detail at the patches; it's far out of my area of
experience...
--
Stefan Richter
-=====-==--- =--- --=--
http://arcgraph.de/sr/

2008-08-04 14:54:18

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

OK, Sounds like I get a D- on patch format submission. I will rework
the patches, switch back to GPL2 (since I guess GPL 3 is still not there
yet) and clean up this list of issues. ULONG, etc. is Microsoft syntax
for cross platform compatibility. Since this is a LINUX SPECIFIC PATCH,
I'll rip out and rework the Gates-isms in the code.

All that aside, the damn works so at least folks can start using it while
I perform code beautification.

Jeff

> [email protected] wrote:
>>> On Sun, 2008-08-03 at 13:36 -0600, [email protected] wrote:
>>>>
>>>> This patch is formally submitted for consideration for inclusion in
>>>> the
>>>> base linux kernel.
>>>>
>>>> ftp://ftp.wolfmountaingroup.org/pub/mdb/mdb-2.6.26-ia32-08-02-08.patch
>>>
>>> Formally submitted patches should be sent to the list inline.
>>> Reviewing
>>> something on an FTP server just becomes that much harder.
>>>
>>> josh
>>>
>>>
>>
>> Submitted as inline patches.
>
> Some non-technical comments to the patch series:
> - Each patch posting in a patch series should have an own Subject and
> changelog which specifically describes the included patch.
> - The Developer's Certificate of Origin is written simply as a single
> line:
> Signed-off-by: Jeffrey Vernon Merkey <email@address>
> This line needs to be included in the changelog of each patch, i.e.
> precedes the diff. (Tools which harvest patches from mboxes are
> trained to pick the changelog up from before the diff.)
> - The MUA rewrapped some lines.
> - File name and date of last change are redundant information and are
> better left out of the source files.
> - Understandably for a port from other kernels, there are clashes with
> Linux kernel's coding style like CamelCase names, comment style,
> indentations.
> - Why define LONGLONG, WORD, BYTE and so on? They could be plain
> unsigned char etc., or u8 etc. if you like it brief.
> - Boolean values should be the standard true and false, not locally
> defined TRUE and FALSE.
> - Usually the #include's are not collected in an intermediary header
> (as in patch 7/25) but put directly into the files which require
> a particular #include.
>
> I haven't looked in detail at the patches; it's far out of my area of
> experience...
> --
> Stefan Richter
> -=====-==--- =--- --=--
> http://arcgraph.de/sr/
>

2008-08-05 09:41:43

by Geert Uytterhoeven

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

On Mon, 4 Aug 2008, [email protected] wrote:
> yet) and clean up this list of issues. ULONG, etc. is Microsoft syntax
> for cross platform compatibility. Since this is a LINUX SPECIFIC PATCH,

You're aware that the Microsoft assumption

typedef unsigned long ULONG

is not compatible with 64-bit platforms in the rest of the world?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2008-08-05 15:23:43

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

> On Mon, 4 Aug 2008, [email protected] wrote:
>> yet) and clean up this list of issues. ULONG, etc. is Microsoft syntax
>> for cross platform compatibility. Since this is a LINUX SPECIFIC PATCH,
>
> You're aware that the Microsoft assumption
>
> typedef unsigned long ULONG
>
> is not compatible with 64-bit platforms in the rest of the world?
>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 --
> [email protected]

No I was not, but I am now. At any rate, I removed the Microsoft-isms
from the code. I can cut yet another patch for git6, but git5 was there
-- GPL2 and all. How about putting in into the kernel guys -- :-)

Jeff

2008-08-05 15:33:28

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

On Wednesday 06 August 2008 01:02, [email protected] wrote:
> > On Mon, 4 Aug 2008, [email protected] wrote:

> > --
> > Geert Uytterhoeven -- There's lots of Linux beyond ia32 --
> > [email protected]
>
> No I was not, but I am now. At any rate, I removed the Microsoft-isms
> from the code. I can cut yet another patch for git6, but git5 was there
> -- GPL2 and all. How about putting in into the kernel guys -- :-)

Seriously? Because it doesn't seem to have had enough peer review,
it hasn't had widespread testing in somewhere like linux-next or
-mm, and we already have kgdb so you have to also explain why you
can't improve kgdb in the areas it trails mdb.

But the ideal outcome would be if you could contribute patches to
kgdb to the point where it is as good as mdb. It is already in the
tree and supported by a handful of architectures... any chance of
that? (I don't know kernel debugger code, so I ask as an interested
user)

2008-08-05 15:40:55

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

> On Wednesday 06 August 2008 01:02, [email protected] wrote:
>> > On Mon, 4 Aug 2008, [email protected] wrote:
>
>> > --
>> > Geert Uytterhoeven -- There's lots of Linux beyond ia32 --
>> > [email protected]
>>
>> No I was not, but I am now. At any rate, I removed the Microsoft-isms
>> from the code. I can cut yet another patch for git6, but git5 was there
>> -- GPL2 and all. How about putting in into the kernel guys -- :-)
>
> Seriously? Because it doesn't seem to have had enough peer review,
> it hasn't had widespread testing in somewhere like linux-next or
> -mm, and we already have kgdb so you have to also explain why you
> can't improve kgdb in the areas it trails mdb.

If you go back to LKML from 2000, this debugger has been around for 10
years. I agree not in the hands of the public, but its very mature
in comparison to kdb or kgdb.

>
> But the ideal outcome would be if you could contribute patches to
> kgdb to the point where it is as good as mdb. It is already in the
> tree and supported by a handful of architectures... any chance of
> that? (I don't know kernel debugger code, so I ask as an interested
> user)
>

I plan to work on kdb and yes, there is a version of this that runs
as an alternate debugger of kdb - you can even switch back and forth
between them - but that misses the point as well.

I can wait untl its more widespread -- or not.

Jeff

2008-08-05 15:52:27

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

On Wednesday 06 August 2008 01:19, [email protected] wrote:
> > On Wednesday 06 August 2008 01:02, [email protected] wrote:
> >> > On Mon, 4 Aug 2008, [email protected] wrote:
> >> >
> >> > --
> >> > Geert Uytterhoeven -- There's lots of Linux beyond ia32 --
> >> > [email protected]
> >>
> >> No I was not, but I am now. At any rate, I removed the Microsoft-isms
> >> from the code. I can cut yet another patch for git6, but git5 was there
> >> -- GPL2 and all. How about putting in into the kernel guys -- :-)
> >
> > Seriously? Because it doesn't seem to have had enough peer review,
> > it hasn't had widespread testing in somewhere like linux-next or
> > -mm, and we already have kgdb so you have to also explain why you
> > can't improve kgdb in the areas it trails mdb.
>
> If you go back to LKML from 2000, this debugger has been around for 10
> years. I agree not in the hands of the public, but its very mature
> in comparison to kdb or kgdb.

OK I don't doubt that at all, but I just mean in terms of being reviewed
by Linux people and how it merges with the current kernel (eg. we now
have a debugger, which was unthinkable in 2000 :)).

> > But the ideal outcome would be if you could contribute patches to
> > kgdb to the point where it is as good as mdb. It is already in the
> > tree and supported by a handful of architectures... any chance of
> > that? (I don't know kernel debugger code, so I ask as an interested
> > user)
>
> I plan to work on kdb and yes, there is a version of this that runs
> as an alternate debugger of kdb - you can even switch back and forth
> between them - but that misses the point as well.
>
> I can wait untl its more widespread -- or not.

That would be great if you do work on kgdb... But I guess I do miss
the point, then. Is there a technical difference with kgdb that cannot
be worked around, a difference of opinion with maintainers, a wish to
have mdb features at short notice?

2008-08-05 15:53:14

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

> On Wednesday 06 August 2008 01:19, [email protected] wrote:
>> > On Wednesday 06 August 2008 01:02, [email protected]
>> wrote:
>> >> > On Mon, 4 Aug 2008, [email protected] wrote:
>> >> >
>> >> > --
>> >> > Geert Uytterhoeven -- There's lots of Linux beyond ia32 --
>> >> > [email protected]
>> >>
>> >> No I was not, but I am now. At any rate, I removed the
>> Microsoft-isms
>> >> from the code. I can cut yet another patch for git6, but git5 was
>> there
>> >> -- GPL2 and all. How about putting in into the kernel guys -- :-)
>> >
>> > Seriously? Because it doesn't seem to have had enough peer review,
>> > it hasn't had widespread testing in somewhere like linux-next or
>> > -mm, and we already have kgdb so you have to also explain why you
>> > can't improve kgdb in the areas it trails mdb.
>>
>> If you go back to LKML from 2000, this debugger has been around for 10
>> years. I agree not in the hands of the public, but its very mature
>> in comparison to kdb or kgdb.
>
> OK I don't doubt that at all, but I just mean in terms of being reviewed
> by Linux people and how it merges with the current kernel (eg. we now
> have a debugger, which was unthinkable in 2000 :)).
>
>
>> > But the ideal outcome would be if you could contribute patches to
>> > kgdb to the point where it is as good as mdb. It is already in the
>> > tree and supported by a handful of architectures... any chance of
>> > that? (I don't know kernel debugger code, so I ask as an interested
>> > user)
>>
>> I plan to work on kdb and yes, there is a version of this that runs
>> as an alternate debugger of kdb - you can even switch back and forth
>> between them - but that misses the point as well.
>>
>> I can wait untl its more widespread -- or not.
>
> That would be great if you do work on kgdb... But I guess I do miss
> the point, then. Is there a technical difference with kgdb that cannot
> be worked around, a difference of opinion with maintainers, a wish to
> have mdb features at short notice?

Nick, its OK. There have been 27,453 downloads of the patches from my ftp
server since yesterday when I osted it -- from what I am seeing people are
voting with their feet. People can get it and I even posted it t
SourceForge as well. After ten years of working on Linux I thougt it
would be nice for something I wrote to end up there. It will happen when
its time. As it stands, people are using it and it is going to help a lot
of folks, which is what this is all about.

:-)

Jeff
>

2008-08-05 16:05:52

by Chris Friesen

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

[email protected] wrote:

> If you go back to LKML from 2000, this debugger has been around for 10
> years. I agree not in the hands of the public, but its very mature
> in comparison to kdb or kgdb.

Without public use, it's difficult to determine that there aren't any
nasty interactions.

If you want to maximize your chances of getting this code into the
kernel, you might want to read Jonathan Corbet's post, "[PATCH] A
development process document, V2". It discusses the normal process, how
to prepare patches for submission, etc.

Chris

2008-08-05 16:39:20

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

On Wednesday 06 August 2008 01:32, [email protected] wrote:
> > On Wednesday 06 August 2008 01:19, [email protected] wrote:

> > That would be great if you do work on kgdb... But I guess I do miss
> > the point, then. Is there a technical difference with kgdb that cannot
> > be worked around, a difference of opinion with maintainers, a wish to
> > have mdb features at short notice?
>
> Nick, its OK. There have been 27,453 downloads of the patches from my ftp
> server since yesterday when I osted it -- from what I am seeing people are
> voting with their feet. People can get it and I even posted it t
> SourceForge as well. After ten years of working on Linux I thougt it
> would be nice for something I wrote to end up there. It will happen when
> its time. As it stands, people are using it and it is going to help a lot
> of folks, which is what this is all about.
>
> :-)

That's all well and good :). But it didn't exactly answer my question.
My question was not what is the point of you writing these patches, but
what is the point of merging it into the kernel (over the alternatives).
It may seem like a trivial question, but it is one that must be answered
in order to be considered to get merged.

2008-08-05 17:00:18

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

> [email protected] wrote:
>
>> If you go back to LKML from 2000, this debugger has been around for 10
>> years. I agree not in the hands of the public, but its very mature
>> in comparison to kdb or kgdb.
>
> Without public use, it's difficult to determine that there aren't any
> nasty interactions.
>
> If you want to maximize your chances of getting this code into the
> kernel, you might want to read Jonathan Corbet's post, "[PATCH] A
> development process document, V2". It discusses the normal process, how
> to prepare patches for submission, etc.
>
> Chris
>

Read it already. Quite a few large companies are using it at present and
have been since 2000, BTW.

Jeff

2008-08-05 17:06:19

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

> On Wednesday 06 August 2008 01:32, [email protected] wrote:
>> > On Wednesday 06 August 2008 01:19, [email protected]
>> wrote:
>
>> > That would be great if you do work on kgdb... But I guess I do miss
>> > the point, then. Is there a technical difference with kgdb that cannot
>> > be worked around, a difference of opinion with maintainers, a wish to
>> > have mdb features at short notice?
>>
>> Nick, its OK. There have been 27,453 downloads of the patches from my
>> ftp
>> server since yesterday when I osted it -- from what I am seeing people
>> are
>> voting with their feet. People can get it and I even posted it t
>> SourceForge as well. After ten years of working on Linux I thougt it
>> would be nice for something I wrote to end up there. It will happen
>> when
>> its time. As it stands, people are using it and it is going to help a
>> lot
>> of folks, which is what this is all about.
>>
>> :-)
>
> That's all well and good :). But it didn't exactly answer my question.
> My question was not what is the point of you writing these patches, but
> what is the point of merging it into the kernel (over the alternatives).
> It may seem like a trivial question, but it is one that must be answered
> in order to be considered to get merged.
>

Integrated kernel debugger in linux (minimal one) and given that there are
already patches to add tickets and text to locks and other tools, one more
can only help. This is by no means the full MDB debugger you have seen,
just a pared down core I submitted. The entire MDB debugger is much
larger.

I have been working on it for ten years, and you may or may not have
noticed, I typically do not ask many questions these days from the
community for my appliance and router development, nor ask for help for
any of the companies I have created and sold based on Linux over the past
ten years since I have tools to fix my stuff without needing a hardware
based inverse assembler like most folks need to debug hardware and file
systems on linux these days.

:-)

Jeff

Jeff

2008-08-05 17:22:58

by Paul Mundt

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

On Tue, Aug 05, 2008 at 09:19:20AM -0600, [email protected] wrote:
> > On Wednesday 06 August 2008 01:02, [email protected] wrote:
> >> No I was not, but I am now. At any rate, I removed the Microsoft-isms
> >> from the code. I can cut yet another patch for git6, but git5 was there
> >> -- GPL2 and all. How about putting in into the kernel guys -- :-)
> >
> > Seriously? Because it doesn't seem to have had enough peer review,
> > it hasn't had widespread testing in somewhere like linux-next or
> > -mm, and we already have kgdb so you have to also explain why you
> > can't improve kgdb in the areas it trails mdb.
>
> If you go back to LKML from 2000, this debugger has been around for 10
> years. I agree not in the hands of the public, but its very mature
> in comparison to kdb or kgdb.
>
That's great, except kgdb has existed in the kernel for various
architectures well before that as well. ppc32's stub dates back to 1998,
sh had it since 2001, mips around the same time, etc, etc. While the
current rework and tidying of the stubs is something new, kgdb itself is
not.

> > But the ideal outcome would be if you could contribute patches to
> > kgdb to the point where it is as good as mdb. It is already in the
> > tree and supported by a handful of architectures... any chance of
> > that? (I don't know kernel debugger code, so I ask as an interested
> > user)
>
> I plan to work on kdb and yes, there is a version of this that runs
> as an alternate debugger of kdb - you can even switch back and forth
> between them - but that misses the point as well.
>
kgdb and kdb are totally different things, kgdb is what is generally
available and worth improving in-kernel.

While it's certainly good to have options, having multiple in-kernel
debuggers is not going to help matters for the vast majority of users. I
agree with Nick, it would be nice to see what we have in-kernel being
extended and worked on by more people, especially those with a background
in these things.

On the other hand, it seems like there's sufficient interest in your
project out-of-tree, so there's not really much point in merging it if
you're content with the interface as it exists today and it continues to
work for your users.

One of the things we can do however is try to provide cleaner
abstractions for the various debuggers to tie in to, so we don't end up
with each debugger piling on its own set of ifdefs in all of the same
places (int3 handling comes to mind, which you could already do more
cleanly through the die chain today). Perhaps it would be more useful to
see what sort of hooks mdb wants in the architecture and core code, how
those overlap with kgdb, and how we might extend kgdb in areas where mdb
is more feature complete.

2008-08-05 17:31:49

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

> That's great, except kgdb has existed in the kernel for various
> architectures well before that as well. ppc32's stub dates back to 1998,
> sh had it since 2001, mips around the same time, etc, etc. While the
> current rework and tidying of the stubs is something new, kgdb itself is
> not.
>
>> > But the ideal outcome would be if you could contribute patches to
>> > kgdb to the point where it is as good as mdb. It is already in the
>> > tree and supported by a handful of architectures... any chance of
>> > that? (I don't know kernel debugger code, so I ask as an interested
>> > user)
>>
>> I plan to work on kdb and yes, there is a version of this that runs
>> as an alternate debugger of kdb - you can even switch back and forth
>> between them - but that misses the point as well.
>>
> kgdb and kdb are totally different things, kgdb is what is generally
> available and worth improving in-kernel.
>
> While it's certainly good to have options, having multiple in-kernel
> debuggers is not going to help matters for the vast majority of users. I
> agree with Nick, it would be nice to see what we have in-kernel being
> extended and worked on by more people, especially those with a background
> in these things.

Not your call to make. Kernel Debuggers are very personal choices and
its pure arrogance to assume any of us can make a choice for someone else
with tools. My tastes in debuggers is like my tastes in food, or women,
or what kin of toothbrush I like to use.

>
> On the other hand, it seems like there's sufficient interest in your
> project out-of-tree, so there's not really much point in merging it if
> you're content with the interface as it exists today and it continues to
> work for your users.
>
> One of the things we can do however is try to provide cleaner
> abstractions for the various debuggers to tie in to, so we don't end up
> with each debugger piling on its own set of ifdefs in all of the same
> places (int3 handling comes to mind, which you could already do more
> cleanly through the die chain today). Perhaps it would be more useful to
> see what sort of hooks mdb wants in the architecture and core code, how
> those overlap with kgdb, and how we might extend kgdb in areas where mdb
> is more feature complete.
>

This is a great suggestion. mdb already uses an alternate debugger
interface with the hooks into traps_XX.c and reboot_XX.c. I still would
like to see it in kernel. but an alternate debugger interface as you
point
out is almost a necessity at this point. there's a good example in
mdb.c and mdb-list.c.

Jeff

2008-08-06 03:08:32

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

Nick Piggin <[email protected]> writes:
>
> Seriously? Because it doesn't seem to have had enough peer review,
> it hasn't had widespread testing in somewhere like linux-next or
> -mm, and we already have kgdb so you have to also explain why you
> can't improve kgdb in the areas it trails mdb.
>
> But the ideal outcome would be if you could contribute patches to
> kgdb to the point where it is as good as mdb. It is already in the

I don't think kgdb and a simple assembler debugger
are directly comparable. kgdb always requires a remote machine,
which has many advantages, but is also often very inconvenient
or impossible to arrange. An low overhead assembler debugger
can be always compiled in just in case.

Also at least for the x86 port the debugger interfaces should
be general enough now (see die hooks as a "debug vfs") that it would
be quite possible to have a multitude of debuggers just using
them. In fact that's already the cases, kprobes and kgdb and
kdump are all kinds of debuggers using such hooks.

As long as it doesn't impact the core code and the mdb
code itself is considered merge worthy and has clean interfaces
that would seem fine to me.It essentially would just live somewhere in
its own directory using the existing interfaces. My standard
test for seeing if a debugger has clean interfaces is to see
if it can be loaded as a module.

There are enough different debugging styles around that offering
developers different tools of which they can pick whatever suits
them is not a bad idea. Also as everyone knows debugging
is often a major time eater and if more tools are available that
can only help the kernel.

That said I haven't read the mdb code, not judging on its general
merge-worthiness or am really completely sure what are all the details
of a "netware style debugger", just a general high level comment on
debuggers. At least judging based on the patch sizes it at least
doesn't seem particularly bloated. But of course it would need full
proper review first.

-Andi

2008-08-06 05:50:42

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

On Wednesday 06 August 2008 13:08, Andi Kleen wrote:
> Nick Piggin <[email protected]> writes:
> > Seriously? Because it doesn't seem to have had enough peer review,
> > it hasn't had widespread testing in somewhere like linux-next or
> > -mm, and we already have kgdb so you have to also explain why you
> > can't improve kgdb in the areas it trails mdb.
> >
> > But the ideal outcome would be if you could contribute patches to
> > kgdb to the point where it is as good as mdb. It is already in the
>
> I don't think kgdb and a simple assembler debugger
> are directly comparable. kgdb always requires a remote machine,
> which has many advantages, but is also often very inconvenient
> or impossible to arrange. An low overhead assembler debugger
> can be always compiled in just in case.
>
> Also at least for the x86 port the debugger interfaces should
> be general enough now (see die hooks as a "debug vfs") that it would
> be quite possible to have a multitude of debuggers just using
> them. In fact that's already the cases, kprobes and kgdb and
> kdump are all kinds of debuggers using such hooks.
>
> As long as it doesn't impact the core code and the mdb
> code itself is considered merge worthy and has clean interfaces
> that would seem fine to me.It essentially would just live somewhere in
> its own directory using the existing interfaces. My standard
> test for seeing if a debugger has clean interfaces is to see
> if it can be loaded as a module.
>
> There are enough different debugging styles around that offering
> developers different tools of which they can pick whatever suits
> them is not a bad idea. Also as everyone knows debugging
> is often a major time eater and if more tools are available that
> can only help the kernel.
>
> That said I haven't read the mdb code, not judging on its general
> merge-worthiness or am really completely sure what are all the details
> of a "netware style debugger", just a general high level comment on
> debuggers. At least judging based on the patch sizes it at least
> doesn't seem particularly bloated. But of course it would need full
> proper review first.

OK thanks for the info. I don't actually know debugger code as I
said, so I wasn't against merging mdb if it offers things that
kgdb fundamentally cannot.

If so, then ensuring clean interfaces indeed would seem like a
good first step to getting it merged.

2008-08-06 13:02:12

by Bill Davidsen

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

Andi Kleen wrote:
> Nick Piggin <[email protected]> writes:
>> Seriously? Because it doesn't seem to have had enough peer review,
>> it hasn't had widespread testing in somewhere like linux-next or
>> -mm, and we already have kgdb so you have to also explain why you
>> can't improve kgdb in the areas it trails mdb.
>>
>> But the ideal outcome would be if you could contribute patches to
>> kgdb to the point where it is as good as mdb. It is already in the
>
That idea sounds familiar, the "suspend2" response, when something new
and significantly different is offered, instead of putting it in and
letting people choose in configuration, take the position that what is
there is good enough, and if the author of the new solution will just
drop all their ideas and slap some band-aids on the existing code it
will be "gooder enough" without actually offering people a choice of
something different.

And Andi explains just *why* this is different (and in many cases better):
> I don't think kgdb and a simple assembler debugger
> are directly comparable. kgdb always requires a remote machine,
> which has many advantages, but is also often very inconvenient
> or impossible to arrange. An low overhead assembler debugger
> can be always compiled in just in case.
>
I totally agree with this, the whole idea of a remote machine implies
that the ability to connect is not what you are debugging.

> Also at least for the x86 port the debugger interfaces should
> be general enough now (see die hooks as a "debug vfs") that it would
> be quite possible to have a multitude of debuggers just using
> them. In fact that's already the cases, kprobes and kgdb and
> kdump are all kinds of debuggers using such hooks.
>
> As long as it doesn't impact the core code and the mdb
> code itself is considered merge worthy and has clean interfaces
> that would seem fine to me.It essentially would just live somewhere in
> its own directory using the existing interfaces. My standard
> test for seeing if a debugger has clean interfaces is to see
> if it can be loaded as a module.
>
> There are enough different debugging styles around that offering
> developers different tools of which they can pick whatever suits
> them is not a bad idea. Also as everyone knows debugging
> is often a major time eater and if more tools are available that
> can only help the kernel.
>
In addition to "Bravo!" I will add that tools which work somewhat
differently will increase the chances of having a tool which will work
at all, depending on what's being investigated.

> That said I haven't read the mdb code, not judging on its general
> merge-worthiness or am really completely sure what are all the details
> of a "netware style debugger", just a general high level comment on
> debuggers. At least judging based on the patch sizes it at least
> doesn't seem particularly bloated. But of course it would need full
> proper review first.
>
I would suggest that if it meets coding standards and doesn't break
anything else it could be included in -mm (assume there's no objection
there) and let people beat on it there, with the assumption that unless
problems are found it will be promoted.

The need for a special setup make spur-of-the-moment investigation of
unusual behavior difficult for anyone but a hard-core developer who does
daily work on a setup with the remote machine available at hand. I think
this new approach would encourage people to do quick checks when the
behavior is observed.

--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot

2008-08-06 13:38:36

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

Bill Davidsen wrote:
>> Nick Piggin <[email protected]> writes:
>>> and we already have kgdb so you have to also explain why you
>>> can't improve kgdb in the areas it trails mdb.
>>>
>>> But the ideal outcome would be if you could contribute patches to
>>> kgdb to the point where it is as good as mdb. It is already in the
>>
> That idea sounds familiar, the "suspend2" response, when something new
> and significantly different is offered, instead of putting it in and
> letting people choose in configuration, take the position that what is
> there is good enough, and if the author of the new solution will just
> drop all their ideas and slap some band-aids on the existing code it
> will be "gooder enough" without actually offering people a choice of
> something different.

To be fair, choice in "leaf" features like a debugger is not entirely
comparable to choice in central features. If the infrastructure does
not support all use cases reasonably well, its better to fix the
infrastructure or replace it by a working one, rather than adding a
second infrastructure which is also not general enough.

In this case: Make a side-by-side comparison of features and
shortcomings of the available debuggers (as in Andi's response), then
decide how the best of both worlds can be achieved + used + maintained
most easily --- by having both side-by-side, or by taking over some or
all of one's features into the other. Either way requires contributors
to be interested.
--
Stefan Richter
-=====-==--- =--- --==-
http://arcgraph.de/sr/

2008-08-06 13:54:23

by Olivier Galibert

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

On Wed, Aug 06, 2008 at 09:11:47AM -0400, Bill Davidsen wrote:
> I would suggest that if it meets coding standards and doesn't break
> anything else it could be included in -mm (assume there's no objection
> there) and let people beat on it there, with the assumption that unless
> problems are found it will be promoted.

It's a little too early for that. Right now it's at the phase "how to
make it better integrate with the kernel", with the use of existing
hooks, adding the needed hooks to be more complete, working as a
module, etc. When that is done then the philosophical aspects can
come into play, but it's not there yet.

OG.

2008-08-06 14:07:15

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

> On Wed, Aug 06, 2008 at 09:11:47AM -0400, Bill Davidsen wrote:
>> I would suggest that if it meets coding standards and doesn't break
>> anything else it could be included in -mm (assume there's no objection
>> there) and let people beat on it there, with the assumption that unless
>> problems are found it will be promoted.
>
> It's a little too early for that. Right now it's at the phase "how to
> make it better integrate with the kernel", with the use of existing
> hooks, adding the needed hooks to be more complete, working as a
> module, etc. When that is done then the philosophical aspects can
> come into play, but it's not there yet.
>
> OG.
>

I have removed the hooks into the /arch/x86 sections and converted the
debugger to use kprobes and notify_die as Andi suggested. It also builds
and loads as a module.

One serious point has to do with NMI handling on SMP since the notify_die
handlers use this priorty calling mechanism. I am still testing on SMP
but it seems to work -- I just am a little uncomfortable with trusting an
interface (notify_die) that can let someone come in and hook the NMI
handlers when I MUST BE ABLE TO NMI AND HALT non-focus processors first.

I am adding a special NMI state to the chain notifier to handle this case
where IT MUST BE CALLED FIRST and IT MUST BE THE ONLY EVENT CALLED. I
used the DIE_KERNELDEBUG to hook the keyboard handler in
drivers/char/keyboard.c so we have the general hook into kprobes to handle
enter debugger events.

Jeff

2008-08-06 14:17:34

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

On Wednesday 06 August 2008 23:11, Bill Davidsen wrote:
> Andi Kleen wrote:
> > Nick Piggin <[email protected]> writes:
> >> Seriously? Because it doesn't seem to have had enough peer review,
> >> it hasn't had widespread testing in somewhere like linux-next or
> >> -mm, and we already have kgdb so you have to also explain why you
> >> can't improve kgdb in the areas it trails mdb.
> >>
> >> But the ideal outcome would be if you could contribute patches to
> >> kgdb to the point where it is as good as mdb. It is already in the
>
> That idea sounds familiar, the "suspend2" response, when something new
> and significantly different is offered, instead of putting it in and
> letting people choose in configuration, take the position that what is
> there is good enough, and if the author of the new solution will just
> drop all their ideas and slap some band-aids on the existing code it
> will be "gooder enough" without actually offering people a choice of
> something different.

No. First try to integrate them together so you have the best of both
from one code base is what I was saying. I specifically said if they
are significantly different and can't be reconciled then it could be
merged.

2008-08-06 17:22:36

by Jason Wessel

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

Andi Kleen wrote:
> Nick Piggin <[email protected]> writes:
>> Seriously? Because it doesn't seem to have had enough peer review,
>> it hasn't had widespread testing in somewhere like linux-next or
>> -mm, and we already have kgdb so you have to also explain why you
>> can't improve kgdb in the areas it trails mdb.
>>
>> But the ideal outcome would be if you could contribute patches to
>> kgdb to the point where it is as good as mdb. It is already in the
>
> I don't think kgdb and a simple assembler debugger
> are directly comparable. kgdb always requires a remote machine,
> which has many advantages, but is also often very inconvenient
> or impossible to arrange. An low overhead assembler debugger
> can be always compiled in just in case.
>

It depends how you look at the problem. I would agree that the use of
gdb + kgdb vs an assembly debugger are completely different cases.
The kgdb core in the mainline kernel, can actually allow to write such
a front end however. The kgdb core has an API for I/O and it is
possible to write an I/O module that implements an in kernel assembly
debugger. The kgdb test suite is not a great example, but it is a
complete example of using the kgdb core directly without a second
machine.

If there is truly missing functionality from kgdb in terms of the way
the kgdb core is used vs mdb, it would be good to at least consider
what is missing. It is entirely possible to add functionality such
that mdb could be implemented a kgdb I/O module. In this case you
would be able to make use of zero runtime impact when a kgdb I/O
module is not configure or make use of it as an early/late/ondemand
debugger.

> Also at least for the x86 port the debugger interfaces should
> be general enough now (see die hooks as a "debug vfs") that it would
> be quite possible to have a multitude of debuggers just using
> them. In fact that's already the cases, kprobes and kgdb and
> kdump are all kinds of debuggers using such hooks.
>

I would agree that the possibility exists to use the hooks directly,
and clearly the mdb code base as it stands in this patch set does not
accomplish this.

If one were to consider integrating mdb as a kgdb I/O module, it would
have a greater degree of platform independence. The primary arch
dependencies should be narrowed down to the back tracing / disassembly
interface. The SMP / threading / breakpoints / exception handling,
would all be shared between the debugger front ends that way. The mdb
code base currently relies on re-implementing HW/SW breakpoints for
each architecture you desire to support.

Unifying some of the debugging technology is a noble goal where it
makes sense to do so. Using some of the existing kernel hook points
is a first pass requirement before a merge of mdb could be considered
for the mainline kernel.

Jason.

2008-08-06 18:56:49

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

> It depends how you look at the problem. I would agree that the use of

Yes I left the possibility of a "someone writing a in kernel kgdb UI"
out. Indeed that would be a possibility.

On the other hand I'm not sure it would save all that much code
versus just directly working on top of die notifiers.

Also the gdb stub interface definitely has its limitations too.

> > be quite possible to have a multitude of debuggers just using
> > them. In fact that's already the cases, kprobes and kgdb and
> > kdump are all kinds of debuggers using such hooks.
> >
>
> I would agree that the possibility exists to use the hooks directly,

It's not just a possibility, they are already used by multiple
debugger like subsystems. e.g. kprobes is certainly a kind of debugger.

-Andi

2008-08-06 19:48:22

by Rene Herman

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

On 05-08-08 18:38, Nick Piggin wrote:

> That's all well and good :). But it didn't exactly answer my
> question. My question was not what is the point of you writing these
> patches, but what is the point of merging it into the kernel (over
> the alternatives). It may seem like a trivial question, but it is one
> that must be answered in order to be considered to get merged.

Nick, please note there is/was some mis-communication between the two of
you with respect to kgdb, the currently merged GNU debugger interface,
and kdb, the SGI kernel debugger.

Merkey responded to you as if you asked about differences with KDB while
what you did was ask about differences with KGDB. Both KDB and MDB are
significantly different from KGDB at least in sofar that the latter is a
remote debugger; it requires two machines. KDB and MDB are local.

This makes KDB and MDB more accesible for small time use at least. The
other most profound advantage is ofcourse that it's not GDB.

Rene.

2008-08-07 13:07:05

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

UPDATE:

As per everyone's recommendations, the debugger has been fully
module-ized, and I have run checkpatch.pl and am cleaning up the slew of
messages checkpatch spits out of its tailpipe.

It would be nice if checkpatch also could FIX those areas it complains about.

I tested kprobes with NMI cross processor calls on SMP and I am unable
to break it, and the module loads and unloads very well. There is a need
for early initialization of the debugger if someone wants to debug kernel
startup and I am including support for this with another.config option,
but I am concerned about the reliance of kprobes on rcu and if this will
break early init of the debugger. The code looks ok, but another set of
eyes
would be helpful when I post the next patch series.

I will generate another patch series after I finished cleaning up the
checkpatch.pl report. I am still going through it.

Also, whoever wrote "/Documentation/volatiles_are_evil" must not have
worked with the busted-ass GNU compiler that optimizes away global
variables and busts SMP dependent code. I am not going to remove the
volatile declarations needed for SMP coordination in the debugger since
the code breaks when removed. GCC will cause massive breakage of SMP code
if you do not declare certain variables as volatile.

Whoever wrote that section doesn't understand low level SMP coding for
operating systems design and aparently has not sent over a week running
down an SMP bug only to discover it was caused by the busted-ass GCC
compiler arbitrarily deciding to optimize away a low level flag used to
signal between processors -- I have spent the time running down Stallman's
bugs.

That text should be removed from the kernel or qualified that its
advertising for GCC's malfunctioning optimization code.

Jeff

2008-08-07 15:17:51

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

On Thu, 2008-08-07 at 06:45 -0600, [email protected] wrote:

> Also, whoever wrote "/Documentation/volatiles_are_evil" must not have
> worked with the busted-ass GNU compiler that optimizes away global
> variables and busts SMP dependent code. I am not going to remove the
> volatile declarations needed for SMP coordination in the debugger since
> the code breaks when removed. GCC will cause massive breakage of SMP code
> if you do not declare certain variables as volatile.

Even with proper barrier() usage?

2008-08-07 16:06:34

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

> Also, whoever wrote "/Documentation/volatiles_are_evil" must not have
> worked with the busted-ass GNU compiler that optimizes away global
> variables and busts SMP dependent code. I am not going to remove the

The Linux way to handle this is to use gcc memory barriers.
mb()/barrier()/wmb()/rmb()/smp_rmb()/smp_wmb() etc.
Normally everything that volatile can do can be expressed by them.

On x86 such a memory barrier tells gcc that memory might
have been clobbered and needs to be flushed and also prevents the compiler
from reordering memory accesses. On other architectures it also forces ordering
on the CPU level, although that's not needed on x86 (except
in some special situations like using write-combining)

See Documentation/memory-barriers.txt

-Andi

2008-08-07 16:14:22

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

>> Also, whoever wrote "/Documentation/volatiles_are_evil" must not have
>> worked with the busted-ass GNU compiler that optimizes away global
>> variables and busts SMP dependent code. I am not going to remove the
>
> The Linux way to handle this is to use gcc memory barriers.
> mb()/barrier()/wmb()/rmb()/smp_rmb()/smp_wmb() etc.
> Normally everything that volatile can do can be expressed by them.
>
> On x86 such a memory barrier tells gcc that memory might
> have been clobbered and needs to be flushed and also prevents the compiler
> from reordering memory accesses. On other architectures it also forces
> ordering
> on the CPU level, although that's not needed on x86 (except
> in some special situations like using write-combining)
>
> See Documentation/memory-barriers.txt
>
> -Andi
>
>

Andi,

I'll instrument this as described in the documentation you referenced and
remove the volatile declarations. If this passes testing, I will repost
with these corections.

Jeff

2008-08-07 17:04:53

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

[email protected] wrote:
>>> Also, whoever wrote "/Documentation/volatiles_are_evil" must not have
>>> worked with the busted-ass GNU compiler that optimizes away global
>>> variables and busts SMP dependent code. I am not going to remove the
>> The Linux way to handle this is to use gcc memory barriers.
>> mb()/barrier()/wmb()/rmb()/smp_rmb()/smp_wmb() etc.
>> Normally everything that volatile can do can be expressed by them.
[...]
>> See Documentation/memory-barriers.txt
[...]
> I'll instrument this as described in the documentation you referenced and
> remove the volatile declarations. If this passes testing, I will repost
> with these corections.

Take care though that neither memory barriers nor volatile are what you
want if accesses need to be atomic on whatever given data structure.
(E.g. bitfield manipulations, counter increments, accesses to virtually
anything that is bigger than an integer or a pointer...)
--
Stefan Richter
-=====-==--- =--- --===
http://arcgraph.de/sr/

2008-08-07 17:46:19

by Christoph Lameter

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

Nick Piggin wrote:

> OK thanks for the info. I don't actually know debugger code as I
> said, so I wasn't against merging mdb if it offers things that
> kgdb fundamentally cannot.
>
> If so, then ensuring clean interfaces indeed would seem like a
> good first step to getting it merged.

The competing implementation is kdb not kgdb. kgdb is just a stub for remote
debugging using gdb. kdb is an in-kernel debugger like the one proposed here.

2008-08-07 18:08:36

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

Christoph Lameter wrote:
> The competing implementation is kdb not kgdb. kgdb is just a stub for remote
> debugging using gdb. kdb is an in-kernel debugger like the one proposed here.

Is there work underway to get kdb merged? (I'm just asking because I
don't know; I personally don't need kdb nor mdb.)
--
Stefan Richter
-=====-==--- =--- --===
http://arcgraph.de/sr/

2008-08-07 18:15:33

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

> Nick Piggin wrote:
>
>> OK thanks for the info. I don't actually know debugger code as I
>> said, so I wasn't against merging mdb if it offers things that
>> kgdb fundamentally cannot.
>>
>> If so, then ensuring clean interfaces indeed would seem like a
>> good first step to getting it merged.
>
> The competing implementation is kdb not kgdb. kgdb is just a stub for
> remote
> debugging using gdb. kdb is an in-kernel debugger like the one proposed
> here.
>

I don't consider them competing, just different tools for people from
different development backgrounds. GNU and DOS/Windows.

Jeff

2008-08-07 19:12:19

by Christoph Lameter

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

Stefan Richter wrote:
> Christoph Lameter wrote:
>> The competing implementation is kdb not kgdb. kgdb is just a stub for
>> remote
>> debugging using gdb. kdb is an in-kernel debugger like the one
>> proposed here.
>
> Is there work underway to get kdb merged? (I'm just asking because I
> don't know; I personally don't need kdb nor mdb.)

KDB still exists in patches but the merge effort was given up when Linus
stated that he did not want a kernel debugger. No problem to start merge
attempts again AFAICT. Jay?

2008-08-07 19:47:36

by Jay Lan

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

Christoph Lameter wrote:
> Stefan Richter wrote:
>> Christoph Lameter wrote:
>>> The competing implementation is kdb not kgdb. kgdb is just a stub for
>>> remote
>>> debugging using gdb. kdb is an in-kernel debugger like the one
>>> proposed here.
>> Is there work underway to get kdb merged? (I'm just asking because I
>> don't know; I personally don't need kdb nor mdb.)
>
> KDB still exists in patches but the merge effort was given up when Linus
> stated that he did not want a kernel debugger. No problem to start merge
> attempts again AFAICT. Jay?

To merge KDB or any other RAS tools, you need to deal with kdump. Kdump
hijack panic() before the die calling chain. For KDB or a RAS tool to
work, an infrastructure such as the "add new notifier function" by
Takenori Nagano should be in place.

His last attempt fell short, in my opinion, was partly due to his
"[PATCH 3/3] Move crash_kexec() into panic_notifier" did not do what it
meant to do: to fit kexec/kdump into the new infrastructure. That is
not fatal; it can be fixed to make it right. If community is interested
in getting a kernel debugger to the kernel, we can continue Takenori's
work. Once the infrastructure is accepted, then merging KDB or any other
kernel debugger will make sense.

Regards,
- jay

>

2008-08-07 19:55:53

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

> Christoph Lameter wrote:
>> Stefan Richter wrote:
>>> Christoph Lameter wrote:
>>>> The competing implementation is kdb not kgdb. kgdb is just a stub for
>>>> remote
>>>> debugging using gdb. kdb is an in-kernel debugger like the one
>>>> proposed here.
>>> Is there work underway to get kdb merged? (I'm just asking because I
>>> don't know; I personally don't need kdb nor mdb.)
>>
>> KDB still exists in patches but the merge effort was given up when Linus
>> stated that he did not want a kernel debugger. No problem to start merge
>> attempts again AFAICT. Jay?
>
> To merge KDB or any other RAS tools, you need to deal with kdump. Kdump
> hijack panic() before the die calling chain. For KDB or a RAS tool to
> work, an infrastructure such as the "add new notifier function" by
> Takenori Nagano should be in place.
>
> His last attempt fell short, in my opinion, was partly due to his
> "[PATCH 3/3] Move crash_kexec() into panic_notifier" did not do what it
> meant to do: to fit kexec/kdump into the new infrastructure. That is
> not fatal; it can be fixed to make it right. If community is interested
> in getting a kernel debugger to the kernel, we can continue Takenori's
> work. Once the infrastructure is accepted, then merging KDB or any other
> kernel debugger will make sense.
>
> Regards,
> - jay
>
>
>>

As I look through entry_32.S and traps_32.c I do not see where kdump is
hooking the notify_die handler which would intercept calls to a debugger.

Where does kdump hook this path?

Jeff

>
>

2008-08-07 20:06:25

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

> To merge KDB or any other RAS tools, you need to deal with kdump. Kdump
> hijack panic() before the die calling chain. For KDB or a RAS tool to

Imho kdump should just be fixed to use die chains.

-Andi

2008-08-07 20:07:52

by Bernhard Walle

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

* Andi Kleen <[email protected]> [2008-08-07 22:06]:
>
> > To merge KDB or any other RAS tools, you need to deal with kdump. Kdump
> > hijack panic() before the die calling chain. For KDB or a RAS tool to
>
> Imho kdump should just be fixed to use die chains.

Well, we had that discussion several times. I'm not against it
(instead, I would like it), but I don't think that repeating the
discussion over and over does help ...

Bernhard
--
Bernhard Walle, SUSE LINUX Products GmbH, Architecture Development

Attachments:

signature.asc (189.00 B)

2008-08-07 20:09:16

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

On Thu, Aug 07, 2008 at 10:07:37PM +0200, Bernhard Walle wrote:
> * Andi Kleen <[email protected]> [2008-08-07 22:06]:
> >
> > > To merge KDB or any other RAS tools, you need to deal with kdump. Kdump
> > > hijack panic() before the die calling chain. For KDB or a RAS tool to
> >
> > Imho kdump should just be fixed to use die chains.
>
> Well, we had that discussion several times. I'm not against it
> (instead, I would like it), but I don't think that repeating the
> discussion over and over does help ...

Just needs some code?

-Andi

2008-08-07 20:11:26

by Bernhard Walle

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

* Andi Kleen <[email protected]> [2008-08-07 22:09]:
>
> On Thu, Aug 07, 2008 at 10:07:37PM +0200, Bernhard Walle wrote:
> > * Andi Kleen <[email protected]> [2008-08-07 22:06]:
> > >
> > > > To merge KDB or any other RAS tools, you need to deal with kdump. Kdump
> > > > hijack panic() before the die calling chain. For KDB or a RAS tool to
> > >
> > > Imho kdump should just be fixed to use die chains.
> >
> > Well, we had that discussion several times. I'm not against it
> > (instead, I would like it), but I don't think that repeating the
> > discussion over and over does help ...
>
> Just needs some code?

No, it was rejected with the argument that in panic case, as less code
as possible should be executed before kexec'ing the panic kernel.

See also: http://kerneltrap.org/node/14050 (for example)

Bernhard
--
Bernhard Walle, SUSE LINUX Products GmbH, Architecture Development

2008-08-07 20:43:36

by Daniel Barkalow

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

On Tue, 5 Aug 2008, [email protected] wrote:

> Read it already. Quite a few large companies are using it at present and
> have been since 2000, BTW.

The criterion for kernel inclusion isn't really whether it works, however.
It's whether other people would be able to understand it well enough to
support it if you disappear (or if somebody else has changes that require
changes to it). If it works well but isn't nice code, nobody really
benefits from having it in the kernel distribution rather than external
(like it's been for the past 8 years). If it is nice code (somewhat
regardless of whether it happens to work right now), people can work on it
and keep it in sync with the kernel as they change things.

-Daniel
*This .sig left intentionally blank*

2008-08-07 21:24:40

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

> On Tue, 5 Aug 2008, [email protected] wrote:
>
>> Read it already. Quite a few large companies are using it at present
>> and
>> have been since 2000, BTW.
>
> The criterion for kernel inclusion isn't really whether it works, however.
> It's whether other people would be able to understand it well enough to
> support it if you disappear (or if somebody else has changes that require
> changes to it). If it works well but isn't nice code, nobody really
> benefits from having it in the kernel distribution rather than external
> (like it's been for the past 8 years). If it is nice code (somewhat
> regardless of whether it happens to work right now), people can work on it
> and keep it in sync with the kernel as they change things.
>
> -Daniel
> *This .sig left intentionally blank*
>

It both works and is nice code. But I may not be impartial.

Jeff

2008-08-07 21:26:31

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

>> On Tue, 5 Aug 2008, [email protected] wrote:
>>
>>> Read it already. Quite a few large companies are using it at present
>>> and
>>> have been since 2000, BTW.
>>
>> The criterion for kernel inclusion isn't really whether it works,
>> however.
>> It's whether other people would be able to understand it well enough to
>> support it if you disappear (or if somebody else has changes that
>> require
>> changes to it). If it works well but isn't nice code, nobody really
>> benefits from having it in the kernel distribution rather than external
>> (like it's been for the past 8 years). If it is nice code (somewhat
>> regardless of whether it happens to work right now), people can work on
>> it
>> and keep it in sync with the kernel as they change things.
>>
>> -Daniel
>> *This .sig left intentionally blank*
>>
>
> It both works and is nice code. But I may not be impartial.
>
> Jeff
>
>

You activate it from the keyboard the same way as kdb -- pause/break or
int X or exceptions.

Jeff

2008-08-07 22:29:12

by Keith Owens

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

Andi Kleen (on Thu, 7 Aug 2008 22:06:59 +0200) wrote:
>> To merge KDB or any other RAS tools, you need to deal with kdump. Kdump
>> hijack panic() before the die calling chain. For KDB or a RAS tool to
>
>Imho kdump should just be fixed to use die chains.

Violently agree, especially since the IA64 handling of NMI type
events is significantly different from x86 and requires at least two
callbacks via the die chain.

Alas the kdump authors are adamant that they will not use die chains,
which makes it almost impossible for any other RAS code to coexist with
kdump. This intransigence on the part of kdump is one of the reasons
that I gave up on getting _any_ RAS code (not just KDB) into the Linux
kernel.

See http://kerneltrap.org/node/14050 and
http://marc.info/?l=linux-arch&m=116304508731232&w=2, the latter
explains why you need die chains to handle IA64 correctly. x86
debugging is relatively easy, ia64 is hard due to interactions between
the firmware and the OS, either can stop the other cpus. If your
debugging framework does not handle ia64 INIT and MCA events, then you
cannot debug most of the interesting ia64 events.

In any case, we have gone round this loop too many times for me to care
about it any more. I have given up on Linux RAS code.

2008-08-08 00:29:09

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

[email protected] wrote:
> It would be nice if checkpatch also could FIX those areas it complains about.

scripts/Lindent can at least help with some of the whitespace changes.
It's long ago though that I used it myself, so I have no idea how well
that works.
--
Stefan Richter
-=====-==--- =--- -=---
http://arcgraph.de/sr/

2008-08-08 01:16:18

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

On Fri, Aug 08, 2008 at 08:28:54AM +1000, Keith Owens wrote:
> Andi Kleen (on Thu, 7 Aug 2008 22:06:59 +0200) wrote:
> >> To merge KDB or any other RAS tools, you need to deal with kdump. Kdump
> >> hijack panic() before the die calling chain. For KDB or a RAS tool to
> >
> >Imho kdump should just be fixed to use die chains.
>
> Violently agree, especially since the IA64 handling of NMI type
> events is significantly different from x86 and requires at least two
> callbacks via the die chain.
>
> Alas the kdump authors are adamant that they will not use die chains,
> which makes it almost impossible for any other RAS code to coexist with
> kdump. This intransigence on the part of kdump is one of the reasons
> that I gave up on getting _any_ RAS code (not just KDB) into the Linux
> kernel.
>

I am doing a quick source code grep and in all the cases except panic,
kdump gets a chance to run in the end. We are running die notifications
first. For example, in the case of nmi, in the case of traps,
in the case of mce, notifier list is being executed first. So a debugger
or any other RAS tool on the notifier chain will get a chance to
run first.

panic() is the only place where kdump gets a chance to run first and
panic notifiers are not executed.

To me so far only in kernel debugger seems to be a reasonable candiate
which needs to run before kdump after a panic event. If a debugger
is really getting merged into the kernel, then I think debugger can
put a hook in the panic() before kdump. Wouldn't this solve the problem?

Thanks
Vivek

2008-08-08 01:27:23

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

On Thu, Aug 07, 2008 at 01:34:22PM -0600, [email protected] wrote:
> > Christoph Lameter wrote:
> >> Stefan Richter wrote:
> >>> Christoph Lameter wrote:
> >>>> The competing implementation is kdb not kgdb. kgdb is just a stub for
> >>>> remote
> >>>> debugging using gdb. kdb is an in-kernel debugger like the one
> >>>> proposed here.
> >>> Is there work underway to get kdb merged? (I'm just asking because I
> >>> don't know; I personally don't need kdb nor mdb.)
> >>
> >> KDB still exists in patches but the merge effort was given up when Linus
> >> stated that he did not want a kernel debugger. No problem to start merge
> >> attempts again AFAICT. Jay?
> >
> > To merge KDB or any other RAS tools, you need to deal with kdump. Kdump
> > hijack panic() before the die calling chain. For KDB or a RAS tool to
> > work, an infrastructure such as the "add new notifier function" by
> > Takenori Nagano should be in place.
> >
> > His last attempt fell short, in my opinion, was partly due to his
> > "[PATCH 3/3] Move crash_kexec() into panic_notifier" did not do what it
> > meant to do: to fit kexec/kdump into the new infrastructure. That is
> > not fatal; it can be fixed to make it right. If community is interested
> > in getting a kernel debugger to the kernel, we can continue Takenori's
> > work. Once the infrastructure is accepted, then merging KDB or any other
> > kernel debugger will make sense.
> >
> > Regards,
> > - jay
> >
> >
> >>
>
> As I look through entry_32.S and traps_32.c I do not see where kdump is
> hooking the notify_die handler which would intercept calls to a debugger.
>
> Where does kdump hook this path?
>

kdump uses crash_kexec() call for hooking. It hooks in panic(), die_nmi()
and die().

Thanks
Vivek

2008-08-08 02:28:41

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

> panic() is the only place where kdump gets a chance to run first and
> panic notifiers are not executed.

To be fully clear panic() that is called outside oops/exception context

s/panic/die notifiers/

>
> To me so far only in kernel debugger seems to be a reasonable candiate

Yes a kernel debugger should be able to hook into panic()

In fact it can do that already by just setting a break point,
but clearly having a real notifier is preferable.

The use case would be then that the kernel debugger would
have some command to trigger a dump.

> which needs to run before kdump after a panic event. If a debugger
> is really getting merged into the kernel, then I think debugger can

kgdb is already merged. Also the x86 notifiers are general
enough that there are a couple of debuggers floating around
that are just using existing interfaces (as in need very little in terms
of core patching)

> put a hook in the panic() before kdump. Wouldn't this solve the problem?

Yes it would, but right now there is no such hook. Also if there
was such a hook kdump could use it like everyone else.

There's a priority scheme in notifiers so you can still run usually last.

-Andi

2008-08-08 08:41:00

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

On Friday 08 August 2008 03:45, Christoph Lameter wrote:
> Nick Piggin wrote:
> > OK thanks for the info. I don't actually know debugger code as I
> > said, so I wasn't against merging mdb if it offers things that
> > kgdb fundamentally cannot.
> >
> > If so, then ensuring clean interfaces indeed would seem like a
> > good first step to getting it merged.
>
> The competing implementation is kdb not kgdb. kgdb is just a stub for
> remote debugging using gdb. kdb is an in-kernel debugger like the one
> proposed here.

Yes, so Andi said a couple of days ago ;)

2008-08-08 12:08:15

by Cliff Wickman

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

2008-08-08 12:20:20

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

> In a partitioned system [I work for SGI, so I'm talking about an Altix],
> there is memory sharing among multiple single-system images. And if
> one of those partitions were to panic the other partitions need to
> be informed that they cannot address the panic'd partition's memory.
> (Once that partition is rebooted any such access will cause an MCA
> in the accessor.)

There are already existing shutdown hooks. Aren't they good enough
for that?

I would feel uneasy about having arbitary drivers hook into panic().
While I'm sure your code is great there is unfortunately a lot
of crappy driver code around.

-Andi

2008-08-08 13:30:43

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

On Fri, Aug 08, 2008 at 04:29:16AM +0200, Andi Kleen wrote:
> > panic() is the only place where kdump gets a chance to run first and
> > panic notifiers are not executed.
>
> To be fully clear panic() that is called outside oops/exception context
>
> s/panic/die notifiers/
>
> >
> > To me so far only in kernel debugger seems to be a reasonable candiate
>
> Yes a kernel debugger should be able to hook into panic()
>
> In fact it can do that already by just setting a break point,
> but clearly having a real notifier is preferable.
>
> The use case would be then that the kernel debugger would
> have some command to trigger a dump.
>
> > which needs to run before kdump after a panic event. If a debugger
> > is really getting merged into the kernel, then I think debugger can
>
> kgdb is already merged. Also the x86 notifiers are general
> enough that there are a couple of debuggers floating around
> that are just using existing interfaces (as in need very little in terms
> of core patching)
>
> > put a hook in the panic() before kdump. Wouldn't this solve the problem?
>
> Yes it would, but right now there is no such hook. Also if there
> was such a hook kdump could use it like everyone else.
>
> There's a priority scheme in notifiers so you can still run usually last.

Hi Andi,

IIUC, there are two lists for exception and panic notifications. All the
exceptios, NMI related notifications go through "die_chain" and
all the panic notifications are done through "panic_notifier_list".

Are you suggesting that kdump should be put onto panic_notifier_list, in
such a way so that it runs last?

Just few points to ponder.

- panic_notifier_list is exported and any module can register and make use
of it. As you mentioned in your other mail, there are lot of drivers out
there with crappy code and if we do it, all the drivers get a chance
to do stuff after panic() and there is no gurantee that kdump code will
ever get a chance to run.

- Kdump is built on the philosophy that after a panic(), one should do as
as little as possible in the kernel and all the actions should be
deferred to new kernel. That's why we recommend that all the panic
notifier actions (except debugger), should be done in second kernel. It
does introduce a little delay in notification but it also makes it more
reliable.

- Neil Horman, has already provided infrastructure so that one can put
it user space code in second kernel's initrd and it will be executed.
This can be easily done for modules also.

But somehow nobody seems to be interested in doing things in second kernel
and everybody wants to run its post panic code in the first kernel. So
far, except debugger, we have not run into any strong case which needs to
run post panic code in first kernel and things will not work out if post
panic actions are taken in second kernel.

That's why there is always resistance from our side to move kdump to panic
notifier list so that we can make modules do the right thing and that
is, run in second kernel. The moment kdump is put onto panic_notifier_list,
nobody will think of doing anything in second kernel (because it takes extra
effort). Everybody will register a panic notifier handler in first kernel
and be happy..

If everybody thinks that they can do something meaningful in a crashed
kernel after panic() and that's the way to go, then so be it. In that
case we should export all the panic_notifier_list registered users to
userspace (throug /sysfs or something) and then let user change the
priority of various tools on the list based on his needs.

Adding all the infrastructure for changing priority of handlers from user
space becomes little messy and Eric had NAKed such patches. But it at least
allows a user to decide in what order he wants to run the various tools upon
panic().

But given an option, I would think that debugger should put a break point
in panic() and rest of the handlers should run in second kernel.

Thanks
Vivek

2008-08-08 13:40:45

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

>> In a partitioned system [I work for SGI, so I'm talking about an Altix],
>> there is memory sharing among multiple single-system images. And if
>> one of those partitions were to panic the other partitions need to
>> be informed that they cannot address the panic'd partition's memory.
>> (Once that partition is rebooted any such access will cause an MCA
>> in the accessor.)
>
> There are already existing shutdown hooks. Aren't they good enough
> for that?
>
> I would feel uneasy about having arbitary drivers hook into panic().
> While I'm sure your code is great there is unfortunately a lot
> of crappy driver code around.
>
> -Andi
>

I hooked panic last night and inserted a notify_die hook -- there is even
a state defined for it already -- DIE_PANIC. The rest of the code should
be ok. My only question was where to harvest the regs variable since
panic is not a real exception.

Here's a first stab. You also must add #include <linux/kdebug.h> to the
top of panic as well.

diff -Naur linux-2.6.27/kernel/panic.c linux-2.6.27-mdb/kernel/panic.c
--- linux-2.6.27/kernel/panic.c 2008-08-07 15:32:29.000000000 -0600
+++ linux-2.6.27-mdb/kernel/panic.c 2008-08-07 15:29:09.000000000 -0600
@@ -82,6 +82,12 @@
printk(KERN_EMERG "Kernel panic - not syncing: %s\n",buf);
bust_spinlocks(0);

+ // call the notify_die handler for any resident debuggers which
+ // may be active and pass the message string. On a software
+ // fault return at least some sort of regs for a remote debugger
+ // to look at.
+ notify_die(DIE_PANIC, buf, get_irq_regs(), 0, 0, 0);
+
/*
* If we have crashed and we have a crash kernel loaded let it handle
* everything else.

Jeff

2008-08-08 14:49:53

by Cliff Wickman

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

On Fri, Aug 08, 2008 at 09:29:53AM -0400, Vivek Goyal wrote:
> On Fri, Aug 08, 2008 at 04:29:16AM +0200, Andi Kleen wrote:
> > > panic() is the only place where kdump gets a chance to run first and
> > > panic notifiers are not executed.
> >
> > To be fully clear panic() that is called outside oops/exception context
> >
> > s/panic/die notifiers/
> >
> > >
> > > To me so far only in kernel debugger seems to be a reasonable candiate
> >
> > Yes a kernel debugger should be able to hook into panic()
> >
> > In fact it can do that already by just setting a break point,
> > but clearly having a real notifier is preferable.
> >
> > The use case would be then that the kernel debugger would
> > have some command to trigger a dump.
> >
> > > which needs to run before kdump after a panic event. If a debugger
> > > is really getting merged into the kernel, then I think debugger can
> >
> > kgdb is already merged. Also the x86 notifiers are general
> > enough that there are a couple of debuggers floating around
> > that are just using existing interfaces (as in need very little in terms
> > of core patching)
> >
> > > put a hook in the panic() before kdump. Wouldn't this solve the problem?
> >
> > Yes it would, but right now there is no such hook. Also if there
> > was such a hook kdump could use it like everyone else.
> >
> > There's a priority scheme in notifiers so you can still run usually last.
>
> Hi Andi,
>
> IIUC, there are two lists for exception and panic notifications. All the
> exceptios, NMI related notifications go through "die_chain" and
> all the panic notifications are done through "panic_notifier_list".
>
> Are you suggesting that kdump should be put onto panic_notifier_list, in
> such a way so that it runs last?
>
> Just few points to ponder.
>
> - panic_notifier_list is exported and any module can register and make use
> of it. As you mentioned in your other mail, there are lot of drivers out
> there with crappy code and if we do it, all the drivers get a chance
> to do stuff after panic() and there is no gurantee that kdump code will
> ever get a chance to run.
>
> - Kdump is built on the philosophy that after a panic(), one should do as
> as little as possible in the kernel and all the actions should be
> deferred to new kernel. That's why we recommend that all the panic
> notifier actions (except debugger), should be done in second kernel. It
> does introduce a little delay in notification but it also makes it more
> reliable.
>
> - Neil Horman, has already provided infrastructure so that one can put
> it user space code in second kernel's initrd and it will be executed.
> This can be easily done for modules also.
>
> But somehow nobody seems to be interested in doing things in second kernel
> and everybody wants to run its post panic code in the first kernel. So
> far, except debugger, we have not run into any strong case which needs to
> run post panic code in first kernel and things will not work out if post
> panic actions are taken in second kernel.

In the case of the cross-partition driver, running panic notification in the
second kernel is an interesting idea.

I discussed it with Robin Holt, who is more knowledgable than I on the
details of that driver, and he told me that there is a great deal of
state information needed for the notification. It's easy to do in the
first kernel, but extremely difficult in a second kernel.

Couldn't we have some tunable flexability in that area, to determine
should run on a panic, and in what order?

2008-08-08 15:05:54

by Cliff Wickman

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

On Fri, Aug 08, 2008 at 02:20:52PM +0200, Andi Kleen wrote:
> > In a partitioned system [I work for SGI, so I'm talking about an Altix],
> > there is memory sharing among multiple single-system images. And if
> > one of those partitions were to panic the other partitions need to
> > be informed that they cannot address the panic'd partition's memory.
> > (Once that partition is rebooted any such access will cause an MCA
> > in the accessor.)
>
> There are already existing shutdown hooks. Aren't they good enough
> for that?

For shutdown, yes. But on a panic crash_kexec() gets called
before the panic_notifier_list is run.

> I would feel uneasy about having arbitary drivers hook into panic().
> While I'm sure your code is great there is unfortunately a lot
> of crappy driver code around.

That is Eric Biederman's concern as well. But it seems we should
have a way for a user/customer to customize those events and their order,
as I noted in a previous post.

--
Cliff Wickman
Silicon Graphics, Inc.
[email protected]
(651) 683-3824

2008-08-08 16:58:30

by Jay Lan

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

Cliff Wickman wrote:
> On Fri, Aug 08, 2008 at 09:29:53AM -0400, Vivek Goyal wrote:
>> On Fri, Aug 08, 2008 at 04:29:16AM +0200, Andi Kleen wrote:
>>>> panic() is the only place where kdump gets a chance to run first and
>>>> panic notifiers are not executed.
>>> To be fully clear panic() that is called outside oops/exception context
>>>
>>> s/panic/die notifiers/
>>>
>>>> To me so far only in kernel debugger seems to be a reasonable candiate
>>> Yes a kernel debugger should be able to hook into panic()
>>>
>>> In fact it can do that already by just setting a break point,
>>> but clearly having a real notifier is preferable.
>>>
>>> The use case would be then that the kernel debugger would
>>> have some command to trigger a dump.
>>>
>>>> which needs to run before kdump after a panic event. If a debugger
>>>> is really getting merged into the kernel, then I think debugger can
>>> kgdb is already merged. Also the x86 notifiers are general
>>> enough that there are a couple of debuggers floating around
>>> that are just using existing interfaces (as in need very little in terms
>>> of core patching)
>>>
>>>> put a hook in the panic() before kdump. Wouldn't this solve the problem?
>>> Yes it would, but right now there is no such hook. Also if there
>>> was such a hook kdump could use it like everyone else.
>>>
>>> There's a priority scheme in notifiers so you can still run usually last.
>> Hi Andi,
>>
>> IIUC, there are two lists for exception and panic notifications. All the
>> exceptios, NMI related notifications go through "die_chain" and
>> all the panic notifications are done through "panic_notifier_list".
>>
>> Are you suggesting that kdump should be put onto panic_notifier_list, in
>> such a way so that it runs last?
>>
>> Just few points to ponder.
>>
>> - panic_notifier_list is exported and any module can register and make use
>> of it. As you mentioned in your other mail, there are lot of drivers out
>> there with crappy code and if we do it, all the drivers get a chance
>> to do stuff after panic() and there is no gurantee that kdump code will
>> ever get a chance to run.
>>
>> - Kdump is built on the philosophy that after a panic(), one should do as
>> as little as possible in the kernel and all the actions should be
>> deferred to new kernel. That's why we recommend that all the panic
>> notifier actions (except debugger), should be done in second kernel. It
>> does introduce a little delay in notification but it also makes it more
>> reliable.
>>
>> - Neil Horman, has already provided infrastructure so that one can put
>> it user space code in second kernel's initrd and it will be executed.
>> This can be easily done for modules also.
>>
>> But somehow nobody seems to be interested in doing things in second kernel
>> and everybody wants to run its post panic code in the first kernel. So
>> far, except debugger, we have not run into any strong case which needs to
>> run post panic code in first kernel and things will not work out if post
>> panic actions are taken in second kernel.
>
> In the case of the cross-partition driver, running panic notification in the
> second kernel is an interesting idea.
>
> I discussed it with Robin Holt, who is more knowledgable than I on the
> details of that driver, and he told me that there is a great deal of
> state information needed for the notification. It's easy to do in the
> first kernel, but extremely difficult in a second kernel.
>
> Couldn't we have some tunable flexability in that area, to determine
> should run on a panic, and in what order?

KDB registers to the panic_notifier_list, but since crash_kexec()
takes control early in panic(), the panic_notifier_list is essentially
dead if kdump is chkconfig'ed on.

I think a kernel debugger is not complete if it does not have an option
to create a kernel dump. Unfortunately we have to tell KDB users to
not chkconfig on kdump.

I am working on KDB to allow KDB to co-exist with kdump. But it is
done through a hack to place KDB ahead of crash-kexec(). It would be
preferred to have a formal notifier_list.

Regards,
- jay

>
>

2008-08-08 18:02:27

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

> Are you suggesting that kdump should be put onto panic_notifier_list, in
> such a way so that it runs last?

The point was that kernel debuggers have an at least as legitimate
need as kdump to run early on panic as kdump. In particularly they
should run before kdump because kdump can be triggered from
the debugger.

But for modular kernel debuggers the hook would need to be exported,
so in theory everyone could use it. In theory code review should
catch that. Another alternative would be to readd the old namespaces
patches I posted some time ago, this allowed to export symbols only
to specific modules (but that would be also unfortunate for out of tree
debuggers)

Since we have nearly all other needed hooks for kernel debuggers
anyways it doesn't really make sense to stop at panic. So this
earlier requirements should be relaxed.

Perhaps code review can solve the problem?

-Andi

2008-08-11 10:36:46

by Jidong Xiao

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

On Mon, Aug 4, 2008 at 1:22 AM, <[email protected]> wrote:
>
>
> This is a linux port of the kernel debugger I wrote in 2000 for the
> MANOS/Gadugi Operating System. I created this particular
> port in June of this year from the MANOS/Gadugi source code I released
> under the GNU public license in 2000.
>
> I wrote the SMP debugger use in SMP Netware in 1994 and 1995, and that was
> later rolled into the main Netware kernel, though a lot of folks
> contributed helped merge it into Netware. This debugger closely resembles
> the legacy Netware kernel debugger, and I find it easier to use than kdb
> with less
> crashes and problems.
>
> This version is ia32 only at present, but I am completing x86_64 support
> and will post it as it is completed. I basically wrote this tool for
> my own internal use and for my projects since I could not find a debugger
> in linux I was used to. I add support to it as I need it for my own
> internal use.
>
> This linux port of my kernel debugger does not require kdb or the kdb hooks
> and is more minimal than kdb and has some features kdb does not, such as
> Intel style disassembly with dereferencing of data during disassembly
> and a very robust mathematical numeric support with conditional breakpoints.
>
> I created a far more robust version of this debugger in 2001 which
> included source level support, integrated screen and keyboard support,
> remote networking capability, and loader support and licensed it to another
> company. I was placed under a 5 year non-compete not to port
> this tool to Linux until end of year 2007. The folks who licensed it
> did absolutely nothing with it of consequence, and 2007 has come and gone,
> so I am released from the non-compete and decided to port the debugger
> from my old Open Source operating system and I figured it might be as useful
> to others as it has been for my projects.
>
> I will be posting user space modules which can be loaded with this version
> at some point which will enable source level debugging and a bunch of
> other features. This add ons may get farmed out to another company for
> support.
>
> KNOWN ISSUES
> ------------
>
> This debugger works very well on SMP systems, but some of the directed
> NMI features are intended for post mortem crashed systems.
>
> All numbers entered are interpreted as hex. If you want a number
> interpreted from the debugger console in decimal, add an 'r' to the end of
> the number for "real number".
>
> There is awesome help support in the debugger. type "help" or "help
> <command name>".
>
> I have support for post-mortem directed NMI breakpoints from the debugger
> console if you want to halt another processor which can hang on some
> systems, but usually does not.
>
> I do not have ia64 support and since Intel in 2000 refused to share ia64
> information with me, I never completed the ia64 section of the debugger.
> If and when I obtain an ia64 system, I will look into putting it in or
> try to adapt the disassembler out of kdb which has ia64 and x86_64 support.
>
>
>
> vanilla kernels
>
> ftp://ftp.wolfmountaingroup.org/pub/mdb/mdb-2.6.18-ia32-08-02-08.patch
> ftp://ftp.wolfmountaingroup.org/pub/mdb/mdb-2.6.19-ia32-08-02-08.patch
> ftp://ftp.wolfmountaingroup.org/pub/mdb/mdb-2.6.20-ia32-08-02-08.patch
> ftp://ftp.wolfmountaingroup.org/pub/mdb/mdb-2.6.21-ia32-08-02-08.patch
> ftp://ftp.wolfmountaingroup.org/pub/mdb/mdb-2.6.22-ia32-08-02-08.patch
> ftp://ftp.wolfmountaingroup.org/pub/mdb/mdb-2.6.23-ia32-08-02-08.patch
> ftp://ftp.wolfmountaingroup.org/pub/mdb/mdb-2.6.24-ia32-08-02-08.patch
> ftp://ftp.wolfmountaingroup.org/pub/mdb/mdb-2.6.25-ia32-08-02-08.patch
> ftp://ftp.wolfmountaingroup.org/pub/mdb/mdb-2.6.26-ia32-08-02-08.patch
>
>
> Linux distributions
>
> ftp://ftp.wolfmountaingroup.org/pub/mdb/distro/mdb-2.6.16-ia32-SLES-10-08-02-08.patch
> ftp://ftp.wolfmountaingroup.org/pub/mdb/distro/mdb-2.6.18-el5-08-02-08.patch
> ftp://ftp.wolfmountaingroup.org/pub/mdb/distro/mdb-2.6.22.5-31-ia32-suse10.3-08-01-08.patch
> ftp://ftp.wolfmountaingroup.org/pub/mdb/distro/mdb-2.6.23-ia32-fc8-08-02-08.patch
> ftp://ftp.wolfmountaingroup.org/pub/mdb/distro/mdb-2.6.25.14-ia32-fc9-08-02-08.patch
>
> Any additional code fixes I would appreciate being placed back into the
> tree, so please sent them back to me if you find the debugger useful.
>
> I commend Keith Owens for his work on kdb and his project is more than
> welcome to share and adapt the code from mdb if it is useful for their
> project. I am certainly looking seriously at using the disassembler in
> kdb for mdb, with my enhancements rolled into it.
>
> Jeffrey Vernon Merkey
>
>
Well I think given the fact that kdb is not accepted by Linus, there
is little possibility that mdb will be included in mainline kernel.
Though I don't know why kgdb is acceptable.

Regards
Jason Xiao

2008-08-11 12:58:17

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

On Fri, Aug 08, 2008 at 09:50:00AM -0500, Cliff Wickman wrote:
> On Fri, Aug 08, 2008 at 09:29:53AM -0400, Vivek Goyal wrote:
> > On Fri, Aug 08, 2008 at 04:29:16AM +0200, Andi Kleen wrote:
> > > > panic() is the only place where kdump gets a chance to run first and
> > > > panic notifiers are not executed.
> > >
> > > To be fully clear panic() that is called outside oops/exception context
> > >
> > > s/panic/die notifiers/
> > >
> > > >
> > > > To me so far only in kernel debugger seems to be a reasonable candiate
> > >
> > > Yes a kernel debugger should be able to hook into panic()
> > >
> > > In fact it can do that already by just setting a break point,
> > > but clearly having a real notifier is preferable.
> > >
> > > The use case would be then that the kernel debugger would
> > > have some command to trigger a dump.
> > >
> > > > which needs to run before kdump after a panic event. If a debugger
> > > > is really getting merged into the kernel, then I think debugger can
> > >
> > > kgdb is already merged. Also the x86 notifiers are general
> > > enough that there are a couple of debuggers floating around
> > > that are just using existing interfaces (as in need very little in terms
> > > of core patching)
> > >
> > > > put a hook in the panic() before kdump. Wouldn't this solve the problem?
> > >
> > > Yes it would, but right now there is no such hook. Also if there
> > > was such a hook kdump could use it like everyone else.
> > >
> > > There's a priority scheme in notifiers so you can still run usually last.
> >
> > Hi Andi,
> >
> > IIUC, there are two lists for exception and panic notifications. All the
> > exceptios, NMI related notifications go through "die_chain" and
> > all the panic notifications are done through "panic_notifier_list".
> >
> > Are you suggesting that kdump should be put onto panic_notifier_list, in
> > such a way so that it runs last?
> >
> > Just few points to ponder.
> >
> > - panic_notifier_list is exported and any module can register and make use
> > of it. As you mentioned in your other mail, there are lot of drivers out
> > there with crappy code and if we do it, all the drivers get a chance
> > to do stuff after panic() and there is no gurantee that kdump code will
> > ever get a chance to run.
> >
> > - Kdump is built on the philosophy that after a panic(), one should do as
> > as little as possible in the kernel and all the actions should be
> > deferred to new kernel. That's why we recommend that all the panic
> > notifier actions (except debugger), should be done in second kernel. It
> > does introduce a little delay in notification but it also makes it more
> > reliable.
> >
> > - Neil Horman, has already provided infrastructure so that one can put
> > it user space code in second kernel's initrd and it will be executed.
> > This can be easily done for modules also.
> >
> > But somehow nobody seems to be interested in doing things in second kernel
> > and everybody wants to run its post panic code in the first kernel. So
> > far, except debugger, we have not run into any strong case which needs to
> > run post panic code in first kernel and things will not work out if post
> > panic actions are taken in second kernel.
>
> In the case of the cross-partition driver, running panic notification in the
> second kernel is an interesting idea.
>
> I discussed it with Robin Holt, who is more knowledgable than I on the
> details of that driver, and he told me that there is a great deal of
> state information needed for the notification. It's easy to do in the
> first kernel, but extremely difficult in a second kernel.
>

Generally what kind of state information has to be passed?

> Couldn't we have some tunable flexability in that area, to determine
> should run on a panic, and in what order?

May be that's the way forward. Export the list of registered handlers on
panic_notifier_list through sysfs or debugfs and also provide flexibility
that user can change the priorities from userspace. That should work
for all.

Thanks
Vivek

2008-08-11 13:03:57

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

On Fri, Aug 08, 2008 at 08:03:03PM +0200, Andi Kleen wrote:
> > Are you suggesting that kdump should be put onto panic_notifier_list, in
> > such a way so that it runs last?
>
> The point was that kernel debuggers have an at least as legitimate
> need as kdump to run early on panic as kdump. In particularly they
> should run before kdump because kdump can be triggered from
> the debugger.
>

Agreed.

> But for modular kernel debuggers the hook would need to be exported,
> so in theory everyone could use it. In theory code review should
> catch that. Another alternative would be to readd the old namespaces
> patches I posted some time ago, this allowed to export symbols only
> to specific modules (but that would be also unfortunate for out of tree
> debuggers)
>

Or an easier way is that debuggers can put a breakpoint on panic().

> Since we have nearly all other needed hooks for kernel debuggers
> anyways it doesn't really make sense to stop at panic. So this
> earlier requirements should be relaxed.
>

I think given that so many people want kdump on panic_notifier_list,
it would be worthwhile to experiment with the different approach.

- Move kdump to panic_notifier_list.
- Export panic_notifier_list to user space and provide flexibility
so that a user can change the priorities of registered handlers
dynamically.

This will allow an admin to explicitly see who all are goint to run
in what order in case of panic and also give him capability that he
can choose to change the order.

This kind of list should keep all the kind of users happy. Those who
want to run all the other modules before kdump, they will be able to
do so and those who don't want, they can boost the priority of kdump
to put it ahead in the list.

I think Takenori had some working patches in the past for this. Probably
time to revisit the patches. (Somebody willing to look into it?).

Thanks
Vivek

2008-08-11 13:34:21

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

I found a problem with APIC NMI support which seems to affect all the
debuggers, but appears machine specific -- at least I can reproduce it
with all of the modules MDB, KDB, and KGDB modules on my ACER 2410 dual
core laptop. It explains the mysterious hangs I would see in KDB all the
time on SMP systems.

The call:

send_IPI_allbutself(vector)

will hard hang an on ACER laptop with dual core processors if issued while
any one of the processors are actively inside an INT 1 handler, then take
a SECOND NMI inside of this path, and nest. It hangs the requesting
(focus) processor during nested interrupts if a target processor is A)
inside an INT 1 exception B) takes an NMI interrupt C) returns from the
NMI back into the INT1 D) receives a second NMI.

I am aware that a second NMI will not propagate to a processor currently
servicing an NMI until the processor sees an IRET instruction (at least
this is how intel worked years back).

I have not been able to reproduce it on the Xeon based motherboards. I
have seen the APIC bus hang this way on my other OS project -- when the
APIC was programmed incorrectly, and assume it must be a bug in the APIC,
how the APIC is programmed by Linux, etc.

I am coding around the problem to prevent such convoluted nesting levels
in MDB (this was from testing) but this was the final test for enabling
SSB and all the fixes before I post and rc3 patch series which really
cleanup up the code, and there's a mystery with send_IPI_allbutself().

Jeff

2008-08-11 13:49:18

[permalink] [raw]

Subject: Re: [ANNOUNCE] Merkey's Kernel Debugger

On Mon, Aug 11, 2008 at 07:11:42AM -0600, [email protected] wrote:
> I found a problem with APIC NMI support which seems to affect all the
> debuggers, but appears machine specific -- at least I can reproduce it
> with all of the modules MDB, KDB, and KGDB modules on my ACER 2410 dual

A couple of laptop BIOS (e.g. some thinkpads) are unfortunately
not NMI safe. There is no known workaround other than not using NMIs
on these systems.

There's unfortunately no global blacklist for these systems, although
having would be useful for a couple of subsystems.

-Andi

2008-08-11 16:39:12