2005-09-21 15:21:37

by Dave Anderson

[permalink] [raw]
Subject: Re: [Fastboot] [PATCH] Kdump(x86): add note type NT_KDUMPINFO tokernel core dumps

"Eric W. Biederman" wrote:

> Vivek Goyal <[email protected]> writes:
>
> > o This patch adds a new note type NT_KDUMPINFO. This note is added with
> > kernel core dumps generated by kdump.
> >
> > o This note mainly communicates the information required by kernel dump
> > analysis tool "crash" to analyze the kernel core dump. As of now it contains
> > the pointer to task struct of panicing task and page size. Page size is
> > irrelevant for i386 but is required for architectures like ia64 and ppc64.
> >

It's not absolutely required for crash -- but more of a convenience. As it is, the
set of NT_PRSTATUS sections can be scoured for the set of active
tasks, and then the process stacks (and potentially the IRQ stacks, then
mapped back to the proper process stack) can be searched for crash_kexec.
So crash works without the task_struct pointer, but it's ugly. It's just that
netdump, diskdump and LKCD all report the panic task_struct pointer,
and it seems a reasonable thing to do.

>
> > o gdb is not affected by this change as gdb need not to parse this note.
>
> A couple of things.
> - The name of your note is terribly generic, so it seems a poor choice.
>
> - Why do we need to capture the page size at the time of the
> crash? Isn't the page size a compile time option? Won't
> sys_getpagesize() get you this information before the crash?
> Why do we need the kernels page size at all?
>

The page size is needed for a whole host of things, primarily for
virtual-to-physical address translations, a bunch of VM-related
commands, etc. For non-x86[_64] systems. the host system may
be using a different page size than the vmcore being examining.
That's probably a rare situation, and using getpagesize() on the
host would work in most cases, or perhaps issuing the page size
as a command line option, but again, it's another convenience item
that gets reported by netdump, diskdump and LKCD.

>
> - Why do you avoid storing the current task on the other cpus?
>

They can be found in the per-cpu runqueues data structure.

>
> - Can't we derive the current task from the existing register information
> already captured. If not would a little extra debug information
> captured at compile time be better?
>
> - You don't address the issue of architectural control registers.
> So you are going to need another note at some point. (Not
> necessarily a bad thing).
>

I guess that's their whole point. This NT_KDUMPINFO or whatever
you want to call it, could cumulatively collect any other data deemed
necessary that falls outside the bounds of the existing note types.

Dave Anderson

>
> > +/*
> > + * NT_KDUMPINFO can be used to communicate additional information required for
> > + * kernel core dumps. Additional information includes kernel configuration
> > + * variables like page size and any other relevant data required by
> > + * debugger (crash in this case).
> > +*/
> > +struct elf_kdumpinfo
> > +{
> > + char page_shift; /* Page size */
> > + struct task_struct *panic_tsk; /* Pointer to panic task_struct */
> > +};
> > +
>
> > #define NT_TASKSTRUCT 4
> > #define NT_AUXV 6
> > #define NT_PRXFPREG 0x46e62b7f /* copied from gdb5.1/include/elf/common.h */
> > -
> > +#define NT_KDUMPINFO 7 /* Used for kernel core dumps */
> >


2005-09-22 09:03:04

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Fastboot] [PATCH] Kdump(x86): add note type NT_KDUMPINFO tokernel core dumps

Dave Anderson <[email protected]> writes:

> "Eric W. Biederman" wrote:
>
>> Vivek Goyal <[email protected]> writes:
>>
>> > o This patch adds a new note type NT_KDUMPINFO. This note is added with
>> > kernel core dumps generated by kdump.
>> >
>> > o This note mainly communicates the information required by kernel dump
>> > analysis tool "crash" to analyze the kernel core dump. As of now it contains
>> > the pointer to task struct of panicing task and page size. Page size is
>> > irrelevant for i386 but is required for architectures like ia64 and ppc64.
>> >
>
> It's not absolutely required for crash -- but more of a convenience. As it is,
> the
> set of NT_PRSTATUS sections can be scoured for the set of active
> tasks, and then the process stacks (and potentially the IRQ stacks, then
> mapped back to the proper process stack) can be searched for crash_kexec.
> So crash works without the task_struct pointer, but it's ugly. It's just that
> netdump, diskdump and LKCD all report the panic task_struct pointer,
> and it seems a reasonable thing to do.

Ok. The point here is to know which task/cpu called panic, rather
than to get the task_struct. That makes a lot of sense, and is
cheap to get. Any note on the crashing cpu that is not captured
by another cpu will give us that information.

My primary concern is while the concept of a task_struct is pretty
stable who is to know how the kernel will change in the future. So
if we don't need to export a task_struct pointer and merely need
to flag the cpu that called panic we can do that much more reliably.

A practical question is how is this solved with a normal multi-threaded
core dumps?

It is not the best thing but order of the notes does and will matter,
it is likely worth Documenting so people don't change it that
PR_STATUS comes first so we can reliably find the first note
for a cpu, after everything has been serialized.

>> > o gdb is not affected by this change as gdb need not to parse this note.
>>
>> A couple of things.
>> - The name of your note is terribly generic, so it seems a poor choice.
>>
>> - Why do we need to capture the page size at the time of the
>> crash? Isn't the page size a compile time option? Won't
>> sys_getpagesize() get you this information before the crash?
>> Why do we need the kernels page size at all?
>>
>
> The page size is needed for a whole host of things, primarily for
> virtual-to-physical address translations, a bunch of VM-related
> commands, etc. For non-x86[_64] systems. the host system may
> be using a different page size than the vmcore being examining.
> That's probably a rare situation, and using getpagesize() on the
> host would work in most cases, or perhaps issuing the page size
> as a command line option, but again, it's another convenience item
> that gets reported by netdump, diskdump and LKCD.

What I was mostly saying is that there are other places this can
be captured. If this is actually a compile time option on all
architectures we should simply add more debug information to vmlinux.
If this varies at runtime we have the option of capturing it from the
kernel before it crashes. We don't need to run code to capture
it while the kernel is dying.

Having just skimmed the include/asm-* it looks like this is indeed
a compile time option so the page size needs to get added to the
information that is easy to get from the vmlinux binary.

We do need a way that we can test to see if a core dump
actually matches the vmlinux we are looking at. Probably
this is capturing all of the information captured by linux/version.h
and linux/compile.h both at runtime and at compile time and
checking to see if they match.

>> - Why do you avoid storing the current task on the other cpus?
>>
>
> They can be found in the per-cpu runqueues data structure.

Thanks. That confirms my impression we are just looking for a flag
that says we are the current cpu.

>> - Can't we derive the current task from the existing register information
>> already captured. If not would a little extra debug information
>> captured at compile time be better?
>>
>> - You don't address the issue of architectural control registers.
>> So you are going to need another note at some point. (Not
>> necessarily a bad thing).
>>
>
> I guess that's their whole point. This NT_KDUMPINFO or whatever
> you want to call it, could cumulatively collect any other data deemed
> necessary that falls outside the bounds of the existing note types.

It is clear we need one or mode notes. Largely once we define what
is in a note we cannot change it. The most we can do is to append
more fields to an existing node type, because we have size information
we can check on. This is because the definition of a note is a kernel
API issue.

Eric

2005-09-22 14:08:39

by Vivek Goyal

[permalink] [raw]
Subject: Re: [Fastboot] [PATCH] Kdump(x86): add note type NT_KDUMPINFO tokernel core dumps

On Thu, Sep 22, 2005 at 03:01:10AM -0600, Eric W. Biederman wrote:
> Dave Anderson <[email protected]> writes:
>
> > "Eric W. Biederman" wrote:
> >
> >> Vivek Goyal <[email protected]> writes:
> >>
> >> > o This patch adds a new note type NT_KDUMPINFO. This note is added with
> >> > kernel core dumps generated by kdump.
> >> >
> >> > o This note mainly communicates the information required by kernel dump
> >> > analysis tool "crash" to analyze the kernel core dump. As of now it contains
> >> > the pointer to task struct of panicing task and page size. Page size is
> >> > irrelevant for i386 but is required for architectures like ia64 and ppc64.
> >> >
> >
> > It's not absolutely required for crash -- but more of a convenience. As it is,
> > the
> > set of NT_PRSTATUS sections can be scoured for the set of active
> > tasks, and then the process stacks (and potentially the IRQ stacks, then
> > mapped back to the proper process stack) can be searched for crash_kexec.
> > So crash works without the task_struct pointer, but it's ugly. It's just that
> > netdump, diskdump and LKCD all report the panic task_struct pointer,
> > and it seems a reasonable thing to do.
>
> Ok. The point here is to know which task/cpu called panic, rather
> than to get the task_struct. That makes a lot of sense, and is
> cheap to get. Any note on the crashing cpu that is not captured
> by another cpu will give us that information.
>

That makes sense. Sheer presence of a particular note can identify the
crashing cpu and no need to store "current". Only crashing cpu is going
to store that note and that too after respective NT_PRSTATUS.

> My primary concern is while the concept of a task_struct is pretty
> stable who is to know how the kernel will change in the future. So
> if we don't need to export a task_struct pointer and merely need
> to flag the cpu that called panic we can do that much more reliably.
>
> A practical question is how is this solved with a normal multi-threaded
> core dumps?
>
> It is not the best thing but order of the notes does and will matter,
> it is likely worth Documenting so people don't change it that
> PR_STATUS comes first so we can reliably find the first note
> for a cpu, after everything has been serialized.
>
> >> > o gdb is not affected by this change as gdb need not to parse this note.
> >>
> >> A couple of things.
> >> - The name of your note is terribly generic, so it seems a poor choice.
> >>
> >> - Why do we need to capture the page size at the time of the
> >> crash? Isn't the page size a compile time option? Won't
> >> sys_getpagesize() get you this information before the crash?
> >> Why do we need the kernels page size at all?
> >>
> >
> > The page size is needed for a whole host of things, primarily for
> > virtual-to-physical address translations, a bunch of VM-related
> > commands, etc. For non-x86[_64] systems. the host system may
> > be using a different page size than the vmcore being examining.
> > That's probably a rare situation, and using getpagesize() on the
> > host would work in most cases, or perhaps issuing the page size
> > as a command line option, but again, it's another convenience item
> > that gets reported by netdump, diskdump and LKCD.
>
> What I was mostly saying is that there are other places this can
> be captured. If this is actually a compile time option on all
> architectures we should simply add more debug information to vmlinux.
> If this varies at runtime we have the option of capturing it from the
> kernel before it crashes. We don't need to run code to capture
> it while the kernel is dying.
>
> Having just skimmed the include/asm-* it looks like this is indeed
> a compile time option so the page size needs to get added to the
> information that is easy to get from the vmlinux binary.
>
> We do need a way that we can test to see if a core dump
> actually matches the vmlinux we are looking at. Probably
> this is capturing all of the information captured by linux/version.h
> and linux/compile.h both at runtime and at compile time and
> checking to see if they match.
>

I quickly browsed through "crash" code and looks like it is already doing
similiar check (kernel.c, verify_version()). It seems to be retrieving
"linux_banner" from core image and also retrieving banner string from vmlinux
and trying to match. So if banner information can be directly read from the
core image, probably there is no need to export it through notes.

Thanks
Vivek

2005-09-22 16:40:34

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Fastboot] [PATCH] Kdump(x86): add note type NT_KDUMPINFO tokernel core dumps

Vivek Goyal <[email protected]> writes:

> I quickly browsed through "crash" code and looks like it is already doing
> similiar check (kernel.c, verify_version()). It seems to be retrieving
> "linux_banner" from core image and also retrieving banner string from vmlinux
> and trying to match. So if banner information can be directly read from the
> core image, probably there is no need to export it through notes.

Sounds good. We still need to define a note for the cpu control
registers. Do any of the other crash dump solution capture that
information right now?

Eric