2005-09-21 06:56:53

by Vivek Goyal

[permalink] [raw]
Subject: [PATCH] Kdump(x86): add note type NT_KDUMPINFO to kernel core dumps



o This patch adds a new note type NT_KDUMPINFO. This note is added with
kernel core dumps generated by kdump.

o This note mainly communicates the information required by kernel dump
analysis tool "crash" to analyze the kernel core dump. As of now it contains
the pointer to task struct of panicing task and page size. Page size is
irrelevant for i386 but is required for architectures like ia64 and ppc64.

o gdb is not affected by this change as gdb need not to parse this note.

Signed-off-by: Vivek Goyal <[email protected]>
---

linux-2.6.14-rc1-mm1-vivek/arch/i386/kernel/crash.c | 9 +++++++++
linux-2.6.14-rc1-mm1-vivek/include/linux/elf.h | 2 +-
linux-2.6.14-rc1-mm1-vivek/include/linux/elfcore.h | 13 +++++++++++++
3 files changed, 23 insertions(+), 1 deletion(-)

diff -puN arch/i386/kernel/crash.c~kdump-x86-add-note-type-nt-kdump arch/i386/kernel/crash.c
--- linux-2.6.14-rc1-mm1/arch/i386/kernel/crash.c~kdump-x86-add-note-type-nt-kdump 2005-09-21 12:22:14.113474136 +0530
+++ linux-2.6.14-rc1-mm1-vivek/arch/i386/kernel/crash.c 2005-09-21 12:22:14.128471856 +0530
@@ -62,6 +62,7 @@ static void final_note(u32 *buf)
static void crash_save_this_cpu(struct pt_regs *regs, int cpu)
{
struct elf_prstatus prstatus;
+ struct elf_kdumpinfo kdumpinfo;
u32 *buf;

if ((cpu < 0) || (cpu >= NR_CPUS))
@@ -80,6 +81,14 @@ static void crash_save_this_cpu(struct p
elf_core_copy_regs(&prstatus.pr_reg, regs);
buf = append_elf_note(buf, "CORE", NT_PRSTATUS, &prstatus,
sizeof(prstatus));
+
+ /* Also store NT_KDUMPINFO */
+ if (cpu == crashing_cpu) {
+ kdumpinfo.page_shift = PAGE_SHIFT;
+ kdumpinfo.panic_tsk = current;
+ buf = append_elf_note(buf, "CORE", NT_KDUMPINFO, &kdumpinfo,
+ sizeof(kdumpinfo));
+ }
final_note(buf);
}

diff -puN include/linux/elfcore.h~kdump-x86-add-note-type-nt-kdump include/linux/elfcore.h
--- linux-2.6.14-rc1-mm1/include/linux/elfcore.h~kdump-x86-add-note-type-nt-kdump 2005-09-21 12:22:14.116473680 +0530
+++ linux-2.6.14-rc1-mm1-vivek/include/linux/elfcore.h 2005-09-21 12:22:14.129471704 +0530
@@ -79,9 +79,22 @@ struct elf_prpsinfo
char pr_psargs[ELF_PRARGSZ]; /* initial part of arg list */
};

+/*
+ * NT_KDUMPINFO can be used to communicate additional information required for
+ * kernel core dumps. Additional information includes kernel configuration
+ * variables like page size and any other relevant data required by
+ * debugger (crash in this case).
+*/
+struct elf_kdumpinfo
+{
+ char page_shift; /* Page size */
+ struct task_struct *panic_tsk; /* Pointer to panic task_struct */
+};
+
#ifndef __KERNEL__
typedef struct elf_prstatus prstatus_t;
typedef struct elf_prpsinfo prpsinfo_t;
+typedef struct elf_kdumpinfo kdumpinfo_t;
#define PRARGSZ ELF_PRARGSZ
#endif

diff -puN include/linux/elf.h~kdump-x86-add-note-type-nt-kdump include/linux/elf.h
--- linux-2.6.14-rc1-mm1/include/linux/elf.h~kdump-x86-add-note-type-nt-kdump 2005-09-21 12:22:14.120473072 +0530
+++ linux-2.6.14-rc1-mm1-vivek/include/linux/elf.h 2005-09-21 12:22:14.130471552 +0530
@@ -391,7 +391,7 @@ typedef struct elf64_shdr {
#define NT_TASKSTRUCT 4
#define NT_AUXV 6
#define NT_PRXFPREG 0x46e62b7f /* copied from gdb5.1/include/elf/common.h */
-
+#define NT_KDUMPINFO 7 /* Used for kernel core dumps */

/* Note header in a PT_NOTE section */
typedef struct elf32_note {
_


2005-09-21 14:29:55

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Fastboot] [PATCH] Kdump(x86): add note type NT_KDUMPINFO to kernel core dumps

Vivek Goyal <[email protected]> writes:

> o This patch adds a new note type NT_KDUMPINFO. This note is added with
> kernel core dumps generated by kdump.
>
> o This note mainly communicates the information required by kernel dump
> analysis tool "crash" to analyze the kernel core dump. As of now it contains
> the pointer to task struct of panicing task and page size. Page size is
> irrelevant for i386 but is required for architectures like ia64 and ppc64.
>
> o gdb is not affected by this change as gdb need not to parse this note.

A couple of things.
- The name of your note is terribly generic, so it seems a poor choice.

- Why do we need to capture the page size at the time of the
crash? Isn't the page size a compile time option? Won't
sys_getpagesize() get you this information before the crash?
Why do we need the kernels page size at all?

- Why do you avoid storing the current task on the other cpus?

- Can't we derive the current task from the existing register information
already captured. If not would a little extra debug information
captured at compile time be better?

- You don't address the issue of architectural control registers.
So you are going to need another note at some point. (Not
necessarily a bad thing).


> +/*
> + * NT_KDUMPINFO can be used to communicate additional information required for
> + * kernel core dumps. Additional information includes kernel configuration
> + * variables like page size and any other relevant data required by
> + * debugger (crash in this case).
> +*/
> +struct elf_kdumpinfo
> +{
> + char page_shift; /* Page size */
> + struct task_struct *panic_tsk; /* Pointer to panic task_struct */
> +};
> +

> #define NT_TASKSTRUCT 4
> #define NT_AUXV 6
> #define NT_PRXFPREG 0x46e62b7f /* copied from gdb5.1/include/elf/common.h */
> -
> +#define NT_KDUMPINFO 7 /* Used for kernel core dumps */
>

2005-09-22 07:39:36

by Vivek Goyal

[permalink] [raw]
Subject: Re: [Fastboot] [PATCH] Kdump(x86): add note type NT_KDUMPINFO to kernel core dumps

On Wed, Sep 21, 2005 at 08:28:32AM -0600, Eric W. Biederman wrote:
> Vivek Goyal <[email protected]> writes:
>
> > o This patch adds a new note type NT_KDUMPINFO. This note is added with
> > kernel core dumps generated by kdump.
> >
> > o This note mainly communicates the information required by kernel dump
> > analysis tool "crash" to analyze the kernel core dump. As of now it contains
> > the pointer to task struct of panicing task and page size. Page size is
> > irrelevant for i386 but is required for architectures like ia64 and ppc64.
> >
> > o gdb is not affected by this change as gdb need not to parse this note.
>
> A couple of things.
> - The name of your note is terribly generic, so it seems a poor choice.
>

Yes it seems generic. And I think the reason being that I did not have a
significant number of things to report hence could not create logical groups
and giving very specific names to those groups. As of today, based on the
requirements, could think of only two items (page size, current). There
is still scope for few more things to appear. Hence gave a generic name and
thought more items can be grouped in this note itself until and unless group
becomes really big and need to break.

Do you have any suggestion to make this name better?


> - Why do we need to capture the page size at the time of the
> crash? Isn't the page size a compile time option? Won't
> sys_getpagesize() get you this information before the crash?
> Why do we need the kernels page size at all?
>

Dave has already mentioned about the need of page size. I agree that page
size is a compile time information. Are you hinting at creating a separate
note for reporting page size in user space while loading the capture kernel
and store it in reserved region alongside other elf headers. Not sure, if
it is good to create a separate note type just to report one configuration
variable.

I also thought of enabling "Kernel .config support" (CONFIG_IKCONFIG=y) and
may be "crash" could write a script similar to scripts/extract-ikconfig to
retrieve any configuration variable required from vmlinux itself. But this
option is not enabled by default and will increase kernel image size and
seems too much to do for a single configuration variable.

> - Why do you avoid storing the current task on the other cpus?
>
> - Can't we derive the current task from the existing register information
> already captured.

It can be done but as Dave suggested but that requires significant amount
of job to be done as one has to traverse through the active task stacks and
look for crash_kexec(). An easier/simpler way is that kernel itself can
report it. Netdump, diskdump already do it. I think for simplicity, it
makes sense to export this information from kernel in the form of note.

Only issue I could think of is stack overflow and current might be
corrupted after panic.


> If not would a little extra debug information
> captured at compile time be better?
>

Sorry, I did not get this. Can you elaborate a little more on this. What
kind of debug information can be helpful in determining the current.

> - You don't address the issue of architectural control registers.
> So you are going to need another note at some point. (Not
> necessarily a bad thing).
>

That's true. Probably a new note type shall be required to store
architectural control registers. May be something like NT_KDUMP_CRREG and
NT_KDUMPINFO can continue to capture miscellaneous info.

Thanks
Vivek

>
> > +/*
> > + * NT_KDUMPINFO can be used to communicate additional information required for
> > + * kernel core dumps. Additional information includes kernel configuration
> > + * variables like page size and any other relevant data required by
> > + * debugger (crash in this case).
> > +*/
> > +struct elf_kdumpinfo
> > +{
> > + char page_shift; /* Page size */
> > + struct task_struct *panic_tsk; /* Pointer to panic task_struct */
> > +};
> > +
>
> > #define NT_TASKSTRUCT 4
> > #define NT_AUXV 6
> > #define NT_PRXFPREG 0x46e62b7f /* copied from gdb5.1/include/elf/common.h */
> > -
> > +#define NT_KDUMPINFO 7 /* Used for kernel core dumps */
> >

2005-09-22 07:49:07

by Andrew Morton

[permalink] [raw]
Subject: Re: [Fastboot] [PATCH] Kdump(x86): add note type NT_KDUMPINFO to kernel core dumps

Vivek Goyal <[email protected]> wrote:
>
> > - Why do you avoid storing the current task on the other cpus?
> >
> > - Can't we derive the current task from the existing register information
> > already captured.
>
> It can be done but as Dave suggested but that requires significant amount
> of job to be done as one has to traverse through the active task stacks and
> look for crash_kexec(). An easier/simpler way is that kernel itself can
> report it. Netdump, diskdump already do it. I think for simplicity, it
> makes sense to export this information from kernel in the form of note.
>
> Only issue I could think of is stack overflow and current might be
> corrupted after panic.
>

Yes, traversing the task_structs in a crashed kernel sounds like a poor
idea.

2005-09-22 08:33:00

by Vivek Goyal

[permalink] [raw]
Subject: Re: [Fastboot] [PATCH] Kdump(x86): add note type NT_KDUMPINFO to kernel core dumps

On Thu, Sep 22, 2005 at 12:46:48AM -0700, Andrew Morton wrote:
> Vivek Goyal <[email protected]> wrote:
> >
> > > - Why do you avoid storing the current task on the other cpus?
> > >
> > > - Can't we derive the current task from the existing register information
> > > already captured.
> >
> > It can be done but as Dave suggested but that requires significant amount
> > of job to be done as one has to traverse through the active task stacks and
> > look for crash_kexec(). An easier/simpler way is that kernel itself can
> > report it. Netdump, diskdump already do it. I think for simplicity, it
> > makes sense to export this information from kernel in the form of note.
> >
> > Only issue I could think of is stack overflow and current might be
> > corrupted after panic.
> >
>
> Yes, traversing the task_structs in a crashed kernel sounds like a poor
> idea.
>

Andrew, traversal of task_structs will not be done in crashed kernel. It
will be done in user space by "crash" (if required, during post crash analysis).

In this case we are simply copying "current" pointer of panicking task in
an elf note.

kdumpinfo.panic_tsk = current;

Thanks
Vivek

2005-09-22 09:13:20

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [Fastboot] [PATCH] Kdump(x86): add note type NT_KDUMPINFO to kernel core dumps

Vivek Goyal <[email protected]> writes:

> On Wed, Sep 21, 2005 at 08:28:32AM -0600, Eric W. Biederman wrote:
>> Vivek Goyal <[email protected]> writes:
>>
>> > o This patch adds a new note type NT_KDUMPINFO. This note is added with
>> > kernel core dumps generated by kdump.
>> >
>> > o This note mainly communicates the information required by kernel dump
>> > analysis tool "crash" to analyze the kernel core dump. As of now it contains
>> > the pointer to task struct of panicing task and page size. Page size is
>> > irrelevant for i386 but is required for architectures like ia64 and ppc64.
>> >
>> > o gdb is not affected by this change as gdb need not to parse this note.
>>
>> A couple of things.
>> - The name of your note is terribly generic, so it seems a poor choice.
>>
>
> Yes it seems generic. And I think the reason being that I did not have a
> significant number of things to report hence could not create logical groups
> and giving very specific names to those groups. As of today, based on the
> requirements, could think of only two items (page size, current). There
> is still scope for few more things to appear. Hence gave a generic name and
> thought more items can be grouped in this note itself until and unless group
> becomes really big and need to break.

I'm not certain exactly how but we do need to capture the kernel version
so we can verify the vmlinux binary and the core dump match.

> Do you have any suggestion to make this name better?

Not presently. That still doesn't mean it isn't worth brain storming
about.

>> - Why do we need to capture the page size at the time of the
>> crash? Isn't the page size a compile time option? Won't
>> sys_getpagesize() get you this information before the crash?
>> Why do we need the kernels page size at all?
>>
>
> Dave has already mentioned about the need of page size. I agree that page
> size is a compile time information. Are you hinting at creating a separate
> note for reporting page size in user space while loading the capture kernel
> and store it in reserved region alongside other elf headers. Not sure, if
> it is good to create a separate note type just to report one configuration
> variable.

It's in my reply to Dave but we need a way to get that kind of information
out of vmlinux.

>> - Why do you avoid storing the current task on the other cpus?
>>
>> - Can't we derive the current task from the existing register information
>> already captured.

In my reply to Dave I noted that flagging the cpu is cheaper and more
reliably that returning current.

>> If not would a little extra debug information
>> captured at compile time be better?
>>
>
> Sorry, I did not get this. Can you elaborate a little more on this. What
> kind of debug information can be helpful in determining the current.

The dwarf2 debug information can tell you what registers a variable
is stored in. I was thinking of something along those lines, for
reporting to the debugger where current was stored.

>> - You don't address the issue of architectural control registers.
>> So you are going to need another note at some point. (Not
>> necessarily a bad thing).
>>
>
> That's true. Probably a new note type shall be required to store
> architectural control registers. May be something like NT_KDUMP_CRREG and
> NT_KDUMPINFO can continue to capture miscellaneous info.

Sounds right. I am slightly less concerned so long as we restrict
ourselves to an append only discipline with regards to the information
in that note.

Eric