To: Dave Anderson <anderson@redhat.com>
Cc: vgoyal@in.ibm.com, Morton Andrew Morton <akpm@osdl.org>,
       Fastboot mailing list <fastboot@lists.osdl.org>,
       linux kernel mailing list <linux-kernel@vger.kernel.org>
Subject: Re: [Fastboot] [PATCH] Kdump(x86): add note type NT_KDUMPINFO
 tokernel  core dumps
References: <20050921065633.GC3780@in.ibm.com>
	<m1mzm6ebqn.fsf@ebiederm.dsl.xmission.com>
	<43317980.D6AEA859@redhat.com>
From: ebiederm@xmission.com (Eric W. Biederman)
Date: Thu, 22 Sep 2005 03:01:10 -0600
In-Reply-To: <43317980.D6AEA859@redhat.com> (Dave Anderson's message of
 "Wed, 21 Sep 2005 11:17:20 -0400")
Message-ID: <m1d5n1cw89.fsf@ebiederm.dsl.xmission.com>
User-Agent: Gnus/5.1007 (Gnus v5.10.7) Emacs/21.4 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 5038
Lines: 112

Dave Anderson <anderson@redhat.com> writes:

> "Eric W. Biederman" wrote:
>
>> Vivek Goyal <vgoyal@in.ibm.com> writes:
>>
>> > o This patch adds a new note type NT_KDUMPINFO. This note is added with
>> >   kernel core dumps generated by kdump.
>> >
>> > o This note mainly communicates the information required by kernel dump
>> > analysis tool "crash" to analyze the kernel core dump. As of now it contains
>> >   the pointer to task struct of panicing task and page size. Page size is
>> > irrelevant for i386 but is required for architectures like ia64 and ppc64.
>> >
>
> It's not absolutely required for crash -- but more of a convenience.  As it is,
> the
> set of NT_PRSTATUS sections can be scoured for the set of active
> tasks, and then the process stacks (and potentially the IRQ stacks, then
> mapped back to the proper process stack) can be searched for crash_kexec.
> So crash works without the task_struct pointer, but it's ugly.  It's just that
> netdump, diskdump and LKCD all report the panic task_struct pointer,
> and it seems a reasonable thing to do.

Ok.  The point here is to know which task/cpu called panic, rather
than to get the task_struct.   That makes a lot of sense, and is
cheap to get.  Any note on the crashing cpu that is not captured
by another cpu will give us that information.  

My primary concern is while the concept of a task_struct is pretty
stable who is to know how the kernel will change in the future.  So
if we don't need to export a task_struct pointer and merely need
to flag the cpu that called panic we can do that much more reliably.

A practical question is how is this solved with a normal multi-threaded
core dumps?

It is not the best thing but order of the notes does and will matter,
it is likely worth Documenting so people don't change it that
PR_STATUS comes first so we can reliably find the first note
for a cpu, after everything has been serialized.

>> > o gdb is not affected by this change as gdb need not to parse this note.
>>
>> A couple of things.
>> - The name of your note is terribly generic, so it seems a poor choice.
>>
>> - Why do we need to capture the page size at the time of the
>>   crash?  Isn't the page size a compile time option?  Won't
>>   sys_getpagesize() get you this information before the crash?
>>   Why do we need the kernels page size at all?
>>
>
> The page size is needed for a whole host of things, primarily for
> virtual-to-physical address translations, a bunch of VM-related
> commands, etc.  For non-x86[_64] systems. the host system may
> be using a different page size than the vmcore being examining.
> That's probably a rare situation, and using getpagesize() on the
> host would work in most cases, or perhaps issuing the page size
> as a command line option, but again, it's another convenience item
> that gets reported by netdump, diskdump and LKCD.

What I was mostly saying is that there are other places this can
be captured.  If this is actually a compile time option on all
architectures we should simply add more debug information to vmlinux.
If this varies at runtime we have the option of capturing it from the
kernel before it crashes.  We don't need to run code to capture
it while the kernel is dying.

Having just skimmed the include/asm-* it looks like this is indeed
a compile time option so the page size needs to get added to the
information that is easy to get from the vmlinux binary.

We do need a way that we can test to see if a core dump
actually matches the vmlinux we are looking at.  Probably
this is capturing all of the information captured by linux/version.h
and linux/compile.h both at runtime and at compile time and
checking to see if they match.

>> - Why do you avoid storing the current task on the other cpus?
>>
>
> They can be found in the per-cpu runqueues data structure.

Thanks.  That confirms my impression we are just looking for a flag
that says we are the current cpu.

>> - Can't we derive the current task from the existing register information
>>   already captured.  If not would a little extra debug information
>>   captured at compile time be better?
>>
>> - You don't address the issue of architectural control registers.
>>   So you are going to need another note at some point. (Not
>>   necessarily a bad thing).
>>
>
> I guess that's their whole point.  This NT_KDUMPINFO or whatever
> you want to call it, could cumulatively collect any other data deemed
> necessary that falls outside the bounds of the existing note types.

It is clear we need one or mode notes.  Largely once we define what
is in a note we cannot change it.  The most we can do is to append
more fields to an existing node type, because we have size information
we can check on.  This is because the definition of a note is a kernel
API issue.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/