Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965190AbVIVJDE (ORCPT ); Thu, 22 Sep 2005 05:03:04 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S965241AbVIVJDD (ORCPT ); Thu, 22 Sep 2005 05:03:03 -0400 Received: from ebiederm.dsl.xmission.com ([166.70.28.69]:31134 "EHLO ebiederm.dsl.xmission.com") by vger.kernel.org with ESMTP id S965190AbVIVJDB (ORCPT ); Thu, 22 Sep 2005 05:03:01 -0400 To: Dave Anderson Cc: vgoyal@in.ibm.com, Morton Andrew Morton , Fastboot mailing list , linux kernel mailing list Subject: Re: [Fastboot] [PATCH] Kdump(x86): add note type NT_KDUMPINFO tokernel core dumps References: <20050921065633.GC3780@in.ibm.com> <43317980.D6AEA859@redhat.com> From: ebiederm@xmission.com (Eric W. Biederman) Date: Thu, 22 Sep 2005 03:01:10 -0600 In-Reply-To: <43317980.D6AEA859@redhat.com> (Dave Anderson's message of "Wed, 21 Sep 2005 11:17:20 -0400") Message-ID: User-Agent: Gnus/5.1007 (Gnus v5.10.7) Emacs/21.4 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5038 Lines: 112 Dave Anderson writes: > "Eric W. Biederman" wrote: > >> Vivek Goyal writes: >> >> > o This patch adds a new note type NT_KDUMPINFO. This note is added with >> > kernel core dumps generated by kdump. >> > >> > o This note mainly communicates the information required by kernel dump >> > analysis tool "crash" to analyze the kernel core dump. As of now it contains >> > the pointer to task struct of panicing task and page size. Page size is >> > irrelevant for i386 but is required for architectures like ia64 and ppc64. >> > > > It's not absolutely required for crash -- but more of a convenience. As it is, > the > set of NT_PRSTATUS sections can be scoured for the set of active > tasks, and then the process stacks (and potentially the IRQ stacks, then > mapped back to the proper process stack) can be searched for crash_kexec. > So crash works without the task_struct pointer, but it's ugly. It's just that > netdump, diskdump and LKCD all report the panic task_struct pointer, > and it seems a reasonable thing to do. Ok. The point here is to know which task/cpu called panic, rather than to get the task_struct. That makes a lot of sense, and is cheap to get. Any note on the crashing cpu that is not captured by another cpu will give us that information. My primary concern is while the concept of a task_struct is pretty stable who is to know how the kernel will change in the future. So if we don't need to export a task_struct pointer and merely need to flag the cpu that called panic we can do that much more reliably. A practical question is how is this solved with a normal multi-threaded core dumps? It is not the best thing but order of the notes does and will matter, it is likely worth Documenting so people don't change it that PR_STATUS comes first so we can reliably find the first note for a cpu, after everything has been serialized. >> > o gdb is not affected by this change as gdb need not to parse this note. >> >> A couple of things. >> - The name of your note is terribly generic, so it seems a poor choice. >> >> - Why do we need to capture the page size at the time of the >> crash? Isn't the page size a compile time option? Won't >> sys_getpagesize() get you this information before the crash? >> Why do we need the kernels page size at all? >> > > The page size is needed for a whole host of things, primarily for > virtual-to-physical address translations, a bunch of VM-related > commands, etc. For non-x86[_64] systems. the host system may > be using a different page size than the vmcore being examining. > That's probably a rare situation, and using getpagesize() on the > host would work in most cases, or perhaps issuing the page size > as a command line option, but again, it's another convenience item > that gets reported by netdump, diskdump and LKCD. What I was mostly saying is that there are other places this can be captured. If this is actually a compile time option on all architectures we should simply add more debug information to vmlinux. If this varies at runtime we have the option of capturing it from the kernel before it crashes. We don't need to run code to capture it while the kernel is dying. Having just skimmed the include/asm-* it looks like this is indeed a compile time option so the page size needs to get added to the information that is easy to get from the vmlinux binary. We do need a way that we can test to see if a core dump actually matches the vmlinux we are looking at. Probably this is capturing all of the information captured by linux/version.h and linux/compile.h both at runtime and at compile time and checking to see if they match. >> - Why do you avoid storing the current task on the other cpus? >> > > They can be found in the per-cpu runqueues data structure. Thanks. That confirms my impression we are just looking for a flag that says we are the current cpu. >> - Can't we derive the current task from the existing register information >> already captured. If not would a little extra debug information >> captured at compile time be better? >> >> - You don't address the issue of architectural control registers. >> So you are going to need another note at some point. (Not >> necessarily a bad thing). >> > > I guess that's their whole point. This NT_KDUMPINFO or whatever > you want to call it, could cumulatively collect any other data deemed > necessary that falls outside the bounds of the existing note types. It is clear we need one or mode notes. Largely once we define what is in a note we cannot change it. The most we can do is to append more fields to an existing node type, because we have size information we can check on. This is because the definition of a note is a kernel API issue. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/