Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp8034120imu; Tue, 4 Dec 2018 01:37:39 -0800 (PST) X-Google-Smtp-Source: AFSGD/XT4H37HDTqTJlaKifcBUQQE8HrMvkxVY9h4GvQ6yrSLa34/Yq+V2nEJnKVisGQV+4yTZzP X-Received: by 2002:a63:1c61:: with SMTP id c33mr15810348pgm.354.1543916259750; Tue, 04 Dec 2018 01:37:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543916259; cv=none; d=google.com; s=arc-20160816; b=Q7Gv7ls5yEhIjwebG/mPQqSHEfhb8nK33JGBAk7n09AxIRVeSNbpvsw1J+nDSlEaR8 vZeaI8RE7vg02joau+pcFK2l96DrZJUOR0Xr+aVfOfQJ3px/g48XEzTLoaRW+JOklmxA aPozBsz+xyPoQy65smMltdTJX6/5fQyzpLSVnTy9ISe86bV5OPNNROptln7sSLmvnCp/ yqv5Ai8ylEp0GIpVTCWTnp/I7GPDLj5D9vXxIoxDQnoEasmSsK0QoM1iM2TWI5xTYX4R 4DTBtclXnOb6teBKb6j6bWapII8e2cfhZj/mf1CymXZUfSYJBdhaaRKHbmLZL3tqaikE 8QEg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=+PyZDl1XOn1Em7DD7upsbKKFM02WT21WPC9RU4H6eVE=; b=f1UQezTK/26VL1LkUStAU3WCIENGRpTQXk9pQorJBXjRz1lF4VllCrq3logazG+kg5 CK4zCtrOlfNQ/cdWG6oBSebI8+u/pfNhY/u+9ynlQWtqQbO3Q8yTo89a5x5OqGKn1+nL h+7H2t1dNzheDx76rMd9UKQN3FhgvQJ8xgOAMQrz/12ETOzpACRZder6jIOdH+7/VX+V qAONACTSY8D1ZKEGCl7SG0TPvCyQpIVTzI9tkeNeMzaHdXDbWseOFytm/zdxM2FsTqbK k8KoXtCyAOL6JwX4GthQSqQ6b/H2V/AZ/EIkfsr97cWOXubE//Dg7x9wp/Ric4a+u/Hl jHfA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w1si15343780pgi.66.2018.12.04.01.37.24; Tue, 04 Dec 2018 01:37:39 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725955AbeLDJf2 (ORCPT + 99 others); Tue, 4 Dec 2018 04:35:28 -0500 Received: from mx1.redhat.com ([209.132.183.28]:45908 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725764AbeLDJf2 (ORCPT ); Tue, 4 Dec 2018 04:35:28 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 89DA02D7E5; Tue, 4 Dec 2018 09:35:27 +0000 (UTC) Received: from localhost.localdomain (ovpn-12-37.pek2.redhat.com [10.72.12.37]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 951F617DC0; Tue, 4 Dec 2018 09:35:14 +0000 (UTC) Subject: Re: [PATCH 1/2 v2] kdump: add the vmcoreinfo documentation To: Borislav Petkov Cc: linux-kernel@vger.kernel.org, kexec@lists.infradead.org, tglx@linutronix.de, mingo@redhat.com, x86@kernel.org, akpm@linux-foundation.org, bhe@redhat.com, dyoung@redhat.com, Jonathan Corbet , linux-doc@vger.kernel.org References: <20181202030839.29945-1-lijiang@redhat.com> <20181202030839.29945-2-lijiang@redhat.com> <20181203150809.GA4794@zn.tnic> From: lijiang Message-ID: <779dbae7-f6e2-e9e4-bdd0-0a9e6ec62487@redhat.com> Date: Tue, 4 Dec 2018 17:35:09 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20181203150809.GA4794@zn.tnic> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Tue, 04 Dec 2018 09:35:27 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2018年12月03日 23:08, Borislav Petkov 写道: > Add some more Ccs. > Thanks a lot. There are more people to review and improve this document together, that would be fine. > On Sun, Dec 02, 2018 at 11:08:38AM +0800, Lianbo Jiang wrote: >> This document lists some variables that export to vmcoreinfo, and briefly >> describles what these variables indicate. It should be instructive for >> many people who do not know the vmcoreinfo, and it also normalizes the >> exported variable as a standard ABI between kernel and use-space. > > Yeah, I'm not sure about it being an ABI. Apparently, it is considered > too tightly coupled to the kernel for it to be an ABI. > > Regardless, thanks for doing that. > It's my pleasure to do that. >> Suggested-by: Borislav Petkov >> Signed-off-by: Lianbo Jiang >> --- >> Documentation/kdump/vmcoreinfo.txt | 400 +++++++++++++++++++++++++++++ >> 1 file changed, 400 insertions(+) >> create mode 100644 Documentation/kdump/vmcoreinfo.txt >> >> diff --git a/Documentation/kdump/vmcoreinfo.txt b/Documentation/kdump/vmcoreinfo.txt > > Aren't we adding new docs in rst format only or what is the logic there? > > Jon? > >> new file mode 100644 >> index 000000000000..c6759be14af7 >> --- /dev/null >> +++ b/Documentation/kdump/vmcoreinfo.txt >> @@ -0,0 +1,400 @@ >> +================================================================ >> + Documentation for Vmcoreinfo >> +================================================================ >> + >> +======================= >> +What is the vmcoreinfo? >> +======================= >> +The vmcoreinfo contains the first kernel's various information, for > > The first sentence here should be explaining what VMCOREINFO is: it is > an ELF PT_NOTE section. So that people can go, oh ok, it is a special > ELF section, when reading. > > Then, MAKEDUMPFILE(8) spells VMCOREINFO in all caps and I think we > should do that too here, for ease of recognition. > This is good advice. > Btw, do we have a makedumpfile switch or a tool/script which dumps > VMCOREINFO contents in human-readable form? > Generating VMCOREINFO is easy in the first kernel, for example: # makedumpfile -g VMCOREINFO -x vmlinux # file VMCOREINFO VMCOREINFO: ASCII text > Maybe something nicer than: > > $ hexdump -C /proc/kcore > >> +example, structure size, page size, symbol values and field offset, >> +etc. These data are encapsulated into an elf format, and these data >> +will also help user-space tools(e.g. makedumpfile, crash) analyze the >> +first kernel's memory usage. >> + >> +================ >> +Common variables >> +================ >> + >> +init_uts_ns.name.release >> +======================== >> +The number of OS release. >> + >> +PAGE_SIZE >> +========= >> +The size of a page. It is usually 4k bytes. >> + >> +init_uts_ns >> +=========== >> +This is the UTS namespace, which is used to isolate two specific elements >> +of the system that relate to the uname system call. The UTS namespace is >> +named after the data structure used to store information returned by the >> +uname system call. > > Those non-obvious exports should also have a short explanation why > they're part of VMCOREINFO. > >> + >> +node_online_map >> +=============== >> +It is a macro definition, actually it is an arrary node_states[N_ONLINE], >> +and it represents the set of online node in a system, one bit position >> +per node number. > > Ditto. > > So yeah, people can find out what those things are but I think it is > more important to state here *why* they're part of VMCOREINFO and how > they're used and why they're exported. > This is a good question. For these two *why*, it should be easy to understand. Because user-space tools need to know basic information, such as the symbol values, field offset, structure size, etc. Otherwise, these tools won't know how to analyze the memory of the crash kernel. For the second question 'how they are used', we can get the answer from user-space tools, such as makedumpfile, crash tools. Therefore, it may not need to explain any more in kernel document. On the other hand, if we must put these contents into kernel document, i have to say, that would be a hard task. > Who knows, some might turn out to be not needed anymore. :) > >> + >> +swapper_pg_dir >> +============= >> +It is always an array, it gerenally stands for the pgd for the kernel. >> +When mmu is enabled in config file, the 'swapper_pg_dir' is valid. >> + >> +_stext >> +====== >> +It is an assemble directive that defines the beginning of the text section. > > That's an assembly symbol. > >> +In gerenal, the '_stext' indicates the kernel start address. >> + >> +vmap_area_list >> +============== >> +It stores the virtual area list, makedumpfile can get the vmalloc start >> +value according to this variable. > > "... from this variable." > >> + >> +mem_map >> +======= >> +Physical addresses are translated to struct pages by treating them as an >> +index into the mem_map array. Shifting a physical address PAGE_SHIFT bits >> +to the right will treat it as a PFN from physical address 0, which is also >> +an index within the mem_map array. >> + >> +In a word, it can map the address to struct page. > > "In short, ... " > >> + >> +contig_page_data >> +================ >> +Makedumpfile can get the pglist_data structure according to this symbol > > Please look up in the dictionary what "according" means. Using it in > this context is at least weird. > Thank you for pointing out my mistake. I'm going to look up in the Collins dictionary and record this word on my notebook using a pen. >> +'contig_page_data'. The pglist_data structure is used to describe the >> +memory layout. >> + >> +mem_section|(mem_section, NR_SECTION_ROOTS)|(mem_section, section_mem_map) >> +========================================================================== >> +Export the address of 'mem_section' array, and it's length, structure size, >> +and the 'section_mem_map' offset. >> + >> +It exists in the sparse memory mapping model, and it is also somewhat >> +similar to the mem_map variable, both of them will help to translate >> +the address. >> + >> +page >> +==== >> +The size of a 'page' structure. >> + >> +pglist_data >> +=========== >> +The size of a 'pglist_data' structure. >> + >> +zone >> +==== >> +The size of a 'zone' structure. >> + >> +free_area >> +========= >> +The size of a 'free_area' structure. >> + >> +list_head >> +========= >> +The size of a 'list_head' structure. >> + >> +nodemask_t >> +========== >> +The size of a 'nodemask_t' type. >> + >> +(page, flags|_refcount|mapping|lru|_mapcount|private|compound_dtor| >> + compound_order|compound_head) >> +=================================================================== >> +The page structure is a familiar concept for most of linuxer, there is no >> +need to explain too much. > > Just delete that sentence. > >> To know more information, please refer to the >> +definition of the page struct(include/linux/mm_types.h). >> + >> +(pglist_data, node_zones|nr_zones|node_mem_map|node_start_pfn|node_ >> + spanned_pages|node_id) >> +=================================================================== >> +On NUMA machines, each NUMA node would have a pg_data_t to describe > > s/would have/has/ > >> +it's memory layout. On UMA machines there is a single pglist_data which >> +describes the whole memory. >> + >> +The pglist_data structure contains these varibales, here export their > ^^^^^^^^^ > > Before you send next time, run the *whole* text through a spellchecker. > >> +offset in the pglist_data structure, which is defined in this file >> +"include/linux/mmzone.h". > > You don't have to state where stuff is defined - I hope everyone > should be able to grep. > Thanks for your comment in detail. I will improve them in patch v3. Regards, Lianbo > ... >