Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp8849638imu; Thu, 15 Nov 2018 19:27:56 -0800 (PST) X-Google-Smtp-Source: AJdET5eQxlznf/hwuiMhGD8P9q8Qrn0BA0dMMuNSCL9972qtYXrvISHe99UIU8uYFq09SE0gKNxh X-Received: by 2002:a17:902:6848:: with SMTP id f8mr8844936pln.300.1542338876388; Thu, 15 Nov 2018 19:27:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542338876; cv=none; d=google.com; s=arc-20160816; b=SBj8nDEjPkl7EJ/TfzpLpBOZqRLkCHIQguMI9hSwPwNStYfNNFkxy4DluXwpOBOY+q XWnldL+kW1Vnf5vzKoqeKwZR4UYghfCCRbjMXVbqd8RZCtxH1we8wDdZKdqDGm4XZOtE x9gTUrqZeo+o1+ISeH4Ll0ADR2BI/jQIudc2Lo3n37xPq9L3GuQfG0Eb3YLjLCSxXxtq MBMEQQ0JhY+7aIZVTG5tw4W00Dw9RLXwh8Heap0qlhN969tF78VpF01v7UizKPd2ztQs 2S8G16QqogiS++8u71zOniRpncytffsyjKEqYr+UiJ1UN2BUqT0Xikg8msgNZzaZWaXQ 84OQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=hnSn0VZQdi19gjtxksemeq3eJIIi3/8JVohzmL/whU8=; b=FH+Xi5Ql02h4bV0VGJj9O2hijjOso2k8A/IvTYhd7+sQ7+4P8+Udb5TdRd68ggRJNa vKbw27JG3JyBBq1PhPQK+Gwv0Vkaix+kICraVFh8TIn85+0gvVZZVzGPxh9XlWcgmTbZ dYopnHeIix+QCwXvKNMkaE22rl7zSCA4ZxNujwsceCR+EG8ZOdI+T5OjeC8SiaLBQwvO jVH9WJaYOMHdDz2F90yvcEsgNh8xPVB6FMyyJcT2bkj2ZK3IwftoI3jZUI/t5Rov2AGs z/ZpakMjO01PSuIvH3t2stqZK4R93ZY0rvDvIJzJooplOR3uG79O90RP+kni5YIELGVT B0ZA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t24-v6si9152037plo.191.2018.11.15.19.27.41; Thu, 15 Nov 2018 19:27:56 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727353AbeKPNgq (ORCPT + 99 others); Fri, 16 Nov 2018 08:36:46 -0500 Received: from mx1.redhat.com ([209.132.183.28]:51218 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727061AbeKPNgq (ORCPT ); Fri, 16 Nov 2018 08:36:46 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 336873082E50; Fri, 16 Nov 2018 03:26:08 +0000 (UTC) Received: from localhost.localdomain (ovpn-12-127.pek2.redhat.com [10.72.12.127]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 340664BF; Fri, 16 Nov 2018 03:25:57 +0000 (UTC) Subject: Re: [PATCH 1/2 v6] x86/kexec_file: add e820 entry in case e820 type string matches to io resource name To: Borislav Petkov , Bjorn Helgaas Cc: linux-kernel@vger.kernel.org, kexec@lists.infradead.org, x86@kernel.org, tglx@linutronix.de, mingo@redhat.com, akpm@linux-foundation.org, dyoung@redhat.com, bhe@redhat.com References: <20181114072926.13312-1-lijiang@redhat.com> <20181114072926.13312-2-lijiang@redhat.com> <20181114112600.GD13926@zn.tnic> <9eb61523-7a08-24c4-ac15-050537bd9203@redhat.com> <20181115103959.GB26448@zn.tnic> From: lijiang Message-ID: Date: Fri, 16 Nov 2018 11:25:55 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20181115103959.GB26448@zn.tnic> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.46]); Fri, 16 Nov 2018 03:26:08 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2018年11月15日 18:39, Borislav Petkov 写道: > + Bjorn. > > On Thu, Nov 15, 2018 at 01:44:07PM +0800, lijiang wrote: >> At present, the upstream kernel does not pass the e820 reserved ranges to the >> second kernel, which might cause two problems: >> >> The first one is the MMCONFIG issue, the PCI MMCONFIG(extended mode) requires >> the reserved region otherwise it falls back to legacy mode, which might lead to >> the hot-plug device could not be recognized in kdump kernel. > > Well, this still doesn't explain it fully. Let's look at a box: > > [ 0.000000] e820: BIOS-provided physical RAM map: > [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x00000000000997ff] usable > [ 0.000000] BIOS-e820: [mem 0x0000000000099800-0x000000000009ffff] reserved > [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved > [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000065642fff] usable > [ 0.000000] BIOS-e820: [mem 0x0000000065643000-0x0000000067fb8fff] reserved > [ 0.000000] BIOS-e820: [mem 0x0000000067fb9000-0x00000000689e8fff] ACPI NVS > [ 0.000000] BIOS-e820: [mem 0x00000000689e9000-0x0000000068bf5fff] ACPI data > [ 0.000000] BIOS-e820: [mem 0x0000000068bf6000-0x000000006f7fffff] usable > [ 0.000000] BIOS-e820: [mem 0x000000006f800000-0x000000008fffffff] reserved > [ 0.000000] BIOS-e820: [mem 0x00000000fd000000-0x00000000fe7fffff] reserved > [ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved > [ 0.000000] BIOS-e820: [mem 0x00000000fec80000-0x00000000fed00fff] reserved > [ 0.000000] BIOS-e820: [mem 0x00000000ff800000-0x00000001007fffff] reserved > [ 0.000000] BIOS-e820: [mem 0x0000000100800000-0x000000603fffffff] usable > > this one has 8 reserved regions. Does that mean that we need to pass > them *all* 8 to the second kernel so that MMCONFIG works? > > Or is it only one reserved region which is needed for MMCONFIG? > On my machine, the pci mmconfig region[mem 0x80000000-0x8fffffff] reserved in e820. This address range belongs to the e820 reserved region[mem 0x0000000078000000- 0x000000008fffffff]. Kernel outputs the following log: [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000008bfff] usable [ 0.000000] BIOS-e820: [mem 0x000000000008c000-0x000000000009ffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000029920fff] usable [ 0.000000] BIOS-e820: [mem 0x0000000029921000-0x0000000029921fff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000029922000-0x0000000062278fff] usable [ 0.000000] BIOS-e820: [mem 0x0000000062279000-0x0000000062378fff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000062379000-0x000000006238bfff] ACPI data [ 0.000000] BIOS-e820: [mem 0x000000006238c000-0x000000006238cfff] ACPI NVS [ 0.000000] BIOS-e820: [mem 0x000000006238d000-0x000000006240bfff] usable [ 0.000000] BIOS-e820: [mem 0x000000006240c000-0x000000006264bfff] reserved [ 0.000000] BIOS-e820: [mem 0x000000006264c000-0x000000006266dfff] usable [ 0.000000] BIOS-e820: [mem 0x000000006266e000-0x00000000626cdfff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000626ce000-0x000000006278dfff] usable [ 0.000000] BIOS-e820: [mem 0x000000006278e000-0x000000006278efff] ACPI data [ 0.000000] BIOS-e820: [mem 0x000000006278f000-0x0000000062807fff] usable [ 0.000000] BIOS-e820: [mem 0x0000000062808000-0x000000006280afff] ACPI data [ 0.000000] BIOS-e820: [mem 0x000000006280b000-0x000000006280cfff] usable [ 0.000000] BIOS-e820: [mem 0x000000006280d000-0x000000006280dfff] ACPI data [ 0.000000] BIOS-e820: [mem 0x000000006280e000-0x000000006286afff] usable [ 0.000000] BIOS-e820: [mem 0x000000006286b000-0x000000006286efff] reserved [ 0.000000] BIOS-e820: [mem 0x000000006286f000-0x00000000682f8fff] usable [ 0.000000] BIOS-e820: [mem 0x00000000682f9000-0x0000000068b05fff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000068b06000-0x0000000068b09fff] ACPI NVS [ 0.000000] BIOS-e820: [mem 0x0000000068b0a000-0x0000000068b1afff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000068b1b000-0x0000000068b1dfff] ACPI NVS [ 0.000000] BIOS-e820: [mem 0x0000000068b1e000-0x0000000071d1dfff] usable [ 0.000000] BIOS-e820: [mem 0x0000000071d1e000-0x0000000071d2dfff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000071d2e000-0x0000000071d3dfff] ACPI NVS [ 0.000000] BIOS-e820: [mem 0x0000000071d3e000-0x0000000071d4dfff] ACPI data [ 0.000000] BIOS-e820: [mem 0x0000000071d4e000-0x0000000077ffffff] usable [ 0.000000] BIOS-e820: [mem 0x0000000078000000-0x000000008fffffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fed80000-0x00000000fed80fff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000087effffff] usable [ 0.000000] BIOS-e820: [mem 0x000000087f000000-0x000000087fffffff] reserved ...... [ 0.082649] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x80000000-0x8fffffff] (base 0x80000000) [ 0.083610] PCI: MMCONFIG at [mem 0x80000000-0x8fffffff] reserved in E820 ...... For the pci mmconfig issue, it should be good enough that the e820 reserved region [mem 0x0000000078000000-0x000000008fffffff] is only passed to the second kernel, but the pci mmconfig region is not the same in another machine. In addition, it has more serious problems that kdump could not work in some machine. > Bjorn, do you know what the detection logic should be to map the correct > reserved region (or regions) for MMCONFIG? > > Now, even if we don't map that reserved region and MMCONFIG falls back > to legacy mode, why is that a problem for the kdump kernel? Why does > the kdump kernel need the hotplug device? What would be the use case? > Hotplug a SATA drive to store the memory dump to it ... or? > A simple case, hotplug a pci network card and use the ssh/nfs to dump the vmcore. If the pci mmconfig region is not reserved in kdump kernel, the pci hotplug device could not be recognized. So the pci network card won't work. >> Another one is that the e820 reserved ranges do not setup in kdump kernel, which >> could cause kdump can't work in some machines. To know more information, please >> refer to the [PATCH 2/2 v6] patch log. > > Yah, I still don't understand *why* we need the reserved ranges in the > second kernel. Once we've figured out the *why* we can look at the *how*. > Here, there is an example about SME kdump. Maybe it can help to better understand. The e820 reserved ranges do not setup in kdump kernel, which will cause some functions that related to the e820 reserved ranges to become invalid. early_memremap()-> early_memremap_pgprot_adjust()-> memremap_should_map_decrypted()-> e820__get_entry_type() Please focus on these functions, early_memremap_pgprot_adjust() and memremap_should_map_decrypted(). In the first kernel, these ranges sit in e820 reserved ranges, so the memremap_should_map_decrypted() will return true, that is to say, the reserved memory is decrypted, then the early_memremap_pgprot_adjust() will call the pgprot_decrypted() to clear the memory encryption mask. In the second kernel, because the e820 reserved ranges are not passed to the second kernel, these ranges don't sit in the e820 reserved ranges, so the memremap_should_map_decrypted() will return false, that is to say, the reserved memory is encrypted, and then the early_memremap_pgprot_ adjust() will also call the pgprot_encrypted() to set the memory encryption mask. In fact, in the second kernel, the e820 reserved memory is still decrypted. Obviously, it has gone wrong. So, this issue must be fixed, otherwise kdump won't work in this case. The e820 reserved range is useful in kdump kernel, so it is necessary to pass the e820 reserved ranges to kdump kernel. Hope this is helpful. Thanks, Lianbo > Thx. >