Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1891350imu; Wed, 21 Nov 2018 03:40:07 -0800 (PST) X-Google-Smtp-Source: AJdET5cp5k03hc0eRbWlSdAFfbpmPdTS753uVOpO4lltBsqRLahz7NNJsrqZr3fbZluyNG0A2ZtS X-Received: by 2002:a62:4851:: with SMTP id v78mr6482442pfa.97.1542800407605; Wed, 21 Nov 2018 03:40:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542800407; cv=none; d=google.com; s=arc-20160816; b=JBqj21roSmg6IK0jwaFYmZMbaPoq+XDLPUACTmy70yRxDjT9eij97WZEGp/7vSHk1a UGcI7Qq+1OGTgln27WP3Y0AKT5Qf3gImE41zS342PhvtWDHxlt4+/kwyOE4+5s4Z8JS4 uZ3bgzkl5X7JZFLclRdqD+pdnm1hgcSyibyB78lUtNRxtPAd8EkFfHSlZupnHUgGHXGC j4i4dfdpqHm5+AO5N1n+8Az/BG2slm/iV/PRADLyAusnbd890OsumKvIlFC7jFBHl4Y3 NnYurWSujM3n+HYAl+l3qZ9XoJc9UWkv8gklHEgwd2KlE+nph+KGznKqiM2H/FUtDMis aa2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=E6pUy6Fv3yuvMe7nPjbdtIJr/M+QR4Q10NsKuBNP8Qk=; b=tO3X2QfvOT6rRsVsKQIA7phGstpHaKaXKr4hfjv5XfRu4yd0olT4PYyJRNH7PIyx89 0GXEl2S3Xi5ejKEzyd/3SxVf8x+JF4ZJIaEls2XXnDAse0lSV/bOII3p/hYMHjLuHsdx hYKoFSbPexTeMJ3hICObYoMNvguhUQyGtsCezzkvhVn/S+5oPwOQGhCIlLnaJQ4qNS/2 IX9RQzg/q6ORK52XDx0jhs6SIgyq8ksEV9HW1UfHywqYFc0i1W9/8TFtpQnyditbrs1T /nuHSwyRPnZzdv0QtGH3Q+q2siTxiFGdP0iJgZA9iSCfnJRYb63CiP2B8n7g6SOq1o7I dUqw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r197si29628347pfc.116.2018.11.21.03.39.53; Wed, 21 Nov 2018 03:40:07 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728819AbeKUV2T (ORCPT + 99 others); Wed, 21 Nov 2018 16:28:19 -0500 Received: from mx1.redhat.com ([209.132.183.28]:41376 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726172AbeKUV2T (ORCPT ); Wed, 21 Nov 2018 16:28:19 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E018130820F2; Wed, 21 Nov 2018 10:54:23 +0000 (UTC) Received: from localhost.localdomain (ovpn-12-61.pek2.redhat.com [10.72.12.61]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 04D2B604A4; Wed, 21 Nov 2018 10:54:13 +0000 (UTC) Subject: Re: [PATCH 1/2 v6] x86/kexec_file: add e820 entry in case e820 type string matches to io resource name To: Borislav Petkov Cc: Bjorn Helgaas , linux-kernel@vger.kernel.org, kexec@lists.infradead.org, x86@kernel.org, tglx@linutronix.de, mingo@redhat.com, akpm@linux-foundation.org, dyoung@redhat.com, bhe@redhat.com References: <20181114072926.13312-1-lijiang@redhat.com> <20181114072926.13312-2-lijiang@redhat.com> <20181114112600.GD13926@zn.tnic> <9eb61523-7a08-24c4-ac15-050537bd9203@redhat.com> <20181115103959.GB26448@zn.tnic> <20181118115251.GB19380@zn.tnic> From: lijiang Message-ID: Date: Wed, 21 Nov 2018 18:54:09 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20181118115251.GB19380@zn.tnic> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.47]); Wed, 21 Nov 2018 10:54:24 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2018年11月18日 19:52, Borislav Petkov 写道: > On Fri, Nov 16, 2018 at 11:25:55AM +0800, lijiang wrote: >> For the pci mmconfig issue, it should be good enough that the e820 reserved region >> [mem 0x0000000078000000-0x000000008fffffff] is only passed to the second kernel, but >> the pci mmconfig region is not the same in another machine. > > Yes. And now the question is, *which* reserved regions need to be mapped > for the second kernel to function properly? How do we figure that out? > >> A simple case, hotplug a pci network card and use the ssh/nfs to dump the vmcore. >> If the pci mmconfig region is not reserved in kdump kernel, the pci hotplug device >> could not be recognized. So the pci network card won't work. > > Yes that's a good example; put *that* example in your commit message. > >> Here, there is an example about SME kdump. Maybe it can help to better understand. > > You keep pasting that and I've read it already. And you keep repeating > that the reserved regions need to be mapped in the second kernel and I'm > asking, how do we determine *which* regions should we pass to the second > kernel? > For the pci mmconfig issue, it is clear according to Dave and Bjorn's comment. So i'd like to explain two things: 1. why the SME kdump kernel does not work without the reserved ranges. The first kernel can get e820 table's information from BIOS or bootloader, but for kdump kernel, it can not use this method to get e820 table. Maybe you have known who should pass the e820 ranges to the second kernel, it is just the kexec-tools. Unfortunately, kernel does not pass the e820 reserved ranges to the kdump kernel, when use the kexec_file_load syscall to load the kernel image and initramfs. At the early boot stage, the caller will use the early_memremap() to remap the address space, such as DMI, ACPI and Firmware, etc. Here, i only use an example to illustrate the issue what the DMI encountered in kdump kernel. At the early boot stage, the DMI will use the early_memremap() to remap the address ranges [0xF0000-0xFFFFF], and then check for the SMBIOS entry point signature. And when the caller checks the SMBIOS entry point signature, the caller will use the early_memremap() to remap another reserved ranges [0x0x6286B000-0x6286EFFF] again, which is reported by firmware(or SMBIOS). Please refer to the code: drivers/firmware/dmi_scan.c dmi_scan_machine()-> / dmi_early_remap()-> early_memremap() \ dmi_smbios3_present()->dmi_walk_early()->dmi_early_remap()->early_memremap() Obviously, the DMI remapped ranges [0xF0000-0xFFFFF] and [0x6286b000-0x6286efff] are not in the e820 table for the kdump kernel. Therefore, when SME is enabled, these regions will be consider to be encrypted according to the following code: arch/x86/mm/ioremap.c. Actually, these regions are still decrypted in kdump kernel. It has gone wrong. Note: When SME is enabled, if the memory regions are decrypted, but the caller marks the memory pages as encrypted, the content of pages is unpredictable when read from memory. early_memremap()-> early_memremap_pgprot_adjust()-> memremap_should_map_decrypted()-> e820__get_entry_type() static bool memremap_should_map_decrypted(resource_size_t phys_addr, unsigned long size) { int is_pmem; ...... /* Check if the address is outside kernel usable area */ switch (e820__get_entry_type(phys_addr, phys_addr + size - 1)) { case E820_TYPE_RESERVED: case E820_TYPE_ACPI: case E820_TYPE_NVS: case E820_TYPE_UNUSABLE: /* For SEV, these areas are encrypted */ if (sev_active()) break; /* Fallthrough */ case E820_TYPE_PRAM: return true; default: break; } return false; } Similarly, for ACPI, firmware, etc. They will also encounter the same problems, eventually, which causes the system fails to start. That is the reason why the SME kdump kernel does not work without the reserved ranges. 2. why the all reserved ranges are passed to the second kernel(or which regions should we pass to the second kernel?) As previously mentioned, for the DMI, the reserved regions[0xF0000-0xFFFFF] and [0x0x6286B000-0x6286EFFF] are accurately passed to the second kernel, the DMI can work well on my machine. But for the ACPI, firmware, etc. How to deal with this? I think that we have to find all places where to call the early_memremap(), and determine whether these reserved regions need be passed to the second kernel, and then add the code for kdump kernel in these code paths. It is really too expensive to do this. Furthermore, i only did a same thing in kernel, just like the kexec-tools pass the all reserved ranges to the second kernel. If you understood this issue, you might ignore it, please. Thanks. Lianbo > If we should pass *all* reserved regions, why? > > IOW, I'm looking for the *why* first. > > Thx. >