Received: by 2002:a25:824b:0:0:0:0:0 with SMTP id d11csp3002198ybn; Thu, 26 Sep 2019 22:44:38 -0700 (PDT) X-Google-Smtp-Source: APXvYqx6ub4CkPmYqGBC2jEWR3gwk1qnlETRhGdqfBHMY75LGgQrGSAjxMLhI3IkkaGQzLzYoEpB X-Received: by 2002:a17:906:fc02:: with SMTP id ov2mr6320772ejb.273.1569563078308; Thu, 26 Sep 2019 22:44:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1569563078; cv=none; d=google.com; s=arc-20160816; b=0oPFMjwP3DcYrtJZL34asAEDBwS5LGL5E4YXoihWMAepWb4tRKX2tIWiEookKJRe46 XWF1XFH8DWVfXTBWYq5DnZOsUXa6DLCIDrK9rPtkMCcA1QSgigZ2pQPcF36+H7vtNXAs fa7uBlS3jAiYk3jqhIcEYn5qkLj/G7CqeqqzkO8mkxLl4g6wUNnDvIVWXtDL3TB94JW1 HO2RgIGq0lNrJYl/KHKuoznSkpU/zyQ7IiCymyLeR4tiACf3zZox6kxhSJBTwBfhwKLi W0y02q2zaEDjwg14Pd1tAovU2Lpt+iVNSagv9DVp7GJYapMDklt+1bR00pZN5FR5ZD1c Pr6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=590ySM9baWgCjnyTj8wBN9ZpEAkqg1xr9sdtFunY/Ms=; b=j0tJCEXZBk/bri0pgZbt9Ea6/BL/Vvrk5rWky0CZY6JKAZHbnLKdj5v42buxlt9O/H ZcGZPUftUFLsdcP85U07lf7V23NRjr5aU+qdwLwgPeDOf+BQRA+c2kwngsrFzzHDt6qo y0yjZrhMxoDYEtKsk2w73/9mtHUN0lGgmdCAb2PhESV6HmwoxY+mqd7fMDkoIOTiIunU 7Pw3rqvnuiR+jNxGC3L8EBEkTKfILkA+R5chdQaFpMfg8F/486Ly+tAH0kgfIfnpNrxh kbONQtigd+dpdCKHNEtsQYjW7HE6nnqVLLQnGpFE6sSwktTizEar5C5AkpZCStwDA3Br vLUw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p9si831411edx.273.2019.09.26.22.44.14; Thu, 26 Sep 2019 22:44:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725963AbfI0FmQ (ORCPT + 99 others); Fri, 27 Sep 2019 01:42:16 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34848 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725820AbfI0FmQ (ORCPT ); Fri, 27 Sep 2019 01:42:16 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 13A20307D962; Fri, 27 Sep 2019 05:42:16 +0000 (UTC) Received: from dhcp-128-65.nay.redhat.com (ovpn-12-78.pek2.redhat.com [10.72.12.78]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DDF9760BF4; Fri, 27 Sep 2019 05:42:11 +0000 (UTC) Date: Fri, 27 Sep 2019 13:42:08 +0800 From: Dave Young To: Kairui Song Cc: Ingo Molnar , Linux Kernel Mailing List , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Thomas Lendacky , Baoquan He , Lianbo Jiang , the arch/x86 maintainers , "kexec@lists.infradead.org" Subject: Re: [PATCH v3 2/2] x86/kdump: Reserve extra memory when SME or SEV is active Message-ID: <20190927054208.GA13426@dhcp-128-65.nay.redhat.com> References: <20190910151341.14986-1-kasong@redhat.com> <20190910151341.14986-3-kasong@redhat.com> <20190911055618.GA104115@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.12.1 (2019-06-15) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]); Fri, 27 Sep 2019 05:42:16 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/25/19 at 06:36pm, Kairui Song wrote: > On Wed, Sep 11, 2019 at 1:56 PM Ingo Molnar wrote: > > * Kairui Song wrote: > > > > > Since commit c7753208a94c ("x86, swiotlb: Add memory encryption support"), > > > SWIOTLB will be enabled even if there is less than 4G of memory when SME > > > is active, to support DMA of devices that not support address with the > > > encrypt bit. > > > > > > And commit aba2d9a6385a ("iommu/amd: Do not disable SWIOTLB if SME is > > > active") make the kernel keep SWIOTLB enabled even if there is an IOMMU. > > > > > > Then commit d7b417fa08d1 ("x86/mm: Add DMA support for SEV memory > > > encryption") will always force SWIOTLB to be enabled when SEV is active > > > in all cases. > > > > > > Now, when either SME or SEV is active, SWIOTLB will be force enabled, > > > and this is also true for kdump kernel. As a result kdump kernel will > > > run out of already scarce pre-reserved memory easily. > > > > > > So when SME/SEV is active, reserve extra memory for SWIOTLB to ensure > > > kdump kernel have enough memory, except when "crashkernel=size[KMG],high" > > > is specified or any offset is used. As for the high reservation case, an > > > extra low memory region will always be reserved and that is enough for > > > SWIOTLB. Else if the offset format is used, user should be fully aware > > > of any possible kdump kernel memory requirement and have to organize the > > > memory usage carefully. > > > > > > Signed-off-by: Kairui Song > > > --- > > > arch/x86/kernel/setup.c | 20 +++++++++++++++++--- > > > 1 file changed, 17 insertions(+), 3 deletions(-) > > > > > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > > > index 71f20bb18cb0..ee6a2f1e2226 100644 > > > --- a/arch/x86/kernel/setup.c > > > +++ b/arch/x86/kernel/setup.c > > > @@ -530,7 +530,7 @@ static int __init crashkernel_find_region(unsigned long long *crash_base, > > > unsigned long long *crash_size, > > > bool high) > > > { > > > - unsigned long long base, size; > > > + unsigned long long base, size, mem_enc_req = 0; > > > > > > base = *crash_base; > > > size = *crash_size; > > > @@ -561,11 +561,25 @@ static int __init crashkernel_find_region(unsigned long long *crash_base, > > > if (high) > > > goto high_reserve; > > > > > > + /* > > > + * When SME/SEV is active and not using high reserve, > > > + * it will always required an extra SWIOTLB region. > > > + */ > > > + if (mem_encrypt_active()) > > > + mem_enc_req = ALIGN(swiotlb_size_or_default(), SZ_1M); > > > + > > > base = memblock_find_in_range(CRASH_ALIGN, > > > - CRASH_ADDR_LOW_MAX, size, > > > + CRASH_ADDR_LOW_MAX, > > > + size + mem_enc_req, > > > CRASH_ALIGN); > > > > Hi Ingo, > > I re-read my previous reply, it's long and tedious, let me try to make > a more effective reply: > > > What sizes are we talking about here? > > The size here is how much memory will be reserved for kdump kernel, to > ensure kdump kernel and userspace can run without OOM. > > > > > - What is the possible size range of swiotlb_size_or_default() > > swiotlb_size_or_default() returns the swiotlb size, it's specified by > user using swiotlb=, or default size (64MB) > > > > > - What is the size of CRASH_ADDR_LOW_MAX (the old limit)? > > It's 4G. > > > > > - Why do we replace one fixed limit with another fixed limit instead of > > accurately sizing the area, with each required feature adding its own > > requirement to the reservation size? > > It's quite hard to "accurately sizing the area". > > No way to tell the exact amount of memory kdump needs, we can only estimate. > Kdump kernel use different cmdline, drivers and components will have > special handling for kdump, and userspace is totally different. Agreed about your above, but specific this the problem in this patch There should be other ways. First thought about doing generic handling in swiotlb part, and do something like kdump_memory_reserve(size) Ingo suggested, but according to you swiotlb init is late, so it can not increase the size, OTOH if reserve another region for kdump in swiotlb will cause other issues. So let's think about other improvement, for example to see if you can call kdump_memory_reserve(size) in AMD SME init path, for example in mem_encrypt_init(), is it before crashkernel reservation? If doable it will be at least cleaner than the code in this patch. Thanks Dave