Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp3647520iog; Tue, 21 Jun 2022 03:06:35 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tJF9SjLeXoLsnv0l9S8G3FqCKxAPtrFqG6OJjzS2uicSU94YrYDf68y3nhhZ0MdTVhAEvq X-Received: by 2002:a17:906:216:b0:711:f623:8bb0 with SMTP id 22-20020a170906021600b00711f6238bb0mr24926117ejd.174.1655805995407; Tue, 21 Jun 2022 03:06:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655805995; cv=none; d=google.com; s=arc-20160816; b=W8x4RcoP4UhdGVn+VSUy5BsSqp9NHoOEK0lleWK33wyGwx3zdZq8s7IkzKexSmU9Kv J1I8yyKC526MLYDLAMRob2tRmvXfkGDXgxjvp3o2govqqI0cLKmiuqT/+Dhjz8VIpw/f 1mg2o/PWL4iyIPNK2d8P0r3jlBjJ68Cr3S0D+17EeIIZxiY4oJjgROY1uaG+kA1B/Fgj fvQtawm5m+gdKT0dhgB1dGdHphrSStCoIMDi22MmCDDKQqV5CJS3rDyu4QUw0PeIWdck YQiAeRKLR8x3ZCbVqOOAaZi9Fo8mdblEwonaP3JKzVRmXrGkkWQ6Czb5rfgXnBzTycd4 2Ebw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=4GxMxN9cunj7j/s8YfY5rtdI8+FBR/4TfOiBLvKgbPo=; b=CnRylio1H2cgN9Pv+LX2aFgv7Aer6nYj2acMv7J2C/+719xlpAW5pcMZUrCGZUSF2/ w+NaVjZzFm8wKs38xx2SG9XtTCnB0foxlJo0cGgy+eElu1o04HUEOVO3zgmsOjBpepNR gCmdA5tcLnlsbSRSclcnDpGE3jZoM+t3glWpdXd1ash44qCBJmsVxp14SscC0Y0qbdzf Hr083m5AEZuAR5qOjP/aPIYy3+yDLtiqf58aOM3YGdlRtdxBkPylRSBgr3bjyEyQ0NLJ /VvGSBEZkCCi5bBId7w7GJC69RyFWb7kiP/FjGh5DmSv78c5jYBzDXdMFsyzdoSgm5MO BPuw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=KyWQgsRX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 3-20020a170906100300b00722e55c8275si1104496ejm.746.2022.06.21.03.06.10; Tue, 21 Jun 2022 03:06:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=KyWQgsRX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349045AbiFUJgH (ORCPT + 99 others); Tue, 21 Jun 2022 05:36:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47516 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349035AbiFUJgG (ORCPT ); Tue, 21 Jun 2022 05:36:06 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id ACA7A25C79 for ; Tue, 21 Jun 2022 02:36:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1655804163; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=4GxMxN9cunj7j/s8YfY5rtdI8+FBR/4TfOiBLvKgbPo=; b=KyWQgsRXa/9uPjqDK8KekF+MkytrL9JIuApDy0G1uwuOMBfT7h7H2+O5YIkz7e6bevH+np 0WXWiexQp/z6g91xAk47YBbLs7wMMckd1JgG1+YfVCwtc/Hm0j5jQZY6O46HiuN5PWToJm KmiVSDUmM3D6Fu24O0sBxuGOX3N1zKw= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-486-0kqwBBi6O-aiHPJp5CQaRQ-1; Tue, 21 Jun 2022 05:36:00 -0400 X-MC-Unique: 0kqwBBi6O-aiHPJp5CQaRQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 975E229AB3F0; Tue, 21 Jun 2022 09:35:59 +0000 (UTC) Received: from localhost (ovpn-12-183.pek2.redhat.com [10.72.12.183]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 134E81121314; Tue, 21 Jun 2022 09:35:57 +0000 (UTC) Date: Tue, 21 Jun 2022 17:35:54 +0800 From: Baoquan He To: "Leizhen (ThunderTown)" Cc: Catalin Marinas , Ard Biesheuvel , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H . Peter Anvin" , Eric Biederman , Rob Herring , Frank Rowand , devicetree@vger.kernel.org, Dave Young , Vivek Goyal , kexec@lists.infradead.org, linux-kernel@vger.kernel.org, Will Deacon , linux-arm-kernel@lists.infradead.org, Jonathan Corbet , linux-doc@vger.kernel.org, Randy Dunlap , Feng Zhou , Kefeng Wang , Chen Zhou , John Donnelly , Dave Kleikamp Subject: Re: [PATCH 5/5] arm64: kdump: Don't defer the reservation of crash high memory Message-ID: References: <20220613080932.663-1-thunder.leizhen@huawei.com> <20220613080932.663-6-thunder.leizhen@huawei.com> <4ad5f8c9-a411-da4e-f626-ead83d107bca@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4ad5f8c9-a411-da4e-f626-ead83d107bca@huawei.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/21/22 at 03:56pm, Leizhen (ThunderTown) wrote: > > > On 2022/6/21 13:33, Baoquan He wrote: > > Hi, > > > > On 06/13/22 at 04:09pm, Zhen Lei wrote: > >> If the crashkernel has both high memory above DMA zones and low memory > >> in DMA zones, kexec always loads the content such as Image and dtb to the > >> high memory instead of the low memory. This means that only high memory > >> requires write protection based on page-level mapping. The allocation of > >> high memory does not depend on the DMA boundary. So we can reserve the > >> high memory first even if the crashkernel reservation is deferred. > >> > >> This means that the block mapping can still be performed on other kernel > >> linear address spaces, the TLB miss rate can be reduced and the system > >> performance will be improved. > > > > Ugh, this looks a little ugly, honestly. > > > > If that's for sure arm64 can't split large page mapping of linear > > region, this patch is one way to optimize linear mapping. Given kdump > > setting is necessary on arm64 server, the booting speed is truly > > impacted heavily. > > There is also a performance impact when running. Yes, indeed, the TLB flush will happen more often. > > > > > However, I would suggest letting it as is with below reasons: > > > > 1) The code will complicate the crashkernel reservatoin code which > > is already difficult to understand. > > Yeah, I feel it, too. > > > 2) It can only optimize the two cases, first is CONFIG_ZONE_DMA|DMA32 > > disabled, the other is crashkernel=,high is specified. While both > > two cases are corner case, most of systems have CONFIG_ZONE_DMA|DMA32 > > enabled, and most of systems have crashkernel=xM which is enough. > > Having them optimized won't bring benefit to most of systems. > > The case of CONFIG_ZONE_DMA|DMA32 disabled have been resolved by > commit 031495635b46 ("arm64: Do not defer reserve_crashkernel() for platforms with no DMA memory zones"). > Currently the performance problem to be optimized is that DMA is enabled. Yes, the disabled CONFIG_ZONE_DMA|DMA32 case has avoided the problem since its boundary is decided already at that time. Crashkenrel=,high can slso avoid this benefitting from the top done memblock allocating. However, the crashkerne=xM which now gets the fallback support is the main syntax we will use, that still has the problem. > > > > 3) Besides, the crashkernel=,high can be handled earlier because > > arm64 alwasys have memblock.bottom_up == false currently, thus we > > don't need worry arbout the lower limit of crashkernel,high > > reservation for now. If memblock.bottom_up is set true in the future, > > this patch doesn't work any more. > > > > > > ... > > crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, > > crash_base, crash_max); > > > > So, in my opinion, we can leave the current NON_BLOCK|SECT mapping as > > is caused by crashkernel reserving, since no regression is brought. > > And meantime, turning to check if there's any way to make the contiguous > > linear mapping and later splitting work. The patch 4, 5 in this patchset > > doesn't make much sense to me, frankly speaking. > > OK. As discussed earlier, I can rethink if there is a better way to patch 4-5, > and this time focus on patch 1-2. In this way, all the functions are complete, > and only optimization is left. Sounds nice, thx. > > > >> > >> Signed-off-by: Zhen Lei > >> --- > >> arch/arm64/mm/init.c | 71 ++++++++++++++++++++++++++++++++++++++++---- > >> 1 file changed, 65 insertions(+), 6 deletions(-) > >> > >> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > >> index fb24efbc46f5ef4..ae0bae2cafe6ab0 100644 > >> --- a/arch/arm64/mm/init.c > >> +++ b/arch/arm64/mm/init.c > >> @@ -141,15 +141,44 @@ static void __init reserve_crashkernel(int dma_state) > >> unsigned long long crash_max = CRASH_ADDR_LOW_MAX; > >> char *cmdline = boot_command_line; > >> int dma_enabled = IS_ENABLED(CONFIG_ZONE_DMA) || IS_ENABLED(CONFIG_ZONE_DMA32); > >> - int ret; > >> + int ret, skip_res = 0, skip_low_res = 0; > >> bool fixed_base; > >> > >> if (!IS_ENABLED(CONFIG_KEXEC_CORE)) > >> return; > >> > >> - if ((!dma_enabled && (dma_state != DMA_PHYS_LIMIT_UNKNOWN)) || > >> - (dma_enabled && (dma_state != DMA_PHYS_LIMIT_KNOWN))) > >> - return; > >> + /* > >> + * In the following table: > >> + * X,high means crashkernel=X,high > >> + * unknown means dma_state = DMA_PHYS_LIMIT_UNKNOWN > >> + * known means dma_state = DMA_PHYS_LIMIT_KNOWN > >> + * > >> + * The first two columns indicate the status, and the last two > >> + * columns indicate the phase in which crash high or low memory > >> + * needs to be reserved. > >> + * --------------------------------------------------- > >> + * | DMA enabled | X,high used | unknown | known | > >> + * --------------------------------------------------- > >> + * | N N | low | NOP | > >> + * | Y N | NOP | low | > >> + * | N Y | high/low | NOP | > >> + * | Y Y | high | low | > >> + * --------------------------------------------------- > >> + * > >> + * But in this function, the crash high memory allocation of > >> + * crashkernel=Y,high and the crash low memory allocation of > >> + * crashkernel=X[@offset] for crashk_res are mixed at one place. > >> + * So the table above need to be adjusted as below: > >> + * --------------------------------------------------- > >> + * | DMA enabled | X,high used | unknown | known | > >> + * --------------------------------------------------- > >> + * | N N | res | NOP | > >> + * | Y N | NOP | res | > >> + * | N Y |res/low_res| NOP | > >> + * | Y Y | res | low_res | > >> + * --------------------------------------------------- > >> + * > >> + */ > >> > >> /* crashkernel=X[@offset] */ > >> ret = parse_crashkernel(cmdline, memblock_phys_mem_size(), > >> @@ -169,10 +198,33 @@ static void __init reserve_crashkernel(int dma_state) > >> else if (ret) > >> return; > >> > >> + /* See the third row of the second table above, NOP */ > >> + if (!dma_enabled && (dma_state == DMA_PHYS_LIMIT_KNOWN)) > >> + return; > >> + > >> + /* See the fourth row of the second table above */ > >> + if (dma_enabled) { > >> + if (dma_state == DMA_PHYS_LIMIT_UNKNOWN) > >> + skip_low_res = 1; > >> + else > >> + skip_res = 1; > >> + } > >> + > >> crash_max = CRASH_ADDR_HIGH_MAX; > >> } else if (ret || !crash_size) { > >> /* The specified value is invalid */ > >> return; > >> + } else { > >> + /* See the 1-2 rows of the second table above, NOP */ > >> + if ((!dma_enabled && (dma_state == DMA_PHYS_LIMIT_KNOWN)) || > >> + (dma_enabled && (dma_state == DMA_PHYS_LIMIT_UNKNOWN))) > >> + return; > >> + } > >> + > >> + if (skip_res) { > >> + crash_base = crashk_res.start; > >> + crash_size = crashk_res.end - crashk_res.start + 1; > >> + goto check_low; > >> } > >> > >> fixed_base = !!crash_base; > >> @@ -202,9 +254,18 @@ static void __init reserve_crashkernel(int dma_state) > >> return; > >> } > >> > >> + crashk_res.start = crash_base; > >> + crashk_res.end = crash_base + crash_size - 1; > >> + > >> +check_low: > >> + if (skip_low_res) > >> + return; > >> + > >> if ((crash_base >= CRASH_ADDR_LOW_MAX) && > >> crash_low_size && reserve_crashkernel_low(crash_low_size)) { > >> memblock_phys_free(crash_base, crash_size); > >> + crashk_res.start = 0; > >> + crashk_res.end = 0; > >> return; > >> } > >> > >> @@ -219,8 +280,6 @@ static void __init reserve_crashkernel(int dma_state) > >> if (crashk_low_res.end) > >> kmemleak_ignore_phys(crashk_low_res.start); > >> > >> - crashk_res.start = crash_base; > >> - crashk_res.end = crash_base + crash_size - 1; > >> insert_resource(&iomem_resource, &crashk_res); > >> } > >> > >> -- > >> 2.25.1 > >> > > > > . > > > > -- > Regards, > Zhen Lei >