Received: by 2002:ac0:cd04:0:0:0:0:0 with SMTP id w4csp75724imn; Fri, 1 Jul 2022 10:13:45 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sWtr8UKKMoUMjKJs3G/ocvHM64L1+VZTTq8XsOS2Cl7E6UWxHQPIhf+wpn87Q2FHQYOfvK X-Received: by 2002:aa7:cd17:0:b0:435:bd7e:2efb with SMTP id b23-20020aa7cd17000000b00435bd7e2efbmr20921202edw.180.1656695625345; Fri, 01 Jul 2022 10:13:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1656695625; cv=none; d=google.com; s=arc-20160816; b=T6McDYojaFKIfBpQzkR/JG5kBsjuGNQudxnUoqrfrKfG7rkMTnBVP5olVmcMtaCK8q Dghhg7XYHMXPPRSIQJrq85ylhW3kFxT1OwgrRU8XYjZvth50FUPe8Q1gHAU1Oq+u33Kq e+l/HiFbEBozjJvo9Y1nCA7uiOas/uVAVqDLlw8zStTXZTBlPDU2kXJgBXDj0tevRj2M zxmzueXYGjrvN1BXSeGZ2ZIgXv0A7+sKJMLYGfSxwu2L4QYOvB4FAMV2Tftr+5MWwos9 0m0aW/WlupOmYG6OEJlcWRQ6uXERzs4LfAftKGT5BVQczSaAcKE+UYxJhdJQfdXN/aie P1yA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=aOnEMSRcOoMmtU7wHRd04QjRoz4Wjl+98EKgzQatXYI=; b=k5l0Hv5TfIXE5yCtCVGX/BRSv0FT8ebFIkQP6YgD0uaOiwhG0QVFZlaD13N6OrczvP 9w8ap+Ua0xL/6a8w/6C1SagpXB/b2D7H7CUMSpr0xAsXA5COBO/zN2Vs1Oz5muYqbRjg drG0I/5bbNTz1z6pEdsFmJf7iOJmAZC9oXLRuTEL5fyFJivU78YOkRZToz3apPnoRggR HevASaPF/31vx/lX3byaFx2zDX05A7MB41tELUydr5AYq23BLt4yAhA12gOQ+fkgfc8j 8D2jlVdTv66OjXufdQDONCrtmobzDa2fU8ITrZ75DdwJBqmoDQCys7I9qXy96Otn0xv1 nsUw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=LCVg8pYO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f21-20020aa7d855000000b00435dde2e2c4si9303985eds.510.2022.07.01.10.13.09; Fri, 01 Jul 2022 10:13:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=LCVg8pYO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232533AbiGAQw1 (ORCPT + 99 others); Fri, 1 Jul 2022 12:52:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45032 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230489AbiGAQwV (ORCPT ); Fri, 1 Jul 2022 12:52:21 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B40323ED21 for ; Fri, 1 Jul 2022 09:52:19 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 5EA45B83094 for ; Fri, 1 Jul 2022 16:52:18 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B3C2EC341C7; Fri, 1 Jul 2022 16:52:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1656694337; bh=6OactPBHIuIvcRfwvl3VE1bL+DJCL/krYJF3tLZoebI=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=LCVg8pYO5wkSe1vpbel9rO1skNXGT9bVjJ2wDdH9jo61swL4UlsfLH4z0nx93eqc3 P9ZgNQ13gfYN1RgTgs9PSnwj92VF2SV3+8sAtZzgkKudeR+kZDqnjp8LjFG+TnhKt+ HB/BJSFsMuiD1I/qodudY3i9eWoOXZ4TeuI5+mUokOW7+WyC2/9qd3aNZBulD+XeX2 MkemKvfgWCTvYfjC/EgLPxVVVMgqkU67Zb12VxsXeroNhBMp6LqOi9iT1jq9B16ybX vNsXpUJy9lr7dXqGBpstSKYoHtRIBx4Jla4UbTrCBP9t5uejswUwFc5gykYM392pAx cBeJmzPvhU3lw== Date: Fri, 1 Jul 2022 19:51:58 +0300 From: Mike Rapoport To: "guanghui.fgh" Cc: baolin.wang@linux.alibaba.com, catalin.marinas@arm.com, will@kernel.org, akpm@linux-foundation.org, david@redhat.com, jianyong.wu@arm.com, james.morse@arm.com, quic_qiancai@quicinc.com, christophe.leroy@csgroup.eu, jonathan@marek.ca, mark.rutland@arm.com, thunder.leizhen@huawei.com, anshuman.khandual@arm.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, geert+renesas@glider.be, ardb@kernel.org, linux-mm@kvack.org, yaohongbo@linux.alibaba.com, alikernel-developer@linux.alibaba.com Subject: Re: [PATCH v3] arm64: mm: fix linear mapping mem access performance degradation Message-ID: References: <1656586222-98555-1-git-send-email-guanghuifeng@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-7.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 01, 2022 at 12:36:00PM +0800, guanghui.fgh wrote: > Thanks. > > 在 2022/6/30 21:46, Mike Rapoport 写道: > > Hi, > > > > On Thu, Jun 30, 2022 at 06:50:22PM +0800, Guanghui Feng wrote: > > > The arm64 can build 2M/1G block/sectiion mapping. When using DMA/DMA32 zone > > > (enable crashkernel, disable rodata full, disable kfence), the mem_map will > > > use non block/section mapping(for crashkernel requires to shrink the region > > > in page granularity). But it will degrade performance when doing larging > > > continuous mem access in kernel(memcpy/memmove, etc). > > > > > > There are many changes and discussions: > > > commit 031495635b46 ("arm64: Do not defer reserve_crashkernel() for > > > platforms with no DMA memory zones") > > > commit 0a30c53573b0 ("arm64: mm: Move reserve_crashkernel() into > > > mem_init()") > > > commit 2687275a5843 ("arm64: Force NO_BLOCK_MAPPINGS if crashkernel > > > reservation is required") > > > > > > This patch changes mem_map to use block/section mapping with crashkernel. > > > Firstly, do block/section mapping(normally 2M or 1G) for all avail mem at > > > mem_map, reserve crashkernel memory. And then walking pagetable to split > > > block/section mapping to non block/section mapping(normally 4K) [[[only]]] > > > for crashkernel mem. So the linear mem mapping use block/section mapping > > > as more as possible. We will reduce the cpu dTLB miss conspicuously, and > > > accelerate mem access about 10-20% performance improvement. > > > > ... > > > Signed-off-by: Guanghui Feng > > > --- > > > arch/arm64/include/asm/mmu.h | 1 + > > > arch/arm64/mm/init.c | 8 +- > > > arch/arm64/mm/mmu.c | 231 ++++++++++++++++++++++++++++++------------- > > > 3 files changed, 168 insertions(+), 72 deletions(-) > > > > ... > > > > > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > > > index 626ec32..4b779cf 100644 > > > --- a/arch/arm64/mm/mmu.c > > > +++ b/arch/arm64/mm/mmu.c > > > @@ -42,6 +42,7 @@ > > > #define NO_BLOCK_MAPPINGS BIT(0) > > > #define NO_CONT_MAPPINGS BIT(1) > > > #define NO_EXEC_MAPPINGS BIT(2) /* assumes FEAT_HPDS is not used */ > > > +#define NO_SEC_REMAPPINGS BIT(3) /* rebuild with non block/sec mapping*/ > > > u64 idmap_t0sz = TCR_T0SZ(VA_BITS_MIN); > > > u64 idmap_ptrs_per_pgd = PTRS_PER_PGD; > > > @@ -156,11 +157,12 @@ static bool pgattr_change_is_safe(u64 old, u64 new) > > > } > > > static void init_pte(pmd_t *pmdp, unsigned long addr, unsigned long end, > > > - phys_addr_t phys, pgprot_t prot) > > > + phys_addr_t phys, pgprot_t prot, int flags) > > > { > > > pte_t *ptep; > > > - ptep = pte_set_fixmap_offset(pmdp, addr); > > > + ptep = (flags & NO_SEC_REMAPPINGS) ? pte_offset_kernel(pmdp, addr) : > > > + pte_set_fixmap_offset(pmdp, addr); > > > do { > > > pte_t old_pte = READ_ONCE(*ptep); > > > @@ -176,7 +178,8 @@ static void init_pte(pmd_t *pmdp, unsigned long addr, unsigned long end, > > > phys += PAGE_SIZE; > > > } while (ptep++, addr += PAGE_SIZE, addr != end); > > > - pte_clear_fixmap(); > > > + if (!(flags & NO_SEC_REMAPPINGS)) > > > + pte_clear_fixmap(); > > > } > > > static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr, > > > @@ -208,16 +211,59 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr, > > > next = pte_cont_addr_end(addr, end); > > > /* use a contiguous mapping if the range is suitably aligned */ > > > - if ((((addr | next | phys) & ~CONT_PTE_MASK) == 0) && > > > + if (!(flags & NO_SEC_REMAPPINGS) && > > > + (((addr | next | phys) & ~CONT_PTE_MASK) == 0) && > > > (flags & NO_CONT_MAPPINGS) == 0) > > > __prot = __pgprot(pgprot_val(prot) | PTE_CONT); > > > - init_pte(pmdp, addr, next, phys, __prot); > > > + init_pte(pmdp, addr, next, phys, __prot, flags); > > > phys += next - addr; > > > } while (addr = next, addr != end); > > > } > > > +static void init_pmd_remap(pud_t *pudp, unsigned long addr, unsigned long end, > > > + phys_addr_t phys, pgprot_t prot, > > > + phys_addr_t (*pgtable_alloc)(int), int flags) > > > +{ > > > + unsigned long next; > > > + pmd_t *pmdp; > > > + phys_addr_t map_offset; > > > + pmdval_t pmdval; > > > + > > > + pmdp = pmd_offset(pudp, addr); > > > + do { > > > + next = pmd_addr_end(addr, end); > > > + > > > + if (!pmd_none(*pmdp) && pmd_sect(*pmdp)) { > > > + phys_addr_t pte_phys = pgtable_alloc(PAGE_SHIFT); > > > + pmd_clear(pmdp); > > > + pmdval = PMD_TYPE_TABLE | PMD_TABLE_UXN; > > > + if (flags & NO_EXEC_MAPPINGS) > > > + pmdval |= PMD_TABLE_PXN; > > > + __pmd_populate(pmdp, pte_phys, pmdval); > > > + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); > > > + > > > + map_offset = addr - (addr & PMD_MASK); > > > + if (map_offset) > > > + alloc_init_cont_pte(pmdp, addr & PMD_MASK, addr, > > > + phys - map_offset, prot, > > > + pgtable_alloc, > > > + flags & (~NO_SEC_REMAPPINGS)); > > > + > > > + if (next < (addr & PMD_MASK) + PMD_SIZE) > > > + alloc_init_cont_pte(pmdp, next, > > > + (addr & PUD_MASK) + PUD_SIZE, > > > + next - addr + phys, > > > + prot, pgtable_alloc, > > > + flags & (~NO_SEC_REMAPPINGS)); > > > + } > > > + alloc_init_cont_pte(pmdp, addr, next, phys, prot, > > > + pgtable_alloc, flags); > > > + phys += next - addr; > > > + } while (pmdp++, addr = next, addr != end); > > > +} > > > > There is still to much duplicated code here and in init_pud_remap(). > > > > Did you consider something like this: > > > > void __init map_crashkernel(void) > > { > > int flags = NO_EXEC_MAPPINGS | NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; > > u64 size; > > > > /* > > * check if crash kernel supported, reserved etc > > */ > > > > > > size = crashk_res.end + 1 - crashk_res.start; > > > > __remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size); > > __create_pgd_mapping(swapper_pg_dir, crashk_res.start, > > __phys_to_virt(crashk_res.start), size, > > PAGE_KERNEL, early_pgtable_alloc, flags); > > } > > > I'm trying do this. > But I think it's the Inverse Process of mem mapping and also generates > duplicated code(Boundary judgment, pagetable modify). > > When removing the pgd mapping, it may split pud/pmd section which also needs > [[[rebuild and clear]]] some pagetable. Well, __remove_pgd_mapping() is probably an overkill, but unmap_hotplug_pmd_range() and unmap_hotplug_pud_range() should do, depending on the size of the crash kernel. -- Sincerely yours, Mike.