Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp1443391pxb; Thu, 14 Apr 2022 06:24:57 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzvkYtg20ZSttymRPYBhh/HqBMTYF7Kzv3G8bSU/9L4qPZG1xLTf63M9qU7x+hsIEr34cTp X-Received: by 2002:a63:5b43:0:b0:39d:1a2b:5904 with SMTP id l3-20020a635b43000000b0039d1a2b5904mr2225662pgm.533.1649942696752; Thu, 14 Apr 2022 06:24:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649942696; cv=none; d=google.com; s=arc-20160816; b=nvtwbV6+s7kX9g1X2zJhdMVGJEg8Av/hkA+UQgEWbk0DwX9B3Ofly/zezCSJj5A3VO TbXXIgYtJPygiPeMtk8DQucOJggslMAJ889hK1aX4GjRRVrs7KsJ948zr6SCyXs2Oybj dYXY89iUnDJJh36i+bnFgKXpknGbyXLGCuNVPhZmz+HdxaX9uWtNP0Ve7AdW9HsNfzg1 Gis+CH4SQRE8Ji1CiL+jnP0RjFKKcF+VAgG/52N+EDbkvO3ifPKgxddHGKtM45UyFfwj ioRc41oyk2runuzQ9gLxDEApnhj4g0S2UfUlGmloSG1UClyInUnaItTq5O60+VjNcNKX YI+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:subject:user-agent:mime-version:date:message-id; bh=0onkc3RmjbeE/PtKlYw4i4E+oeaU1+vYJ99f2TdW1l0=; b=Mv48pkfKeGFT0FGQcXqyL+K851S2UJJkiSubzgVg4hlhhwsFUfXTXhPkqsf2WnSpXe xW+Lah0UX3okpbKjFQ84bbS5y39nbQop5+aH6WUgCHcCy8lwRkkedoepZHJAVGTbMSVd mHxndZJlEKVJ4gjNztK9xBK+vnGbO2FhTHSIydSp4mSmJhd/Qfh5BSflbs0XyXiBFwSu tHgeI/ogAvNdnzKVgJ4Z6tv6tWkq3OA2fvqbdA1X5Uh/uFsR3XpJ+Z0ttnq2n2rDacgM tkqFsjbyxYySeE+RzEnnzeiI1sXexu5hfU9MBbG3dKup6tYAIvEL6mSm2fe77WsjHnXH 5CdA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n2-20020a1709026a8200b001583a083186si13957764plk.302.2022.04.14.06.24.42; Thu, 14 Apr 2022 06:24:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239699AbiDNDpj (ORCPT + 99 others); Wed, 13 Apr 2022 23:45:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50360 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234059AbiDNDph (ORCPT ); Wed, 13 Apr 2022 23:45:37 -0400 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 90AED165B4 for ; Wed, 13 Apr 2022 20:43:13 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R141e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04426;MF=guanghuifeng@linux.alibaba.com;NM=1;PH=DS;RN=5;SR=0;TI=SMTPD_---0VA0LwpP_1649907790; Received: from 30.225.28.199(mailfrom:guanghuifeng@linux.alibaba.com fp:SMTPD_---0VA0LwpP_1649907790) by smtp.aliyun-inc.com(127.0.0.1); Thu, 14 Apr 2022 11:43:11 +0800 Message-ID: <9e5758e2-c9a7-2253-ee69-9979ae31afdd@linux.alibaba.com> Date: Thu, 14 Apr 2022 11:43:10 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Subject: Re: [PATCH RFC v1] arm64: mm: change mem_map to use block/section mapping with crashkernel To: Catalin Marinas Cc: will@kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, baolin.wang@linux.alibaba.com References: <1649754476-8713-1-git-send-email-guanghuifeng@linux.alibaba.com> From: "guanghui.fgh" In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,NICE_REPLY_A,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Thanks for your response! 在 2022/4/14 0:53, Catalin Marinas 写道: > On Tue, Apr 12, 2022 at 05:07:56PM +0800, Guanghui Feng wrote: >> There are many changes and discussions: >> commit 031495635b46 >> commit 1a8e1cef7603 >> commit 8424ecdde7df >> commit 0a30c53573b0 >> commit 2687275a5843 >> >> When using DMA/DMA32 zone and crashkernel, disable rodata full and kfence, >> mem_map will use non block/section mapping(for crashkernel requires to shrink >> the region in page granularity). But it will degrade performance when doing >> larging continuous mem access in kernel(memcpy/memmove, etc). >> >> This patch firstly do block/section mapping at mem_map, reserve crashkernel >> memory. And then walking pagetable to split block/section mapping >> to non block/section mapping [only] for crashkernel mem. We will accelerate >> mem access about 10-20% performance improvement, and reduce the cpu dTLB miss >> conspicuously on some platform with this optimization. > Do you actually have some real world use-cases where this improvement > matters? I don't deny that large memcpy over the kernel linear map may > be slightly faster but where does this really matter? When doing fio test, there may be about 10-20% performance gap. The test method: 1.prepare env with shell script set -x modprobe -r brd modprobe brd rd_nr=1 rd_size=134217728 dmsetup remove_all wipefs -a --force /dev/ram0 mkfs -t ext4 -E lazy_itable_init=0,lazy_journal_init=0 -q -F /dev/ram0 mkdir -p /fs/ram0 mount -t ext4 /dev/ram0 /fs/ram0 #sed -i s/scan_lvs = 1/scan_lvs = 1/ /etc/lvm/lvm.conf 2.prepare fio env with setting file in [x.fio]: [global] bs=4k ioengine=psync iodepth=128 size=8G direct=1 runtime=30 invalidate=1 #fallocate=native group_reporting thread=1 time_based=1 rw=read directory=/fs/ram0 #filename=/dev/ram0 numjobs=1 [task_0] cpus_allowed=16 stonewall=1 3.running fio testcase: sudo fio x.fio ----------------------------------------------------- At the same time, I have test memcpy in the double envs (block/section mapping + non block/section mapping): 1.alloc many continuous pages(src/dst: 10000 * 2^10 bytes): alloc_pages(GFP_KERNEL, 10) 2.memcpy for src to dst >> +static void init_crashkernel_pmd(pud_t *pudp, unsigned long addr, >> + unsigned long end, phys_addr_t phys, >> + pgprot_t prot, >> + phys_addr_t (*pgtable_alloc)(int), int flags) >> +{ >> + phys_addr_t map_offset; >> + unsigned long next; >> + pmd_t *pmdp; >> + pmdval_t pmdval; >> + >> + pmdp = pmd_offset(pudp, addr); >> + do { >> + next = pmd_addr_end(addr, end); >> + if (!pmd_none(*pmdp) && pmd_sect(*pmdp)) { >> + phys_addr_t pte_phys = pgtable_alloc(PAGE_SHIFT); >> + pmd_clear(pmdp); >> + pmdval = PMD_TYPE_TABLE | PMD_TABLE_UXN; >> + if (flags & NO_EXEC_MAPPINGS) >> + pmdval |= PMD_TABLE_PXN; >> + __pmd_populate(pmdp, pte_phys, pmdval); >> + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); > The architecture requires us to do a break-before-make here, so > pmd_clear(), TLBI, __pmd_populate() - in this order. And that's where it > gets tricky, if the kernel happens to access this pmd range while it is > unmapped, you'd get a translation fault. OK, Thanks. + if (map_offset) + alloc_init_cont_pte(pmdp, addr & PMD_MASK, addr, + phys - map_offset, prot, + pgtable_alloc, flags); + + + map_offset = addr - (addr & PUD_MASK); + if (map_offset) + alloc_init_cont_pmd(pudp, addr & PUD_MASK, addr, + phys - map_offset, prot, + pgtable_alloc, flags); + Sorry,There is a defect. When rebuilding normal pmd/pte(out of crashkernel mem), the flags should strip NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS on some scenes: !(can_set_direct_map() || IS_ENABLED(CONFIG_KFENCE)). So we will use as many as possible block/section mapping.