Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp201171imw; Mon, 4 Jul 2022 07:42:28 -0700 (PDT) X-Google-Smtp-Source: AGRyM1t1p3tW9fe/0CQOEkU+S/2gmEl/HMF2iH5ELJnpPeO+HyFvamxcNg2EHPwWCO1RCB8celeF X-Received: by 2002:a17:906:7949:b0:726:40d1:60aa with SMTP id l9-20020a170906794900b0072640d160aamr29129900ejo.511.1656945748575; Mon, 04 Jul 2022 07:42:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1656945748; cv=none; d=google.com; s=arc-20160816; b=XnhCiQpH1e8o2Ue1KMNTemho4OToT6gLIpC5pHdqrntH6L/GYlDUE0NXXJhVSIITDK 08QUAX6VWsRRQ5GPNfTX5+QDIlgukzpwJYyUedCJCr2zdz8j6xPRqyy6XjYdR0OQYQ6T HI8H0pXXRkQHmQ3dbRWPqtSdRSeebq6zsEftT7gyR3jPTa0/9edLEcWhiJbkuSytjEgq ulpBpSbhyAWCnxwiXJmS1vUZifl/qZLoyt/pyK9bsvePKoyzv7fEQn4nAhwFFpNAIaJX wRWo1mGqYT+XY7CjaNWuZ0ewRchDd9nXnUPNwm1xkSv6l0Xt5zHZp6Z6yvZLE03KqG9A 3GLg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:subject:user-agent:mime-version:date:message-id; bh=OWVBB6sgrOV/xLBm43nWckWaiQE2JkRvB6Ze2uZl2VI=; b=msQbr20g+9YE54xLLFqv/9iVduUhcbFDRaCIXFo9+YjgHzoPzBRnTas8i2xVg22niW xL09dUAS0VYG/WQKLzWZZdUiJoDv+a5UUuxJme5qywB6SbUZxb0KbfeHoBKig/AFZDlQ qxsoJbqHt4dCsXDJwvX1Fm6gk+FIELhSckEvLZtDYv+ecOcx+pwzOkonRaP3gHwdiI/b c3FqNDHZkheqPFdkPTzd9sgAISZG3V8Tdq51U9I+Ms/THBxejZ5gcC2lFZy263R9w7eh PfAjk9PY85ntkQX/z5MZ8QJBzg8wBMUG/R0KD/Apt1BKkk8rDNKBCCodfMfF1LTTQzTU 4hZQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d29-20020a056402401d00b0043a12be8863si7021607eda.302.2022.07.04.07.42.02; Mon, 04 Jul 2022 07:42:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231759AbiGDOeV (ORCPT + 99 others); Mon, 4 Jul 2022 10:34:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34558 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234461AbiGDOeR (ORCPT ); Mon, 4 Jul 2022 10:34:17 -0400 Received: from out30-43.freemail.mail.aliyun.com (out30-43.freemail.mail.aliyun.com [115.124.30.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5CB80764C for ; Mon, 4 Jul 2022 07:34:16 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R621e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04400;MF=guanghuifeng@linux.alibaba.com;NM=1;PH=DS;RN=21;SR=0;TI=SMTPD_---0VIO1tUl_1656945247; Received: from 30.225.28.131(mailfrom:guanghuifeng@linux.alibaba.com fp:SMTPD_---0VIO1tUl_1656945247) by smtp.aliyun-inc.com; Mon, 04 Jul 2022 22:34:09 +0800 Message-ID: <6977c692-78ca-5a67-773e-0389c85f2650@linux.alibaba.com> Date: Mon, 4 Jul 2022 22:34:07 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [PATCH v4] arm64: mm: fix linear mem mapping access performance degradation To: Will Deacon Cc: baolin.wang@linux.alibaba.com, catalin.marinas@arm.com, akpm@linux-foundation.org, david@redhat.com, jianyong.wu@arm.com, james.morse@arm.com, quic_qiancai@quicinc.com, christophe.leroy@csgroup.eu, jonathan@marek.ca, mark.rutland@arm.com, thunder.leizhen@huawei.com, anshuman.khandual@arm.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, rppt@kernel.org, geert+renesas@glider.be, ardb@kernel.org, linux-mm@kvack.org, yaohongbo@linux.alibaba.com, alikernel-developer@linux.alibaba.com References: <1656777473-73887-1-git-send-email-guanghuifeng@linux.alibaba.com> <20220704103523.GC31437@willie-the-truck> <73f0c53b-fd17-c5e9-3773-1d71e564eb50@linux.alibaba.com> <20220704111402.GA31553@willie-the-truck> <4accaeda-572f-f72d-5067-2d0999e4d00a@linux.alibaba.com> <20220704131516.GC31684@willie-the-truck> <2ae1cae0-ee26-aa59-7ed9-231d67194dce@linux.alibaba.com> <20220704142313.GE31684@willie-the-truck> From: "guanghui.fgh" In-Reply-To: <20220704142313.GE31684@willie-the-truck> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Thanks. 在 2022/7/4 22:23, Will Deacon 写道: > On Mon, Jul 04, 2022 at 10:11:27PM +0800, guanghui.fgh wrote: >> 在 2022/7/4 21:15, Will Deacon 写道: >>> On Mon, Jul 04, 2022 at 08:05:59PM +0800, guanghui.fgh wrote: >>>>>> 1.Quoted messages from arch/arm64/mm/init.c >>>>>> >>>>>> "Memory reservation for crash kernel either done early or deferred >>>>>> depending on DMA memory zones configs (ZONE_DMA) -- >>>>>> >>>>>> In absence of ZONE_DMA configs arm64_dma_phys_limit initialized >>>>>> here instead of max_zone_phys(). This lets early reservation of >>>>>> crash kernel memory which has a dependency on arm64_dma_phys_limit. >>>>>> Reserving memory early for crash kernel allows linear creation of block >>>>>> mappings (greater than page-granularity) for all the memory bank rangs. >>>>>> In this scheme a comparatively quicker boot is observed. >>>>>> >>>>>> If ZONE_DMA configs are defined, crash kernel memory reservation >>>>>> is delayed until DMA zone memory range size initialization performed in >>>>>> zone_sizes_init(). The defer is necessary to steer clear of DMA zone >>>>>> memory range to avoid overlap allocation. >>>>>> >>>>>> [[[ >>>>>> So crash kernel memory boundaries are not known when mapping all bank memory >>>>>> ranges, which otherwise means not possible to exclude crash kernel range >>>>>> from creating block mappings so page-granularity mappings are created for >>>>>> the entire memory range. >>>>>> ]]]" >>>>>> >>>>>> Namely, the init order: memblock init--->linear mem mapping(4k mapping for >>>>>> crashkernel, requirinig page-granularity changing))--->zone dma >>>>>> limit--->reserve crashkernel. >>>>>> So when enable ZONE DMA and using crashkernel, the mem mapping using 4k >>>>>> mapping. >>>>> >>>>> Yes, I understand that is how things work today but I'm saying that we may >>>>> as well leave the crashkernel mapped (at block granularity) if >>>>> !can_set_direct_map() and then I think your patch becomes a lot simpler. >>>> >>>> But Page-granularity mapppings are necessary for crash kernel memory range >>>> for shrinking its size via /sys/kernel/kexec_crash_size interfac(Quoted from >>>> arch/arm64/mm/init.c). >>>> So this patch split block/section mapping to 4k page-granularity mapping for >>>> crashkernel mem. >>> >>> Why? I don't see why the mapping granularity is relevant at all if we >>> always leave the whole thing mapped. >>> >> There is another reason. >> >> When loading crashkernel finish, the do_kexec_load will use >> arch_kexec_protect_crashkres to invalid all the pagetable for crashkernel >> mem(protect crashkernel mem from access). >> >> arch_kexec_protect_crashkres--->set_memory_valid--->...--->apply_to_pmd_range >> >> In the apply_to_pmd_range, there is a judement: BUG_ON(pud_huge(*pud)). And >> if the crashkernel use block/section mapping, there will be some error. >> >> Namely, it's need to use non block/section mapping for crashkernel mem >> before shringking. > > Well, yes, but we can change arch_kexec_[un]protect_crashkres() not to do > that if we're leaving the thing mapped, no? > > Will I think we should use arch_kexec_[un]protect_crashkres for crashkernel mem. Because when invalid crashkernel mem pagetable, there is no chance to rd/wr the crashkernel mem by mistake. If we don't use arch_kexec_[un]protect_crashkres to invalid crashkernel mem pagetable, there maybe some write operations to these mem by mistake which may cause crashkernel boot error and vmcore saving error. Can we change the arch_kexec_[un]protect_crashkres to support block/section mapping?(But we also need to remap when shrinking) Thanks.