Received: by 2002:a05:7412:a9a2:b0:e2:908c:2ebd with SMTP id o34csp397880rdh; Thu, 26 Oct 2023 05:31:45 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHlx6+JZazSweyl0Fy6bQ8OBftgIWnccDUpHCc9g0YrSXtdtWvpEd8BbbAQw2LZdFpHTdTM X-Received: by 2002:a05:690c:dc4:b0:5a7:aa54:42b1 with SMTP id db4-20020a05690c0dc400b005a7aa5442b1mr22985225ywb.28.1698323505401; Thu, 26 Oct 2023 05:31:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698323505; cv=none; d=google.com; s=arc-20160816; b=tbVb2mzwqL2svI50nHnTeJH3PUthMzTx+hZXWk24sBg8VmawOmz6rPv38h2mcNqCwF KD90uHc5f91UCWQ0LdSWZxUjieflhuhx86Fu2asy6/qNc9bxdyYL10+NX1o0b6vBDFt4 s2J9b5AUvySCUhyOqaGRYFwUwY4nBRAg4ppUtDv52A9jNLxtLNcrCHksY1iwM2xuTMH7 xlF0El98nkRXv8aIFzk6sqxwlVyo7ket38S7X4SmUAm22J4M1OQZRxkuK8JDha/I1S1K ZDkmAwp8exfIaFpETpir35IRLkAFvR9Rs9G4QpUosD8dRTycruP5XL6nlmVNdgkhFmUF K9dA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id; bh=bjOJf3Kfp5dfJlKflU+ypn2XM618MEnRgtrwp66zp48=; fh=Q7nZLdVY7SRuRA3HqI76TmUJPKLjBr7/ZL4RDSzf5+Q=; b=yaSL8GFSnji/gdD5vRcAFHL992fuH35pq/BTT15zoCKbzdHNAWlZFcULfRHajyfsjj fEgwrSNasprAkiRXavr0lYow/YWymAdnoFSQhVmmYAZrrhZ2nBG3svc+X2pt6PBOcJBB LMSQD+AkOyCzaL+wK7KliG2v9GA3t4m+rUiedMi5ScJ4fDnHs2yMs1cP/MJyik29NcV8 60WxKLZxjlMNr1A1sge3i4RnFOWSx8jyvPZw+Whuim4oxe58qkaz4gGhye1+jlIouNHr zQ0XUfBJoXAEYMQQNQHTFeG/p/ztDff0jfKff2BhZCz2UQXdZjxuxeVWA/fIMaGwovUG WAvg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id j126-20020a0df984000000b0059f6fc04046si13024864ywf.489.2023.10.26.05.31.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Oct 2023 05:31:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 6C7228079DFE; Thu, 26 Oct 2023 05:30:31 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344897AbjJZMaS (ORCPT + 99 others); Thu, 26 Oct 2023 08:30:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59614 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231127AbjJZMaR (ORCPT ); Thu, 26 Oct 2023 08:30:17 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E4A4293 for ; Thu, 26 Oct 2023 05:30:13 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3C0D52F4; Thu, 26 Oct 2023 05:30:55 -0700 (PDT) Received: from [10.1.196.40] (e121345-lin.cambridge.arm.com [10.1.196.40]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5C4EA3F738; Thu, 26 Oct 2023 05:30:12 -0700 (PDT) Message-ID: <1457b22f-8327-4bd0-8d18-1cc7ec31a3e4@arm.com> Date: Thu, 26 Oct 2023 13:30:00 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] arm64: mm: drop tlb flush operation when clearing the access bit To: Anshuman Khandual , Barry Song <21cnbao@gmail.com> Cc: Baolin Wang , catalin.marinas@arm.com, will@kernel.org, akpm@linux-foundation.org, v-songbaohua@oppo.com, yuzhao@google.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org References: <2f55f62b-cae2-4eee-8572-1b662a170880@arm.com> Content-Language: en-GB From: Robin Murphy In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Thu, 26 Oct 2023 05:30:31 -0700 (PDT) On 26/10/2023 7:01 am, Anshuman Khandual wrote: > > > On 10/26/23 11:24, Barry Song wrote: >> On Thu, Oct 26, 2023 at 12:55 PM Anshuman Khandual >> wrote: >>> >>> >>> >>> On 10/24/23 18:26, Baolin Wang wrote: >>>> Now ptep_clear_flush_young() is only called by folio_referenced() to >>>> check if the folio was referenced, and now it will call a tlb flush on >>>> ARM64 architecture. However the tlb flush can be expensive on ARM64 >>>> servers, especially for the systems with a large CPU numbers. >>> >>> TLB flush would be expensive on *any* platform with large CPU numbers ? >>> >>>> >>>> Similar to the x86 architecture, below comments also apply equally to >>>> ARM64 architecture. So we can drop the tlb flush operation in >>>> ptep_clear_flush_young() on ARM64 architecture to improve the performance. >>>> " >>>> /* Clearing the accessed bit without a TLB flush >>>> * doesn't cause data corruption. [ It could cause incorrect >>>> * page aging and the (mistaken) reclaim of hot pages, but the >>>> * chance of that should be relatively low. ] >>>> * >>>> * So as a performance optimization don't flush the TLB when >>>> * clearing the accessed bit, it will eventually be flushed by >>>> * a context switch or a VM operation anyway. [ In the rare >>>> * event of it not getting flushed for a long time the delay >>>> * shouldn't really matter because there's no real memory >>>> * pressure for swapout to react to. ] >>>> */ >>> >>> If always true, this sounds generic enough for all platforms, why only >>> x86 and arm64 ? >>> >>>> " >>>> Running the thpscale to show some obvious improvements for compaction >>>> latency with this patch: >>>> base patched >>>> Amean fault-both-1 1093.19 ( 0.00%) 1084.57 * 0.79%* >>>> Amean fault-both-3 2566.22 ( 0.00%) 2228.45 * 13.16%* >>>> Amean fault-both-5 3591.22 ( 0.00%) 3146.73 * 12.38%* >>>> Amean fault-both-7 4157.26 ( 0.00%) 4113.67 * 1.05%* >>>> Amean fault-both-12 6184.79 ( 0.00%) 5218.70 * 15.62%* >>>> Amean fault-both-18 9103.70 ( 0.00%) 7739.71 * 14.98%* >>>> Amean fault-both-24 12341.73 ( 0.00%) 10684.23 * 13.43%* >>>> Amean fault-both-30 15519.00 ( 0.00%) 13695.14 * 11.75%* >>>> Amean fault-both-32 16189.15 ( 0.00%) 14365.73 * 11.26%* >>>> base patched >>>> Duration User 167.78 161.03 >>>> Duration System 1836.66 1673.01 >>>> Duration Elapsed 2074.58 2059.75 >>> >>> Could you please point to the test repo you are running ? >>> >>>> >>>> Barry Song submitted a similar patch [1] before, that replaces the >>>> ptep_clear_flush_young_notify() with ptep_clear_young_notify() in >>>> folio_referenced_one(). However, I'm not sure if removing the tlb flush >>>> operation is applicable to every architecture in kernel, so dropping >>>> the tlb flush for ARM64 seems a sensible change. >>> >>> The reasoning provided here sounds generic when true, hence there seems >>> to be no justification to keep it limited just for arm64 and x86. Also >>> what about pmdp_clear_flush_young_notify() when THP is enabled. Should >>> that also not do a TLB flush after clearing access bit ? Although arm64 >>> does not enable __HAVE_ARCH_PMDP_CLEAR_YOUNG_FLUSH, rather depends on >>> the generic pmdp_clear_flush_young() which also does a TLB flush via >>> flush_pmd_tlb_range() while clearing the access bit. >>> >>>> >>>> Note: I am okay for both approach, if someone can help to ensure that >>>> all architectures do not need the tlb flush when clearing the accessed >>>> bit, then I also think Barry's patch is better (hope Barry can resend >>>> his patch). >>> >>> This paragraph belongs after the '----' below and not part of the commit >>> message. >>> >>>> >>>> [1] https://lore.kernel.org/lkml/20220617070555.344368-1-21cnbao@gmail.com/ >>>> Signed-off-by: Baolin Wang >>>> --- >>>> arch/arm64/include/asm/pgtable.h | 31 ++++++++++++++++--------------- >>>> 1 file changed, 16 insertions(+), 15 deletions(-) >>>> >>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h >>>> index 0bd18de9fd97..2979d796ba9d 100644 >>>> --- a/arch/arm64/include/asm/pgtable.h >>>> +++ b/arch/arm64/include/asm/pgtable.h >>>> @@ -905,21 +905,22 @@ static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, >>>> static inline int ptep_clear_flush_young(struct vm_area_struct *vma, >>>> unsigned long address, pte_t *ptep) >>>> { >>>> - int young = ptep_test_and_clear_young(vma, address, ptep); >>>> - >>>> - if (young) { >>>> - /* >>>> - * We can elide the trailing DSB here since the worst that can >>>> - * happen is that a CPU continues to use the young entry in its >>>> - * TLB and we mistakenly reclaim the associated page. The >>>> - * window for such an event is bounded by the next >>>> - * context-switch, which provides a DSB to complete the TLB >>>> - * invalidation. >>>> - */ >>>> - flush_tlb_page_nosync(vma, address); >>>> - } >>>> - >>>> - return young; >>>> + /* >>>> + * This comment is borrowed from x86, but applies equally to ARM64: >>>> + * >>>> + * Clearing the accessed bit without a TLB flush doesn't cause >>>> + * data corruption. [ It could cause incorrect page aging and >>>> + * the (mistaken) reclaim of hot pages, but the chance of that >>>> + * should be relatively low. ] >>>> + * >>>> + * So as a performance optimization don't flush the TLB when >>>> + * clearing the accessed bit, it will eventually be flushed by >>>> + * a context switch or a VM operation anyway. [ In the rare >>>> + * event of it not getting flushed for a long time the delay >>>> + * shouldn't really matter because there's no real memory >>>> + * pressure for swapout to react to. ] >>>> + */ >>>> + return ptep_test_and_clear_young(vma, address, ptep); >>>> } >>>> >>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE >>> >>> There are three distinct concerns here >>> >>> 1) What are the chances of this misleading existing hot page reclaim process >>> 2) How secondary MMU such as SMMU adapt to change in mappings without a flush >>> 3) Could this break the architecture rule requiring a TLB flush after access >>> bit clear on a page table entry >> >> In terms of all of above concerns, though 2 is different, which is an >> issue between >> cpu and non-cpu, >> i feel kernel has actually dropped tlb flush at least for mglru, there >> is no flush in >> lru_gen_look_around(), >> >> static bool folio_referenced_one(struct folio *folio, >> struct vm_area_struct *vma, unsigned long address, void *arg) >> { >> ... >> >> if (pvmw.pte) { >> if (lru_gen_enabled() && >> pte_young(ptep_get(pvmw.pte))) { >> lru_gen_look_around(&pvmw); >> referenced++; >> } >> >> if (ptep_clear_flush_young_notify(vma, address, >> pvmw.pte)) >> referenced++; >> } >> >> return true; >> } >> >> and so is in walk_pte_range() of vmscan. linux has been surviving with >> all above concerns for a while, believing it or not :-) > > Although the first two concerns could be worked upon in the SW, kernel surviving > after breaking arch rules explicitly is not a correct state to be in IMHO. AFAICS it's not breaking any rules as such. Indeed the architecture requires a TLBI to ensure that the access flag update is observed by the CPU, but what's being proposed here is relaxing Linux's own expectations to say actually we don't mind if the CPU *doesn't* observe the update straight away. Thanks, Robin.