Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp4162357imw; Tue, 19 Jul 2022 00:35:39 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sZzGaar5ZlbarUAArcxxhxL8rIoRhmMThgOwGjGpcWISEeZe9MPpsinfxqzj/vMQkXiPyq X-Received: by 2002:a63:ce4a:0:b0:419:c8f4:d2a3 with SMTP id r10-20020a63ce4a000000b00419c8f4d2a3mr21515573pgi.515.1658216138665; Tue, 19 Jul 2022 00:35:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658216138; cv=none; d=google.com; s=arc-20160816; b=WrBEnlZJno7m2uWdLeZoSArN0M59KWNi8w3utO1PzqaToO2/D5bjo4FawcP7L8PyNk g/k1RjG7DjCEhuCxdI1xIFQlRKobX8wWRornhAAhE7a7lER9jkWhXD+07T4AST18X00p zj2qyuPDeOvQqJ9tJ/CGCAOc5MBQ026fWwmDNjPUT9S3H/Rl7v86w1TtrTNzr2TDCIxr EAC6OLEz54uB6r5uFhsAcUgGDDmIlmFvt4VsROGT33D3wkcF1wnVgN53mSJfoaoxYlJc Xn9cWWC4bbQ9RvcGJOXl7EZQFmXAGD4V3lXELmfjsPZ+dLpywK3C5MQMDptmQAZSmZXz +W7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=Mfzex07FFd38e51uzVHolcsK8k3nV2wRGUZJqe3f6qA=; b=lZZjs4oxADSMvy1udbSjRay3eoRA5wxm1ssko6/68Cn716Pq67BDujH6sU9Fn15HrL h1xLZsAPogEAbEizS1/QeUx8QYNXGheBO7IyrPkmkMXZ9/KOBwPZP854G3xF7XCETXpE QQffQP0mEtSkc3Zfyn/nEjSkdoZnypdWGvLTaDOlp2PqSbs7Zhb3J2E9oJ1lGE7wu8x2 sjqpD5F3GmPi3kSVaeHzxywdWekPqS/AQmOunz8ikfTXEpECde14Xk9X81CsG4zVRi1P oYJV7kxLmchy9eO8sCXYe7wrwXUUABGS1Pxm3SOPCoIB/1Cx/t39Y1JUXhSuaRfJIQWo pwEw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=ETvzc37z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r83-20020a632b56000000b0041a03c40631si32060pgr.149.2022.07.19.00.35.24; Tue, 19 Jul 2022 00:35:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=ETvzc37z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233997AbiGSHBV (ORCPT + 99 others); Tue, 19 Jul 2022 03:01:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35650 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233852AbiGSHBT (ORCPT ); Tue, 19 Jul 2022 03:01:19 -0400 Received: from mail-ej1-x631.google.com (mail-ej1-x631.google.com [IPv6:2a00:1450:4864:20::631]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 95F4912D12 for ; Tue, 19 Jul 2022 00:01:16 -0700 (PDT) Received: by mail-ej1-x631.google.com with SMTP id sz17so25385403ejc.9 for ; Tue, 19 Jul 2022 00:01:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Mfzex07FFd38e51uzVHolcsK8k3nV2wRGUZJqe3f6qA=; b=ETvzc37zWuZjJGQMYrMeVUTDIWMi58PlRsqJP6hJPLj3ZNkEAuLhUmCQJn2I6FPm0G m26v3t7gI3apjnLNGwFnFQwGKK2N/uXz1YDLt2pfHRMvXpXg+aJ/5LmTLTi3Xz4Zhc4R ZMBaAkZlvkt5Fo4lNSLVgFqW7A4D7DebTme1Cx6F4VNY2S+1jPDu1+bfSaj+eczVhESc r5uLDMXTKz85748c8fCwst9nR7YZWIkR/9Bw9UpJf8kDY/jH0Hw2EytfRjQdjdWFsft8 aNlyHcSG2OzNSHRbB0UgaGErRL+pJ19aP5TzwzGIUBHCGBAMPBfozOBG0HpEQFJMOXFF S5Og== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Mfzex07FFd38e51uzVHolcsK8k3nV2wRGUZJqe3f6qA=; b=tmQ9LkuEZWhcLQk0oJ6uijhpbQcTxOCZDZOeex1TGv6KpGY6H9ezCi+4Z4KVkn6uTs 7oF5rZHn+QXT9/Wa/Ec/UEioqiyhoQdOOChtTz3nNWsT7mmIq5aBabpY06hSEibokm4d abZvs16dKx/ss6K0Kxb+kYzCgdHSJ3XkATY4OvT4hCbOEw4tle4Jr790+MC2gsGdCuFY ANgHqXEB+iNoJfOWMJ6T/xFmKgtZ4G0Sufd+udJa+e+f+zyw/zD0aXKG4SiSdgri8M/6 ux5a4/w3jWLXxKmD7AhkhAG+xjQtBBc7H3mtEiGPRR6rMGJ1x0pI0gOJuEV+CwnuTUDW ajVA== X-Gm-Message-State: AJIora/BaSmMRS7v+qxHWCMnCl5bB0uZKDqPwGGM8PH+RRpQPGGpizKo KPQimA9rlsLQ9NnYsIR3rzwkV2NC2d/FZjnysLU= X-Received: by 2002:a17:906:8a45:b0:72b:31d4:d537 with SMTP id gx5-20020a1709068a4500b0072b31d4d537mr30001953ejc.170.1658214075093; Tue, 19 Jul 2022 00:01:15 -0700 (PDT) MIME-Version: 1.0 References: <20220718090050.2261-1-21cnbao@gmail.com> <87mtd62apo.fsf@yhuang6-desk2.ccr.corp.intel.com> <87zgh5232o.fsf@yhuang6-desk2.ccr.corp.intel.com> <416a06f6-ca7d-d4a9-2cda-af0ad6e28261@arm.com> <87o7xl1wpb.fsf@yhuang6-desk2.ccr.corp.intel.com> <87k0891uj6.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87k0891uj6.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Barry Song <21cnbao@gmail.com> Date: Tue, 19 Jul 2022 19:01:03 +1200 Message-ID: Subject: Re: [RESEND PATCH v3] arm64: enable THP_SWAP for arm64 To: "Huang, Ying" Cc: Anshuman Khandual , Andrew Morton , Catalin Marinas , LAK , Linux-MM , Steven Price , Will Deacon , Andrea Arcangeli , =?UTF-8?B?6YOt5YGl?= , hanchuanhua , Johannes Weiner , Hugh Dickins , LKML , Minchan Kim , Yang Shi , Barry Song , =?UTF-8?B?5byg6K+X5piOKFNpbW9uIFpoYW5nKQ==?= Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 19, 2022 at 6:33 PM Huang, Ying wrote: > > Barry Song <21cnbao@gmail.com> writes: > > > On Tue, Jul 19, 2022 at 5:47 PM Huang, Ying wrote: > >> > >> Barry Song <21cnbao@gmail.com> writes: > >> > >> > On Tue, Jul 19, 2022 at 3:59 PM Barry Song <21cnbao@gmail.com> wrote: > >> >> > >> >> On Tue, Jul 19, 2022 at 3:35 PM Anshuman Khandual > >> >> wrote: > >> >> > > >> >> > > >> >> > > >> >> > On 7/19/22 08:58, Huang, Ying wrote: > >> >> > > Anshuman Khandual writes: > >> >> > > > >> >> > >> On 7/19/22 06:53, Barry Song wrote: > >> >> > >>> On Tue, Jul 19, 2022 at 12:44 PM Huang, Ying wrote: > >> >> > >>>> > >> >> > >>>> Barry Song <21cnbao@gmail.com> writes: > >> >> > >>>> > >> >> > >>>>> From: Barry Song > >> >> > >>>>> > >> >> > >>>>> THP_SWAP has been proven to improve the swap throughput significantly > >> >> > >>>>> on x86_64 according to commit bd4c82c22c367e ("mm, THP, swap: delay > >> >> > >>>>> splitting THP after swapped out"). > >> >> > >>>>> As long as arm64 uses 4K page size, it is quite similar with x86_64 > >> >> > >>>>> by having 2MB PMD THP. THP_SWAP is architecture-independent, thus, > >> >> > >>>>> enabling it on arm64 will benefit arm64 as well. > >> >> > >>>>> A corner case is that MTE has an assumption that only base pages > >> >> > >>>>> can be swapped. We won't enable THP_SWAP for ARM64 hardware with > >> >> > >>>>> MTE support until MTE is reworked to coexist with THP_SWAP. > >> >> > >>>>> > >> >> > >>>>> A micro-benchmark is written to measure thp swapout throughput as > >> >> > >>>>> below, > >> >> > >>>>> > >> >> > >>>>> unsigned long long tv_to_ms(struct timeval tv) > >> >> > >>>>> { > >> >> > >>>>> return tv.tv_sec * 1000 + tv.tv_usec / 1000; > >> >> > >>>>> } > >> >> > >>>>> > >> >> > >>>>> main() > >> >> > >>>>> { > >> >> > >>>>> struct timeval tv_b, tv_e;; > >> >> > >>>>> #define SIZE 400*1024*1024 > >> >> > >>>>> volatile void *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, > >> >> > >>>>> MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); > >> >> > >>>>> if (!p) { > >> >> > >>>>> perror("fail to get memory"); > >> >> > >>>>> exit(-1); > >> >> > >>>>> } > >> >> > >>>>> > >> >> > >>>>> madvise(p, SIZE, MADV_HUGEPAGE); > >> >> > >>>>> memset(p, 0x11, SIZE); /* write to get mem */ > >> >> > >>>>> > >> >> > >>>>> gettimeofday(&tv_b, NULL); > >> >> > >>>>> madvise(p, SIZE, MADV_PAGEOUT); > >> >> > >>>>> gettimeofday(&tv_e, NULL); > >> >> > >>>>> > >> >> > >>>>> printf("swp out bandwidth: %ld bytes/ms\n", > >> >> > >>>>> SIZE/(tv_to_ms(tv_e) - tv_to_ms(tv_b))); > >> >> > >>>>> } > >> >> > >>>>> > >> >> > >>>>> Testing is done on rk3568 64bit quad core processor Quad Core > >> >> > >>>>> Cortex-A55 platform - ROCK 3A. > >> >> > >>>>> thp swp throughput w/o patch: 2734bytes/ms (mean of 10 tests) > >> >> > >>>>> thp swp throughput w/ patch: 3331bytes/ms (mean of 10 tests) > >> >> > >>>>> > >> >> > >>>>> Cc: "Huang, Ying" > >> >> > >>>>> Cc: Minchan Kim > >> >> > >>>>> Cc: Johannes Weiner > >> >> > >>>>> Cc: Hugh Dickins > >> >> > >>>>> Cc: Andrea Arcangeli > >> >> > >>>>> Cc: Anshuman Khandual > >> >> > >>>>> Cc: Steven Price > >> >> > >>>>> Cc: Yang Shi > >> >> > >>>>> Signed-off-by: Barry Song > >> >> > >>>>> --- > >> >> > >>>>> -v3: > >> >> > >>>>> * refine the commit log; > >> >> > >>>>> * add a benchmark result; > >> >> > >>>>> * refine the macro of arch_thp_swp_supported > >> >> > >>>>> Thanks to the comments of Anshuman, Andrew, Steven > >> >> > >>>>> > >> >> > >>>>> arch/arm64/Kconfig | 1 + > >> >> > >>>>> arch/arm64/include/asm/pgtable.h | 6 ++++++ > >> >> > >>>>> include/linux/huge_mm.h | 12 ++++++++++++ > >> >> > >>>>> mm/swap_slots.c | 2 +- > >> >> > >>>>> 4 files changed, 20 insertions(+), 1 deletion(-) > >> >> > >>>>> > >> >> > >>>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > >> >> > >>>>> index 1652a9800ebe..e1c540e80eec 100644 > >> >> > >>>>> --- a/arch/arm64/Kconfig > >> >> > >>>>> +++ b/arch/arm64/Kconfig > >> >> > >>>>> @@ -101,6 +101,7 @@ config ARM64 > >> >> > >>>>> select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP > >> >> > >>>>> select ARCH_WANT_LD_ORPHAN_WARN > >> >> > >>>>> select ARCH_WANTS_NO_INSTR > >> >> > >>>>> + select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES > >> >> > >>>>> select ARCH_HAS_UBSAN_SANITIZE_ALL > >> >> > >>>>> select ARM_AMBA > >> >> > >>>>> select ARM_ARCH_TIMER > >> >> > >>>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > >> >> > >>>>> index 0b6632f18364..78d6f6014bfb 100644 > >> >> > >>>>> --- a/arch/arm64/include/asm/pgtable.h > >> >> > >>>>> +++ b/arch/arm64/include/asm/pgtable.h > >> >> > >>>>> @@ -45,6 +45,12 @@ > >> >> > >>>>> __flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1) > >> >> > >>>>> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ > >> >> > >>>>> > >> >> > >>>>> +static inline bool arch_thp_swp_supported(void) > >> >> > >>>>> +{ > >> >> > >>>>> + return !system_supports_mte(); > >> >> > >>>>> +} > >> >> > >>>>> +#define arch_thp_swp_supported arch_thp_swp_supported > >> >> > >>>>> + > >> >> > >>>>> /* > >> >> > >>>>> * Outside of a few very special situations (e.g. hibernation), we always > >> >> > >>>>> * use broadcast TLB invalidation instructions, therefore a spurious page > >> >> > >>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > >> >> > >>>>> index de29821231c9..4ddaf6ad73ef 100644 > >> >> > >>>>> --- a/include/linux/huge_mm.h > >> >> > >>>>> +++ b/include/linux/huge_mm.h > >> >> > >>>>> @@ -461,4 +461,16 @@ static inline int split_folio_to_list(struct folio *folio, > >> >> > >>>>> return split_huge_page_to_list(&folio->page, list); > >> >> > >>>>> } > >> >> > >>>>> > >> >> > >>>>> +/* > >> >> > >>>>> + * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to > >> >> > >>>>> + * limitations in the implementation like arm64 MTE can override this to > >> >> > >>>>> + * false > >> >> > >>>>> + */ > >> >> > >>>>> +#ifndef arch_thp_swp_supported > >> >> > >>>>> +static inline bool arch_thp_swp_supported(void) > >> >> > >>>>> +{ > >> >> > >>>>> + return true; > >> >> > >>>>> +} > >> >> > >>>> > >> >> > >>>> How about the following? > >> >> > >>>> > >> >> > >>>> static inline bool arch_wants_thp_swap(void) > >> >> > >>>> { > >> >> > >>>> return IS_ENABLED(ARCH_WANTS_THP_SWAP); > >> >> > >>>> } > >> >> > >>> > >> >> > >>> This looks good. then i'll need to change arm64 to > >> >> > >>> > >> >> > >>> +static inline bool arch_thp_swp_supported(void) > >> >> > >>> +{ > >> >> > >>> + return IS_ENABLED(ARCH_WANTS_THP_SWAP) && !system_supports_mte(); > >> >> > >>> +} > >> >> > >> > >> >> > >> Why ? CONFIG_THP_SWAP depends on ARCH_WANTS_THP_SWAP. In folio_alloc_swap(), > >> >> > >> IS_ENABLED(CONFIG_THP_SWAP) enabled, will also imply ARCH_WANTS_THP_SWAP too > >> >> > >> is enabled. Hence checking for ARCH_WANTS_THP_SWAP again does not make sense > >> >> > >> either in the generic fallback stub, or in arm64 platform override. Because > >> >> > >> without ARCH_WANTS_THP_SWAP enabled, arch_thp_swp_supported() should never > >> >> > >> be called in the first place. > >> >> > > > >> >> > > For the only caller now, the checking looks redundant. But the original > >> >> > > proposed implementation as follows, > >> >> > > > >> >> > > static inline bool arch_thp_swp_supported(void) > >> >> > > { > >> >> > > return true; > >> >> > > } > >> >> > > > >> >> > > will return true even on architectures that don't support/want THP swap. > >> >> > > >> >> > But the function will never be called on for those platforms. > >> >> > > >> >> > > That will confuse people too. > >> >> > > >> >> > I dont see how. > >> >> > > >> >> > > > >> >> > > And the "redundant" checking has no run time overhead, because compiler > >> >> > > will do the trick. > >> >> > I understand that, but dont think this indirection is necessary. > >> >> > >> >> Hi Anshuman, Hi Ying, > >> >> Thanks for the comments of both of you. Does the below look ok? > >> >> > >> >> generic, > >> >> > >> >> static inline bool arch_wants_thp_swap(void) > >> >> { > >> >> return IS_ENABLED(CONFIG_THP_SWAP); > >> >> } > >> >> > >> > > >> > sorry, i actually meant arch_thp_swp_supported() but not > >> > arch_wants_thp_swap() in generic code, > >> > > >> > static inline bool arch_thp_swp_supported(void) > >> > { > >> > return IS_ENABLED(CONFIG_THP_SWAP); > >> > } > >> > >> IS_ENABLED(CONFIG_THP_SWAP) doesn't match the name too. It's an option > >> selected by users. arch_thp_swp_supported() is to report the > >> capability. > > > > Hi Ying, > > CONFIG_THP_SWAP implicitly includes ARCH_WANTS_THP_SWAP. So it seems > > a bit odd to have still another arch_wants_thp_swap(). > > if the name of arch_thp_swp_supported is not sensible to you, will > > thp_swp_supported() > > without arch_ make more sense? a similar example is, > > > > static inline bool gigantic_page_runtime_supported(void) > > { > > return IS_ENABLED(CONFIG_ARCH_HAS_GIGANTIC_PAGE); > > } > > Here, the capability of the architecture is reported. But > CONFIG_THP_SWAP is a user option. > > I'm OK with the function name "arch_thp_swp_supported()". I just think > that we should implement the function in a way that is consistent with > the function name as much as possible. That is, don't return true on > architectures that THP swap isn't supported in fact. My point is that having a generic thp_swp_supported() which can combine both IS_ENABLED(CONFIG_THP_SWP) && arch_thp_swp_thing(). then we can always only call this rather than checking two conditions. Although there is only one caller for this moment, we might get more later. So always calling this single function might make our life easier. we can treat static inline bool arch_thp_swp_supported(void) { return IS_ENABLED(CONFIG_THP_SWAP); } as a virtual function in base class. thp_swp is generically true if platforms are able to enable CONFIG_THP_SWAP. Derived classes like arm64 can overwrite it to false in some particular cases based on their unique characteristics. > > > Otherwise, can we just keep the code as is according to Anshuman's suggestion? > > Although I still think my way is better, I will not force you to do > that. If you don't think that is better, you can use your original > implementation. Ok, fair enough. And thanks for your reviewing :-) Thanks Barry