Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp27695022rwd; Tue, 4 Jul 2023 06:39:34 -0700 (PDT) X-Google-Smtp-Source: APBJJlGw1ywGoQ/c7JRXCUEYZ68/zQ3jo0ZyE3AsopsZAOeDHibpZ8iDB6WaEFAZIewrjpXFLRNi X-Received: by 2002:a17:902:f684:b0:1b8:9015:bd65 with SMTP id l4-20020a170902f68400b001b89015bd65mr9894027plg.48.1688477974188; Tue, 04 Jul 2023 06:39:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688477974; cv=none; d=google.com; s=arc-20160816; b=Ws8dO3oeibyTCyex0yRQJYymggPmCo+DoAXaHG5alohyMmIWGsDE49DBZuCTD1TDH/ anLCsBwdy7ywY5aLmMEQ9IyZgv6sc9zKrdeAwu8dRBi0L2Gm43PSR9d2i+GYUbClAXoN +/hkciAQ6LzO3iWJMagjYZez1DF9pP04M47QdtYIDAEMkN2zVv2puTz8MBQTNuPNkf5l Jl+5Nnyf0L6xh4mPoGcEeRFyVF69rZdVb1D4qLoMFu/ESu17Cfx3gwrLZLA7MDzGMiHe 9DVjlCNk7rVX10H+kH+Lf4F28vQvtgkqnR2du6QZFFKQGm3aqU7qEV6QSkbZ0T1vcise yS2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:subject:user-agent:mime-version:date:message-id; bh=07Ti6fb0PO40BGX6BBYnL10Ixp5NDMht4L2KXyXiKfk=; fh=WfeLqpNf0pM2FKG36GMZnrLJD9QIoBChKruSuJw8AOI=; b=XWpF71+m6ktCAUEyUjOsGJJfKC75M/3N4WPoCUa18wqidz8C3Zp7ePQBotnZ1tL1ja bBu+llZy6tC5jBxE82Zum+Noc92ngrPII+vuKkL8q1CjFZn3yTr9Emjr4JcPwSXAnvUh uvebYTwUvyyMWGOL9bcUoqpen2IbDA8OSUE8Xm3iBmUNcKVlYLxXcGrYUUMlA3Zw1a1W Q8hWNU+9gni/R+OJ+2bj6ZU+wVFBoEXttqBJMbAhQHFQyGRI1HvSpHECiXl2e0ZAVZG9 u7IRlZ3uxXNWJnDRxYFVFM3Zfmsr6vYVrea3WUiZj1LROe46bW5FoezdffG/1nB+IMmM GFqA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q8-20020a170902dac800b001b892b6cec9si5308562plx.150.2023.07.04.06.39.18; Tue, 04 Jul 2023 06:39:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229441AbjGDNUs (ORCPT + 99 others); Tue, 4 Jul 2023 09:20:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41242 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231217AbjGDNUQ (ORCPT ); Tue, 4 Jul 2023 09:20:16 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 27B8C10C1 for ; Tue, 4 Jul 2023 06:20:04 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 53C2FC14; Tue, 4 Jul 2023 06:20:46 -0700 (PDT) Received: from [10.1.35.40] (C02Z41KALVDN.cambridge.arm.com [10.1.35.40]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 49D233F663; Tue, 4 Jul 2023 06:20:02 -0700 (PDT) Message-ID: Date: Tue, 4 Jul 2023 14:20:01 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [PATCH v2 3/5] mm: Default implementation of arch_wants_pte_order() To: Yu Zhao Cc: Andrew Morton , Matthew Wilcox , "Kirill A. Shutemov" , Yin Fengwei , David Hildenbrand , Catalin Marinas , Will Deacon , Anshuman Khandual , Yang Shi , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20230703135330.1865927-1-ryan.roberts@arm.com> <20230703135330.1865927-4-ryan.roberts@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/07/2023 20:50, Yu Zhao wrote: > On Mon, Jul 3, 2023 at 7:53 AM Ryan Roberts wrote: >> >> arch_wants_pte_order() can be overridden by the arch to return the >> preferred folio order for pte-mapped memory. This is useful as some >> architectures (e.g. arm64) can coalesce TLB entries when the physical >> memory is suitably contiguous. >> >> The first user for this hint will be FLEXIBLE_THP, which aims to >> allocate large folios for anonymous memory to reduce page faults and >> other per-page operation costs. >> >> Here we add the default implementation of the function, used when the >> architecture does not define it, which returns the order corresponding >> to 64K. > > I don't really mind a non-zero default value. But people would ask why > non-zero and why 64KB. Probably you could argue this is the large size > all known archs support if they have TLB coalescing. For x86, AMD CPUs > would want to override this. I'll leave it to Fengwei to decide > whether Intel wants a different default value.> > Also I don't like the vma parameter because it makes > arch_wants_pte_order() a mix of hw preference and vma policy. From my > POV, the function should be only about the former; the latter should > be decided by arch-independent MM code. However, I can live with it if > ARM MM people think this is really what you want. ATM, I'm skeptical > they do. Here's the big picture for what I'm tryng to achieve: - In the common case, I'd like all programs to get a performance bump by automatically and transparently using large anon folios - so no explicit requirement on the process to opt-in. - On arm64, in the above case, I'd like the preferred folio size to be 64K; from the (admittedly limitted) testing I've done that's about where the performance knee is and it doesn't appear to increase the memory wastage very much. It also has the benefits that for 4K base pages this is the contpte size (order-4) so I can take full benefit of contpte mappings transparently to the process. And for 16K this is the HPA size (order-2). - On arm64 when the process has marked the VMA for THP (or when transparent_hugepage=always) but the VMA does not meet the requirements for a PMD-sized mapping (or we failed to allocate, ...) then I'd like to map using contpte. For 4K base pages this is 64K (order-4), for 16K this is 2M (order-7) and for 64K this is 2M (order-5). The 64K base page case is very important since the PMD size for that base page is 512MB which is almost impossible to allocate in practice. So one approach would be to define arch_wants_pte_order() as always returning the contpte size (remove the vma parameter). Then max_anon_folio_order() in memory.c could so this: #define MAX_ANON_FOLIO_ORDER_NOTHP ilog2(SZ_64K >> PAGE_SHIFT); static inline int max_anon_folio_order(struct vm_area_struct *vma) { int order = arch_wants_pte_order(); // Fix up default case which returns 0 because PAGE_ALLOC_COSTLY_ORDER // can't be used directly in pgtable.h order = order ? order : PAGE_ALLOC_COSTLY_ORDER; if (hugepage_vma_check(vma, vma->vm_flags, false, true, true)) return order; else return min(order, MAX_ANON_FOLIO_ORDER_NOTHP); } This moves the SW policy into memory.c and gives you PAGE_ALLOC_COSTLY_ORDER (or whatever default we decide on) as the default for arches with no override, and also meets all my goals above. > >> Signed-off-by: Ryan Roberts > > After another CPU vendor, e.g., Fengwei, and an ARM MM person, e.g., > Will give the green light: > Reviewed-by: Yu Zhao > >> --- >> include/linux/pgtable.h | 13 +++++++++++++ >> 1 file changed, 13 insertions(+) >> >> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h >> index a661a17173fa..f7e38598f20b 100644 >> --- a/include/linux/pgtable.h >> +++ b/include/linux/pgtable.h >> @@ -13,6 +13,7 @@ >> #include >> #include >> #include >> +#include >> >> #if 5 - defined(__PAGETABLE_P4D_FOLDED) - defined(__PAGETABLE_PUD_FOLDED) - \ >> defined(__PAGETABLE_PMD_FOLDED) != CONFIG_PGTABLE_LEVELS >> @@ -336,6 +337,18 @@ static inline bool arch_has_hw_pte_young(void) >> } >> #endif >> >> +#ifndef arch_wants_pte_order >> +/* >> + * Returns preferred folio order for pte-mapped memory. Must be in range [0, >> + * PMD_SHIFT-PAGE_SHIFT) and must not be order-1 since THP requires large folios > > The warning is helpful. > >> + * to be at least order-2. >> + */ >> +static inline int arch_wants_pte_order(struct vm_area_struct *vma) >> +{ >> + return ilog2(SZ_64K >> PAGE_SHIFT); >> +} >> +#endif >> + >> #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR >> static inline pte_t ptep_get_and_clear(struct mm_struct *mm, >> unsigned long address,