Received: by 2002:a05:7412:a9a2:b0:e2:908c:2ebd with SMTP id o34csp1118344rdh; Fri, 27 Oct 2023 05:28:07 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHyxS5oKRh7VgOhYNUVrnDE5leK+MVWv9O+WCrc3d54113jXhOwW2T79dz42qzEL8UCWyB+ X-Received: by 2002:a05:6808:2d8:b0:3ab:8431:8037 with SMTP id a24-20020a05680802d800b003ab84318037mr2220501oid.32.1698409686887; Fri, 27 Oct 2023 05:28:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698409686; cv=none; d=google.com; s=arc-20160816; b=j14xEpfTn9raM5e9JCclPcxBDx/ZC8mDTv3x9pU1GWkweOQ8B/Mt0t/Wu6TsUaden3 QX2HOThMRtroufY0FUhbNBofqfnIItOMe6bVdtNHo2jDn8yZ/d/w0/xHDus4hHYezwRo wM9EZQrhJsP/4clKTwC26TyKYNZbP2AAnoZoq7THmTniXsl9yylO79acFqYZ+ljnk7uA Nx8Fm/GXN5vOwibjCUFJYxI394Zko8ya/MBXrkQQGXEONhZc7odH4lKqIf3stGjXlDwA fkl3Us1OY23e4F3Rv0IkHtDQdmZOJ3Yv1tLrHA2rQwxoqhzgaPkUTop5VCr5GreuXmnC 0YMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=1EsaBiznAbtHhUcW9zIyI68l/OKqfYg/OQVGuCm2Ihw=; fh=dr13yEfs4Mu3Oe6v4tciZlTc0U5wcf+n21iE/xXbiag=; b=Owqq5YFuOSVGnfCM7wfxsVzAVGQ/l2V1znWOQt6Jxm9DFBdV3cBUJsS8izdKHqhN6Z hitx7ZqRo9DPR6imWVJIAmSnQ5xD1FVCQ922FttN07EKlQMcLfNOKuYXi44ilaeg6mjV uW9PFDSodVNtQZxyV0fZawlJCw78WU2GlZxqEZd77RgejXBBx4E7WXH2vHf48E3aHW1C bf5cZstuyEskMxxS4ciaTx0lbDs8Y0UQmwC9gklg6BfaDgh9lU4yqL2ogfdceAcoCUmT u4tNUK85zc6gSe1stPW2DWfAfjM9kxPuecZMNWYMIpeDDbROtMSMYaqDNU96hiIESWli CgpA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id y4-20020a258604000000b00d9ac304bb76si2366804ybk.449.2023.10.27.05.28.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Oct 2023 05:28:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id F242A83383DF; Fri, 27 Oct 2023 05:28:02 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345847AbjJ0M1h (ORCPT + 99 others); Fri, 27 Oct 2023 08:27:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44126 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345859AbjJ0M1f (ORCPT ); Fri, 27 Oct 2023 08:27:35 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id C3CE81B1 for ; Fri, 27 Oct 2023 05:27:31 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AFD751424; Fri, 27 Oct 2023 05:28:12 -0700 (PDT) Received: from [10.57.70.251] (unknown [10.57.70.251]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 89BBC3F738; Fri, 27 Oct 2023 05:27:28 -0700 (PDT) Message-ID: <644b1519-b44f-4128-8e5e-52ee5e02b404@arm.com> Date: Fri, 27 Oct 2023 13:27:27 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v6 0/9] variable-order, large folios for anonymous memory Content-Language: en-GB To: David Hildenbrand , Yu Zhao Cc: Andrew Morton , Matthew Wilcox , Yin Fengwei , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , John Hubbard , David Rientjes , Vlastimil Babka , Hugh Dickins , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org References: <20230929114421.3761121-1-ryan.roberts@arm.com> <6d89fdc9-ef55-d44e-bf12-fafff318aef8@redhat.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Fri, 27 Oct 2023 05:28:03 -0700 (PDT) On 26/10/2023 16:19, David Hildenbrand wrote: > [...] > >>>> Hi, >>>> >>>> I wanted to remind people in the THP cabal meeting, but that either >>>> didn't happen or zoomed decided to not let me join :) >> >> I didn't make it yesterday either - was having to juggle child care. > > I think it didn't happen, or started quite late (>20 min). > >> >>>> >>>>> >>>>> It's been a week since the mm alignment meeting discussion we had around >>>>> prerequisites and the ABI. I haven't heard any further feedback on the ABI >>>>> proposal, so I'm going to be optimistic and assume that nobody has found any >>>>> fatal flaws in it :). >>>> >>>> After saying in the call probably 10 times that people should comment >>>> here if there are reasonable alternatives worth discussing, call me >>>> "optimistic" as well; but, it's only been a week and people might still >>>> be thinking about this/ >>>> >>>> There were two things discussed in the call: >>>> >>>> * Yu brought up "lists" so we can have priorities. As briefly discussed >>>>     in the  call, this (a) might not be needed right now in an initial >>>>     version;  (b) the kernel might be able to handle that (or many cases) >>>>     automatically, TBD. Adding lists now would kind-of set the semantics >>>>     of that interface in stone. As you describe below, the approach >>>>     discussed here could easily be extended to cover priorities, if need >>>>     be. >>> >>> I want to expand on this: the argument that "if you could allocate a >>> higher order you should use it" is too simplistic. There are many >>> reasons in addition to the one above that we want to "fall back" to >>> higher orders, e.g., those higher orders are not on PCP or from the >>> local node. When we consider the sequence of orders to try, user >>> preference is just one of the parameters to the cost function. The >>> bottom line is that I think we should all agree that there needs to be >>> a cost function down the road, whatever it looks like. Otherwise I >>> don't know how we can make "auto" happen. > > I agree that there needs to be a cost function, and as pagecache showed that's > independent of initial enablement. > >> >> I don't dispute that this sounds like it could be beneficial, but I see it as >> research to happen further down the road (as you say), and we don't know what >> that research might conclude. Also, I think the scope of this is bigger than >> anonymous memory - you would also likely want to look at the policy for page >> cache folio order too, since today that's based solely on readahead. So I see it >> as an optimization that is somewhat orthogonal to small-sized THP. > > Exactly my thoughts. > > The important thing is that we should plan ahead that we still have the option > to let the admin configure if we cannot make this work automatically in the kernel. > > What we'll need, nobody knows. Maybe it's a per-size priority, maybe it's a > single global toggle. > >> >> The proposed interface does not imply any preference order - it only states >> which sizes the user wants the kernel to select from, so I think there is lots >> of freedom to change this down the track if the kernel wants to start using the >> buddy allocator's state as a signal to make its decisions. > > Yes. > > [..] > >>>> Jup, same opinion here. But again, I'm very happy to hear other >>>> alternatives and why they are better. >>> >>> I'm not against David's proposal but I want to hear a lot more about >>> "lots of flexibility for growth" before I'm fully convinced. >> >> My point was that in an abstract sense, there are properties a user may wish to >> apply individually to a size, which is catered for by having a per-size >> directory into which we can add more files if/when requirements for new per-size >> properties arise. There are also properties that may be applied globally, for >> which we have the top-level transparent_hugepage directory where properties can >> be extended or added. > > Exactly, well said. > >> >> For your case around tighter integration with the buddy allocator, I could >> imagine a per-size file allowing the user to specify if the kernel should allow >> splitting a higher order to make a THP of that size (I'm not suggesting that's a >> good idea, I'm just pointing out that this sort of thing is possible with the >> interface). And we have discussed how the global enabled prpoerty could be >> extended to support "auto" [1]. >> >> But perhaps what we really need are lots more ideas for future directions for >> small-sized THP to allow us to evaluate this interface more widely. > > David R. motivated a future size-aware setting of the defrag option. As > discussed we might want something similar to shmem_enable. What will happen with > khugepaged, nobody knows yet :) > > I could imagine exposing per-size boolean read-only properties like > "native-hw-size" (PMD, cont-pte). But these things require much more thought. FWIW, the reason I opted for the "recommend" special case in the v5 posting was because that felt like an easy thing to also add to the command line in future. Having a separate file, native-hw-size, that the user has to read then enable through another file is not very command-line friendly, if you want the hw-preferred size(s) enabled from boot. Maybe the wider observation is "how does the proposed interface translate to the kernel command line if needed in future?".