Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp1702940rdb; Thu, 7 Dec 2023 06:45:58 -0800 (PST) X-Google-Smtp-Source: AGHT+IF4xa/rJ8NnuIP3ypydEfdxcHuCsv/FzkR1rJjwbYzod3rjxm6tdcE7/4M6Ld8h3Px+WKzy X-Received: by 2002:a05:6a20:b786:b0:18f:1274:efa7 with SMTP id fh6-20020a056a20b78600b0018f1274efa7mr1715652pzb.113.1701960358212; Thu, 07 Dec 2023 06:45:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701960358; cv=none; d=google.com; s=arc-20160816; b=TUWS3v/nwk7Ia2xl/xo5ZSMZjPAJYSw5la4JuWBittRqn93gRH4YZuBo6IvW88Wb1f u1RIbqk4ezvw8dhNf8Gs0Soo7As12SP5lUHvtVZdZHZbY0GTSuGOWj2DB1tOFS8gYV7y zd2xk5wbp4y383+IdRSZvpJeHZvW9aQgfA95TcNFA8x+/UW/ZGzV8x9a+SeORo2JjMEn nNahshAbJ6DEMM9c+hFGl3pttGPQob2TlQeciZpdbnQh4+/8/VBRoevOEwguEZo2LvG+ RCfg9n1m4IW+0sdgimSxbe3WjTNs9x1jdcMfWEX2d2FTNZUNARUkJd1IIlfjDmmpbCqr VhJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=uWOQJcrASnePGzg4t2l9x+WESrxzkWP/CBSeufNvIPo=; fh=Hhg/Irzq7NxjNc8I6dFyEpRureAOclKOpRByptRv8N8=; b=OaweH+SiKMMXdtpEKCUj7bhhbBG2dUKG2gH/RWOAcpkCJpk7GfexoHssYFsLtbRw29 uqaoR+CFytMQ2an2k3/e1hlxyeBzkI5r1rqRTdG9FYi1FIWOBNj5h97PwhFEPtef7dtD DZ7G35uTFY9bPL9jM/G60bIRDEtX/woePwRJMnITCyMStV2KEV7mOdSOiO7im+KG3JYK Y7Dla0ZjGX6GoSCU1/TBQa2RvdeZeaese9+lVb4gVFsCr50poDniSxLv0xZHYXhNuHPu ZDb2PlXWG08biPVGVve2C9mgF1s7zrk86O5DkA0UbfUGHCz6v5KdmktmfqTlDwzTDx/x ACSQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id s13-20020a056a0008cd00b006ce546c463dsi1316564pfu.275.2023.12.07.06.45.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Dec 2023 06:45:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id AF3DE80BB3E8; Thu, 7 Dec 2023 06:45:55 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1443362AbjLGOph (ORCPT + 99 others); Thu, 7 Dec 2023 09:45:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37016 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1443322AbjLGOpg (ORCPT ); Thu, 7 Dec 2023 09:45:36 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 03273A3 for ; Thu, 7 Dec 2023 06:45:43 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9CCE92F4; Thu, 7 Dec 2023 06:46:28 -0800 (PST) Received: from [10.1.32.134] (XHFQ2J9959.cambridge.arm.com [10.1.32.134]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A89C33F6C4; Thu, 7 Dec 2023 06:45:39 -0800 (PST) Message-ID: <3d49bcbf-1f9b-48e8-a91a-ede0762b795c@arm.com> Date: Thu, 7 Dec 2023 14:45:38 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v8 04/10] mm: thp: Support allocation of anonymous multi-size THP Content-Language: en-GB To: David Hildenbrand , Andrew Morton , Matthew Wilcox , Yin Fengwei , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , John Hubbard , David Rientjes , Vlastimil Babka , Hugh Dickins , Kefeng Wang , Barry Song <21cnbao@gmail.com>, Alistair Popple Cc: linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org References: <20231204102027.57185-1-ryan.roberts@arm.com> <20231204102027.57185-5-ryan.roberts@arm.com> <71040a8c-4ea1-4f21-8ac8-65f7c25c217e@redhat.com> <126c3b71-1acc-4851-9986-4228cb8a8660@arm.com> <94806b4f-2370-4999-9586-2c936955cb87@redhat.com> From: Ryan Roberts In-Reply-To: <94806b4f-2370-4999-9586-2c936955cb87@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Thu, 07 Dec 2023 06:45:55 -0800 (PST) On 07/12/2023 13:28, David Hildenbrand wrote: >>> >>> Right, but you know from the first loop which order is applicable (and will be >>> fed to the second loop) and could just pte_unmap(pte) + tryalloc. If that fails, >>> remap and try with the next orders. >> >> You mean something like this? >> >>     pte = pte_offset_map(vmf->pmd, vmf->address & PMD_MASK); >>     if (!pte) >>         return ERR_PTR(-EAGAIN); >> >>     order = highest_order(orders); >>     while (orders) { >>         addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); >>         if (!pte_range_none(pte + pte_index(addr), 1 << order)) { >>             order = next_order(&orders, order); >>             continue; >>         } >> >>         pte_unmap(pte); >>         >>         folio = vma_alloc_folio(gfp, order, vma, addr, true); >>         if (folio) { >>             clear_huge_page(&folio->page, vmf->address, 1 << order); >>             return folio; >>         } >> >>         pte = pte_offset_map(vmf->pmd, vmf->address & PMD_MASK); >>         if (!pte) >>             return ERR_PTR(-EAGAIN); >> >>         order = next_order(&orders, order); >>     } >> >>     pte_unmap(pte); >> >> I don't really like that because if high order folio allocations fail, then you >> are calling pte_range_none() again for the next lower order; once that check has >> succeeded for an order it shouldn't be required for any lower orders. In this >> case you also have lots of pte map/unmap. > > I see what you mean. > >> >> The original version feels more efficient to me. > Yes it is. Adding in some comments might help, like > > /* >  * Find the largest order where the aligned range is completely prot_none(). Note >  * that all remaining orders will be completely prot_none(). >  */ > ... > > /* Try allocating the largest of the remaining orders. */ OK added. > >> >>> >>> That would make the code certainly easier to understand. That "orders" magic of >>> constructing, filtering, walking is confusing :) >>> >>> >>> I might find some time today to see if there is an easy way to cleanup all what >>> I spelled out above. It really is a mess. But likely that cleanup could be >>> deferred (but you're touching it, so ... :) ). >> >> I'm going to ignore the last 5 words. I heard the "that cleanup could be >> deferred" part loud and clear though :) > > :) > > If we could stop passing orders into thp_vma_allowable_orders(), that would > probably > be the biggest win. It's just all a confusing mess. I tried an approach like you suggested in the other thread originally, but I struggled to define exactly what "thp_vma_configured_orders()" should mean; Ideally, I just want "all the THP orders that are currently enabled for this VMA+flags". But some callers want to enforce_sysfs and others don't, so you probably have to at least pass that flag. Then you have DAX which explicitly ignores enforce_sysfs, but only in a page fault. And shmem, which ignores enforce_sysfs, but only outside of a page fault. So it quickly becomes pretty complex. It is basically thp_vma_allowable_orders() as currently defined. If this could be a simple function then it could be inline and as you say, we can do the masking in the caller and exit early for the order-0 case. But it is very complex (at least if you want to retain the equivalent logic to what thp_vma_allowable_orders() has) so I'm not sure how to do the order-0 early exit without passing in the orders bitfield. And we are unlikely to exit early because PMD-sized THP is likely enabled and because we didn't pass in a orders bitfield, that wasn't filtered out. In short, I can't see a solution that's better than the one I have. But if you have something in mind, if you can spell it out, then I'll have a go at tidying it up and integrating it into the series. Otherwise I really would prefer to leave it for a separate series.