Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp1722640rdb; Thu, 7 Dec 2023 07:12:53 -0800 (PST) X-Google-Smtp-Source: AGHT+IE+PSGnmcHJQ4TPOew/eDXI9Mef+ItbX1sF9LnHvR9z/fSYwWchK/BEkO2GtOmQdazldbgQ X-Received: by 2002:a05:6a20:be03:b0:186:2389:a73e with SMTP id ge3-20020a056a20be0300b001862389a73emr3016552pzb.55.1701961972763; Thu, 07 Dec 2023 07:12:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701961972; cv=none; d=google.com; s=arc-20160816; b=TNnceqSlLxfdh7qLwfkQJUW0DPGm6luefPkTYHfprHK99GRSL3ptctq7Ecjbfd6FNJ 7jFW2+18H0Uu6nkhFao1uiThS0d5pYr8O6QC2saeMdTD6yB3Vt6wj5Rc4n5Ji99XwtXL 1yDIMm7qXtlvQ6/dwgPUUe1rnzjBN3sRh/0Ecy30oHNL/mf4aoh+2Wo09ntn2uNuNUll LDtS/isEuRiaKTPNNkcA06swa6o0X0EieOhkq8dJzpHV8wHurXslVzCzHFTD4P655kxD o0ABtm2SCpn2fyY1mroDTsp3o2LMyxwqOA5sAzhqzpo3mfiMuM4gDzfcVZKj54ZnD+9X UtmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=RiXvUpLsykTBnw8k00bNjQDFpxdournga1BpTaQjpEU=; fh=Hhg/Irzq7NxjNc8I6dFyEpRureAOclKOpRByptRv8N8=; b=0w6IVBbHhCh/pN/5eQTZRGi9qmkmpdrrq4FdqqT5AAuY+sGxcezoUASmrqu5KHua/y z1msdE7JL/qi6Nd5sL4Bu5qYhaZHYd+uFtppBpqtBnWsSIex6EsiFwS0p2UUEbXLIRn6 nDaSUI0hAIgymFqXaPKYNRbMAboJTmHf0/oCsruofI3jyyrA6wE5mKMRPoEGZwtZJ5d8 wOFsxUhzGpptSo3pmMqTmsziF2FDpzlq1s96VkwI0AFMazMeTpuARs/uvDC4grT4OsxE Q/JBfDpJy0KNZVv9Lzs3OV4DYkNWPfQBapYyYFfeqI4X3Sas76I21kHHgovuY7CFLs2+ 2HvQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from morse.vger.email (morse.vger.email. [23.128.96.31]) by mx.google.com with ESMTPS id bz31-20020a056a02061f00b005c6c950c3bdsi1349447pgb.642.2023.12.07.07.12.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Dec 2023 07:12:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) client-ip=23.128.96.31; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id 0A309809C3AC; Thu, 7 Dec 2023 07:12:23 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233043AbjLGPMH (ORCPT + 99 others); Thu, 7 Dec 2023 10:12:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52864 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232097AbjLGPMG (ORCPT ); Thu, 7 Dec 2023 10:12:06 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id ECC26D5B for ; Thu, 7 Dec 2023 07:12:11 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1A6611042; Thu, 7 Dec 2023 07:12:57 -0800 (PST) Received: from [10.1.32.134] (XHFQ2J9959.cambridge.arm.com [10.1.32.134]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 769EC3F762; Thu, 7 Dec 2023 07:12:08 -0800 (PST) Message-ID: <787eb131-759c-4cd3-a2b7-39caf818cffc@arm.com> Date: Thu, 7 Dec 2023 15:12:07 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v8 04/10] mm: thp: Support allocation of anonymous multi-size THP Content-Language: en-GB To: David Hildenbrand , Andrew Morton , Matthew Wilcox , Yin Fengwei , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , John Hubbard , David Rientjes , Vlastimil Babka , Hugh Dickins , Kefeng Wang , Barry Song <21cnbao@gmail.com>, Alistair Popple Cc: linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org References: <20231204102027.57185-1-ryan.roberts@arm.com> <20231204102027.57185-5-ryan.roberts@arm.com> <71040a8c-4ea1-4f21-8ac8-65f7c25c217e@redhat.com> <126c3b71-1acc-4851-9986-4228cb8a8660@arm.com> <94806b4f-2370-4999-9586-2c936955cb87@redhat.com> <3d49bcbf-1f9b-48e8-a91a-ede0762b795c@arm.com> <369ec8d3-ef6a-4a4e-84e2-2c91b8293929@redhat.com> From: Ryan Roberts In-Reply-To: <369ec8d3-ef6a-4a4e-84e2-2c91b8293929@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Thu, 07 Dec 2023 07:12:23 -0800 (PST) On 07/12/2023 15:01, David Hildenbrand wrote: > On 07.12.23 15:45, Ryan Roberts wrote: >> On 07/12/2023 13:28, David Hildenbrand wrote: >>>>> >>>>> Right, but you know from the first loop which order is applicable (and will be >>>>> fed to the second loop) and could just pte_unmap(pte) + tryalloc. If that >>>>> fails, >>>>> remap and try with the next orders. >>>> >>>> You mean something like this? >>>> >>>>      pte = pte_offset_map(vmf->pmd, vmf->address & PMD_MASK); >>>>      if (!pte) >>>>          return ERR_PTR(-EAGAIN); >>>> >>>>      order = highest_order(orders); >>>>      while (orders) { >>>>          addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); >>>>          if (!pte_range_none(pte + pte_index(addr), 1 << order)) { >>>>              order = next_order(&orders, order); >>>>              continue; >>>>          } >>>> >>>>          pte_unmap(pte); >>>>                  folio = vma_alloc_folio(gfp, order, vma, addr, true); >>>>          if (folio) { >>>>              clear_huge_page(&folio->page, vmf->address, 1 << order); >>>>              return folio; >>>>          } >>>> >>>>          pte = pte_offset_map(vmf->pmd, vmf->address & PMD_MASK); >>>>          if (!pte) >>>>              return ERR_PTR(-EAGAIN); >>>> >>>>          order = next_order(&orders, order); >>>>      } >>>> >>>>      pte_unmap(pte); >>>> >>>> I don't really like that because if high order folio allocations fail, then you >>>> are calling pte_range_none() again for the next lower order; once that check >>>> has >>>> succeeded for an order it shouldn't be required for any lower orders. In this >>>> case you also have lots of pte map/unmap. >>> >>> I see what you mean. >>> >>>> >>>> The original version feels more efficient to me. >>> Yes it is. Adding in some comments might help, like >>> >>> /* >>>   * Find the largest order where the aligned range is completely prot_none(). >>> Note >>>   * that all remaining orders will be completely prot_none(). >>>   */ >>> ... >>> >>> /* Try allocating the largest of the remaining orders. */ >> >> OK added. >> >>> >>>> >>>>> >>>>> That would make the code certainly easier to understand. That "orders" >>>>> magic of >>>>> constructing, filtering, walking is confusing :) >>>>> >>>>> >>>>> I might find some time today to see if there is an easy way to cleanup all >>>>> what >>>>> I spelled out above. It really is a mess. But likely that cleanup could be >>>>> deferred (but you're touching it, so ... :) ). >>>> >>>> I'm going to ignore the last 5 words. I heard the "that cleanup could be >>>> deferred" part loud and clear though :) >>> >>> :) >>> >>> If we could stop passing orders into thp_vma_allowable_orders(), that would >>> probably >>> be the biggest win. It's just all a confusing mess. >> >> >> >> I tried an approach like you suggested in the other thread originally, but I >> struggled to define exactly what "thp_vma_configured_orders()" should mean; >> Ideally, I just want "all the THP orders that are currently enabled for this >> VMA+flags". But some callers want to enforce_sysfs and others don't, so you >> probably have to at least pass that flag. Then you have DAX which explicitly > > Yes, the flags would still be passed. It's kind of the "context". > >> ignores enforce_sysfs, but only in a page fault. And shmem, which ignores >> enforce_sysfs, but only outside of a page fault. So it quickly becomes pretty >> complex. It is basically thp_vma_allowable_orders() as currently defined. > > Yeah, but moving the "can we actually fit a THP in there" check out of the picture. > >> >> If this could be a simple function then it could be inline and as you say, we >> can do the masking in the caller and exit early for the order-0 case. But it is >> very complex (at least if you want to retain the equivalent logic to what >> thp_vma_allowable_orders() has) so I'm not sure how to do the order-0 early exit >> without passing in the orders bitfield. And we are unlikely to exit early >> because PMD-sized THP is likely enabled and because we didn't pass in a orders >> bitfield, that wasn't filtered out. >> >> In short, I can't see a solution that's better than the one I have. But if you >> have something in mind, if you can spell it out, then I'll have a go at tidying >> it up and integrating it into the series. Otherwise I really would prefer to >> leave it for a separate series. > > I'm playing with some cleanups, but they can all be built on top if they > materialize. OK, I'm going to post a v9 then. And cross my fingers and hope that's the final version.