Received: by 2002:a05:7412:f584:b0:e2:908c:2ebd with SMTP id eh4csp1369200rdb; Mon, 4 Sep 2023 11:20:08 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGZJH2JRx20mHuHg0CHiO6lRITdNylTP9SR+g3tJwGtCjRXjrvXiaywpbXXizWrLLKjhAD/ X-Received: by 2002:a50:ee92:0:b0:52a:3ee9:a78a with SMTP id f18-20020a50ee92000000b0052a3ee9a78amr8321187edr.26.1693851607920; Mon, 04 Sep 2023 11:20:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1693851607; cv=none; d=google.com; s=arc-20160816; b=jntL5hN9tqEThwVAnYcGcJYi2eaV3sVZT15/Apgv74syQYtVp+lpsqYEVKI0tNkfVp ml4ZvfBEtxGon4mdHpa0fumcCfurp35Udps8GaHA0BSbQsbtKlxHnbf34gISG2dR3C0w VypIHCOn1yaUvRXWfYwXJie4Xb+Z+WYedxwpE7MiSVnk7CN+OMz1UWSSnxQEZpIvyClt wRDnJSwCU9cwYOqb3wCHy+UaPb1dQ4wZxHRNKibYXa1FHOHF8G7TbPGjTAqVIg0aeB2V uEPPDyj0GUaCMLnMtTiLnDQGgqh36RquVGzkDzioJ7vL+ThcM99MqWGskBkE3Ysc7o6z Jaxw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=7qTsYbyWpVRIvKZOH6uKQ4aNho+XjSfgjlJlsXcCdeU=; fh=urX2oZWgiO31SqoFKgLlR7PRe1RAmkueZIqpHk1d9L4=; b=pfySxG7MLp48QGBe6pGu/zH6M73mE2KWKG7gRtYBM176Z6YYzKp7W1EHNJ4HZAzyyu 9ac0whweehXcSc3q8ZrPtHM0yZOQhbG9kqkUnffRVqAsOPJt4/OVFzeIMG5wdLod2Hek 7UPyErSsd5db0zMoCFhwifv9hV+w3Wh3Hx3L5AykRppRi+WxZzkBqKDmMJTeAY3QoGSo VpFp8l4rRVnOddFgBzG3J0K2bW6SxHVGXLKsUJtQ+f1eU7rFWNR4r1aB9qOZtUgVa0BR Zdm5Dk1KLAFwKb7k9TPF3UpzNViCSZrcsoFG/kgTG4N1KTniQOQFrPu6tqJkmG7Xd+A2 AJhA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l13-20020aa7c3cd000000b00525b997a7b9si6980886edr.610.2023.09.04.11.19.37; Mon, 04 Sep 2023 11:20:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235106AbjIDKFM (ORCPT + 99 others); Mon, 4 Sep 2023 06:05:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51524 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232023AbjIDKFM (ORCPT ); Mon, 4 Sep 2023 06:05:12 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 79023E1 for ; Mon, 4 Sep 2023 03:05:08 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 34EEA143D; Mon, 4 Sep 2023 03:05:46 -0700 (PDT) Received: from [10.57.65.16] (unknown [10.57.65.16]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 995A13F793; Mon, 4 Sep 2023 03:05:05 -0700 (PDT) Message-ID: <61f875fe-d2e0-4a46-baeb-b6cd7b765267@arm.com> Date: Mon, 4 Sep 2023 11:05:03 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v5 3/5] mm: LARGE_ANON_FOLIO for improved performance Content-Language: en-GB To: Yang Shi , Matthew Wilcox Cc: David Hildenbrand , "Huang, Ying" , Andrew Morton , Yin Fengwei , Yu Zhao , Catalin Marinas , Anshuman Khandual , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org References: <20230810142942.3169679-1-ryan.roberts@arm.com> <20230810142942.3169679-4-ryan.roberts@arm.com> <87v8dg6lfu.fsf@yhuang6-desk2.ccr.corp.intel.com> <5c9ba378-2920-4892-bdf0-174e47d528b7@arm.com> <87cyz43s63.fsf@yhuang6-desk2.ccr.corp.intel.com> <4e14730b-4e4c-de30-04bb-9f3ec4a93754@redhat.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/09/2023 18:18, Yang Shi wrote: > On Fri, Sep 1, 2023 at 9:13 AM Matthew Wilcox wrote: >> >> On Thu, Aug 31, 2023 at 10:15:09AM -0700, Yang Shi wrote: >>> On Thu, Aug 31, 2023 at 12:57 AM David Hildenbrand wrote: >>>> Let's talk about that in a bi-weekly MM session. (I proposed it as a >>>> topic for next week). >>>> >>>> As raised in another mail, we can then discuss >>>> * how we want to call this feature (transparent large pages? there is >>>> the concern that "THP" might confuse users. Maybe we can consider >>>> "large" the more generic version and "huge" only PMD-size, TBD) >>> >>> I tend to agree. "Huge" means PMD-mappable (transparent or HugeTLB), >>> "Large" means any order but less than PMD-mappable order, "Gigantic" >>> means PUD mappable. This should incur the least confusion IMHO. >> >> "Large" means any order > 0. The limitation to <= PMD_ORDER is simply >> because I don't want to go through the whole VM and fix all the places >> that assume that pmd_page() returns a head page. The benefit to doing so >> is quite small, and the work to achieve it is quite large. The amount of >> work needed should decrease over time as we convert more code to folios, >> so deferring it is the right decision today. > > Yeah, I agree. And we are on the same page. > >> >> But nobody should have the impression that large folios are smaller >> than PMD size, nor even less than or equal. Just like they shouldn't >> think that large folios depend on CONFIG_TRANSPARENT_HUGEPAGE. They do >> today, but that's purely an implementation detail that will be removed >> eventually. > > Yes, THP should be just a special case of large folio from page table > point of view (for example, PMD-mappable vs non-PMD-mappable). > >> >>>> I think there *really* has to be a way to disable it for a running >>>> system, otherwise no distro will dare pulling it in, even after we >>>> figured out the other stuff. >>> >>> TBH I really don't like to tie large folio to THP toggles. THP >>> (PMD-mappable) is just a special case of LAF. The large folio should >>> be tried whenever it is possible ideally. But I do agree we may not be >>> able to achieve the ideal case at the time being, and also understand >>> the concern about regression in early adoption, so a knob that can >>> disable large folio may be needed for now. But it should be just a >>> simple binary knob (on/off), and should not be a part of kernel ABI >>> (temporary and debugging only) IMHO. >> >> Best of luck trying to remove it after you've shipped it ... we've >> never been able to remove any of the THP toggles, only make them more >> complicated. > > Fingers crossed... and my point is we should try to avoid making > things more complicated. It may be hard... > >> >>> One more thing we may discuss is whether huge page madvise APIs should >>> take effect for large folio or not. >> >> They already do for file large folios; we listen to MADV_HUGEPAGE and >> attempt to allocate PMD_ORDER folios for faults. > > OK, file folio may be simpler than anonymous. For anonymous folio, > there may be two potential cases depending on our choice: > > Tie large folio to THP knobs: > MADV_HUGEPAGE - large folio if THP is on/no large folio if THP is off > MADV_NOHUGEPAGE - no large folio > > Not tie large folio to THP knob: > MADV_HUGEPAGE - always large folio > MADV_NOHUGEPAGE - shall create large folio? > In my mind, the debate on how LAF and MADV_NOHUGEPAGE should interact is concluded; David has explained a QEMU live migration use case, which would break if a LAF was allocated for a VMA with MADV_NOHUGEPAGE (see [1]). Given LAF and THP controls must be tied together at MADV_NOHUGEPAGE as a minimum, then for me it makes most sense to expose LAF to user space as a generalization of THP rather than a separate, independent feature. And if taking such a route, Huang Ying's suggestion at [2] sounds like a good starting point. Anyway, let's discuss in the mm meeting as David requested. [1] https://lore.kernel.org/linux-mm/b936041c-08a7-e844-19e7-eafc4ddf63b9@redhat.com/ [2] https://lore.kernel.org/linux-mm/87v8dg6lfu.fsf@yhuang6-desk2.ccr.corp.intel.com/