Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp35116125rwd; Mon, 10 Jul 2023 02:50:12 -0700 (PDT) X-Google-Smtp-Source: APBJJlFMLmO3z4tzAKoXWuzFkwYBqoyHppQK5BUNubC6zbliQU3Ew+2F7vvGRKcY62sbAbvAC7nw X-Received: by 2002:a05:6a21:998a:b0:12a:30da:59 with SMTP id ve10-20020a056a21998a00b0012a30da0059mr11658093pzb.45.1688982611977; Mon, 10 Jul 2023 02:50:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688982611; cv=none; d=google.com; s=arc-20160816; b=RKjEj6ZjxMaMjPWyvJMu5qJZkgFMif+cV39n96/AMKQOP2B1nUiMEqFnzt52gnCILo 3QWqhuXGMEhYCPir4J3YpH7kXRSUSyiy8jBDoijzuYNGrfL6YaXKQAiCYgG+9n6lfz44 eiqAbtWHXJr27R9ZMMaVQDiwu+66os6rxefuR5NbdL5zfLOl7DuxMRxANT1CrCUQPnQv d2TKDuVMsglRR2l3W4ZFEFOWCCAlgmdVe+VkvtDKMSHvKmbyFUYJ5yX9cdf21x9KiaFn nmQz6v3ZUycqelPLB1X7gfOxrsAggYdk5mVpZy1C+aoTKY9Si6UWoO/QJ1IlvKi3ZZKs 80/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:subject:user-agent:mime-version:date:message-id; bh=yGosFH1ml7clbjsm/DfDtBduunSXeYaHR4Wok9XLtHQ=; fh=fdRB1Ba8h+wqZwueZYXh3muZcK9qCV+IYPaHCc2HKgA=; b=VBpB2svsSxJpvmxb8JHHFl3Z22lTe4yYJ3eL289BKICzahgri6FRVjNxGyK2IijWcZ dOCDFZtGh3vkxKXOm3NkuTGq++XsMqW/wZKV4o8KTrikvl+5a/B55DSCB16mewRZABRd wGzM1DUBoZ05jE2D/13Rvs1PIdRSsl7jnSLGxNFCybwJRgHj87n2NYdIUJLshJJf62gZ 8JrLY/PXPqubgd/vtylQTFq0cMPX7vY97G7MsVXfqFlM35GF8k3GMdIAsZOIGIqEQZ9Z a1HTxr/+AWaFaqHwirGCdZE3+PAvU21MokaIbabArFbGLIMsPCf92cdVN2XHYPClcMyu 1r5g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u2-20020a170902714200b001b9d9212e2csi2831488plm.238.2023.07.10.02.50.00; Mon, 10 Jul 2023 02:50:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229627AbjGJJnk (ORCPT + 99 others); Mon, 10 Jul 2023 05:43:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60090 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233263AbjGJJnN (ORCPT ); Mon, 10 Jul 2023 05:43:13 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 758EB2708 for ; Mon, 10 Jul 2023 02:40:02 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 187FE2B; Mon, 10 Jul 2023 02:40:44 -0700 (PDT) Received: from [10.57.77.63] (unknown [10.57.77.63]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 08C693F740; Mon, 10 Jul 2023 02:39:59 -0700 (PDT) Message-ID: Date: Mon, 10 Jul 2023 10:39:58 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [PATCH v2 2/5] mm: Allow deferred splitting of arbitrary large anon folios To: "Huang, Ying" Cc: Andrew Morton , Matthew Wilcox , "Kirill A. Shutemov" , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Will Deacon , Anshuman Khandual , Yang Shi , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20230703135330.1865927-1-ryan.roberts@arm.com> <20230703135330.1865927-3-ryan.roberts@arm.com> <877crcgmj1.fsf@yhuang6-desk2.ccr.corp.intel.com> <6379dd13-551e-3c73-422a-56ce40b27deb@arm.com> <87ttucfht7.fsf@yhuang6-desk2.ccr.corp.intel.com> <878rbof8cs.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Ryan Roberts In-Reply-To: <878rbof8cs.fsf@yhuang6-desk2.ccr.corp.intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/07/2023 10:01, Huang, Ying wrote: > Ryan Roberts writes: > >> On 10/07/2023 06:37, Huang, Ying wrote: >>> Ryan Roberts writes: >>> >>>> Somehow I managed to reply only to the linux-arm-kernel list on first attempt so >>>> resending: >>>> >>>> On 07/07/2023 09:21, Huang, Ying wrote: >>>>> Ryan Roberts writes: >>>>> >>>>>> With the introduction of large folios for anonymous memory, we would >>>>>> like to be able to split them when they have unmapped subpages, in order >>>>>> to free those unused pages under memory pressure. So remove the >>>>>> artificial requirement that the large folio needed to be at least >>>>>> PMD-sized. >>>>>> >>>>>> Signed-off-by: Ryan Roberts >>>>>> Reviewed-by: Yu Zhao >>>>>> Reviewed-by: Yin Fengwei >>>>>> --- >>>>>> mm/rmap.c | 2 +- >>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/mm/rmap.c b/mm/rmap.c >>>>>> index 82ef5ba363d1..bbcb2308a1c5 100644 >>>>>> --- a/mm/rmap.c >>>>>> +++ b/mm/rmap.c >>>>>> @@ -1474,7 +1474,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, >>>>>> * page of the folio is unmapped and at least one page >>>>>> * is still mapped. >>>>>> */ >>>>>> - if (folio_test_pmd_mappable(folio) && folio_test_anon(folio)) >>>>>> + if (folio_test_large(folio) && folio_test_anon(folio)) >>>>>> if (!compound || nr < nr_pmdmapped) >>>>>> deferred_split_folio(folio); >>>>>> } >>>>> >>>>> One possible issue is that even for large folios mapped only in one >>>>> process, in zap_pte_range(), we will always call deferred_split_folio() >>>>> unnecessarily before freeing a large folio. >>>> >>>> Hi Huang, thanks for reviewing! >>>> >>>> I have a patch that solves this problem by determining a range of ptes covered >>>> by a single folio and doing a "batch zap". This prevents the need to add the >>>> folio to the deferred split queue, only to remove it again shortly afterwards. >>>> This reduces lock contention and I can measure a performance improvement for the >>>> kernel compilation benchmark. See [1]. >>>> >>>> However, I decided to remove it from this patch set on Yu Zhao's advice. We are >>>> aiming for the minimal patch set to start with and wanted to focus people on >>>> that. I intend to submit it separately later on. >>>> >>>> [1] https://lore.kernel.org/linux-mm/20230626171430.3167004-8-ryan.roberts@arm.com/ >>> >>> Thanks for your information! "batch zap" can solve the problem. >>> >>> And, I agree with Matthew's comments to fix the large folios interaction >>> issues before merging the patches to allocate large folios as in the >>> following email. >>> >>> https://lore.kernel.org/linux-mm/ZKVdUDuwNWDUCWc5@casper.infradead.org/ >>> >>> If so, we don't need to introduce the above problem or a large patchset. >> >> I appreciate Matthew's and others position about not wanting to merge a minimal >> implementation while there are some fundamental features (e.g. compaction) it >> doesn't play well with - I'm working to create a definitive list so these items >> can be tracked and tackled. > > Good to know this, Thanks! > >> That said, I don't see this "batch zap" patch as an example of this. It's just a >> performance enhancement that improves things even further than large anon folios >> on their own. I'd rather concentrate on the core changes first then deal with >> this type of thing later. Does that work for you? > > IIUC, allocating large folios upon page fault depends on splitting large > folios in page_remove_rmap() to avoid memory wastage. Splitting large > folios in page_remove_rmap() depends on "batch zap" to avoid performance > regression in zap_pte_range(). So we need them to be done earlier. Or > I miss something? My point was just that large anon folios improves performance significantly overall, despite a small perf regression in zap_pte_range(). That regression is reduced further by a patch from Yin Fengwei to reduce the lock contention [1]. So it doesn't seem urgent to me to get the "batch zap" change in. I'll add it to my list, then prioritize it against the other stuff. [1] https://lore.kernel.org/linux-mm/20230429082759.1600796-1-fengwei.yin@intel.com/ > > Best Regards, > Huang, Ying