Received: by 2002:a05:6a10:7420:0:0:0:0 with SMTP id hk32csp959478pxb; Wed, 16 Feb 2022 07:59:48 -0800 (PST) X-Google-Smtp-Source: ABdhPJwRZBEo7X2U/h7AS5jFCaPvQKvEYm9IsVQJV1r3lp8Uc9tyvunMOm51fYpcba3n6J2sax49 X-Received: by 2002:a63:10a:0:b0:372:e458:e707 with SMTP id 10-20020a63010a000000b00372e458e707mr2806781pgb.39.1645027188221; Wed, 16 Feb 2022 07:59:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645027188; cv=none; d=google.com; s=arc-20160816; b=Vg9JKgVA3x8BGWj9HM51ScEyDH0T+22Fv8loZuDtoB5Bb7QrTKkRj3bv+O10ygREV8 JXljL3FxMYnA2ZryQod67SbLg8F+sIPUtLQACBtuVop94kQpE0g2SpKRvfh5ulF6S+Sh LJSxJ6EyEpkv4O2Q1oXsS6gXZKDawx/5EBWbjrJ2cxalQDu3GumF74Y4b1zju6nhA++T UPKu4qwBDiWFT8+DL0n9mvPKdwpCW1LjUTDtGDXDwwGxXxww1Et6D0merRklg11LvILS S4icf7XPE96JCFM4wwc7s4DIChmKzmLURkB20axGvAVo5CtJoQm2GjOfb8Sf7RHG5MaP h/nQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature:dkim-signature; bh=Hzxw7hrW6dK7op0zINjolnPd3bCW2rJjoUiSMQH+CTk=; b=wd27tdDmZWvzXLttQXZR38yocDyCUoG6rqQ7t2YlMM1q88wtX696YSjhTMH+DQCYD3 avbkbxb7KacfukdMQR/PUd0n9AyTc6YEJNysJ6TO1nI/Fwt8D0lUQmoQ1eLVWDyQxEvB w66OQtnl/Y9CwwT+LVCjT1W4aR55saQNlbWIteLHU3MeIwSRrdkKJsGr00wxow/KnJvG vGECNInxaW9YqWdNt5BryQMjcmazvD+28TMZCwBRCrsduCM6vkVggUycnKe7/hCHfCuG 1i0XqXG8k0WI9RQjAnmbfwe9IPsqARfheUmDLOUPgobcTOT57L1CAKU6Lpb90niWu9iE RS8w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=TvYzXGmg; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j3si17746370plr.339.2022.02.16.07.59.31; Wed, 16 Feb 2022 07:59:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=TvYzXGmg; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235110AbiBPOiJ (ORCPT + 99 others); Wed, 16 Feb 2022 09:38:09 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:44324 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233864AbiBPOiH (ORCPT ); Wed, 16 Feb 2022 09:38:07 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EF8192A64D6 for ; Wed, 16 Feb 2022 06:37:54 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 150431F37D; Wed, 16 Feb 2022 14:37:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1645022273; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Hzxw7hrW6dK7op0zINjolnPd3bCW2rJjoUiSMQH+CTk=; b=TvYzXGmg7n17GwN8Ys+jLoQHubu+F6/aqB18rg+S6Hnk4Oj4SBiD4trr4k0W7I/B3JQ8aS c5ApEWU6UmHLXFjTLg1aQXVkVdYvdrrYHadBSTBXmVpMThw2k70Y4qu9Tv4Bblu19M8M6+ uisCnEoGmQUnfZk8iniWj/U+KChjx6c= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1645022273; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Hzxw7hrW6dK7op0zINjolnPd3bCW2rJjoUiSMQH+CTk=; b=n10LNqT7gvOOyLsB+Xx/BFainR8wLXRyzOA+tOnQxrdj/1HAptg3Qd+VINmQG/sj6LR6uo blgA+z97B0HhJHDw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id E23F113B24; Wed, 16 Feb 2022 14:37:52 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id N95zNUAMDWJEFwAAMHmgww (envelope-from ); Wed, 16 Feb 2022 14:37:52 +0000 Message-ID: <5c4747b7-60b7-c7b4-3be6-4ecea92cf975@suse.cz> Date: Wed, 16 Feb 2022 15:37:52 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.0 Subject: Re: [PATCH 5/5] mm/page_alloc: Limit number of high-order pages on PCP during bulk free Content-Language: en-US To: Mel Gorman , Andrew Morton Cc: Aaron Lu , Dave Hansen , Michal Hocko , Jesper Dangaard Brouer , LKML , Linux-MM References: <20220215145111.27082-1-mgorman@techsingularity.net> <20220215145111.27082-6-mgorman@techsingularity.net> From: Vlastimil Babka In-Reply-To: <20220215145111.27082-6-mgorman@techsingularity.net> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2/15/22 15:51, Mel Gorman wrote: > When a PCP is mostly used for frees then high-order pages can exist on PCP > lists for some time. This is problematic when the allocation pattern is all > allocations from one CPU and all frees from another resulting in colder > pages being used. When bulk freeing pages, limit the number of high-order > pages that are stored on the PCP lists. > > Netperf running on localhost exhibits this pattern and while it does > not matter for some machines, it does matter for others with smaller > caches where cache misses cause problems due to reduced page reuse. > Pages freed directly to the buddy list may be reused quickly while still > cache hot where as storing on the PCP lists may be cold by the time > free_pcppages_bulk() is called. > > Using perf kmem:mm_page_alloc, the 5 most used page frames were > > 5.17-rc3 > 13041 pfn=0x111a30 > 13081 pfn=0x5814d0 > 13097 pfn=0x108258 > 13121 pfn=0x689598 > 13128 pfn=0x5814d8 > > 5.17-revert-highpcp > 192009 pfn=0x54c140 > 195426 pfn=0x1081d0 > 200908 pfn=0x61c808 > 243515 pfn=0xa9dc20 > 402523 pfn=0x222bb8 > > 5.17-full-series > 142693 pfn=0x346208 > 162227 pfn=0x13bf08 > 166413 pfn=0x2711e0 > 166950 pfn=0x2702f8 > > The spread is wider as there is still time before pages freed to one > PCP get released with a tradeoff between fast reuse and reduced zone > lock acquisition. > > From the machine used to gather the traces, the headline performance > was equivalent. > > netperf-tcp > 5.17.0-rc3 5.17.0-rc3 5.17.0-rc3 > vanilla mm-reverthighpcp-v1r1 mm-highpcplimit-v1r12 > Hmean 64 839.93 ( 0.00%) 840.77 ( 0.10%) 835.34 * -0.55%* > Hmean 128 1614.22 ( 0.00%) 1622.07 * 0.49%* 1604.18 * -0.62%* > Hmean 256 2952.00 ( 0.00%) 2953.19 ( 0.04%) 2959.46 ( 0.25%) > Hmean 1024 10291.67 ( 0.00%) 10239.17 ( -0.51%) 10287.05 ( -0.04%) > Hmean 2048 17335.08 ( 0.00%) 17399.97 ( 0.37%) 17125.73 * -1.21%* > Hmean 3312 22628.15 ( 0.00%) 22471.97 ( -0.69%) 22414.24 * -0.95%* > Hmean 4096 25009.50 ( 0.00%) 24752.83 * -1.03%* 24620.03 * -1.56%* > Hmean 8192 32745.01 ( 0.00%) 31682.63 * -3.24%* 32475.31 ( -0.82%) > Hmean 16384 39759.59 ( 0.00%) 36805.78 * -7.43%* 39291.42 ( -1.18%) > > From a 1-socket skylake machine with a small CPU cache that suffers > more if cache misses are too high > > netperf-tcp > 5.17.0-rc3 5.17.0-rc3 5.17.0-rc3 > vanilla mm-reverthighpcp-v1 mm-highpcplimit-v1 > Min 64 935.38 ( 0.00%) 939.40 ( 0.43%) 940.11 ( 0.51%) > Min 128 1831.69 ( 0.00%) 1856.15 ( 1.34%) 1849.30 ( 0.96%) > Min 256 3560.61 ( 0.00%) 3659.25 ( 2.77%) 3654.12 ( 2.63%) > Min 1024 13165.24 ( 0.00%) 13444.74 ( 2.12%) 13281.71 ( 0.88%) > Min 2048 22706.44 ( 0.00%) 23219.67 ( 2.26%) 23027.31 ( 1.41%) > Min 3312 30960.26 ( 0.00%) 31985.01 ( 3.31%) 31484.40 ( 1.69%) > Min 4096 35149.03 ( 0.00%) 35997.44 ( 2.41%) 35891.92 ( 2.11%) > Min 8192 48064.73 ( 0.00%) 49574.05 ( 3.14%) 48928.89 ( 1.80%) > Min 16384 58017.25 ( 0.00%) 60352.93 ( 4.03%) 60691.14 ( 4.61%) > Hmean 64 938.95 ( 0.00%) 941.50 * 0.27%* 940.47 ( 0.16%) > Hmean 128 1843.10 ( 0.00%) 1857.58 * 0.79%* 1855.83 * 0.69%* > Hmean 256 3573.07 ( 0.00%) 3667.45 * 2.64%* 3662.08 * 2.49%* > Hmean 1024 13206.52 ( 0.00%) 13487.80 * 2.13%* 13351.11 * 1.09%* > Hmean 2048 22870.23 ( 0.00%) 23337.96 * 2.05%* 23149.68 * 1.22%* > Hmean 3312 31001.99 ( 0.00%) 32206.50 * 3.89%* 31849.40 * 2.73%* > Hmean 4096 35364.59 ( 0.00%) 36490.96 * 3.19%* 36112.91 * 2.12%* > Hmean 8192 48497.71 ( 0.00%) 49954.05 * 3.00%* 49384.50 * 1.83%* > Hmean 16384 58410.86 ( 0.00%) 60839.80 * 4.16%* 61362.12 * 5.05%* > > Note that this was a machine that did not benefit from caching high-order > pages and performance is almost restored with the series applied. It's not > fully restored as cache misses are still higher. This is a trade-off > between optimising for a workload that does all allocs on one CPU and frees > on another or more general workloads that need high-order pages for SLUB > and benefit from avoiding zone->lock for every SLUB refill/drain. > > Signed-off-by: Mel Gorman Reviewed-by: Vlastimil Babka > --- > mm/page_alloc.c | 26 +++++++++++++++++++++----- > 1 file changed, 21 insertions(+), 5 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 6881175b27df..cfb3cbad152c 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3314,10 +3314,15 @@ static bool free_unref_page_prepare(struct page *page, unsigned long pfn, > return true; > } > > -static int nr_pcp_free(struct per_cpu_pages *pcp, int high, int batch) > +static int nr_pcp_free(struct per_cpu_pages *pcp, int high, int batch, > + bool free_high) > { > int min_nr_free, max_nr_free; > > + /* Free everything if batch freeing high-order pages. */ > + if (unlikely(free_high)) > + return pcp->count; > + > /* Check for PCP disabled or boot pageset */ > if (unlikely(high < batch)) > return 1; > @@ -3338,11 +3343,12 @@ static int nr_pcp_free(struct per_cpu_pages *pcp, int high, int batch) > return batch; > } > > -static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone) > +static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, > + bool free_high) > { > int high = READ_ONCE(pcp->high); > > - if (unlikely(!high)) > + if (unlikely(!high || free_high)) > return 0; > > if (!test_bit(ZONE_RECLAIM_ACTIVE, &zone->flags)) > @@ -3362,17 +3368,27 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn, > struct per_cpu_pages *pcp; > int high; > int pindex; > + bool free_high; > > __count_vm_event(PGFREE); > pcp = this_cpu_ptr(zone->per_cpu_pageset); > pindex = order_to_pindex(migratetype, order); > list_add(&page->lru, &pcp->lists[pindex]); > pcp->count += 1 << order; > - high = nr_pcp_high(pcp, zone); > + > + /* > + * As high-order pages other than THP's stored on PCP can contribute > + * to fragmentation, limit the number stored when PCP is heavily > + * freeing without allocation. The remainder after bulk freeing > + * stops will be drained from vmstat refresh context. > + */ > + free_high = (pcp->free_factor && order && order <= PAGE_ALLOC_COSTLY_ORDER); > + > + high = nr_pcp_high(pcp, zone, free_high); > if (pcp->count >= high) { > int batch = READ_ONCE(pcp->batch); > > - free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch), pcp, pindex); > + free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch, free_high), pcp, pindex); > } > } >