Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp792150rdg; Wed, 11 Oct 2023 05:52:39 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFxdH4ruDB3qQN/WYDa9rwB5y57NqmmwbPko5Yy8vg6kCVwv3Te/kZTdLQ1ofAiCRinlJAp X-Received: by 2002:a05:6e02:148d:b0:351:5322:b801 with SMTP id n13-20020a056e02148d00b003515322b801mr26444338ilk.16.1697028758855; Wed, 11 Oct 2023 05:52:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697028758; cv=none; d=google.com; s=arc-20160816; b=MeOH9VfjjKGDZbBj/RNG6eTHD/7j0ugIwZ7ATMMBsUSdUMMOg/opWP+54o3IIcDCkc Q5NZSahPYVyCsjNAEcsuunncc7bzE6IXJwRMHdS069Lbbrhjnxca/pt9BbAUL+NeXdrN hMe2xQ6V3wHZmKHLdnSq6WHkh0ioDz9l5nci7EVgkMEKk3V3zgWog6q2g/uFJLZqbalh 5Oh56bTpe+akmmmVpHV4t4OlaPhYlK6jcmEO+sLHBJUKRDD4kKkf9ctbq1eTftKeHwUw TnTIGB4CJhJHnaKckd7JUgOhu0uxjQrhl/M8ll6oE41lx1J+V0jU/yycxhis5T7hMhuG vuig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=W70XwxOzGtbCcF264ghOGIPT8N6qfssnCYAqOT8VnSs=; fh=ukqztD1M2m5XsR/Os+/aO0Vc+vHqgDcAt7WzoiZcIPw=; b=waz2gUcAuieLt/ax5r7AmtWB+CEymOhKjIWWhCfsZRjHx/L9IcSfFEaUJWnc7waZWs 8B09oyCL1vSvJqS129ABsDU1+CI8C67QyglfoYiYIc9ZkwjLAC+b8fP3GExuTdp5G5xP 5drlYyMLp12+bb39M5vKEuHxDaI2/9eTdeD343LViMkFelPTTZFGu7r7Y2LFA/tRFc6O lsNVFYaFzhI3gEhL7EaeNnjnpYltoXwYmV6zobqMXSYEJ/Q40aKSurWd6zHsYdt5RTYX Fh+LoKGNj1hrugcVpe+/FKnSTuUH+K453RHj4RN9WkNmavbjeLkUKyQ/opZqj5GahLfX 4Bzw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id h12-20020a056a00170c00b0069341622984si12594678pfc.147.2023.10.11.05.52.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Oct 2023 05:52:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 1C6FF81BBA3A; Wed, 11 Oct 2023 05:52:38 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231758AbjJKMw0 (ORCPT + 99 others); Wed, 11 Oct 2023 08:52:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51188 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231262AbjJKMw0 (ORCPT ); Wed, 11 Oct 2023 08:52:26 -0400 Received: from outbound-smtp63.blacknight.com (outbound-smtp63.blacknight.com [46.22.136.252]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EFF8F98 for ; Wed, 11 Oct 2023 05:52:23 -0700 (PDT) Received: from mail.blacknight.com (pemlinmail01.blacknight.ie [81.17.254.10]) by outbound-smtp63.blacknight.com (Postfix) with ESMTPS id 1BAF0FB0A1 for ; Wed, 11 Oct 2023 13:52:22 +0100 (IST) Received: (qmail 22332 invoked from network); 11 Oct 2023 12:52:21 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.197.19]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 11 Oct 2023 12:52:21 -0000 Date: Wed, 11 Oct 2023 13:52:19 +0100 From: Mel Gorman To: Huang Ying Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Arjan Van De Ven , Andrew Morton , Vlastimil Babka , David Hildenbrand , Johannes Weiner , Dave Hansen , Michal Hocko , Pavel Tatashin , Matthew Wilcox , Christoph Lameter Subject: Re: [PATCH 04/10] mm: restrict the pcp batch scale factor to avoid too long latency Message-ID: <20231011125219.kuoluyuwxzva5q5w@techsingularity.net> References: <20230920061856.257597-1-ying.huang@intel.com> <20230920061856.257597-5-ying.huang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20230920061856.257597-5-ying.huang@intel.com> X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Wed, 11 Oct 2023 05:52:38 -0700 (PDT) On Wed, Sep 20, 2023 at 02:18:50PM +0800, Huang Ying wrote: > In page allocator, PCP (Per-CPU Pageset) is refilled and drained in > batches to increase page allocation throughput, reduce page > allocation/freeing latency per page, and reduce zone lock contention. > But too large batch size will cause too long maximal > allocation/freeing latency, which may punish arbitrary users. So the > default batch size is chosen carefully (in zone_batchsize(), the value > is 63 for zone > 1GB) to avoid that. > > In commit 3b12e7e97938 ("mm/page_alloc: scale the number of pages that > are batch freed"), the batch size will be scaled for large number of > page freeing to improve page freeing performance and reduce zone lock > contention. Similar optimization can be used for large number of > pages allocation too. > > To find out a suitable max batch scale factor (that is, max effective > batch size), some tests and measurement on some machines were done as > follows. > > A set of debug patches are implemented as follows, > > - Set PCP high to be 2 * batch to reduce the effect of PCP high > > - Disable free batch size scaling to get the raw performance. > > - The code with zone lock held is extracted from rmqueue_bulk() and > free_pcppages_bulk() to 2 separate functions to make it easy to > measure the function run time with ftrace function_graph tracer. > > - The batch size is hard coded to be 63 (default), 127, 255, 511, > 1023, 2047, 4095. > > Then will-it-scale/page_fault1 is used to generate the page > allocation/freeing workload. The page allocation/freeing throughput > (page/s) is measured via will-it-scale. The page allocation/freeing > average latency (alloc/free latency avg, in us) and allocation/freeing > latency at 99 percentile (alloc/free latency 99%, in us) are measured > with ftrace function_graph tracer. > > The test results are as follows, > > Sapphire Rapids Server > ====================== > Batch throughput free latency free latency alloc latency alloc latency > page/s avg / us 99% / us avg / us 99% / us > ----- ---------- ------------ ------------ ------------- ------------- > 63 513633.4 2.33 3.57 2.67 6.83 > 127 517616.7 4.35 6.65 4.22 13.03 > 255 520822.8 8.29 13.32 7.52 25.24 > 511 524122.0 15.79 23.42 14.02 49.35 > 1023 525980.5 30.25 44.19 25.36 94.88 > 2047 526793.6 59.39 84.50 45.22 140.81 > > Ice Lake Server > =============== > Batch throughput free latency free latency alloc latency alloc latency > page/s avg / us 99% / us avg / us 99% / us > ----- ---------- ------------ ------------ ------------- ------------- > 63 620210.3 2.21 3.68 2.02 4.35 > 127 627003.0 4.09 6.86 3.51 8.28 > 255 630777.5 7.70 13.50 6.17 15.97 > 511 633651.5 14.85 22.62 11.66 31.08 > 1023 637071.1 28.55 42.02 20.81 54.36 > 2047 638089.7 56.54 84.06 39.28 91.68 > > Cascade Lake Server > =================== > Batch throughput free latency free latency alloc latency alloc latency > page/s avg / us 99% / us avg / us 99% / us > ----- ---------- ------------ ------------ ------------- ------------- > 63 404706.7 3.29 5.03 3.53 4.75 > 127 422475.2 6.12 9.09 6.36 8.76 > 255 411522.2 11.68 16.97 10.90 16.39 > 511 428124.1 22.54 31.28 19.86 32.25 > 1023 414718.4 43.39 62.52 40.00 66.33 > 2047 429848.7 86.64 120.34 71.14 106.08 > > Commet Lake Desktop > =================== > Batch throughput free latency free latency alloc latency alloc latency > page/s avg / us 99% / us avg / us 99% / us > ----- ---------- ------------ ------------ ------------- ------------- > > 63 795183.13 2.18 3.55 2.03 3.05 > 127 803067.85 3.91 6.56 3.85 5.52 > 255 812771.10 7.35 10.80 7.14 10.20 > 511 817723.48 14.17 27.54 13.43 30.31 > 1023 818870.19 27.72 40.10 27.89 46.28 > > Coffee Lake Desktop > =================== > Batch throughput free latency free latency alloc latency alloc latency > page/s avg / us 99% / us avg / us 99% / us > ----- ---------- ------------ ------------ ------------- ------------- > 63 510542.8 3.13 4.40 2.48 3.43 > 127 514288.6 5.97 7.89 4.65 6.04 > 255 516889.7 11.86 15.58 8.96 12.55 > 511 519802.4 23.10 28.81 16.95 26.19 > 1023 520802.7 45.30 52.51 33.19 45.95 > 2047 519997.1 90.63 104.00 65.26 81.74 > > From the above data, to restrict the allocation/freeing latency to be > less than 100 us in most times, the max batch scale factor needs to be > less than or equal to 5. > > So, in this patch, the batch scale factor is restricted to be less > than or equal to 5. > > Signed-off-by: "Huang, Ying" Acked-by: Mel Gorman However, it's worth noting that the time to free depends on the CPU and while the CPUs you tested are reasonable, there are also slower CPUs out there and I've at least one account that the time is excessive. While this patch is fine, there may be a patch on top that makes this runtime configurable, a Kconfig default or both. -- Mel Gorman SUSE Labs