Received: by 2002:a05:7412:f690:b0:e2:908c:2ebd with SMTP id ej16csp765282rdb; Thu, 19 Oct 2023 20:33:18 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFgKmDC2GsoTWpoJo+ReicRxAavcwxmeMyZD2QehSI0u7DfNauITnrlH7YdOc+/WEfqz+Tb X-Received: by 2002:a05:6870:3c8d:b0:1ea:ca54:e51a with SMTP id gl13-20020a0568703c8d00b001eaca54e51amr874718oab.45.1697772798240; Thu, 19 Oct 2023 20:33:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697772798; cv=none; d=google.com; s=arc-20160816; b=Fe0g7qvrWfFOybL5fR4q62HB/+34UjGpCxYgTZ/G1M2t2gHpYiOfkm4qUbLp1vSteb 1Ggd3JI+bWRv8t06k7KyB41qWDu5KjybuGuyLKcRqbDfwp7R6Bgq+YEHTYTuILfL3Jd4 CntkEEWXRv3H+P3iinhwEoXKG9BzHXOoTe6zlor1tuZwu7+kgRrsq5FjvS9aY+W4wrNS 2htFd1XsUt5GuDetG49An34bDIq3DKGAjbm2j3YF1HK2pV4xX2XnjdgC8Q8lns3syRLY ayxcEu3OmQ7O/eeFo3DRbLn0OL8jqac0Afyk3pwFFR67Gjdy4Lt7Ec/qVY/rQz1BmAMl WHwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:message-id:date :references:in-reply-to:subject:cc:to:from:dkim-signature; bh=JDSKjY2YYjwZIlsgOnBiT/1nvrmvB00PtJJS9L39yzA=; fh=o/QHSPw3PmKph8TTwj4rah792npSRShnjtafWJDgFGk=; b=0N7sQv0MKu9J2d5Nrh0otepthilFjmLCEgf3JHNHc2F3vP/ku5i2ld421wtBl7UsUO GDbOPW1S2lIHIpSUsqdqaf5FsA26wGWoc7Q1GcPdS5CjfWqmxfCLMxsdlaGgo+OZCpXB Akn6ErZLZE4j7kmwZPUMWOu8bVNq48dUU+f0dRjlxlAnNj581gjRtdavuSWQEWpndoiN Whw5wl3Kxo6sYaCQRQLOlCsMW20yJV6DJlIXQKhxJpDYljhncXqQPpnuy+LY9dcF+E6n f0eUPqJyfs020CXjUt+4DcFNsKMeU8iZ5vfJ0QekvTI5wik5OOqOVfriN4qGJqDMs/RT zVOQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="G/dDSyhI"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id bz25-20020a056a02061900b0055c1760dd8esi1135184pgb.380.2023.10.19.20.33.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Oct 2023 20:33:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="G/dDSyhI"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 0453E82664A2; Thu, 19 Oct 2023 20:33:15 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346888AbjJTDdG (ORCPT + 99 others); Thu, 19 Oct 2023 23:33:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42170 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235596AbjJTDdD (ORCPT ); Thu, 19 Oct 2023 23:33:03 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CCE40D49 for ; Thu, 19 Oct 2023 20:32:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697772779; x=1729308779; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=ivYGA1CuBsuE1aoMC1/Z4fsSzlHs99Tm+5xLSzUaM0w=; b=G/dDSyhIWO27Az/WY2jdWOxpZFCTKdD2lzA0LWFF0pMqhBvj1SMmmxsz wxVSwXnIMTds5QN5ntn/2CugMbyzxBejOE8kKkp5b6ZBYfZTT4VSN0H8L eThlXGyK3Oh772SI1I+AsMFidia+7BD+JfMrQSpI1bl7NNl5IYwUvNntw wg8c3gDWWitS9nOAlVyEGypUTwn1YyDxgQgEgKiYWZox7nS9Ma7qI/hL6 OR3yh0ChZpYiLVJ6AGAjK5lejmw3peY0ewppIY9T3bmLLoHJzpF2eX0FJ xEaiTcEAekzpXUWqREE3TU3+xUouq3jDzwfxzCIDWtUa/0B9v2+Xe+uDV Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10868"; a="371493231" X-IronPort-AV: E=Sophos;i="6.03,238,1694761200"; d="scan'208";a="371493231" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Oct 2023 20:32:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10868"; a="757307892" X-IronPort-AV: E=Sophos;i="6.03,238,1694761200"; d="scan'208";a="757307892" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Oct 2023 20:32:55 -0700 From: "Huang, Ying" To: Mel Gorman Cc: Andrew Morton , , , Arjan Van De Ven , Vlastimil Babka , David Hildenbrand , Johannes Weiner , Dave Hansen , Michal Hocko , Pavel Tatashin , Matthew Wilcox , "Christoph Lameter" Subject: Re: [PATCH -V3 8/9] mm, pcp: decrease PCP high if free pages < high watermark In-Reply-To: <20231019123310.pirier6sk6iaqr3n@techsingularity.net> (Mel Gorman's message of "Thu, 19 Oct 2023 13:33:10 +0100") References: <20231016053002.756205-1-ying.huang@intel.com> <20231016053002.756205-9-ying.huang@intel.com> <20231019123310.pirier6sk6iaqr3n@techsingularity.net> Date: Fri, 20 Oct 2023 11:30:53 +0800 Message-ID: <87o7guezrm.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Thu, 19 Oct 2023 20:33:15 -0700 (PDT) Mel Gorman writes: > On Mon, Oct 16, 2023 at 01:30:01PM +0800, Huang Ying wrote: >> One target of PCP is to minimize pages in PCP if the system free pages >> is too few. To reach that target, when page reclaiming is active for >> the zone (ZONE_RECLAIM_ACTIVE), we will stop increasing PCP high in >> allocating path, decrease PCP high and free some pages in freeing >> path. But this may be too late because the background page reclaiming >> may introduce latency for some workloads. So, in this patch, during >> page allocation we will detect whether the number of free pages of the >> zone is below high watermark. If so, we will stop increasing PCP high >> in allocating path, decrease PCP high and free some pages in freeing >> path. With this, we can reduce the possibility of the premature >> background page reclaiming caused by too large PCP. >> >> The high watermark checking is done in allocating path to reduce the >> overhead in hotter freeing path. >> >> Signed-off-by: "Huang, Ying" >> Cc: Andrew Morton >> Cc: Mel Gorman >> Cc: Vlastimil Babka >> Cc: David Hildenbrand >> Cc: Johannes Weiner >> Cc: Dave Hansen >> Cc: Michal Hocko >> Cc: Pavel Tatashin >> Cc: Matthew Wilcox >> Cc: Christoph Lameter >> --- >> include/linux/mmzone.h | 1 + >> mm/page_alloc.c | 33 +++++++++++++++++++++++++++++++-- >> 2 files changed, 32 insertions(+), 2 deletions(-) >> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >> index ec3f7daedcc7..c88770381aaf 100644 >> --- a/include/linux/mmzone.h >> +++ b/include/linux/mmzone.h >> @@ -1018,6 +1018,7 @@ enum zone_flags { >> * Cleared when kswapd is woken. >> */ >> ZONE_RECLAIM_ACTIVE, /* kswapd may be scanning the zone. */ >> + ZONE_BELOW_HIGH, /* zone is below high watermark. */ >> }; >> >> static inline unsigned long zone_managed_pages(struct zone *zone) >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 8382ad2cdfd4..253fc7d0498e 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -2407,7 +2407,13 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, >> return min(batch << 2, pcp->high); >> } >> >> - if (pcp->count >= high && high_min != high_max) { >> + if (high_min == high_max) >> + return high; >> + >> + if (test_bit(ZONE_BELOW_HIGH, &zone->flags)) { >> + pcp->high = max(high - (batch << pcp->free_factor), high_min); >> + high = max(pcp->count, high_min); >> + } else if (pcp->count >= high) { >> int need_high = (batch << pcp->free_factor) + batch; >> >> /* pcp->high should be large enough to hold batch freed pages */ >> @@ -2457,6 +2463,10 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, >> if (pcp->count >= high) { >> free_pcppages_bulk(zone, nr_pcp_free(pcp, batch, high, free_high), >> pcp, pindex); >> + if (test_bit(ZONE_BELOW_HIGH, &zone->flags) && >> + zone_watermark_ok(zone, 0, high_wmark_pages(zone), >> + ZONE_MOVABLE, 0)) >> + clear_bit(ZONE_BELOW_HIGH, &zone->flags); >> } >> } >> > > This is a relatively fast path and freeing pages should not need to check > watermarks. Another stuff that mitigate the overhead is that the watermarks checking only occurs when we free pages from PCP to buddy. That is, in most cases, every 63 page freeing. > While the overhead is mitigated because it applies only when > the watermark is below high, that is also potentially an unbounded condition > if a workload is sized precisely enough. Why not clear this bit when kswapd > is going to sleep after reclaiming enough pages in a zone? IIUC, if the number of free pages is kept larger than the low watermark, then kswapd will have no opportunity to be waken up even if the number of free pages was ever smaller than the high watermark. > If you agree then a follow-up patch classed as a micro-optimisation is > sufficient to avoid redoing all the results again. For most of your > tests, it should be performance-neutral or borderline noise. -- Best Regards, Huang, Ying