Received: by 2002:a05:7412:37c9:b0:e2:908c:2ebd with SMTP id jz9csp1342399rdb; Wed, 20 Sep 2023 06:45:36 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGDqU0ehdd7+oI1CfP2hbV1NSemkhKOXNOtFM3b17h8Yclmv+Y1Rjhq+ntT1GyG+nnbGARj X-Received: by 2002:a05:6a21:1a0:b0:14c:d494:77d1 with SMTP id le32-20020a056a2101a000b0014cd49477d1mr3027603pzb.10.1695217535749; Wed, 20 Sep 2023 06:45:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695217535; cv=none; d=google.com; s=arc-20160816; b=Ltb5KntZb4QJVyiPQ3bPF1cjI99fKfMmpf8LNeCFv12vpX+vPR9EGJBO/IoaT1eW1q X0jOAl8tmv5a6ZDgivAJAfYyRRSB8XeNWnV1YNtcLPYpDzp0gbKprz/7CEdie9sqSYTN cwYv4E56MKdCte+6sTM2onFsJtMuXo47jd+6dyybepsSbv/zegxa60jx5LoeulsqIXZ6 QQxDuOWAOparonAdHLeajr9nnXeGxkeM1orHdswUR2V/SOcmLr8EXAJz+M0MYTWj2ACw RVP24NxkseBCbmEzW2PSMFjOaGBs0DtZ5t7NuNwgelbrP5AEYXMN9u/d/MRqNNjpLx4r jOdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=UvpAserKTFQazQgvaQV8AWmN5BTD454Je6S/+Aq412A=; fh=OlKm7LKbIdgbzv7m6ivtVBS9u5zco/nrHpeuJnEjCeg=; b=SNTRD2Wfio0NyUojPmOWjbB1uReCJs87Qdn3dUt4ZAO1vEX4S3UiVqHk/Fh9rYXVmX SewvwJsiNcIgN81RdKyTgeiM1asIy5LnE/hxtRG5XPH05xEiZso7P6FdJpprXiiOBMaY HVBnClOknBXptsK/2hDGF+TDzNEXV5JoMZ2bWxb7kWZT3pVNHrb0LN2BoOtqyUsfYbMT uWicciTt/5hkp+/4CDWEWORHj7qsJiaYzW/og7R+qWTgtS9vxnmkV1LiWKpjt/8+Pvyd v4N50SYSmoib8EJXuQw6zUKAPMijmipCsOonLvcE0tQ66lgQsp+8UQoZ8dl+Oa5ldIp0 VbLA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=jXr2YEz5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id t8-20020a63eb08000000b005657ba564bdsi11554926pgh.826.2023.09.20.06.45.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Sep 2023 06:45:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=jXr2YEz5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 42A50808A8F1; Tue, 19 Sep 2023 23:21:01 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233419AbjITGU4 (ORCPT + 99 others); Wed, 20 Sep 2023 02:20:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35276 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233412AbjITGUe (ORCPT ); Wed, 20 Sep 2023 02:20:34 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 779BCF2 for ; Tue, 19 Sep 2023 23:20:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695190813; x=1726726813; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=S/60qEX1jhBiS6ZWzLs5VHrBM2EfBeFasIo8hPl4DFY=; b=jXr2YEz5bQqQJL6tBRUhfqJ/2om3UBPX4ttJKoUkB70zGed/KbsMcjHt GwnxWs9nTzfNbNZZ8aVqrelL+1PLcNHDXec5ADbkRCIr/H878JtAIyvyQ rqqY2P9OtbEC4qBmPsIBXZGFNeWmkMFEIG/872CrNvbyl4qfMi/N744JH geBo1qcaNYIHzWcrBdskweFXpGVbidnoZ/5BCMTXkSfQgwdIgQhIQs1q/ rYoApNGtTJZ3+u+BX6I4QU+Zqsukkhn7pbd9ooxywB0RRuRbdbGr3+V+4 76tCdHwLfb9vUncuIaFZTyubtr7qFVNmnGDu9WOn+9XYzJzFlKK4rDeeJ Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10838"; a="365187807" X-IronPort-AV: E=Sophos;i="6.02,161,1688454000"; d="scan'208";a="365187807" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Sep 2023 23:20:12 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10838"; a="740060689" X-IronPort-AV: E=Sophos;i="6.02,161,1688454000"; d="scan'208";a="740060689" Received: from yhuang6-mobl2.sh.intel.com ([10.238.6.133]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Sep 2023 23:20:08 -0700 From: Huang Ying To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Arjan Van De Ven , Huang Ying , Andrew Morton , Mel Gorman , Vlastimil Babka , David Hildenbrand , Johannes Weiner , Dave Hansen , Michal Hocko , Pavel Tatashin , Matthew Wilcox , Christoph Lameter Subject: [PATCH 10/10] mm, pcp: reduce detecting time of consecutive high order page freeing Date: Wed, 20 Sep 2023 14:18:56 +0800 Message-Id: <20230920061856.257597-11-ying.huang@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230920061856.257597-1-ying.huang@intel.com> References: <20230920061856.257597-1-ying.huang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Tue, 19 Sep 2023 23:21:01 -0700 (PDT) In current PCP auto-tuning design, if the number of pages allocated is much more than that of pages freed on a CPU, the PCP high may become the maximal value even if the allocating/freeing depth is small, for example, in the sender of network workloads. If a CPU was used as sender originally, then it is used as receiver after context switching, we need to fill the whole PCP with maximal high before triggering PCP draining for consecutive high order freeing. This will hurt the performance of some network workloads. To solve the issue, in this patch, we will track the consecutive page freeing with a counter in stead of relying on PCP draining. So, we can detect consecutive page freeing much earlier. On a 2-socket Intel server with 128 logical CPU, we tested SCTP_STREAM_MANY test case of netperf test suite with 64-pair processes. With the patch, the network bandwidth improves 3.1%. This restores the performance drop caused by PCP auto-tuning. Signed-off-by: "Huang, Ying" Cc: Andrew Morton Cc: Mel Gorman Cc: Vlastimil Babka Cc: David Hildenbrand Cc: Johannes Weiner Cc: Dave Hansen Cc: Michal Hocko Cc: Pavel Tatashin Cc: Matthew Wilcox Cc: Christoph Lameter --- include/linux/mmzone.h | 2 +- mm/page_alloc.c | 23 +++++++++++------------ 2 files changed, 12 insertions(+), 13 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 35b78c7522a7..44f6dc3cdeeb 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -689,10 +689,10 @@ struct per_cpu_pages { int batch; /* chunk size for buddy add/remove */ u8 flags; /* protected by pcp->lock */ u8 alloc_factor; /* batch scaling factor during allocate */ - u8 free_factor; /* batch scaling factor during free */ #ifdef CONFIG_NUMA u8 expire; /* When 0, remote pagesets are drained */ #endif + short free_count; /* consecutive free count */ /* Lists of pages, one per migrate type stored on the pcp-lists */ struct list_head lists[NR_PCP_LISTS]; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 77e9b7b51688..6ae2a5ebf7a4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2375,13 +2375,10 @@ static int nr_pcp_free(struct per_cpu_pages *pcp, int batch, int high, bool free max_nr_free = high - batch; /* - * Double the number of pages freed each time there is subsequent - * freeing of pages without any allocation. + * Increase the batch number to the number of the consecutive + * freed pages to reduce zone lock contention. */ - batch <<= pcp->free_factor; - if (batch <= max_nr_free && pcp->free_factor < PCP_BATCH_SCALE_MAX) - pcp->free_factor++; - batch = clamp(batch, min_nr_free, max_nr_free); + batch = clamp_t(int, pcp->free_count, min_nr_free, max_nr_free); return batch; } @@ -2408,7 +2405,7 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, * stored on pcp lists */ if (test_bit(ZONE_RECLAIM_ACTIVE, &zone->flags)) { - pcp->high = max(high - (batch << pcp->free_factor), high_min); + pcp->high = max(high - pcp->free_count, high_min); return min(batch << 2, pcp->high); } @@ -2416,10 +2413,10 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, return high; if (test_bit(ZONE_BELOW_HIGH, &zone->flags)) { - pcp->high = max(high - (batch << pcp->free_factor), high_min); + pcp->high = max(high - pcp->free_count, high_min); high = max(pcp->count, high_min); } else if (pcp->count >= high) { - int need_high = (batch << pcp->free_factor) + batch; + int need_high = pcp->free_count + batch; /* pcp->high should be large enough to hold batch freed pages */ if (pcp->high < need_high) @@ -2456,7 +2453,7 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, * stops will be drained from vmstat refresh context. */ if (order && order <= PAGE_ALLOC_COSTLY_ORDER) { - free_high = (pcp->free_factor && + free_high = (pcp->free_count >= batch && (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER) && (!(pcp->flags & PCPF_FREE_HIGH_BATCH) || pcp->count >= READ_ONCE(batch))); @@ -2464,6 +2461,8 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, } else if (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER) { pcp->flags &= ~PCPF_PREV_FREE_HIGH_ORDER; } + if (pcp->free_count < (batch << PCP_BATCH_SCALE_MAX)) + pcp->free_count += (1 << order); high = nr_pcp_high(pcp, zone, batch, free_high); if (pcp->count >= high) { free_pcppages_bulk(zone, nr_pcp_free(pcp, batch, high, free_high), @@ -2861,7 +2860,7 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone, * See nr_pcp_free() where free_factor is increased for subsequent * frees. */ - pcp->free_factor >>= 1; + pcp->free_count >>= 1; list = &pcp->lists[order_to_pindex(migratetype, order)]; page = __rmqueue_pcplist(zone, order, migratetype, alloc_flags, pcp, list); pcp_spin_unlock(pcp); @@ -5483,7 +5482,7 @@ static void per_cpu_pages_init(struct per_cpu_pages *pcp, struct per_cpu_zonesta pcp->high_min = BOOT_PAGESET_HIGH; pcp->high_max = BOOT_PAGESET_HIGH; pcp->batch = BOOT_PAGESET_BATCH; - pcp->free_factor = 0; + pcp->free_count = 0; } static void __zone_set_pageset_high_and_batch(struct zone *zone, unsigned long high_min, -- 2.39.2