Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp4234267pxj; Tue, 25 May 2021 03:26:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyKPvUwOqp80SMTwn0AIAaCuvAbCJxvRQn7JrlC+/n4sViniEo+iOA0Mjndo84v1PoJuM3n X-Received: by 2002:a50:f388:: with SMTP id g8mr30638125edm.236.1621938381913; Tue, 25 May 2021 03:26:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621938381; cv=none; d=google.com; s=arc-20160816; b=vocVvavJIY6ATGGtVHxnRzlUugforzYxDac2Qy0/fBvzI0A3Gb9Ze2WQ8a7jSSGAjR s8YxoFVQU16NkK9GQwH7b3UHXlzJyUdcxbCzQETtgpSSc9X6LKoZEmSzdgtX2t5EklsD yEf2nZUivyq8eVW8mkW6uA33M5f/D6HyADsJdZrdDNTXM1eyuDj9wfWlZjZi7IkEgxWD beL5O9tYTvTiTjqpPlB13K4CSG09F8y64zD/EqWITuZIWqqmpEiGpODFYgGe5Rqc70Cc NpAKLDqgglGFGHStslyMOyf73TXh2uit6Fx6gLGnvQfVNYAr/aGll3SGzMZmYDszAkou TFpA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=BusrJttGoqEwlzWCDjEeLrfxWJJx3GA3LJe0mRZADU0=; b=jq1xldJqXwjalRx8YVg5np/WsqmwRT2acjb/stWwhQsiqh9Q7+13OE0ltcLoMYlaoG XeQEAkiBnCxv1ofXmQTX7mmOl4T7dwD1doA/bF2MZJc01AzPMerPOTLQ/ytmJjjmV82m mIxsw2BNZrSRStwgVV3UjSnrfeeKzwgyhJ+5wZBEpPtGc24HDhaB02UpsYTorkDfaa1E zqa9/OoyU0qXkYRr4uCf4mJj0qcQxQmAsQFh9VAYIOquIHqwGHNTSC9zWmi8lBr0MWZ1 R27i4w7pVsFaz7ztRIg6CjKP332WU6lAyK6XqdrBJaZ97Prt7vg/UBsfVhlPAafGA8bO aKCw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id eb8si6598564edb.574.2021.05.25.03.25.59; Tue, 25 May 2021 03:26:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231822AbhEYIOS (ORCPT + 99 others); Tue, 25 May 2021 04:14:18 -0400 Received: from outbound-smtp10.blacknight.com ([46.22.139.15]:38259 "EHLO outbound-smtp10.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232062AbhEYIN0 (ORCPT ); Tue, 25 May 2021 04:13:26 -0400 Received: from mail.blacknight.com (pemlinmail06.blacknight.ie [81.17.255.152]) by outbound-smtp10.blacknight.com (Postfix) with ESMTPS id 497371C3C8D for ; Tue, 25 May 2021 09:01:30 +0100 (IST) Received: (qmail 4802 invoked from network); 25 May 2021 08:01:29 -0000 Received: from unknown (HELO stampy.112glenside.lan) (mgorman@techsingularity.net@[84.203.23.168]) by 81.17.254.9 with ESMTPA; 25 May 2021 08:01:29 -0000 From: Mel Gorman To: Andrew Morton Cc: Hillf Danton , Dave Hansen , Vlastimil Babka , Michal Hocko , LKML , Linux-MM , Mel Gorman Subject: [PATCH 0/6 v2] Calculate pcp->high based on zone sizes and active CPUs Date: Tue, 25 May 2021 09:01:13 +0100 Message-Id: <20210525080119.5455-1-mgorman@techsingularity.net> X-Mailer: git-send-email 2.26.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Changelog since v1 o Clarification comments o Sanity check pcp->high during reclaim (dhansen) o Handle vm.percpu_pagelist_high_fraction in zone_highsize (hdanton) o Sanity check pcp->batch versus pcp->high This series has pre-requisites in mmotm so for convenience it is also available at https://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git mm-pcpburst-v2r3 The per-cpu page allocator (PCP) is meant to reduce contention on the zone lock but the sizing of batch and high is archaic and neither takes the zone size into account or the number of CPUs local to a zone. With larger zones and more CPUs per node, the contention is getting worse. Furthermore, the fact that vm.percpu_pagelist_fraction adjusts both batch and high values means that the sysctl can reduce zone lock contention but also increase allocation latencies. This series disassociates pcp->high from pcp->batch and then scales pcp->high based on the size of the local zone with limited impact to reclaim and accounting for active CPUs but leaves pcp->batch static. It also adapts the number of pages that can be on the pcp list based on recent freeing patterns. The motivation is partially to adjust to larger memory sizes but is also driven by the fact that large batches of page freeing via release_pages() often shows zone contention as a major part of the problem. Another is a bug report based on an older kernel where a multi-terabyte process can takes several minutes to exit. A workaround was to use vm.percpu_pagelist_fraction to increase the pcp->high value but testing indicated that a production workload could not use the same values because of an increase in allocation latencies. Unfortunately, I cannot reproduce this test case myself as the multi-terabyte machines are in active use but it should alleviate the problem. The series aims to address both and partially acts as a pre-requisite. pcp only works with order-0 which is useless for SLUB (when using high orders) and THP (unconditionally). To store high-order pages on PCP, the pcp->high values need to be increased first. Documentation/admin-guide/sysctl/vm.rst | 29 ++-- include/linux/cpuhotplug.h | 2 +- include/linux/mmzone.h | 8 +- kernel/sysctl.c | 8 +- mm/internal.h | 2 +- mm/memory_hotplug.c | 4 +- mm/page_alloc.c | 196 ++++++++++++++++++------ mm/vmscan.c | 35 +++++ 8 files changed, 212 insertions(+), 72 deletions(-) -- 2.26.2