Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp1032097pxj; Fri, 21 May 2021 05:02:59 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz+4hTBf4SUDopWq3f6byEz7tDOJ2lCOuzBsvv9QgNHbZUZJGiILjz6T9BvC0lHHpVuj1VY X-Received: by 2002:a7b:c084:: with SMTP id r4mr8201761wmh.102.1621598578874; Fri, 21 May 2021 05:02:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621598578; cv=none; d=google.com; s=arc-20160816; b=BV3BO75ZqNqABHZMWsVoHUGqLpW2y5SoAgkR3tMNu8I2MCoIDiOsm10hKF19rP7Hk/ gl4CP8RavDSzbrkS3s/tOpC19EDR0Y8oJ/CsUwN3w8fRnc5cWhbYmkLHi3S3fz3znPJ9 GgFIJ72NfqUCrv/CdMEeSMl8nz5OURsXtM/blmo4xA6zjCBlnPdE/t3DN/vOwEXDkIUz kxSJ7Rrl4h75xG9eV/jSKh+uszxPs8kJvClvSvO3hjx+XbxWP1Oz2EyDath5H6UKdt32 R5zD8h/FaipkQ23UhxbWGb4mTcmJTC6fBlONFV7wK9YgB/rkCbVEFosYPcJzpaKdPu68 +EbA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=bp5RB2C83KgT7/YH+r80Ig4YU272wKGLAOOk/LoJlsU=; b=pSqQXCgugZksKN50F5y0CkSYdpbXMrLJgQHOUDbL+lfu9Rw5lKk+3Q2PZtGT5DhZo1 IZR5p67P4PPVa7UY9bR5RCnqr/rLH0LMQY9J54evtkQr9nyqXvC+to29e25DAR30ftux QuiBjfiWf136N15HzFeqEGnaWRNtDfUOAi0G/SwqVxa5edoxEMLjXcjX17qzTgOXT/bk j+8eOG3WGAvdGcMxBQ4p7QNBIeF05a6EcKfCvSvEGsUlNphU+53rWXbvI2qZwa2DV+Dm NpwFMWu7xwolgmhV9hg6668469aklN9YiQ63BN62NIvfyL+CFYhNg4x64eTJDE2Foa60 LBFA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z4si5547771ejp.321.2021.05.21.05.02.27; Fri, 21 May 2021 05:02:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230330AbhEUKbb (ORCPT + 99 others); Fri, 21 May 2021 06:31:31 -0400 Received: from outbound-smtp21.blacknight.com ([81.17.249.41]:34232 "EHLO outbound-smtp21.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234449AbhEUKaG (ORCPT ); Fri, 21 May 2021 06:30:06 -0400 Received: from mail.blacknight.com (pemlinmail01.blacknight.ie [81.17.254.10]) by outbound-smtp21.blacknight.com (Postfix) with ESMTPS id BE058CCB51 for ; Fri, 21 May 2021 11:28:36 +0100 (IST) Received: (qmail 21556 invoked from network); 21 May 2021 10:28:36 -0000 Received: from unknown (HELO stampy.112glenside.lan) (mgorman@techsingularity.net@[84.203.23.168]) by 81.17.254.9 with ESMTPA; 21 May 2021 10:28:36 -0000 From: Mel Gorman To: Linux-MM Cc: Dave Hansen , Matthew Wilcox , Vlastimil Babka , Michal Hocko , Nicholas Piggin , LKML , Mel Gorman Subject: [RFC PATCH 0/6] Calculate pcp->high based on zone sizes and active CPUs Date: Fri, 21 May 2021 11:28:20 +0100 Message-Id: <20210521102826.28552-1-mgorman@techsingularity.net> X-Mailer: git-send-email 2.26.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The per-cpu page allocator (PCP) is meant to reduce contention on the zone lock but the sizing of batch and high is archaic and neither takes the zone size into account or the number of CPUs local to a zone. Furthermore, the fact that vm.percpu_pagelist_fraction adjusts both batch and high values means that the sysctl can reduce zone lock contention but also increase allocation latencies. This series disassociates pcp->high from pcp->batch and then scales pcp->high based on the size of the local zone with limited impact to reclaim and accounting for active CPUs but leaves pcp->batch static. It also adapts the number of pages that can be on the pcp list based on recent freeing patterns. The motivation is partially to adjust to larger memory sizes but is also driven by the fact that large batches of page freeing via release_pages() often shows zone contention as a major part of the problem. Another is a bug report based on an older kernel where a multi-terabyte process can takes several minutes to exit. A workaround was to use vm.percpu_pagelist_fraction to increase the pcp->high value but testing indicated that a production workload could not use the same values because of an increase in allocation latencies. Unfortunately, I cannot reproduce this test case myself as the multi-terabyte machines are in active use but it should alleviate the problem. The series aims to address both and partially acts as a pre-requisite. pcp only works with order-0 which is useless for SLUB (when using high orders) and THP (unconditionally). To store high-order pages on PCP, the pcp->high values need to be increased first. Documentation/admin-guide/sysctl/vm.rst | 19 +-- include/linux/cpuhotplug.h | 2 +- include/linux/mmzone.h | 8 +- kernel/sysctl.c | 8 +- mm/internal.h | 2 +- mm/memory_hotplug.c | 4 +- mm/page_alloc.c | 166 +++++++++++++++++------- mm/vmscan.c | 35 +++++ 8 files changed, 179 insertions(+), 65 deletions(-) -- 2.26.2