Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp3247562pxj; Mon, 24 May 2021 02:08:44 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwGw1pN9sNR51rbICjbtccF7v9FuDceZsYL/4qimV43QN/iMba9YzSerKvXAPdHfldws3Di X-Received: by 2002:a17:906:73d7:: with SMTP id n23mr22364651ejl.135.1621847323971; Mon, 24 May 2021 02:08:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621847323; cv=none; d=google.com; s=arc-20160816; b=GF61b7nRwgdcD/cS4ozdtmgXc8zbZyQ4AZjzighkCz/+LWK6iIRZQsPONPVMJzith8 vyqbVqumaFPdv9koIZ3s6ILxjpM/KhFtt0ZuaBOdGSVNY4OGKKfTY/h3PFSLWddDu9n9 XeJlK/Ul/L98EjKCcmOQwjh3xYRqrcjbA06MNl9/4tfHVyTTpXtoCRu7ApGC5MVa2KGN w9NaINFwo/r/BPQtX5MJm12h9jZ4pbR2xT4+b0E6A+tbE/lsIRU++8U9Pc+qHRqfoXKD xBGscslIgEif5IjTmU1rVJ2tS/X9TB4bNRUYJeTuCJy0NE5CUUW+Wj/MftMqBcjkB9lf ChTg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=zLP/ccZj8mK5AGDJyMXD2KFUpgGk4SECItevnFf/GTI=; b=g0yMduaitq+XEdfpdxxwyNNzby9U9g/pe9jGczNE2No8rfVBRsLopsH+lL+AAw4pB4 XQvj9I6paWDk6HDTQ4RZQB0nFi+U4NoP8SOLHkqtzprQWmvRs+0M3nP7x6wCXumayDvo jE6LdQ9lkU/6a8CeB0aYCAtgrIUgIMcjdN/4NUetk/zCfO291cr7ODr0nZRO8+O7pTi8 CWtKVH7DqBrnFLR/eAKwY2Nrt/d3PbKg6R03f4VHgihVtfzeqOZ/1nr8ixMyp2vA8cCX 5zHVRQ/1rc563GYEE2XMDUepOjm95zdg0GA5t8m/1mzy0HF/B9PXdBprnL+5kPO7GfKd b3fA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v9si7686085edy.49.2021.05.24.02.08.20; Mon, 24 May 2021 02:08:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232415AbhEXJI5 (ORCPT + 99 others); Mon, 24 May 2021 05:08:57 -0400 Received: from outbound-smtp47.blacknight.com ([46.22.136.64]:45925 "EHLO outbound-smtp47.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232362AbhEXJI4 (ORCPT ); Mon, 24 May 2021 05:08:56 -0400 Received: from mail.blacknight.com (pemlinmail05.blacknight.ie [81.17.254.26]) by outbound-smtp47.blacknight.com (Postfix) with ESMTPS id 5D31BFA88F for ; Mon, 24 May 2021 10:07:28 +0100 (IST) Received: (qmail 10052 invoked from network); 24 May 2021 09:07:28 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.23.168]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 24 May 2021 09:07:28 -0000 Date: Mon, 24 May 2021 10:07:26 +0100 From: Mel Gorman To: Dave Hansen Cc: Linux-MM , Dave Hansen , Matthew Wilcox , Vlastimil Babka , Michal Hocko , Nicholas Piggin , LKML Subject: Re: [PATCH 3/6] mm/page_alloc: Adjust pcp->high after CPU hotplug events Message-ID: <20210524090726.GB30378@techsingularity.net> References: <20210521102826.28552-1-mgorman@techsingularity.net> <20210521102826.28552-4-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 21, 2021 at 03:13:35PM -0700, Dave Hansen wrote: > On 5/21/21 3:28 AM, Mel Gorman wrote: > > The PCP high watermark is based on the number of online CPUs so the > > watermarks must be adjusted during CPU hotplug. At the time of > > hot-remove, the number of online CPUs is already adjusted but during > > hot-add, a delta needs to be applied to update PCP to the correct > > value. After this patch is applied, the high watermarks are adjusted > > correctly. > > > > # grep high: /proc/zoneinfo | tail -1 > > high: 649 > > # echo 0 > /sys/devices/system/cpu/cpu4/online > > # grep high: /proc/zoneinfo | tail -1 > > high: 664 > > # echo 1 > /sys/devices/system/cpu/cpu4/online > > # grep high: /proc/zoneinfo | tail -1 > > high: 649 > > This is actually a comment more about the previous patch, but it doesn't > really become apparent until the example above. > > In your example, you mentioned increased exit() performance by using > "vm.percpu_pagelist_fraction to increase the pcp->high value". That's > presumably because of the increased batching effects and fewer lock > acquisitions. > Yes > But, logically, doesn't that mean that, the more CPUs you have in a > node, the *higher* you want pcp->high to be? If we took this to the > extreme and had an absurd number of CPUs in a node, we could end up with > a too-small pcp->high value. > I see your point but I don't think increasing pcp->high for larger numbers of CPUs is the right answer because then reclaim can be triggered simply because too many PCPs have pages. To address your point requires much deeper surgery. zone->lock would have to be split to being a metadata lock and a free page lock. Then the free areas would have to be split based on some factor -- number of CPUs or memory size. That gets complex because then the page allocator loop needs to walk multiple arenas as well as multiple zones as well as consider which arena should be examined first. Fragmentation should also be considered because a decision would need to be made on whether a pageblock should fragment or whether other local areans should be examined. Anything that walks PFNs such as compaction would also need to be aware of arenas and their associated locks. Finally every acquisition of zone->lock would have to be audited to determine exactly what it is protecting. Even with all that, it still makes sense to disassociate pcp->high from pcp->batch as this series does. There is value to doing something like this but it's beyond what this series is trying to do and doing the work without introducing regressions would be very difficult. > Also, do you worry at all about a zone with a low min_free_kbytes seeing > increased zone lock contention? > > ... > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index bf5cdc466e6c..2761b03b3a44 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -6628,7 +6628,7 @@ static int zone_batchsize(struct zone *zone) > > #endif > > } > > > > -static int zone_highsize(struct zone *zone) > > +static int zone_highsize(struct zone *zone, int cpu_online) > > { > > #ifdef CONFIG_MMU > > int high; > > @@ -6640,7 +6640,7 @@ static int zone_highsize(struct zone *zone) > > * CPUs local to a zone. Note that early in boot that CPUs may > > * not be online yet. > > */ > > - nr_local_cpus = max(1U, cpumask_weight(cpumask_of_node(zone_to_nid(zone)))); > > + nr_local_cpus = max(1U, cpumask_weight(cpumask_of_node(zone_to_nid(zone)))) + cpu_online; > > high = low_wmark_pages(zone) / nr_local_cpus; > > Is this "+ cpu_online" bias because the CPU isn't in cpumask_of_node() > when the CPU hotplug callback occurs? If so, it might be nice to mention. Fixed. -- Mel Gorman SUSE Labs