Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp309881pxf; Wed, 31 Mar 2021 04:03:41 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzvv6mhLdFUcCooKGsrU+yQGOxqZR+4EijrBJx/P6kPT4+Njgcd+8AKERmsEvqjh3KijKOm X-Received: by 2002:a17:906:5e50:: with SMTP id b16mr2966334eju.272.1617188620935; Wed, 31 Mar 2021 04:03:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617188620; cv=none; d=google.com; s=arc-20160816; b=ag7MkbCkBtVylyyWNyrTrXqPe6KmFEccAJ3UnAwCLlar5QdNJDY6IO4u0NV7CX5iAO 3y6I9f+5LAfZpL0SuLsbjUW3UoKfTQ8TOQgua28UN0d684ECvCZKL3B7W2zzhXV5uXtt rtV1y92f2qkxjd3lAlwGqHW0mb+tL7ffZKipgVrdQIYzj+PxJTFMc11HVAlW16x0ZVPL dlhmekpZUoQ+FTM8Wv5pTFQtsMfa/c20QKYNy711BKoLZUCHTBWPUferBOO7rkc83Tg9 vGmw+6W9NUrz+hqheItai+/o5ifnWf0HJiVj9m3LQ3XMiILA3y8ktHLX+6iudZNZPp40 +O/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=gZvmNO1zl6qdXPaBc1MiO/NAAZFv061aPaJNyIF2J8U=; b=HIE0K4uk0FgGCgv+23+5iBpzbGYJ3LO77zMe8jKRMaKS6Q0OISjrfhsYaK5W+tIEtG z9PFUfsCxlYjYLk9nGGQmP9Q5ET/C36yosjT9nVeaqh12y6Brwyjdj+YFnvFlKvSoV1l pQwxmy+nFLK+L6EMSEwC63plIOm3GzB9ARF183/VuxkX4y0s+cxYXt4PIMEsTER5yV/U 12oIPsZkG8tTiG6a3CjoDq6BoCOhiLmn45TvygwtPXE8UU6vCp5ESCApfA24jNvtFb4+ MD62KXWCEXwy9WQduSsUfO7VZXtcmOApX+8eIbXordYnQhHYQN5W4EQlY8GC3VHXE2sZ OS0A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b6si1560234ejb.254.2021.03.31.04.03.18; Wed, 31 Mar 2021 04:03:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235300AbhCaLCP (ORCPT + 99 others); Wed, 31 Mar 2021 07:02:15 -0400 Received: from outbound-smtp15.blacknight.com ([46.22.139.232]:40025 "EHLO outbound-smtp15.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235539AbhCaLBl (ORCPT ); Wed, 31 Mar 2021 07:01:41 -0400 Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp15.blacknight.com (Postfix) with ESMTPS id F070C1C3851 for ; Wed, 31 Mar 2021 12:01:38 +0100 (IST) Received: (qmail 6962 invoked from network); 31 Mar 2021 11:01:38 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.22.4]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 31 Mar 2021 11:01:38 -0000 Date: Wed, 31 Mar 2021 12:01:37 +0100 From: Mel Gorman To: Thomas Gleixner Cc: Linux-MM , Linux-RT-Users , LKML , Chuck Lever , Jesper Dangaard Brouer , Matthew Wilcox , Sebastian Andrzej Siewior Subject: Re: [PATCH 2/6] mm/page_alloc: Convert per-cpu list protection to local_lock Message-ID: <20210331110137.GA3697@techsingularity.net> References: <20210329120648.19040-1-mgorman@techsingularity.net> <20210329120648.19040-3-mgorman@techsingularity.net> <877dln640j.ffs@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <877dln640j.ffs@nanos.tec.linutronix.de> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 31, 2021 at 11:55:56AM +0200, Thomas Gleixner wrote: > On Mon, Mar 29 2021 at 13:06, Mel Gorman wrote: > > There is a lack of clarity of what exactly local_irq_save/local_irq_restore > > protects in page_alloc.c . It conflates the protection of per-cpu page > > allocation structures with per-cpu vmstat deltas. > > > > This patch protects the PCP structure using local_lock which > > for most configurations is identical to IRQ enabling/disabling. > > The scope of the lock is still wider than it should be but this is > > decreased in later patches. The per-cpu vmstat deltas are protected by > > preempt_disable/preempt_enable where necessary instead of relying on > > IRQ disable/enable. > > Yes, this goes into the right direction and I really appreciate the > scoped protection for clarity sake. > Thanks. > > #ifdef CONFIG_MEMORY_HOTREMOVE > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > index 8a8f1a26b231..01b74ff73549 100644 > > --- a/mm/vmstat.c > > +++ b/mm/vmstat.c > > @@ -887,6 +887,7 @@ void cpu_vm_stats_fold(int cpu) > > > > pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu); > > > > + preempt_disable(); > > What's the reason for the preempt_disable() here? A comment would be > appreciated. > Very good question because it's protecting vm_stat_diff and vm_numa_stat_diff in different contexts and not quite correctly at this point of the series. By the end of the series vm_numa_stat_diff is a simple counter and does not need special protection. Right now, it's protecting against a read and clear of vm_stat_diff in two contexts -- cpu_vm_stats_fold and drain_zonestats but it's only defensive. cpu_vm_stats_fold is only called when a CPU is going dead and drain_zonestats is called from memory hotplug context. The protection is necessary only if a new drain_zonestats caller was added without taking the RMW of vm_stat_diff into account which may never happen. This whole problem with preemption could be avoided altogether if this_cpu_xchg was used similar to what is done elsewhere in vmstat so.... this? diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 64429ca4957f..9528304ce24d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8969,8 +8969,9 @@ void zone_pcp_reset(struct zone *zone) struct per_cpu_zonestat *pzstats; /* - * No race with drain_pages. drain_zonestat disables preemption - * and drain_pages relies on the pcp local_lock. + * No race with drain_pages. drain_zonestat is only concerned with + * vm_*_stat_diff which is updated with this_cpu_xchg and drain_pages + * only cares about the PCP lists protected by local_lock. */ if (zone->per_cpu_pageset != &boot_pageset) { for_each_online_cpu(cpu) { diff --git a/mm/vmstat.c b/mm/vmstat.c index 01b74ff73549..34ff61a145d2 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -887,13 +887,11 @@ void cpu_vm_stats_fold(int cpu) pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu); - preempt_disable(); for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) if (pzstats->vm_stat_diff[i]) { int v; - v = pzstats->vm_stat_diff[i]; - pzstats->vm_stat_diff[i] = 0; + v = this_cpu_xchg(pzstats->vm_stat_diff[i], 0); atomic_long_add(v, &zone->vm_stat[i]); global_zone_diff[i] += v; } @@ -903,13 +901,11 @@ void cpu_vm_stats_fold(int cpu) if (pzstats->vm_numa_stat_diff[i]) { int v; - v = pzstats->vm_numa_stat_diff[i]; - pzstats->vm_numa_stat_diff[i] = 0; + v = this_cpu_xchg(pzstats->vm_numa_stat_diff[i], 0); atomic_long_add(v, &zone->vm_numa_stat[i]); global_numa_diff[i] += v; } #endif - preempt_enable(); } for_each_online_pgdat(pgdat) { @@ -943,10 +939,9 @@ void drain_zonestat(struct zone *zone, struct per_cpu_zonestat *pzstats) { int i; - preempt_disable(); for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) if (pzstats->vm_stat_diff[i]) { - int v = pzstats->vm_stat_diff[i]; + int v = this_cpu_xchg(pzstats->vm_stat_diff[i], 0); pzstats->vm_stat_diff[i] = 0; atomic_long_add(v, &zone->vm_stat[i]); atomic_long_add(v, &vm_zone_stat[i]); @@ -955,14 +950,12 @@ void drain_zonestat(struct zone *zone, struct per_cpu_zonestat *pzstats) #ifdef CONFIG_NUMA for (i = 0; i < NR_VM_NUMA_STAT_ITEMS; i++) if (pzstats->vm_numa_stat_diff[i]) { - int v = pzstats->vm_numa_stat_diff[i]; + int v = this_cpu_xchg(pzstats->vm_numa_stat_diff[i], 0); - pzstats->vm_numa_stat_diff[i] = 0; atomic_long_add(v, &zone->vm_numa_stat[i]); atomic_long_add(v, &vm_numa_stat[i]); } #endif - preempt_enable(); } #endif -- Mel Gorman SUSE Labs