Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp352317pxb; Wed, 14 Apr 2021 17:34:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyjOj0g2OUN0+guwMW4H3reWWb2N7kPGVl/7KQ0ZRqccH0bdm9fKxRJnYrIxf2/iVqRlYsL X-Received: by 2002:a05:6402:1393:: with SMTP id b19mr936597edv.333.1618446844284; Wed, 14 Apr 2021 17:34:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618446844; cv=none; d=google.com; s=arc-20160816; b=kcnh/xgiPcDrrHjdYHv6LZqVL7fxqp4UD8S1ZvolzNDFYeYXkwpq0l7V5713lEyEEh v715i4K/0Xpl2C6k5RfUZ6XRCpS2hyJjwd1ZU9od2zULqhHonh/0SVgeQ50mlAH+S9A6 FzZxJfIFW+Kl7XmD7YGu1bGcCRDtYqk7WpvpdthaD5quV0dtO5yzUUgmJr+GN3YPuQhY RAiDluCG1cscDkbDWaZUu6sWwpjYzFECY1T7CfrzQURwJTqAe6BL1TAYBRLaHq96OIlW dpxlgHAcGrIMQVzSTCfZ9gjEkZKKzlNz/I0d9fd0Ft6d95m94ZuESAnqmn4xTT3uem1u P67Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=UtB61NrbJ6fFVQYhxqdn80ojeB3CgdGPJ2YpYqa/hRg=; b=UAI/CYm0DvZo+wUUSTgQVMuYXjnb0cYbwCbshUJ5pbaN8rurFED/2Aor4OZlkYBc4E VsLlrE0ebfgs2Al1QRWXhIX5rMaO2Vs1LfFNwaDDk1PUYf5M/9hsQhXDXTsfbQFVX1DT 51J0BeaR4js0NMN48I6WWEWdQVdl3wdi0AU0NDgpuRjmtSKF3SxIbTT14oXY2YeppxoM ZjBs35F14lhcYgAZVaQOV78x8+MU97rZWFRfre7t0xORYRdhrvf64rJOUraGUK9ChQnm X6gnT5DYFtQz8FVlOFlxiZI7l87AZ5Q5xkM8H8IOjd233Ye9GSUsbQrcerg2JM5KPoXa +6Kg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z25si838803ejw.647.2021.04.14.17.33.40; Wed, 14 Apr 2021 17:34:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351906AbhDNPTR (ORCPT + 99 others); Wed, 14 Apr 2021 11:19:17 -0400 Received: from outbound-smtp32.blacknight.com ([81.17.249.64]:39044 "EHLO outbound-smtp32.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1352042AbhDNPTO (ORCPT ); Wed, 14 Apr 2021 11:19:14 -0400 Received: from mail.blacknight.com (pemlinmail05.blacknight.ie [81.17.254.26]) by outbound-smtp32.blacknight.com (Postfix) with ESMTPS id 943F3D2AA9 for ; Wed, 14 Apr 2021 16:18:52 +0100 (IST) Received: (qmail 27104 invoked from network); 14 Apr 2021 15:18:52 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.22.4]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 14 Apr 2021 15:18:52 -0000 Date: Wed, 14 Apr 2021 16:18:50 +0100 From: Mel Gorman To: Vlastimil Babka Cc: Linux-MM , Linux-RT-Users , LKML , Chuck Lever , Jesper Dangaard Brouer , Matthew Wilcox , Thomas Gleixner , Peter Zijlstra , Ingo Molnar , Michal Hocko , Oscar Salvador Subject: Re: [PATCH 04/11] mm/vmstat: Convert NUMA statistics to basic NUMA counters Message-ID: <20210414151850.GG3697@techsingularity.net> References: <20210407202423.16022-1-mgorman@techsingularity.net> <20210407202423.16022-5-mgorman@techsingularity.net> <7a7ec563-0519-a850-563a-9680a7bd00d3@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <7a7ec563-0519-a850-563a-9680a7bd00d3@suse.cz> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 14, 2021 at 02:56:45PM +0200, Vlastimil Babka wrote: > On 4/7/21 10:24 PM, Mel Gorman wrote: > > NUMA statistics are maintained on the zone level for hits, misses, foreign > > etc but nothing relies on them being perfectly accurate for functional > > correctness. The counters are used by userspace to get a general overview > > of a workloads NUMA behaviour but the page allocator incurs a high cost to > > maintain perfect accuracy similar to what is required for a vmstat like > > NR_FREE_PAGES. There even is a sysctl vm.numa_stat to allow userspace to > > turn off the collection of NUMA statistics like NUMA_HIT. > > > > This patch converts NUMA_HIT and friends to be NUMA events with similar > > accuracy to VM events. There is a possibility that slight errors will be > > introduced but the overall trend as seen by userspace will be similar. > > Note that while these counters could be maintained at the node level that > > it would have a user-visible impact. > > I guess this kind of inaccuracy is fine. I just don't like much > fold_vm_zone_numa_events() which seems to calculate sums of percpu counters and > then assign the result to zone counters for immediate consumption, which differs > from other kinds of folds in vmstat that reset the percpu counters to 0 as they > are treated as diffs to the global counters. > The counters that are diffs fit inside an s8 and they are kept limited because their "true" value is sometimes critical -- e.g. NR_FREE_PAGES for watermark checking. So the level of drift has to be controlled and the drift should not exist potentially forever so it gets updated periodically. The inaccurate counters are only exported to userspace. There is no need to update them every few seconds so fold_vm_zone_numa_events() is only called when a user cares but you raise a raise a valid below. > So it seems that this intermediate assignment to zone counters (using > atomic_long_set() even) is unnecessary and this could mimic sum_vm_events() that > just does the summation on a local array? > The atomic is unnecessary for sure but using a local array is problematic because of your next point. > And probably a bit more serious is that vm_events have vm_events_fold_cpu() to > deal with a cpu going away, but after your patch the stats counted on a cpu just > disapepar from the sums as it goes offline as there's no such thing for the numa > counters. > That is a problem I missed. Even if zonestats was preserved on hot-remove, fold_vm_zone_numa_events would not be reading the CPU so hotplug events jump all over the place. So some periodic folding is necessary. I would still prefer not to do it by time but it could be done only on overflow or when a file like /proc/vmstat is read. I'll think about it a bit more and see what I come up with. Thanks! -- Mel Gorman SUSE Labs