Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp4232604pxb; Tue, 2 Mar 2021 09:45:55 -0800 (PST) X-Google-Smtp-Source: ABdhPJz6914NTjSNo7EhYM9vjdCMdYPcdjAtN/u2VB1h44xe39XVsJGlEzIHIYbzVFj2P4qZ3dVa X-Received: by 2002:a05:6402:10c8:: with SMTP id p8mr21080263edu.144.1614707154920; Tue, 02 Mar 2021 09:45:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614707154; cv=none; d=google.com; s=arc-20160816; b=nKT6qdLUzz9YI/6H/RovmvXF9HgQ93Nlw6Ec97nstdJXMOLOUG49ckPRpw0rKJYxhX i/rgJ7ByJAgeHgSiaiYiJpi+zvJDx71N8gpshx3cBnFP+Xjv6T/yEbSiFB/cxeb5K6t0 siLYX/bV7HPaeTl2xu2gyBDrQSwONAmzaUPmdsI5gCTMP+AOZZ7MsrdCwd/s7RTMNAvA qnmMJiuW/IfXlV0IvIEMmYRAYQqreHeUOzeZAQIaBG6QnTRvvPO1Ca5YDiQYjtFDdsSF HUhVGKD8yZA9rrqn9lSNzBKYsDg/7ho/O5zlSJjZzX8ZYmL7FW5Xe9Ae4xrn9WdxlWOT UxCg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:references:message-id :in-reply-to:subject:cc:to:from:date:dkim-signature; bh=Og6LUavDt+OG4SGQrgLFpZJxNNTz6tmp7HOJDZDJ1xU=; b=YdFxyQCvsNjbhr2P+wsvC3qljv122Tks+B0NCs1CthztGtywCFI0jDUVHhu5ldweyZ 7GZz6pAY6v7Tg/Yx6sectMRawl0BzBsHgLNshyciFZr4CQZiIbgBVdaX4IikW1sHsgBs up+3B8ZbjTo6dmQMCEoRIs/VMqEJx+5fdgGYOyx5LY90z+AvJ2Zwqm3Ux52ZUrVxMb7M sxPlYAJ2h4KtgXl60cvy8tOvxQ08zlS9z6j1pAU5uFar/dn0hkzhWdFvK1+LnssLxss6 MR6GoFoZAXSR1dNfCQzBOCGXH7WmsAoZZVqWq2VHxmpGD8fztGzKvrdzKZ58M0xUdolh ouig== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=boKdmhhV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b1si1423632edq.5.2021.03.02.09.45.31; Tue, 02 Mar 2021 09:45:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=boKdmhhV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1575193AbhCBDyB (ORCPT + 99 others); Mon, 1 Mar 2021 22:54:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53424 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245122AbhCAWJr (ORCPT ); Mon, 1 Mar 2021 17:09:47 -0500 Received: from mail-ot1-x32f.google.com (mail-ot1-x32f.google.com [IPv6:2607:f8b0:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 89FB5C061756 for ; Mon, 1 Mar 2021 14:09:06 -0800 (PST) Received: by mail-ot1-x32f.google.com with SMTP id f33so18030558otf.11 for ; Mon, 01 Mar 2021 14:09:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=Og6LUavDt+OG4SGQrgLFpZJxNNTz6tmp7HOJDZDJ1xU=; b=boKdmhhVglqyH3x2SwJtdSPRESqRj+XFpHZ19kWLcLuk3PdvxHt5zGQP6dOT0EaDeo UAhRXu1N9DnBApCp3gGBf4LBn0n8Dai32ehLTW6H5D/Cel9MKczU4C+4Zt1d03cv4kmc Skm3GeaiLYp3OoNm0qX5meZzXHLtPNrlrcgF6rSqTE/44tNm2ZDPlleBJK24R8W683cy /oXOq4iaPmrlk/zAgkLX0OC8ptv8Q1XKXrRKIZZx11lY1uJX+u+lIIeDKaAwIW+arfFp Js5fQtozihwy2btyyCKRZqJ1Yeg3j797Swlt5sFfA05bvQtdYkFZd1M07DA21Z84MYuY lk+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=Og6LUavDt+OG4SGQrgLFpZJxNNTz6tmp7HOJDZDJ1xU=; b=qk8qN/V1gAREkL92UPR1Ch4TdXR7gbOf7gVuazqRlxZOpwPZpEbb0czcjWCNMpLyhR 4BoMHdtldCda0yiAvKMiHz+OmFBN48vMq5aWQlfOz81b7hT0GI+H4Yo8aVWH9yQugrdt GxjXF3WhjWmkos72tFCnb8ghXY92FoTYJ/DtYlUqsyNp1lujEcA1aiw2M7+g6Eu+AyQo GhDKDEm30y+4QFXZced9UasbtJELYPsv9zFhd7/Ea3iNS3/qq38C68shZTmnLOFGdDXQ ZRxjp/NYf71833zE/kr59WIYeBk9u8gT8P0YwtgFUr7Z1o79jTwMsClaquYcJ1MKvQEY 5b7Q== X-Gm-Message-State: AOAM53130qMoJzE3jbTeelNMETsyL0T5ZG70U2Xk+u40bs7zJsRzcLDp QpqOnZv6SHE+QEooN3RQatoxIA== X-Received: by 2002:a9d:7a88:: with SMTP id l8mr15019090otn.289.1614636545715; Mon, 01 Mar 2021 14:09:05 -0800 (PST) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id p12sm3735344oon.12.2021.03.01.14.09.04 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Mon, 01 Mar 2021 14:09:05 -0800 (PST) Date: Mon, 1 Mar 2021 14:08:17 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Roman Gushchin cc: Hugh Dickins , Andrew Morton , Johannes Weiner , Michal Hocko , Vlastimil Babka , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 3/4] mm: /proc/sys/vm/stat_refresh skip checking known negative stats In-Reply-To: Message-ID: References: User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, 28 Feb 2021, Roman Gushchin wrote: > On Thu, Feb 25, 2021 at 03:14:03PM -0800, Hugh Dickins wrote: > > vmstat_refresh() can occasionally catch nr_zone_write_pending and > > nr_writeback when they are transiently negative. The reason is partly > > that the interrupt which decrements them in test_clear_page_writeback() > > can come in before __test_set_page_writeback() got to increment them; > > but transient negatives are still seen even when that is prevented, and > > we have not yet resolved why (Roman believes that it is an unavoidable > > consequence of the refresh scheduled on each cpu). But those stats are > > not buggy, they have never been seen to drift away from 0 permanently: > > so just avoid the annoyance of showing a warning on them. > > > > Similarly avoid showing a warning on nr_free_cma: CMA users have seen > > that one reported negative from /proc/sys/vm/stat_refresh too, but it > > does drift away permanently: I believe that's because its incrementation > > and decrementation are decided by page migratetype, but the migratetype > > of a pageblock is not guaranteed to be constant. > > > > Use switch statements so we can most easily add or remove cases later. > > I'm OK with the code, but I can't fully agree with the commit log. I don't think > there is any mystery around negative values. Let me copy-paste the explanation > from my original patch: > > These warnings* are generated by the vmstat_refresh() function, which > assumes that atomic zone and numa counters can't go below zero. However, > on a SMP machine it's not quite right: due to per-cpu caching it can in > theory be as low as -(zone threshold) * NR_CPUs. > > For instance, let's say all cma pages are in use and NR_FREE_CMA_PAGES > reached 0. Then we've reclaimed a small number of cma pages on each CPU > except CPU0, so that most percpu NR_FREE_CMA_PAGES counters are slightly > positive (the atomic counter is still 0). Then somebody on CPU0 consumes > all these pages. The number of pages can easily exceed the threshold and > a negative value will be committed to the atomic counter. > > * warnings about negative NR_FREE_CMA_PAGES Hi Roman, thanks for your Acks on the others - and indeed this is the one on which disagreement was more to be expected. I certainly wanted (and included below) a Link to your original patch; and even wondered whether to paste your description into mine. But I read it again and still have issues with it. Mainly, it does not convey at all, that touching stat_refresh adds the per-cpu counts into the global atomics, resetting per-cpu counts to 0. Which does not invalidate your explanation: races might still manage to underflow; but it does take the "easily" out of "can easily exceed". Since I don't use CMA on any machine, I cannot be sure, but it looked like a bad example to rely upon, because of its migratetype-based accounting. If you use /proc/sys/vm/stat_refresh frequently enough, without suppressing the warning, I guess that uncertainty could be resolved by checking whether nr_free_cma is seen with negative value in consecutive refreshes - which would tend to support my migratetype theory - or only singly - which would support your raciness theory. > > Actually, the same is almost true for ANY other counter. What differs CMA, dirty > and write pending counters is that they can reach 0 value under normal conditions. > Other counters are usually not reaching values small enough to see negative values > on a reasonable sized machine. Looking through /proc/vmstat now, yes, I can see that there are fewer counters which hover near 0 than I had imagined: more have a positive bias, or are monotonically increasing. And I'd be lying if I said I'd never seen any others than nr_writeback or nr_zone_write_pending caught negative. But what are you asking for? Should the patch be changed, to retry the refresh_vm_stats() before warning, if it sees any negative? Depends on how terrible one line in dmesg is considered! > > Does it makes sense? I'm not sure: you were not asking for the patch to be changed, but its commit log: and I better not say "Roman believes that it is an unavoidable consequence of the refresh scheduled on each cpu" if that's untrue (or unclear: now it reads to me as if we're accusing the refresh of messing things up, whereas it's the non-atomic nature of the refresh which leaves it vulnerable to races). Hugh > > > > > Link: https://lore.kernel.org/linux-mm/20200714173747.3315771-1-guro@fb.com/ > > Reported-by: Roman Gushchin > > Signed-off-by: Hugh Dickins > > --- > > > > mm/vmstat.c | 15 +++++++++++++++ > > 1 file changed, 15 insertions(+) > > > > --- vmstat2/mm/vmstat.c 2021-02-25 11:56:18.000000000 -0800 > > +++ vmstat3/mm/vmstat.c 2021-02-25 12:42:15.000000000 -0800 > > @@ -1840,6 +1840,14 @@ int vmstat_refresh(struct ctl_table *tab > > if (err) > > return err; > > for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) { > > + /* > > + * Skip checking stats known to go negative occasionally. > > + */ > > + switch (i) { > > + case NR_ZONE_WRITE_PENDING: > > + case NR_FREE_CMA_PAGES: > > + continue; > > + } > > val = atomic_long_read(&vm_zone_stat[i]); > > if (val < 0) { > > pr_warn("%s: %s %ld\n", > > @@ -1856,6 +1864,13 @@ int vmstat_refresh(struct ctl_table *tab > > } > > #endif > > for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) { > > + /* > > + * Skip checking stats known to go negative occasionally. > > + */ > > + switch (i) { > > + case NR_WRITEBACK: > > + continue; > > + } > > val = atomic_long_read(&vm_node_stat[i]); > > if (val < 0) { > > pr_warn("%s: %s %ld\n", >