Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp133170ybl; Fri, 16 Aug 2019 20:35:38 -0700 (PDT) X-Google-Smtp-Source: APXvYqz6mfHfMJJYMbwICUua2lvG92btEeffwoRf5tuXOftOGu33Vg+HR6IQhkp55BKIIvUqxSCB X-Received: by 2002:aa7:81d4:: with SMTP id c20mr14008072pfn.235.1566012938896; Fri, 16 Aug 2019 20:35:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1566012938; cv=none; d=google.com; s=arc-20160816; b=XWwvubUQzxZ8xZpYvQW/prDz9ea3YEJBDHD/hIIPmqVB5wxen4MMAf31YZ6fjGwalF iR8LEDAbUjjrAJxOFKTcmxQoEgFrhbuWO3vmhP+TzliaddHqptB6QxQYtz2yHWRlwyzS bl5j+j6bJvl4fY1wrUIZVsvhN++HYLbVn7wLlTnCBVIKkhtpA3MRxnJjqMePtLBSU+vI iVPHU+ykHSwK/dgeQNUbezvlUPm4uR3g9H0tbwFBSh6u4UP5NRJHvJ5zEvNybD12DuF+ qMub2y6GmhBP3IEUmRXumATDsp+YVoT8RXPoMus4mG2SHJ+cXatROkaeqVvx0eKpCvml Gr+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=Nun2JfC7ez1tC7LAJQflPI3bnSU4+ToyXIlxSOxQrz4=; b=Hkqxa3LQujiRLDEKUfWaG3oqyIy9oozdTnzC0tCfGLp9PNkkcvETSkY39XIZoYhazu Adm/HkTRZNKik13lh7W1zPCz2R71DsIskAIK3ReE4ZFrXgTOIr+ZUXuli7VeIVL120lF 9uA61khlw804y58grmXMkIaI2Jp3G3hKV+1BoNWP3e3JlW8mGxkhGhkAl/7bQ5OpFihk 5gXcGMn2BAZS2fP4YUrLn38x0Pv2CZLZdDsTpIiZB0+AFX4K4l5IWodM2UfavfjeTTMy apDvbMEUOvoFaoTxenRcoNPg3Z0GTV68OSNmGeXPhzWu+JIRxMtMB7NX9EDGvtQsOQFx /jCA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Kf9pb5KB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i8si5066069pgs.87.2019.08.16.20.35.24; Fri, 16 Aug 2019 20:35:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Kf9pb5KB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726162AbfHQDec (ORCPT + 99 others); Fri, 16 Aug 2019 23:34:32 -0400 Received: from mail-io1-f66.google.com ([209.85.166.66]:44355 "EHLO mail-io1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725832AbfHQDec (ORCPT ); Fri, 16 Aug 2019 23:34:32 -0400 Received: by mail-io1-f66.google.com with SMTP id j4so10226427iop.11; Fri, 16 Aug 2019 20:34:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Nun2JfC7ez1tC7LAJQflPI3bnSU4+ToyXIlxSOxQrz4=; b=Kf9pb5KBMGYwgI+0kzbxt2Tw9l6jQWED1EcMLYyVY9p34X5qI//6NfdUJ85bl7UOut s1N3104WhhoYDJvZFKiKARuM87iDLNA4pJuptcvX3bGp/zAfRIuCUa4tAdo074Gq613V tglnLRUXVHX8JtqjPJCijkaatL+EPounGC6BYhLQupL9WRhwWXRcZL0i+ZezrRkfKN+T PIPDD1tCGtkTa8zYTSrFkUoBuqhW+V6di30HWLUuRb+iDI3lIsoER3611C9RIdxarQnM IFo1RFast19nI6unyTFpAhrOD8obVtGWU/r2hJUiGxGaKTy+lnm+IERqecSNwfOHOsW8 +vSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Nun2JfC7ez1tC7LAJQflPI3bnSU4+ToyXIlxSOxQrz4=; b=NCvfjQRDd/A8QBVyxz2c7fpjjLaODQ2ijMmv8WfxP6rGJUYjGrzL97zC0NMm7ThWSs eoT+Ns2XmV8cOBFiU7ufOaJc4VZ9mMFsPutpnf0WCc6go3tQzQDTVrJiG0CmzJ8TbyEp rtvF9/F18rtojrdgvXWRsvv9g2SBYdwzFrVRDHP3tUMy6vszpLS5vpUV9KzyKKCc2gTl vt0oc0x2VgdXUwvAaEMl8R+ksHWGw++h2vjNUoZVzqmPSh4e+D9hNCCA4SpxlA6zFwRl g8mj1CEIQ3dS3rH0pAPeUih6or9GHMhII8vK7rcSyIhDkVGc9zGW6sOwy/z4B1Ax1Oev 9yKA== X-Gm-Message-State: APjAAAXdsUM0hgMbshX9XyV3FijND4nJkPBQVJypcQdc2Omkkr8bQaLc vLqAy7ozSPmTkD6qEnbJLUycQd6nEGHYezTKKWk= X-Received: by 2002:a02:54c1:: with SMTP id t184mr15032478jaa.10.1566012871360; Fri, 16 Aug 2019 20:34:31 -0700 (PDT) MIME-Version: 1.0 References: <20190817004726.2530670-1-guro@fb.com> In-Reply-To: <20190817004726.2530670-1-guro@fb.com> From: Yafang Shao Date: Sat, 17 Aug 2019 11:33:57 +0800 Message-ID: Subject: Re: [PATCH] Partially revert "mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones" To: Roman Gushchin Cc: Andrew Morton , Linux MM , Michal Hocko , Johannes Weiner , LKML , kernel-team@fb.com, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Aug 17, 2019 at 8:47 AM Roman Gushchin wrote: > > Commit 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync > with the hierarchical ones") effectively decreased the precision of > per-memcg vmstats_local and per-memcg-per-node lruvec percpu counters. > > That's good for displaying in memory.stat, but brings a serious regression > into the reclaim process. > > One issue I've discovered and debugged is the following: > lruvec_lru_size() can return 0 instead of the actual number of pages > in the lru list, preventing the kernel to reclaim last remaining > pages. Result is yet another dying memory cgroups flooding. > The opposite is also happening: scanning an empty lru list > is the waste of cpu time. > > Also, inactive_list_is_low() can return incorrect values, preventing > the active lru from being scanned and freed. It can fail both because > the size of active and inactive lists are inaccurate, and because > the number of workingset refaults isn't precise. In other words, > the result is pretty random. > > I'm not sure, if using the approximate number of slab pages in > count_shadow_number() is acceptable, but issues described above > are enough to partially revert the patch. > > Let's keep per-memcg vmstat_local batched (they are only used for > displaying stats to the userspace), but keep lruvec stats precise. > This change fixes the dead memcg flooding on my setup. > That will make some misunderstanding if the local counters are not in sync with the hierarchical ones (someone may doubt whether there're something leaked.). If we have to do it like this, I think we should better document this behavior. > Fixes: 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones") > Signed-off-by: Roman Gushchin > Cc: Yafang Shao > Cc: Johannes Weiner > --- > mm/memcontrol.c | 8 +++----- > 1 file changed, 3 insertions(+), 5 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 249187907339..3429340adb56 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -746,15 +746,13 @@ void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, > /* Update memcg */ > __mod_memcg_state(memcg, idx, val); > > + /* Update lruvec */ > + __this_cpu_add(pn->lruvec_stat_local->count[idx], val); > + > x = val + __this_cpu_read(pn->lruvec_stat_cpu->count[idx]); > if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) { > struct mem_cgroup_per_node *pi; > > - /* > - * Batch local counters to keep them in sync with > - * the hierarchical ones. > - */ > - __this_cpu_add(pn->lruvec_stat_local->count[idx], x); > for (pi = pn; pi; pi = parent_nodeinfo(pi, pgdat->node_id)) > atomic_long_add(x, &pi->lruvec_stat[idx]); > x = 0; > -- > 2.21.0 >