Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp3414243ybl; Mon, 19 Aug 2019 18:31:02 -0700 (PDT) X-Google-Smtp-Source: APXvYqxKKXuu+YfmYi2HBLG/72lG7nVAsU8bFAXjMRBsvksgFaaAci3tVLfVdMlj6klmq4592bBi X-Received: by 2002:aa7:93cc:: with SMTP id y12mr23695308pff.246.1566264662857; Mon, 19 Aug 2019 18:31:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1566264662; cv=none; d=google.com; s=arc-20160816; b=GH/M560nvbdZsfrTeyXD8Onstkx8GiF24sHIQ/sx0Jyxbyq6MWVaJ6dHb7h4PvF4kx B625gopoQc2cFMDUaUm/Me6thKxC8LR3OLy45XRjpbAGjbD8D37D4zfSnCx1kdArcb2F zVwL8vjLAHerc2omJgB8Sa08SVgutsSsmaToFJj7jLYkQKgolevZf0HnMntR2cKSv5KV K3q72gpY5GPJu3RneZ1g05No6JfEWdO7QmgDeqWEKdRbHin3MIszS1z1HgCfFkE1k+4M avNAUBAhGtQXzlobLbsF19ofRPUD612irFwdJgdFomc+klcibau3NDmmd00PZAnU4WNn i7Hg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=KKjD5A8NaZUrH/M43uDXv8CAljNHrPrsN4OaxgcIFWU=; b=qlA5R9j0qbRvoZfnnDnLCjBL6XFs/IwzJe2LxV2abaMNYhU/DcuE69ac/FuuMv+PsU 7vXP++ahCKti1iO+/sQfpVGzSYtwGrovjxUB54GR95NiD3WWHyTUwf74wypfhir6gm5K 6eWJ5FVVFzBrljJpsN3VeWuFnOXF1Ezk3MZBjvKw6w03ZYCBK4p3f8w4eyNmtQO9+yV9 cwbxtK83b5biw54fkPFQGVQ9k8dQWAJb1tCPdyYowYbWr1fiy2Pt/C3yRSWLuQMvPP7F 6wTWN21GWVl4N9/P+5iqGwVx4mTOleZlgaLKdaXk0bcvtpLfZqfmQEFuN45/rkVmkEm3 rMZg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=gVYZIY8B; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v23si11139818pgb.496.2019.08.19.18.30.47; Mon, 19 Aug 2019 18:31:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=gVYZIY8B; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728950AbfHTB3v (ORCPT + 99 others); Mon, 19 Aug 2019 21:29:51 -0400 Received: from mail-io1-f65.google.com ([209.85.166.65]:45323 "EHLO mail-io1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728734AbfHTB3v (ORCPT ); Mon, 19 Aug 2019 21:29:51 -0400 Received: by mail-io1-f65.google.com with SMTP id t3so8597369ioj.12; Mon, 19 Aug 2019 18:29:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=KKjD5A8NaZUrH/M43uDXv8CAljNHrPrsN4OaxgcIFWU=; b=gVYZIY8BzCrk1VyB29fczZF2tKYtOEIigjrlogTwEIWHsU6+681mGRuSKV1Yaa1DNx YhD+oeBQZiyRrYCmRexgUUhbubEWfdOpvKgJJTU1Si+vT/BfbUOMZZ1+sM0D5wVH0Pvf TRQVXvTaXNLees07ZAko50w9Qd+1sQCFYvXI5hIcWyfDLuYYlrOq0JwiUCgLfalasdcz H9jCvAyI2ZjVEoZYsi4yXrHqUkKwX6TxQQ06ulagsxGTDdljwqBD170Rbf2dR2Pr8jr4 pH4p6MM/NdLPN3alOV3IGTvm4PIl9GE0G2VU083yyM5LNAyJW/UBSLBgxjyoZZdLzo9b wKYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=KKjD5A8NaZUrH/M43uDXv8CAljNHrPrsN4OaxgcIFWU=; b=PYPLq24H6i7nobHunbjFP0OMRk7W/GCCXEIGvyRYDdaBZemd6BOP8tzaydyRadNkpn 1S4yHgVjtayrlY/T+iNDbtKdWN+eDQotJmMNT75Vk2F6hauONviHjl6YD7Qj5nJ+DAOT dO9m2THkRwwGknQ9Q6riXVvrp6YDmTrZOX+HO4F3+AU+F7naYsp5x1iFXAfUDOuIzlFC yBBvxq/6yCR6sn7K+SHHBDl6ZW/go5oz3ogPWUeq8cDAEDqKoCS7ey07I1NvxX7+QasQ AO+qEeKvXN9xsqvjx0N+4lyR9MfNlNeggrOc1F9Ovss5m8fq3oL2jAEjyPaHgwrVX4uY 8cmg== X-Gm-Message-State: APjAAAXCxk/2yBJOvQM/BBR6iHeeU1qWl/WQ/oqTflVlXdguSRMngGQe EmhFl6X/eucP574MmaUHCAZdnZXac/YSas37z18= X-Received: by 2002:a6b:e511:: with SMTP id y17mr547859ioc.228.1566264590034; Mon, 19 Aug 2019 18:29:50 -0700 (PDT) MIME-Version: 1.0 References: <20190817004726.2530670-1-guro@fb.com> <20190817191419.GA11125@castle> <20190819212034.GB24956@tower.dhcp.thefacebook.com> In-Reply-To: <20190819212034.GB24956@tower.dhcp.thefacebook.com> From: Yafang Shao Date: Tue, 20 Aug 2019 09:29:14 +0800 Message-ID: Subject: Re: [PATCH] Partially revert "mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones" To: Roman Gushchin Cc: Andrew Morton , Linux MM , Michal Hocko , Johannes Weiner , LKML , Kernel Team , "stable@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 20, 2019 at 5:20 AM Roman Gushchin wrote: > > On Sun, Aug 18, 2019 at 08:30:15AM +0800, Yafang Shao wrote: > > On Sun, Aug 18, 2019 at 3:14 AM Roman Gushchin wrote: > > > > > > On Sat, Aug 17, 2019 at 11:33:57AM +0800, Yafang Shao wrote: > > > > On Sat, Aug 17, 2019 at 8:47 AM Roman Gushchin wrote: > > > > > > > > > > Commit 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync > > > > > with the hierarchical ones") effectively decreased the precision of > > > > > per-memcg vmstats_local and per-memcg-per-node lruvec percpu counters. > > > > > > > > > > That's good for displaying in memory.stat, but brings a serious regression > > > > > into the reclaim process. > > > > > > > > > > One issue I've discovered and debugged is the following: > > > > > lruvec_lru_size() can return 0 instead of the actual number of pages > > > > > in the lru list, preventing the kernel to reclaim last remaining > > > > > pages. Result is yet another dying memory cgroups flooding. > > > > > The opposite is also happening: scanning an empty lru list > > > > > is the waste of cpu time. > > > > > > > > > > Also, inactive_list_is_low() can return incorrect values, preventing > > > > > the active lru from being scanned and freed. It can fail both because > > > > > the size of active and inactive lists are inaccurate, and because > > > > > the number of workingset refaults isn't precise. In other words, > > > > > the result is pretty random. > > > > > > > > > > I'm not sure, if using the approximate number of slab pages in > > > > > count_shadow_number() is acceptable, but issues described above > > > > > are enough to partially revert the patch. > > > > > > > > > > Let's keep per-memcg vmstat_local batched (they are only used for > > > > > displaying stats to the userspace), but keep lruvec stats precise. > > > > > This change fixes the dead memcg flooding on my setup. > > > > > > > > > > > > > That will make some misunderstanding if the local counters are not in > > > > sync with the hierarchical ones > > > > (someone may doubt whether there're something leaked.). > > > > > > Sure, but the actual leakage is a much more serious issue. > > > > > > > If we have to do it like this, I think we should better document this behavior. > > > > > > Lru size calculations can be done using per-zone counters, which is > > > actually cheaper, because the number of zones is usually smaller than > > > the number of cpus. I'll send a corresponding patch on Monday. > > > > > > > Looks like a good idea. > > > > > Maybe other use cases can also be converted? > > > > We'd better keep the behavior the same across counters. I think you > > can have a try. > > As I said, consistency of counters is important, but not nearly as important > as the real behavior of the system. Especially because we talk about > per-node memcg statistics, which I believe is mostly used for debugging. > > So for now I think the right thing to do is to revert the change to fix > the memory reclaim process. And then we can discuss how to get counters > right. > Sure. Thanks Yafang