Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp786345yba; Fri, 12 Apr 2019 13:51:08 -0700 (PDT) X-Google-Smtp-Source: APXvYqz3C3Bttb2gp1WlhQwK6ebLX91MYnvX+G0BdoXxo99TG8s4cC0mOe0k/W9cobTAKhal9CoM X-Received: by 2002:a63:d250:: with SMTP id t16mr54411371pgi.288.1555102268046; Fri, 12 Apr 2019 13:51:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1555102268; cv=none; d=google.com; s=arc-20160816; b=MBXt0UAKb9GxIE/u3Ti92zvcVnfwDXV3DLkWfrvkyzI0cSTAt+lFIVudOxRZDpGIhz FKIkhOeJfvhGLNvWROwcIjTLhA53NkzcFzBafgLdUZIE4Rtj8yv3MthD+fU+CWGocfYT aAGbSqbAKPsotWg1ayEkgx9h6zVhnQez3nd2sGrn3V8A/8PLSxIrGA4RRW6NPiHxpk6i xGRcjb1cDmDOVDwYp7eIdsTgTYgpF71B5RsZKnDxotELQU68UIzWL9ZotQ72F2+gIWDz tBrtDXrIWKSXjh5VUGP/r3ppu6PXeKy6O0R+gSpWEJ3ANkrPHT6xHO+33/4BRiRVVNY+ oPiA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=j7qwsRw5IVivpTyLBIatOTcD/RB+ixEMBFLpFto3jKc=; b=Yt+z+btrG/ojjyoG89CEP1zEiT+iuR3p80PEwQ+kNi8v4uv3if1EgbMQSqdgeuEBTI 7VjDlPpy1VFLmz0zi+6zeAZdbkKGp7KpEwrQF1OqbxYFQJAVybibYJ5HavIBHdjIqNaZ f3pKlHHcuvJU9GchG1zDF//d164ULTQIErO47l7fTGd7Nc8qcYhWGJjuv9CDOpiw4yzW SJL7L0XuJd7729KKckgc+d9qLKKSiCGeWu1kLHZtZKKU2SHZ2IYoPyGd3oJq+2+ATxgu gVagjxxfineoDakNQxdUyBGbeNX51aaMbHXf4e1Y7edUF8/ndOLkfhnDwcD+b7H8kUtK UKOg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=FNfanZT+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h189si26987636pge.378.2019.04.12.13.50.51; Fri, 12 Apr 2019 13:51:08 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=FNfanZT+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726944AbfDLUuP (ORCPT + 99 others); Fri, 12 Apr 2019 16:50:15 -0400 Received: from mail-yw1-f67.google.com ([209.85.161.67]:34084 "EHLO mail-yw1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726884AbfDLUuP (ORCPT ); Fri, 12 Apr 2019 16:50:15 -0400 Received: by mail-yw1-f67.google.com with SMTP id x129so3875946ywc.1 for ; Fri, 12 Apr 2019 13:50:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=j7qwsRw5IVivpTyLBIatOTcD/RB+ixEMBFLpFto3jKc=; b=FNfanZT+pHyXePPyxtbvwcA0HSyUlLfO69/0C0yMTEEwkqZGMDKqTnsD34jUcSJwSW H+rD2xu0SDTdcw2PMP5wOdnzg2MxqLtDyXmgf+SrP+4HztAQxIIZxBF+bICluUkir/pY c8cRuA0THxpAqVC/043WDsTwSO7azcuGV8s3Vue48TZYoJ5syonT16R4SA6/Mj9qtuXt 5KndgfyXwhPPDpn/a2Z3w1WxKVQRod5rlcVAAlA/Um8fPjD//D1ASejBz38ApivCap2q aXtwz0gMjra9X61DONZSTYp0024btZjJXF98c64BfCEZ2hl5rI6OgykrOJAXItWBazR8 8n0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=j7qwsRw5IVivpTyLBIatOTcD/RB+ixEMBFLpFto3jKc=; b=dl87WGrQDBfg3HZfifeOV86BNfcQjzNkfjgwbxtszUUmZyr8/tIrvVUeYk3Rxkqvf8 qqg3Vg8kk414GEnZrfTq0Q6fy7sDIW8VT2XwhyR4I0O4PXNJGD8WK3FvbfVtsIe1vPcf PmTzfwrADzCUBArVHJSQv1243tp8kSKZGIRi0+9csj+M/QF+/asAnwpj5rK7MbhSOMo/ qKkUguh9EwxR7NVMVL1BdEvLs7dGsmEXhRWxu/XgxWFzjXuJTGkMENaJUgedcDBlpU5N WTUZOkl0ghk4e13C0CV9ukV3Mu8VLpFrSitO+MY9R+PJdUAIPBBJIIPF0ZCRZPwL0LZv Tw8Q== X-Gm-Message-State: APjAAAVR2J23fCwJu5qYnqiZzX2wrxrjRsosds9B8BrTdqPgH2WhI6sI W1yX9AIBTGShq2MgNSMKbhi3ASWFYgjq5eGBke3g2A== X-Received: by 2002:a81:9ad0:: with SMTP id r199mr47081086ywg.310.1555102213675; Fri, 12 Apr 2019 13:50:13 -0700 (PDT) MIME-Version: 1.0 References: <20190412151507.2769-1-hannes@cmpxchg.org> <20190412151507.2769-4-hannes@cmpxchg.org> <20190412201534.GB24377@tower.DHCP.thefacebook.com> In-Reply-To: <20190412201534.GB24377@tower.DHCP.thefacebook.com> From: Shakeel Butt Date: Fri, 12 Apr 2019 13:50:02 -0700 Message-ID: Subject: Re: [PATCH 3/4] mm: memcontrol: fix recursive statistics correctness & scalabilty To: Roman Gushchin Cc: Johannes Weiner , Andrew Morton , Linux MM , Cgroups , LKML , Kernel Team Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 12, 2019 at 1:16 PM Roman Gushchin wrote: > > On Fri, Apr 12, 2019 at 12:55:10PM -0700, Shakeel Butt wrote: > > On Fri, Apr 12, 2019 at 8:15 AM Johannes Weiner wrote: > > > > > > Right now, when somebody needs to know the recursive memory statistics > > > and events of a cgroup subtree, they need to walk the entire subtree > > > and sum up the counters manually. > > > > > > There are two issues with this: > > > > > > 1. When a cgroup gets deleted, its stats are lost. The state counters > > > should all be 0 at that point, of course, but the events are not. When > > > this happens, the event counters, which are supposed to be monotonic, > > > can go backwards in the parent cgroups. > > > > > > > We also faced this exact same issue as well and had the similar solution. > > > > > 2. During regular operation, we always have a certain number of lazily > > > freed cgroups sitting around that have been deleted, have no tasks, > > > but have a few cache pages remaining. These groups' statistics do not > > > change until we eventually hit memory pressure, but somebody watching, > > > say, memory.stat on an ancestor has to iterate those every time. > > > > > > This patch addresses both issues by introducing recursive counters at > > > each level that are propagated from the write side when stats change. > > > > > > Upward propagation happens when the per-cpu caches spill over into the > > > local atomic counter. This is the same thing we do during charge and > > > uncharge, except that the latter uses atomic RMWs, which are more > > > expensive; stat changes happen at around the same rate. In a sparse > > > file test (page faults and reclaim at maximum CPU speed) with 5 cgroup > > > nesting levels, perf shows __mod_memcg_page state at ~1%. > > > > > > > (Unrelated to this patchset) I think there should also a way to get > > the exact memcg stats. As the machines are getting bigger (more cpus > > and larger basic page size) the accuracy of stats are getting worse. > > Internally we have an additional interface memory.stat_exact for that. > > However I am not sure in the upstream kernel will an additional > > interface is better or something like /proc/sys/vm/stat_refresh which > > sync all per-cpu stats. > > I was thinking about eventually consistent counters: sync them periodically > from a worker thread. It should keep the cost of reading small, but > should increase the accuracy. Will it work for you? Worker thread based solution seems fine to me but Johannes said it would be best to not traverse the whole tree every few seconds.