Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp1152250pxb; Thu, 4 Feb 2021 06:24:08 -0800 (PST) X-Google-Smtp-Source: ABdhPJxtLBST1TY43oGeDdrkxTFlJGXTyQQGTFum/SaH/BADbcOxx92QGz3uAVLTJY6ey48fDPr2 X-Received: by 2002:a17:907:da7:: with SMTP id go39mr7982736ejc.255.1612448647962; Thu, 04 Feb 2021 06:24:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612448647; cv=none; d=google.com; s=arc-20160816; b=yNNeu2gUpiMWyfuBu2OrnRhERNxtqcQMDfKnYS2XUaVd3i1qSB1IwmTsjA7JVqxo0Q WKaK0iq22sOC5HDDjJCZCtMs3QC11NhCXQNLtbtPUmUsvdtIvoBISWNuEPlzhbQIS2NR 7TO63k3T8dyRqRbNRzgIBdt1+zkfNv/eWwHqjclkgaM0PpQsK5NSNeakJxPrwPc+PU20 TY2VJTMFQntxdPXB3FY5KPV/+Pxq2N4/lhlHCFSqiKbkIqRj+TZpoduLlKsVZhgD/CXi d5giTs5f23mUUdv1coKdmATEDQTtQKOF7QEXpa6kHge7psW/Ekf/mNnXlZcF0Qfz+q5p Z4/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=uNjcnDnZXUPYHHuK46oDIL84tyY2kS2LkC6T9yyAPCU=; b=PUnS0dz0e6iPWyROjHhFkvUy0gw2AJ/waFcREUbTTlBm8kzGRFkGyjeUzXMr2Fptj7 gOObuycKMXqLb2QSjofzkRenzjDtSh94CefezdxP4dN63tzECz3Dkevak7iPhQlYj6vZ +KubAQKh4XiZ+Dm1Bs0E1aoifqZHyljF0O4tqJPyOYZxLkaGtgQd8iZ2AwZfhe+yhFDJ KMRrtmGBj6F75AKQXzA3Ets8GVKCi70a2iGiXPZzFkL70oqGC8x4wPhPqmKWPJVf2td8 mq3DBZYX85UWDmXU+69RD4gyVXyiARFHSoCkLiEd3dXzFaIRh8wDdQXsQeDhk2Dtk1Jb tP+g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=FpVLR3O5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i8si244216ejv.311.2021.02.04.06.23.42; Thu, 04 Feb 2021 06:24:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=FpVLR3O5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236603AbhBDOUm (ORCPT + 99 others); Thu, 4 Feb 2021 09:20:42 -0500 Received: from mx2.suse.de ([195.135.220.15]:58368 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236639AbhBDOUI (ORCPT ); Thu, 4 Feb 2021 09:20:08 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1612448358; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=uNjcnDnZXUPYHHuK46oDIL84tyY2kS2LkC6T9yyAPCU=; b=FpVLR3O5yTTiyeAwG18zs+GsUc9KTxfWONmuisoWJ5R5Qc+/oEzDmE8bSbH18CXTe6sjcR iIb3W1B+PoapI11EyyaUQ6YIBmeGMGWmyFSi1EW3GHcp6M4GA6Qx4SolDI790FtL20ziY2 ZigN/iwMkJ+cKFNvY3YBp5715do75N0= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id A88CEB176; Thu, 4 Feb 2021 14:19:18 +0000 (UTC) Date: Thu, 4 Feb 2021 15:19:17 +0100 From: Michal Hocko To: Johannes Weiner Cc: Andrew Morton , Tejun Heo , Roman Gushchin , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH 6/7] mm: memcontrol: switch to rstat Message-ID: References: <20210202184746.119084-1-hannes@cmpxchg.org> <20210202184746.119084-7-hannes@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210202184746.119084-7-hannes@cmpxchg.org> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 02-02-21 13:47:45, Johannes Weiner wrote: > Replace the memory controller's custom hierarchical stats code with > the generic rstat infrastructure provided by the cgroup core. > > The current implementation does batched upward propagation from the > write side (i.e. as stats change). The per-cpu batches introduce an > error, which is multiplied by the number of subgroups in a tree. In > systems with many CPUs and sizable cgroup trees, the error can be > large enough to confuse users (e.g. 32 batch pages * 32 CPUs * 32 > subgroups results in an error of up to 128M per stat item). This can > entirely swallow allocation bursts inside a workload that the user is > expecting to see reflected in the statistics. > > In the past, we've done read-side aggregation, where a memory.stat > read would have to walk the entire subtree and add up per-cpu > counts. This became problematic with lazily-freed cgroups: we could > have large subtrees where most cgroups were entirely idle. Hence the > switch to change-driven upward propagation. Unfortunately, it needed > to trade accuracy for speed due to the write side being so hot. > > Rstat combines the best of both worlds: from the write side, it > cheaply maintains a queue of cgroups that have pending changes, so > that the read side can do selective tree aggregation. This way the > reported stats will always be precise and recent as can be, while the > aggregation can skip over potentially large numbers of idle cgroups. > > This adds a second vmstats to struct mem_cgroup (MEMCG_NR_STAT + > NR_VM_EVENT_ITEMS) to track pending subtree deltas during upward > aggregation. It removes 3 words from the per-cpu data. It eliminates > memcg_exact_page_state(), since memcg_page_state() is now exact. I am still digesting details and need to look deeper into how rstat works but removing our own stats is definitely a good plan. Especially when there are existing limitations and problems that would need fixing. Just to check that my high level understanding is correct. The transition is effectivelly removing a need to manually sync counters up the hierarchy and partially outsources that decision to rstat core. The controller is responsible just to tell the core how that syncing is done (e.g. which specific counters etc). Excplicit flushes are needed when you want an exact value (e.g. when values are presented to the userspace). I do not see any flushes to be done by the core pro-actively except for clean up on a release. Is the above correct understanding? -- Michal Hocko SUSE Labs