Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp518323pxa; Tue, 11 Aug 2020 08:32:16 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzkOyCIRT9KIebwOhRQg9SxqiqQItWIhBx5LmOWndr2xYkjgRkqy2w49BmO9KFpIo+rD3Tr X-Received: by 2002:a17:906:ccda:: with SMTP id ot26mr1892152ejb.389.1597159935900; Tue, 11 Aug 2020 08:32:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1597159935; cv=none; d=google.com; s=arc-20160816; b=AVXyzczSjLdLUxCsjHROn2aXtU0GfrE19rCDqh/KzTMT1iJ8ygjAnYysj+jQGcLiuY t6p9WwlkFxhmiewjVAof8C5doq43gao9rLr1KYSpipdxuwXAXZ1Z6GRFeg1TCgkem0K4 e1dE0qVpKJ43giFdmTnUftH+Lf14LEcqvwdO5lSkrBxBxPSBBim+0DkUnyEOHIgQ7pav phj/9ZbWUsDLaXCem3Gv1tVqSqAtlQL7LDSlGPZH1URk2EezlDP/4l6BEy4kBa4GXk4w bY4MrZCIxw9UyWBrX0IQvS/DiTEdpewDc3dzKId5y97F9kdlPAUiZ3AE3dJnXVuzJxJ8 KPVg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=Lgl2XWPxOkvTO1C8S88OrGJFzEG0BedRecFChwB3gY8=; b=pXF5O9RHwwCUlECJqw0YIHN0cpVD37o/fdKLhnGvi614RRf8jkoE5VVxL3de0NRpzk YXEZxhagKRiLKNDsrBZWxyC7Lhgd5d/02YLSvOylwximKXnoJkAaF75Lk1xEXdjyY49w i2d3IxJXt6Jinvf7OdJOmNtNj2KFpFLRhPLkX/ZzMn/kjyCBJ1m6oHmPfeEXCBpTJlR6 zFTMFcZFPFQm1vhMA8D9MLzpBOmu5QdoMolUIzgFlPMtWFcu6e5HGKrck+nqCAd0xOX1 H0L/T+AdXEneetBh/Cu8ZEYqMbvs/J4IICEir6mP9MmfLsmVcDj0LdkZorofJ9+uTmye aiIQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b="O+oz/y94"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v13si12540000edr.516.2020.08.11.08.31.52; Tue, 11 Aug 2020 08:32:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b="O+oz/y94"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728971AbgHKP2n (ORCPT + 99 others); Tue, 11 Aug 2020 11:28:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59954 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728859AbgHKP2n (ORCPT ); Tue, 11 Aug 2020 11:28:43 -0400 Received: from mail-qv1-xf43.google.com (mail-qv1-xf43.google.com [IPv6:2607:f8b0:4864:20::f43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D408C06174A for ; Tue, 11 Aug 2020 08:28:43 -0700 (PDT) Received: by mail-qv1-xf43.google.com with SMTP id y11so6126369qvl.4 for ; Tue, 11 Aug 2020 08:28:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Lgl2XWPxOkvTO1C8S88OrGJFzEG0BedRecFChwB3gY8=; b=O+oz/y94S7Ot2vFCIAFHszkvovBOf36KdyrH2IEJWc4M4iGAI+0vUVeb65yAePs+57 DROlBFWLKmiEh+5yYPQAuZxOyCCI59rbE60yQsK/k5p+hCWZLwZuiio3saMyRdst9aN6 W00Raq/cnhUCgHcCrtxHXZRE6zekJg2/KMflF2aaWNHLVuSUxA1oo1s4fEWzKninscH7 +U1Bhx4SjBooCPa+GwTjN7sivvJ+csbnAYbo7Rsrvfg+DIRVEeLlRuCSag4yPLoeRdCG FufH0Ro9lxoInq9MGfEz5Pr+bqIIzZVDkYh86LTyBcauJu8KDfKElTqTqBT5krSr8MMX gq3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Lgl2XWPxOkvTO1C8S88OrGJFzEG0BedRecFChwB3gY8=; b=N+fFL+f1fdgIxJO4ZEO8ipOTPh0LMqYgXNUiodz0URi5HoMa9BZYMyVBGz4hJhO0yl 6aAe7+66/SWCHDL3qRVBZczN/Irv03sAWvcYhEciRyaBaHHLxdNRUYCPLGH2ZS66dcbJ Kw55A1+xuuymWtPbdAqR1Bef0We4/FpMAg86tJn5ioyh2gkjfVX+KxAa/AgWL6AkSVgy v2ZYOjzhyZlbYYTUQIOo1vJM/wv2MiGY9j2aAqamcOA5feh8Mr3YM8AixGEd1VcrdR8t kESy+TuYHyPcCtuzObn5Ahqw66jH2eG/2pe0MR/dSHjgrdVx8j7/3aA4XGEdoXfhxGCe tjow== X-Gm-Message-State: AOAM532e01bO3jeCbV5DO0TSABSEJzQ0LDy32E0JKxmCJt2Nr2nxpXqd 5+lUR88vA91vbwE4jAR1uP88mQ== X-Received: by 2002:ad4:446d:: with SMTP id s13mr1833462qvt.183.1597159722398; Tue, 11 Aug 2020 08:28:42 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:2dd0]) by smtp.gmail.com with ESMTPSA id l189sm17003362qke.67.2020.08.11.08.28.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Aug 2020 08:28:41 -0700 (PDT) Date: Tue, 11 Aug 2020 11:27:37 -0400 From: Johannes Weiner To: Roman Gushchin Cc: Andrew Morton , Dennis Zhou , Tejun Heo , Christoph Lameter , Michal Hocko , Shakeel Butt , linux-mm@kvack.org, kernel-team@fb.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3 4/5] mm: memcg: charge memcg percpu memory to the parent cgroup Message-ID: <20200811152737.GB650506@cmpxchg.org> References: <20200623184515.4132564-1-guro@fb.com> <20200623184515.4132564-5-guro@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200623184515.4132564-5-guro@fb.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 23, 2020 at 11:45:14AM -0700, Roman Gushchin wrote: > Memory cgroups are using large chunks of percpu memory to store vmstat > data. Yet this memory is not accounted at all, so in the case when there > are many (dying) cgroups, it's not exactly clear where all the memory is. > > Because the size of memory cgroup internal structures can dramatically > exceed the size of object or page which is pinning it in the memory, it's > not a good idea to simple ignore it. It actually breaks the isolation > between cgroups. > > Let's account the consumed percpu memory to the parent cgroup. > > Signed-off-by: Roman Gushchin > Acked-by: Dennis Zhou Acked-by: Johannes Weiner This makes sense, and the accounting is in line with how we track and distribute child creation quotas (cgroup.max.descendants and cgroup.max.depth) up the cgroup tree. I have one minor comment that isn't a dealbreaker for me: > @@ -5069,13 +5069,15 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node) > if (!pn) > return 1; > > - pn->lruvec_stat_local = alloc_percpu(struct lruvec_stat); > + pn->lruvec_stat_local = alloc_percpu_gfp(struct lruvec_stat, > + GFP_KERNEL_ACCOUNT); > if (!pn->lruvec_stat_local) { > kfree(pn); > return 1; > } > > - pn->lruvec_stat_cpu = alloc_percpu(struct lruvec_stat); > + pn->lruvec_stat_cpu = alloc_percpu_gfp(struct lruvec_stat, > + GFP_KERNEL_ACCOUNT); > if (!pn->lruvec_stat_cpu) { > free_percpu(pn->lruvec_stat_local); > kfree(pn); > @@ -5149,11 +5151,13 @@ static struct mem_cgroup *mem_cgroup_alloc(void) > goto fail; > } > > - memcg->vmstats_local = alloc_percpu(struct memcg_vmstats_percpu); > + memcg->vmstats_local = alloc_percpu_gfp(struct memcg_vmstats_percpu, > + GFP_KERNEL_ACCOUNT); > if (!memcg->vmstats_local) > goto fail; > > - memcg->vmstats_percpu = alloc_percpu(struct memcg_vmstats_percpu); > + memcg->vmstats_percpu = alloc_percpu_gfp(struct memcg_vmstats_percpu, > + GFP_KERNEL_ACCOUNT); > if (!memcg->vmstats_percpu) > goto fail; > > @@ -5202,7 +5206,9 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) > struct mem_cgroup *memcg; > long error = -ENOMEM; > > + memalloc_use_memcg(parent); > memcg = mem_cgroup_alloc(); > + memalloc_unuse_memcg(); The disconnect between 1) requesting accounting and 2) which cgroup to charge is making me uneasy. It makes mem_cgroup_alloc() a bit of a handgrenade, because accounting to the current task is almost guaranteed to be wrong if the use_memcg() annotation were to get lost in a refactor or not make it to a new caller of the function. The saving grace is that mem_cgroup_alloc() is pretty unlikely to be used elsewhere. And pretending it's an independent interface would be overengineering. But how about the following in mem_cgroup_alloc() and alloc_mem_cgroup_per_node_info() to document that caller relationship: /* We charge the parent cgroup, never the current task */ WARN_ON_ONCE(!current->active_memcg);