Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp169556iob; Mon, 2 May 2022 16:10:24 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzyEyDQ7tES3AwhC6sz4bbmSo8U756iAFey80fGj4LICiMNsOkVyikSt0cmhV4rSEvnKcVL X-Received: by 2002:a05:6a00:b81:b0:50d:c428:c487 with SMTP id g1-20020a056a000b8100b0050dc428c487mr13558464pfj.74.1651533024254; Mon, 02 May 2022 16:10:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651533024; cv=none; d=google.com; s=arc-20160816; b=dYwM/u0NIfBHneB5O6JR5RGZGaVbGR5DsxhhgCt1Xoi4F4IbX/V+ky70wJ2N384739 FQnnGJvTdJfB3/sekSfZNMGOdVZxI3gY3nDlHbsi3ooPup5kB3lugxvB0E1kfcpS7xX8 aRHgqblCOPyZFykqxFdMFxWAdlGpGF2L62QSf02HTNhy2x+p7dbdpX0cKDq4yhrWEwtV cvm8+faSJvHzLv2Yb5EK4oX7uQi6E2zT9O7rVAA3QpxBDgUoH0uANynL3WA/oDSI2RP2 vUgCYoAtjC6/RqglWTSXnbvfBwVahImwRPc+6Ax/Xwb/qfNzFndNszAJ8wN0jINBL+/H dDxA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=gDk2qkB7YUEj5NtH1MO7nyXZonZFY6nq6FEdcA71x/0=; b=n+tsyMnytioKCQs4r/mZLjbBwd29AAz+CPfK3vepRrq5KNDN507NduLYyOZoWJmUkP bz88lh2wqnC9k3dKCTGG4yxNwGzlG2FXQMcqiuOznsUR6nzck4mXt6NvPb2ubvhoaoJ9 wfbWnDVkmMn76Md5HvxEA/W8MgPo2bucEDOhP/Ur33e69To6QztMK58T74j9XBtk+uLL zi3Z1cVMrwrU2FXsUA6teDd0lEPFWWC0HRwPXW473RdfNjOVsXAJGOtZV9FPLnxeUAGL S0bTmWIF+RZHAxjlaUZhYoXExOE5o/xyjepMVeS/1ZD/ZEec6fTF2rJLogE/M9uS1p7G 0qoQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=TFavIqNr; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id k62-20020a638441000000b003ab9b0e6ffasi14937318pgd.122.2022.05.02.16.10.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 May 2022 16:10:24 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=TFavIqNr; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7504E2F3B5; Mon, 2 May 2022 16:10:04 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237618AbiEBV0R (ORCPT + 99 others); Mon, 2 May 2022 17:26:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57520 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232204AbiEBV0Q (ORCPT ); Mon, 2 May 2022 17:26:16 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A6CA2F5; Mon, 2 May 2022 14:22:46 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id ED2FB210DE; Mon, 2 May 2022 21:22:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1651526564; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=gDk2qkB7YUEj5NtH1MO7nyXZonZFY6nq6FEdcA71x/0=; b=TFavIqNrXC1pDmJRQ9csR4iHTIfSyLzehksOi+Bx2aYpRS8C9cf2HWN76ikfgpQoHGnF98 MBaZcccVCtS9nVins+KeqsL0JB+bXpGE8XLV6sNEqyQWywmiUNhUmKGrSnW0t+eLcjLETo m/b8vfqOSuc+TKbc7gomqJt311lOB6A= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id A8B7813491; Mon, 2 May 2022 21:22:44 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id FX4IKKRLcGLRVQAAMHmgww (envelope-from ); Mon, 02 May 2022 21:22:44 +0000 Date: Mon, 2 May 2022 23:22:09 +0200 From: Michal =?iso-8859-1?Q?Koutn=FD?= To: Vasily Averin Cc: Roman Gushchin , Vlastimil Babka , Shakeel Butt , kernel@openvz.org, Florian Westphal , linux-kernel@vger.kernel.org, Michal Hocko , cgroups@vger.kernel.org, Greg Kroah-Hartman , Tejun Heo Subject: Re: kernfs memcg accounting Message-ID: References: <7e867cb0-89d6-402c-33d2-9b9ba0ba1523@openvz.org> <20220427140153.GC9823@blackbody.suse.cz> <7509fa9f-9d15-2f29-cb2f-ac0e8d99a948@openvz.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7509fa9f-9d15-2f29-cb2f-ac0e8d99a948@openvz.org> X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello. On Mon, May 02, 2022 at 10:37:49PM +0300, Vasily Averin wrote: > I did not understand your statement. Could you please explain it in more details? Sure, let me expand my perhaps ambiguous and indefinite sentence. > I see that cgroup_mkdir()->cgroup_create() creates new kernfs node for new > sub-directory, and with my patch account memory of kernfs node to memcg > of current process. Indeed. The variants I'm comparing here are: a) charge to the creator's memcg, b) charge to the parent (memcg ancestor) of created cgroup. When struct mem_cgroup charging was introduced, there was a similar discussion [1]. I can see following aspects here: 1) absolute size of kernfs_objects, 2) practical difference between a) and b), 3) consistency with memcg, 4) v1 vs v2 behavior. Ad 1) -- normally, I'd treat this as negligible (~120B struct kernfs_node * there are ~10 of them per subsys * ~10 subsystems ~ 12 KB/cgroup). But I guess the point of this change are exploitative users where this doesn't hold [2], so absolute size is not so important. Ad 2) -- in the typical workloads, only top-level cgroup are created by some management entity and lower level are managed from within, i.e. there is little difference whom to charge the created objects. Ad 3) -- struct mem_cgroup objects are charged to their hierarchical parent, so that dying memcgs can be associated to a subtree which is where the reclaim can deal with it (in contrast with creator's cgroup). Now, if I'm looking correctly, the kernfs_node objects are not pinned by any residual state (subsystems kill_css()->css_clear_dir() synchronously from rmdir, cgroup itself may be RCU delayed). So the memcg argument remains purely for consistency (but no practical reason). Ad 4) -- the variant b) becomes slightly awkward when mkdir'ing a cgroup in a non-memcg hierarchy (bubbles up to root, despite creator in a non-root memcg). How do these reasonings align with your original intention of net devices accounting? (Are the creators of net devices inside the container?) > Do you think it is incorrect and new kernfs node should be accounted > to memcg of parent cgroup, as mem_cgroup_css_alloc()-> mem_cgroup_alloc() does? I don't think either variant is incorrect. I'd very much prefer the consistency with memcg behavior (variant a)) but as I've listed the arguments above, it seems such a consistency can't be easily justified. > Perhaps you mean that in this case kernfs should not be counted at all, > as almost all neighboring allocations do? No, I think it wouldn't help here [2]. (Or which neighboring allocations do you mean? There must be at least nr_cgroups of them.) (Of course, then there's the traditional performance argument, cgroup's kernfs_node object shouldn't be problematic but I can't judge others (sysfs) but that's nothing to prevent any form of kernfs_node accounting going forward in my eyes.) HTH, Michal [1] https://lore.kernel.org/all/20200729171039.GA22229@blackbody.suse.cz/ [2] Unless this could be constraint by something even bigger and accounted. But only struct mem_cgroup (recursively its percpu stats) comes to my mind.