Received: by 2002:a05:6358:701b:b0:131:369:b2a3 with SMTP id 27csp435724rwo; Fri, 21 Jul 2023 14:31:22 -0700 (PDT) X-Google-Smtp-Source: APBJJlFwXfEamDgFWPIGJi/3sRtWmdrlPU+rsNJV54zkcdFokcKKL0ulauzs4lx7WmuQd4r9Bd7F X-Received: by 2002:a19:700d:0:b0:4fb:9da2:6cec with SMTP id h13-20020a19700d000000b004fb9da26cecmr2170194lfc.7.1689975082524; Fri, 21 Jul 2023 14:31:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689975082; cv=none; d=google.com; s=arc-20160816; b=qaRVlg5FchIfwk1fbq7V77YUP7JkfgelTkKwTVp+1NKTEl0WONjsR5PBhoAzTizEzP nAq1md8ROyL6Xv+jZJxzgHkQTi0AKnWX0Vnnuy5NVKhTnl8R/5fbWf4Bo25N5xoZXZnC Xko5kLgisdz6xSs4LhqCZprIbpKzHE4acRhz7n5hk6flxXFv9PnZFG0PiIKKLbMCMBVE jyJ51FSrzxVPVCLfPLRNASJqou5cOOfJE8AVs9qVa4lxns/sirEoLPO+OP1mQnwMWKW2 4gcL/Eo0b3UJhCKgCruTxxm6o2jfUsGgglY+ZUd9Iuydk2+MNckcvzYwbWM5tZotilu6 ABWQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=tEshZgxJu/wBZMZ7+AtLkJHqS7609jEJDv2p0gloftc=; fh=Duy0LtdtoJWkZ3IpP3pd6FbkpSF5cEtkFF4eNIf7zVM=; b=c8CzpAOt82OzObvCn1p5ui71oD9lKyXsSsqeJvag+MJ52fHrLscCP5iix3LePB2ul8 sOg3ZyWpmKnCkg9wk+MDR86Vst/K/gZiKiXI50h6ceOpFUDbP9E9xAZtmOF2jm+DrVu/ VR9FUh8HvWhNOj8Wh3TDKgfTRrIPyudvoiKZdcjeMdHbcH1lFvICMBDpIKCoIW1MHH6v 7VWJoMe/+aEUf3P4clL5jglr6eNenKoCtHrCWJAWzK9m9C1byhvtSMQFoJV09ZdbPYYx R+q0ccVRfyVpQhQURaKJ+3JmHFy8BPPbI0tJ4iI+8+Jl32xTgy/2GbJqV4IEy3meojLP rDrA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b=5hAajZX2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ay21-20020a056402203500b0051dd30daa6asi2688367edb.268.2023.07.21.14.30.49; Fri, 21 Jul 2023 14:31:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b=5hAajZX2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229945AbjGUUoP (ORCPT + 99 others); Fri, 21 Jul 2023 16:44:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54908 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229529AbjGUUoM (ORCPT ); Fri, 21 Jul 2023 16:44:12 -0400 Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5CC7830E4 for ; Fri, 21 Jul 2023 13:44:11 -0700 (PDT) Received: by mail-oi1-x22e.google.com with SMTP id 5614622812f47-3a479e7a37dso1638187b6e.3 for ; Fri, 21 Jul 2023 13:44:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1689972250; x=1690577050; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=tEshZgxJu/wBZMZ7+AtLkJHqS7609jEJDv2p0gloftc=; b=5hAajZX29mBiOPuUXN0gJdiJQTDXp7oyiK0PE9JoornRTaNyZJhhteqbrpVFU4f4bZ y8xH5CKnTiuIM+YS5OgrMUAx3smz9SHvL3Rv1rnrG/LRq6Z2II71YIH/nCzn5j93rMkn yhgVintKzHo7qPI4NNFEw8rtKqkoGY/AhPnczh5A9ORL9epMzWr8anD9S3h/9cNfPTUM Zem1aR1Xb4PsxWQKl54wYJyP63CTDJCfWZRcmPf98zGrIn1LAQp0Dk/qExa5bdvCnve/ 7o+H9RD2hsD+yb/3hwpSOSsg9CGV3bE7ORPPUWBaCqgs0YbD7db5MS1AJ9FT6jy6lLOS ktjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689972250; x=1690577050; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tEshZgxJu/wBZMZ7+AtLkJHqS7609jEJDv2p0gloftc=; b=GjRPHGunIGnWJqdx5s+8rPTT/gEgWu/PnYCZuPxiwVkSv0jKK/qS+UUKHOIpRE8Bln ZN4G55DCqvBcVT/c/jS/buutO+JRkHiXY8baMwNE/yYoiHAamBqByqx/jqVyjdQ15dP9 tCpIyp9R+T4JS7MDCqYTZO2HN4nE7g/NuZOdMRcnXQOWJnApLO5TA41QCNko5KOWRbY4 /9OyDKvcR0AfVY5l4LWtt6bVuP4JOR2SGPKguWoqHuCUX3uQnH0EfSfwtpxNbCI7qz5p mzMYWIT7Rvfg0DPSi86XwlL1NOdVxIQDulbjwO4h88+icjaU2Ic9VyBP3HcQZeluHf6j SCJA== X-Gm-Message-State: ABy/qLbYlLnRSmERaCUQxa41UmBZnSlUKmQzyMazsLuUQ+cMuYPwoiD/ gXcEci5ig7hzeJG8uagPxlvEhQ== X-Received: by 2002:a54:4d06:0:b0:3a3:6ae3:bd9 with SMTP id v6-20020a544d06000000b003a36ae30bd9mr3284589oix.55.1689972250682; Fri, 21 Jul 2023 13:44:10 -0700 (PDT) Received: from localhost (2603-7000-0c01-2716-8f57-5681-ccd3-4a2e.res6.spectrum.com. [2603:7000:c01:2716:8f57:5681:ccd3:4a2e]) by smtp.gmail.com with ESMTPSA id h15-20020a0cab0f000000b0063707f03e2bsm1564056qvb.19.2023.07.21.13.44.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Jul 2023 13:44:10 -0700 (PDT) Date: Fri, 21 Jul 2023 16:44:08 -0400 From: Johannes Weiner To: Yosry Ahmed Cc: Tejun Heo , Andrew Morton , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , "Matthew Wilcox (Oracle)" , Zefan Li , Yu Zhao , Luis Chamberlain , Kees Cook , Iurii Zaikin , "T.J. Mercier" , Greg Thelen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org Subject: Re: [RFC PATCH 0/8] memory recharging for offline memcgs Message-ID: <20230721204408.GA1033322@cmpxchg.org> References: <20230720070825.992023-1-yosryahmed@google.com> <20230720153515.GA1003248@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 21, 2023 at 11:47:49AM -0700, Yosry Ahmed wrote: > On Fri, Jul 21, 2023 at 11:26 AM Tejun Heo wrote: > > > > Hello, > > > > On Fri, Jul 21, 2023 at 11:15:21AM -0700, Yosry Ahmed wrote: > > > On Thu, Jul 20, 2023 at 3:31 PM Tejun Heo wrote: > > > > memory at least in our case. The sharing across them comes down to things > > > > like some common library pages which don't really account for much these > > > > days. > > > > > > Keep in mind that even a single page charged to a memcg and used by > > > another memcg is sufficient to result in a zombie memcg. > > > > I mean, yeah, that's a separate issue or rather a subset which isn't all > > that controversial. That can be deterministically solved by reparenting to > > the parent like how slab is handled. I think the "deterministic" part is > > important here. As you said, even a single page can pin a dying cgroup. > > There are serious flaws with reparenting that I mentioned above. We do > it for kernel memory, but that's because we really have no other > choice. Oftentimes the memory is not reclaimable and we cannot find an > owner for it. This doesn't mean it's the right answer for user memory. > > The semantics are new compared to normal charging (as opposed to > recharging, as I explain below). There is an extra layer of > indirection that we did not (as far as I know) measure the impact of. > Parents end up with pages that they never used and we have no > observability into where it came from. Most importantly, over time > user memory will keep accumulating at the root, reducing the accuracy > and usefulness of accounting, effectively an accounting leak and > reduction of capacity. Memory that is not attributed to any user, aka > system overhead. Reparenting has been the behavior since the first iteration of cgroups in the kernel. The initial implementation would loop over the LRUs and reparent pages synchronously during rmdir. This had some locking issues, so we switched to the current implementation of just leaving the zombie memcg behind but neutralizing its controls. Thanks to Roman's objcg abstraction, we can now go back to the old implementation of directly moving pages up to avoid the zombies. However, these were pure implementation changes. The user-visible semantics never varied: when you delete a cgroup, any leftover resources are subject to control by the remaining parent cgroups. Don't remove control domains if you still need to control resources. But none of this is new or would change in any way! Neutralizing controls of a zombie cgroup results in the same behavior and accounting as linking the pages to the parent cgroup's LRU! The only thing that's new is the zombie cgroups. We can fix that by effectively going back to the earlier implementation, but thanks to objcg without the locking problems. I just wanted to address this, because your description/framing of reparenting strikes me as quite wrong.