Received: by 2002:a05:6358:701b:b0:131:369:b2a3 with SMTP id 27csp421232rwo; Fri, 21 Jul 2023 14:14:21 -0700 (PDT) X-Google-Smtp-Source: APBJJlHSgBazbXiUaD6xEZPrzTnzXAs+fNpQ5dBK3j2u1MPqknCl0BlJPuktjsZBVeo/RJV842Sw X-Received: by 2002:a05:6402:7c4:b0:521:e502:baf8 with SMTP id u4-20020a05640207c400b00521e502baf8mr2515997edy.11.1689974061229; Fri, 21 Jul 2023 14:14:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689974061; cv=none; d=google.com; s=arc-20160816; b=cHpav8jTt9U7GpXWSkpmyvlRYEcr+Y+MOC9edyaL6933b6JnDguptEkVKQwfI6HpDw SyVlp8BXJs+b+BpnAHAwCgvrskChlRuCAGuTMRq/CscXow2W0hfRohGSHXV27EljUX3a Rhem7IB8a+jbojs64Wd6dW3SiXkoDXdH7eJ8ro53XnE6ecpx2FnpwYPFcyejI0y7ifqn FSm6cabaTmravMlb2Avjnl7EoEEwSS2Ckepc2rIq3hNlY6oJZPdymcO87S0nRJA2vd72 FlyBVHhWi2IFDJ2vYwxBaIn0Nd7bl6pQghnVNDcEZOwiDbAtZzAlL5WMPMyYi0B6srpS x5og== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=yqqIc7aKNfw1AoPDdLMW6uHAgEFbeLCdE9gekNCu4dY=; fh=2jRb7WVPDbknhecJ6Qh4fc3Jgn9N7U+4X5ewRFVrHeI=; b=CsZv7/KvB6AIrjQoi5dLLD+JowC4aGAQHCJSBEcRNal1tobE+D9e3gJZBAqmHiFzmw JKgi8T1MW7/RcL50hs++RRQk4285qbzqtgDbSRzpgKZOWqOlkcqch5mxPSQNwybIylZU I+z6sgjiYjoPdWiTXTLlT7HyOOe/FZ8U9UsutZ15NrYGm3rUEL5vo/XvtRKcFLNWoyFm PGBDaV1C86xoNGJsEM+Te/9koApDUu+q5UiR7NI5mNu5YuvOmxNHOy2kHm+xuoxq31Ga PgnGjf6qbG7Jz9RlrBZn6npHBqFINLVGER7gJQxyCS2bVeIJh8DnjtuSi00rhV5G/jvp 3tyg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b="M/PwwIqI"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j9-20020a50ed09000000b0051e81e15618si2859405eds.193.2023.07.21.14.13.57; Fri, 21 Jul 2023 14:14:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b="M/PwwIqI"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229824AbjGUUie (ORCPT + 99 others); Fri, 21 Jul 2023 16:38:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52418 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229566AbjGUUic (ORCPT ); Fri, 21 Jul 2023 16:38:32 -0400 Received: from mail-lj1-x22f.google.com (mail-lj1-x22f.google.com [IPv6:2a00:1450:4864:20::22f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9ED271701 for ; Fri, 21 Jul 2023 13:38:30 -0700 (PDT) Received: by mail-lj1-x22f.google.com with SMTP id 38308e7fff4ca-2b743161832so34700521fa.1 for ; Fri, 21 Jul 2023 13:38:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689971909; x=1690576709; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=yqqIc7aKNfw1AoPDdLMW6uHAgEFbeLCdE9gekNCu4dY=; b=M/PwwIqI59V50uzvt0exFtHh9+yRnpOV1yb2n5IU96sD6sldNtKBmNPYO+aHgO07Sb 9jxNL6fK/OGgQ+Nh+P56c1euonksbQhgybBOHA7Rplm/n5/IFxA4zLPt1mGySMyBnYwZ jlgMgx+dT6k+FMwUybQym0muhy75No4ejRKBBACIufTDrlz6/Wsv5UN0OL4RBdyKJC0z XC/pXE+u8YkaDalg91bDHSSiGXSLSLTaXuESKBZz6zRPH3venAKSPcgd4rZzxlvkg3jY A8LVQuSoFpwcZGKVXTIZ79wNQBazMi0skVxBTJLXLIiMClvnWpbFykCgt9TT6omOMOCr dlqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689971909; x=1690576709; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yqqIc7aKNfw1AoPDdLMW6uHAgEFbeLCdE9gekNCu4dY=; b=ALtXNm4Jp2a8nnqx6AgZxcM1/CJsQEfuUNaCKShOBiACVY+vwxMhxT5S4ZHYOItpeE B8mKEx9XjwGcc8yP/Avephfm60cr7LwZc3UkDLWBohNFPsasMdhb7zl1QvA7p8LY+tPZ EVtMwlJgE1pIeGThFDLhgm/gXpqmwy6f7KoxEbel0+Gk9HNhn/ynRzGFd075OdZ4V4iY T41Gtwq/xQkipW1vTSnRfhJFfBait0QjqvNxlaVpQdm1tULabbLaZuaCd3UaqbTqrQQl V+5xLzc7AKeohzzQCj302Cnbdz/TTjB8iEgQiq6hACFR6aW+e6v+guxgfcim5SfPZLgt LlZw== X-Gm-Message-State: ABy/qLYRbWtd49TF3m37qJRNKyUb0euf5o/CyfsKb50zcbwr6hZ4t6cV OZdQPMvi9CsqUhjOtXPceqBFP7cHvV/IosOumVDMYA== X-Received: by 2002:a2e:9dd3:0:b0:2b6:d8cf:2f44 with SMTP id x19-20020a2e9dd3000000b002b6d8cf2f44mr2086475ljj.13.1689971908448; Fri, 21 Jul 2023 13:38:28 -0700 (PDT) MIME-Version: 1.0 References: <20230720070825.992023-1-yosryahmed@google.com> <20230720153515.GA1003248@cmpxchg.org> In-Reply-To: From: Yosry Ahmed Date: Fri, 21 Jul 2023 13:37:51 -0700 Message-ID: Subject: Re: [RFC PATCH 0/8] memory recharging for offline memcgs To: Tejun Heo Cc: Johannes Weiner , Andrew Morton , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , "Matthew Wilcox (Oracle)" , Zefan Li , Yu Zhao , Luis Chamberlain , Kees Cook , Iurii Zaikin , "T.J. Mercier" , Greg Thelen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 21, 2023 at 12:18=E2=80=AFPM Tejun Heo wrote: > > Hello, > > On Fri, Jul 21, 2023 at 11:47:49AM -0700, Yosry Ahmed wrote: > > On Fri, Jul 21, 2023 at 11:26=E2=80=AFAM Tejun Heo wrot= e: > > > On Fri, Jul 21, 2023 at 11:15:21AM -0700, Yosry Ahmed wrote: > > > > On Thu, Jul 20, 2023 at 3:31=E2=80=AFPM Tejun Heo w= rote: > > > > > memory at least in our case. The sharing across them comes down t= o things > > > > > like some common library pages which don't really account for muc= h these > > > > > days. > > > > > > > > Keep in mind that even a single page charged to a memcg and used by > > > > another memcg is sufficient to result in a zombie memcg. > > > > > > I mean, yeah, that's a separate issue or rather a subset which isn't = all > > > that controversial. That can be deterministically solved by reparenti= ng to > > > the parent like how slab is handled. I think the "deterministic" part= is > > > important here. As you said, even a single page can pin a dying cgrou= p. > > > > There are serious flaws with reparenting that I mentioned above. We do > > it for kernel memory, but that's because we really have no other > > choice. Oftentimes the memory is not reclaimable and we cannot find an > > owner for it. This doesn't mean it's the right answer for user memory. > > > > The semantics are new compared to normal charging (as opposed to > > recharging, as I explain below). There is an extra layer of > > indirection that we did not (as far as I know) measure the impact of. > > Parents end up with pages that they never used and we have no > > observability into where it came from. Most importantly, over time > > user memory will keep accumulating at the root, reducing the accuracy > > and usefulness of accounting, effectively an accounting leak and > > reduction of capacity. Memory that is not attributed to any user, aka > > system overhead. > > That really sounds like the setup is missing cgroup layers tracking > persistent resources. Most of the problems you describe can be solved by > adding cgroup layers at the right spots which would usually align with th= e > logical structure of the system, right? It is difficult to track down all persistent/shareable resources and find the users, especially when both the resources and the users are dynamically changed. A simple example is text files for a shared library or sidecar processes that run with different workloads and need to have their usage charged to the workload, but they may have memory. For those cases there is no layering that would work. More practically, sometimes userspace just doesn't even know what exactly is being shared by whom. > > ... > > I believe recharging is being mis-framed here :) > > > > Recharging semantics are not new, it is a shortcut to a process that > > is already happening that is focused on offline memcgs. Let's take a > > step back. > > Yeah, it does sound better when viewed that way. I'm still not sure what > extra problems it solves tho. We experienced similar problems but AFAIK a= ll > of them came down to needing the appropriate hierarchical structure to > capture how resources are being used on systems. It solves the problem of zombie memcgs and unaccounted memory. It is great that in some cases an appropriate hierarchy structure fixes the problem by accurately capturing how resources are being shared, but in some cases it's not as straightforward. Recharging attempts to fix the problem in a way that is more consistent with current semantics and more appealing that reparenting in terms of rightful ownership. Some systems are not rebooted for months. Can you imagine how much memory can be accumulated at the root (escaping all accounting) over months of reparenting? > > Thanks. > > -- > tejun