Received: by 2002:a05:6358:701b:b0:131:369:b2a3 with SMTP id 27csp395294rwo; Fri, 21 Jul 2023 13:44:30 -0700 (PDT) X-Google-Smtp-Source: APBJJlHwxbE6fca+d9R6t2Y9vE1ZI7yMH2SyZBaDBkUutUq/fXpmuSf/PHSwuvBqZZe06CVmxlNQ X-Received: by 2002:a05:6a00:23d3:b0:67e:ca79:36f0 with SMTP id g19-20020a056a0023d300b0067eca7936f0mr1348588pfc.0.1689972270108; Fri, 21 Jul 2023 13:44:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689972270; cv=none; d=google.com; s=arc-20160816; b=CfhvAJIlbK3yD2lCK9h9FWr6TWAPFnYg3WN6M/0BoSCtn8qiVefe4vBxzdyaaX9QDY Qf+DDapviHVilfq+tqZZZEaE4Y8oWKQq5Y0me2pAN1AyR8wKVUPMs5/bCumD3seMgsFn +8DboheDQ6mE1ELsJ+pu4EmAdFr7PdYm1Uhuy2tQp0+U40ictmxz7roQqoetDbgE4/cY LiumQI4JfpzjrHh7wNt6xdgRbWkbuR0P3oUuj7AKmDIE+dkmpPSS8Bn4NwgQFMYgyjYa umTfZRzzhBh7iKBJ8z7Embr1qwAoq95cA57kn6NVSNZgTwDyKN5UBZUmtqn+qKsEwVy4 8/Sg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:sender:dkim-signature; bh=+nDLHNSovJBpy7GRacDMrAtqiXNHt1dAkEqDxo04+EA=; fh=SCwsqN2ULYPOVetMiP4evZzfXVi9Vxa0nfF6kSzIwFU=; b=p2G/AIfFYQXRRMwK9p3fySV/V5SszidAsJGKxP4jEuTDxaOo1Rbe1F81RThXizoyQZ 6YiJeqjOtE0coVftTXCUD4HNQISQyGaUI2KAAL+ML76627iqqWfVDr6VZ5GYARRiOfxc r6XifFkV2ybUr1ERlixwHQpdZD0svI7NgU87DPQMdLhRii+i7VRwIUjhAiMvq8/czMDT 799sOuG84eV6aioUY4hs+4dEWSwiME2mkeUvLuEF3cOMRihBLZ0ubu9WPKC9j5XqgiKg 8sOfXyZRo+iVtXRkhKQWS13IafKOYNheWD50lmpddKd/k/ZpjBxDAC9pvMKn7J5VnTh7 BGhA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=byfrvwVR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w10-20020a056a0014ca00b0068291627221si3961546pfu.15.2023.07.21.13.44.17; Fri, 21 Jul 2023 13:44:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=byfrvwVR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229713AbjGUS0g (ORCPT + 99 others); Fri, 21 Jul 2023 14:26:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38040 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230454AbjGUS0e (ORCPT ); Fri, 21 Jul 2023 14:26:34 -0400 Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 639052D53; Fri, 21 Jul 2023 11:26:31 -0700 (PDT) Received: by mail-pl1-x634.google.com with SMTP id d9443c01a7336-1b8b318c5cfso16327105ad.1; Fri, 21 Jul 2023 11:26:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689963991; x=1690568791; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:sender :from:to:cc:subject:date:message-id:reply-to; bh=+nDLHNSovJBpy7GRacDMrAtqiXNHt1dAkEqDxo04+EA=; b=byfrvwVRXmfGcyKN0KQNikPRt347H2lBT+iMj0kENUcWDzoABxw2A1ju6iy+Q2GQMc 7q4YHsL7QX6EC58wN8XJAKbLpmTB0tYiZi/mNH+UYz2ZlHLJlRtD+M9pctqgDpqsJwg2 tbUp1cXMC6ADjVdFvQmS7GDyfZJ6aQZukkkYLUeTTOjQ20JnB3Z/yScrbSUiwBadHWIv rGS5FD8XV3pn6yHnleEwQCsgvfcPn1eOCbg81jEHC8934W+iElMm7hkxa+K+PxDONGBj YQFP4HzpB0eck75pBEN2A+0N918Eq2NtywUkigZa5YV6Hpt7vq70Ea1UKYJEtSuIEY3d SMzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689963991; x=1690568791; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:sender :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=+nDLHNSovJBpy7GRacDMrAtqiXNHt1dAkEqDxo04+EA=; b=ctRGlLc5AjxwMpoCJuiY2m1qJXtalRxg5i9v07jMvUEnFM8YschwwPgcQoI7l2P25O kJ9KxQ135KnGrdobDGF/BMZlLutKHTgN4rQp0XWDtmBU5IlYWvjYmAR5erDj3aaRijeN CpLiFLOXfUg1C6Q2T9Mxl43DbsfBkbRdswyDl3hcg0MUdFlHBgpw2BhX+pmQY2E42t3L HxwJzkfQ5Ynv++WZJe1w2L1JYm9LS/rnrhQzQSJJ84p99sWM9c6p19/VQUn/CVmFFAyo JsrgZDHisQnIfHJkKVLotjEXXFUwqqIwuLCk02nMAmcJTjzc+sPjHlWIQKo4kG0jYrd8 r2KQ== X-Gm-Message-State: ABy/qLbxM5j7NFW/O0CEcfCId9+ep+6SvwgZnsaA65bEtlzb/PYeiGOZ ZsNTdKkgXtIHYHQ4KyKx9sQPdwxnnv8= X-Received: by 2002:a17:902:eccd:b0:1b9:e97f:3846 with SMTP id a13-20020a170902eccd00b001b9e97f3846mr3436799plh.15.1689963990617; Fri, 21 Jul 2023 11:26:30 -0700 (PDT) Received: from localhost ([2620:10d:c090:400::5:fbd8]) by smtp.gmail.com with ESMTPSA id jw20-20020a170903279400b001b8b26fa6a9sm3853213plb.19.2023.07.21.11.26.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Jul 2023 11:26:30 -0700 (PDT) Sender: Tejun Heo Date: Fri, 21 Jul 2023 08:26:28 -1000 From: Tejun Heo To: Yosry Ahmed Cc: Johannes Weiner , Andrew Morton , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , "Matthew Wilcox (Oracle)" , Zefan Li , Yu Zhao , Luis Chamberlain , Kees Cook , Iurii Zaikin , "T.J. Mercier" , Greg Thelen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org Subject: Re: [RFC PATCH 0/8] memory recharging for offline memcgs Message-ID: References: <20230720070825.992023-1-yosryahmed@google.com> <20230720153515.GA1003248@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On Fri, Jul 21, 2023 at 11:15:21AM -0700, Yosry Ahmed wrote: > On Thu, Jul 20, 2023 at 3:31 PM Tejun Heo wrote: > > memory at least in our case. The sharing across them comes down to things > > like some common library pages which don't really account for much these > > days. > > Keep in mind that even a single page charged to a memcg and used by > another memcg is sufficient to result in a zombie memcg. I mean, yeah, that's a separate issue or rather a subset which isn't all that controversial. That can be deterministically solved by reparenting to the parent like how slab is handled. I think the "deterministic" part is important here. As you said, even a single page can pin a dying cgroup. > > > Keep in mind that the environment is dynamic, workloads are constantly > > > coming and going. Even if find the perfect nesting to appropriately > > > scope resources, some rescheduling may render the hierarchy obsolete > > > and require us to start over. > > > > Can you please go into more details on how much memory is shared for what > > across unrelated dynamic workloads? That sounds different from other use > > cases. > > I am trying to collect more information from our fleet, but the > application restarting in a different cgroup is not what is happening > in our case. It is not easy to find out exactly what is going on on This is the point that Johannes raised but I don't think the current proposal would make things more deterministic. From what I can see, it actually pushes it towards even less predictability. Currently, yeah, some pages may end up in cgroups which aren't the majority user but it at least is clear how that would happen. The proposed change adds layers of indeterministic behaviors on top. I don't think that's the direction we want to go. > machines and where the memory is coming from due to the > indeterministic nature of charging. The goal of this proposal is to > let the kernel handle leftover memory in zombie memcgs because it is > not always obvious to userspace what's going on (like it's not obvious > to me now where exactly is the sharing happening :) ). > > One thing to note is that in some cases, maybe a userspace bug or > failed cleanup is a reason for the zombie memcgs. Ideally, this > wouldn't happen, but it would be nice to have a fallback mechanism in > the kernel if it does. I'm not disagreeing on that. Our handling of pages owned by dying cgroups isn't great but I don't think the proposed change is an acceptable solution. Thanks. -- tejun