Received: by 2002:a05:6358:7058:b0:131:369:b2a3 with SMTP id 24csp10258735rwp; Thu, 20 Jul 2023 17:52:50 -0700 (PDT) X-Google-Smtp-Source: APBJJlHzMQ4uNQG6tOQTutOODrEoZ3ZUXxx5UwR4oLwFBVk7fv97UeA+X2jiMBbqOExe+QrxClbD X-Received: by 2002:a05:6808:1495:b0:3a3:6cb2:d5bf with SMTP id e21-20020a056808149500b003a36cb2d5bfmr695258oiw.4.1689900770504; Thu, 20 Jul 2023 17:52:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689900770; cv=none; d=google.com; s=arc-20160816; b=voglWiEpdcTYfzhjXHKcYx0dJk4xvKSrNzRAT5GbRAKCCKjd93ffi4GbC20oV5Rpxl XIFvz/aeVUdaWM804cQl/olTydKFb9hcz0BZs2ck/KDe5cVIBud6y8Kt1QUabk3IFucp OevXNaRokslpkcco6FDRC5ZPtrGpJufwbv3563WrBZna/hJcwND7KUtveqF4WRcLYM+a me/JFvL0WGP4ZSTRISmC+XMVIvQ2+m2Rc1FD+/azr0nxXCZLECJ1hTQYEW6Dw35UfqmZ v04qXqdsMkWK9dBIbv9jKP04rzUdtqnUncVJfDVI32azSFPI8YPfbfNMpvY+90SrsSgU 4m/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=OaecvBVf9Flq0C+mZ5QEkg9NA82UBvH6EXy41+0INrA=; fh=qR7GKIeQWkMY16p+2VpkmZSX04z09qDJsNIkZH+Cx9g=; b=WJKxLUBqb6YIlwaxuDqYBjT/MPkCHweBFgONNqNF94kWeAbaesNU192NP/GkPrLI/n /TC75faO+CiK58Zhgw3BctYhflMI26/Pk9n0PhZhnlB3DeeFNYaivhzeJBhprSjEeV+Z mlhmAFwZ/6CzF9Z5PZZrVINeYW2VIyDjnD77n+ocsw+fqIHy6rvESMo9fFmdKeMq3PW2 UZcUEz8IArQ8ot76CKGqXotgfCzGxegUwtV20+PeICkWB3EiKDEm8mm2z0VG2J1vewiU b7lJvrVSYaOrQKfklSJ7pbhnhK1rOSCE83uBttihZ69Bv4n8m/FxNiRL/lT06tAlsOXP bKCQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=faHYAA+C; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n6-20020a632706000000b0055bb15b3dafsi1804898pgn.348.2023.07.20.17.52.36; Thu, 20 Jul 2023 17:52:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=faHYAA+C; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229597AbjGUAH4 (ORCPT + 99 others); Thu, 20 Jul 2023 20:07:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55854 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229477AbjGUAHz (ORCPT ); Thu, 20 Jul 2023 20:07:55 -0400 Received: from mail-ej1-x62a.google.com (mail-ej1-x62a.google.com [IPv6:2a00:1450:4864:20::62a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D31281982 for ; Thu, 20 Jul 2023 17:07:52 -0700 (PDT) Received: by mail-ej1-x62a.google.com with SMTP id a640c23a62f3a-992acf67388so198354566b.1 for ; Thu, 20 Jul 2023 17:07:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689898071; x=1690502871; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=OaecvBVf9Flq0C+mZ5QEkg9NA82UBvH6EXy41+0INrA=; b=faHYAA+C7baJNH4ty1xusZ1GAMizOuhswFoWstiefxPx5kIsdmuE9U8cUmQHhOeHrk 5D0dko+MIjrr37VaAnnddqI178VcnobmLffPqhhTmjyA/ZJqzxsB9P3o3tXGl1KOXupN 41KeL8AVmEyVjcLgywZInS05YyxV5osfdNAJfDYZdF/xdJZRYucW4OPzDbbYhaIpxG+h Brosl06kzi75w9/BHG8kM7Porua9q9p8IGdBTIMIO5EoUvyy8aXd9EGU6n3AVAUF8ezX 36fwLy4GUQZnwDJwnTMrYZAckVf5TRZDItY9IV2nC9OuSWoD/25PQk0tp63E/TydQY1B YQ/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689898071; x=1690502871; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OaecvBVf9Flq0C+mZ5QEkg9NA82UBvH6EXy41+0INrA=; b=j0FCEBBvGTZMam1BGa1C67WiB4KKRF0RPi0MhaqFXOtdruWrpNFO6bL/NT9rTmN2vb 1vgT75y/emQW0RpCorJTQ6RR6vB7zYNQqMZ/GlbtIpBrYOQSaGKWkk4Juw07DfHcv09j sdJPEt1mcbPkuszCT5H0uMuAKeNa2iDQCOpHeEkJAUutgS7lWdQTpa4ZOXDjWRwQEu91 x89d1ohRCoDBSANdYjw70Zpo0lnWpWdJ8dxbvCSMYuTkQb8hfw4nItiuAozsgOcRMNe0 WN3Fem/aroL63ZwQ5gLl4cEJyDRRAuXToMBNqCT8AWNrUEQOueOaJ2o7IMS4fttEzhoy zE+w== X-Gm-Message-State: ABy/qLYoLAxGwfM9Y9kw0ruLb6OqZZubXAg8aQ2D0DbwRX8kziOgWgoI y+xfmlb22oLk+Gg4Yu3QBn+gMxPsEGiRxbpQ+PGj2w== X-Received: by 2002:a17:906:59:b0:99b:4525:e06c with SMTP id 25-20020a170906005900b0099b4525e06cmr287734ejg.55.1689898071090; Thu, 20 Jul 2023 17:07:51 -0700 (PDT) MIME-Version: 1.0 References: <20230720070825.992023-1-yosryahmed@google.com> In-Reply-To: From: Yosry Ahmed Date: Thu, 20 Jul 2023 17:07:14 -0700 Message-ID: Subject: Re: [RFC PATCH 0/8] memory recharging for offline memcgs To: Roman Gushchin Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , "Matthew Wilcox (Oracle)" , Tejun Heo , Zefan Li , Yu Zhao , Luis Chamberlain , Kees Cook , Iurii Zaikin , "T.J. Mercier" , Greg Thelen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 20, 2023 at 5:02=E2=80=AFPM Roman Gushchin wrote: > > On Thu, Jul 20, 2023 at 07:08:17AM +0000, Yosry Ahmed wrote: > > This patch series implements the proposal in LSF/MM/BPF 2023 conference > > for reducing offline/zombie memcgs by memory recharging [1]. The main > > difference is that this series focuses on recharging and does not > > include eviction of any memory charged to offline memcgs. > > > > Two methods of recharging are proposed: > > > > (a) Recharging of mapped folios. > > > > When a memcg is offlined, queue an asynchronous worker that will walk > > the lruvec of the offline memcg and try to recharge any mapped folios t= o > > the memcg of one of the processes mapping the folio. The main assumptio= n > > is that a process mapping the folio is the "rightful" owner of the > > memory. > > > > Currently, this is only supported for evictable folios, as the > > unevictable lru is imaginary and we cannot iterate the folios on it. A > > separate proposal [2] was made to revive the unevictable lru, which > > would allow recharging of unevictable folios. > > > > (b) Deferred recharging of folios. > > > > For folios that are unmapped, or mapped but we fail to recharge them > > with (a), we rely on deferred recharging. Simply put, any time a folio > > is accessed or dirtied by a userspace process, and that folio is charge= d > > to an offline memcg, we will try to recharge it to the memcg of the > > process accessing the folio. Again, we assume this process should be th= e > > "rightful" owner of the memory. This is also done asynchronously to avo= id > > slowing down the data access path. > > Unfortunately I have to agree with Johannes, Tejun and others who are not= big > fans of this approach. > > Lazy recharging leads to an interesting phenomena: a memory usage of a ru= nning > workload may suddenly go up only because some other workload is terminate= d and > now it's memory is being recharged. I find it confusing. It also makes ha= rd > to set up limits and/or guarantees. This can happen today. If memcg A starts accessing some memory and gets charged for it, and then memcg B also accesses it, it will not be charged for it. If at a later point memcg A runs into reclaim, and the memory is freed, then memcg B tries to access it, its usage will suddenly go up as well, because some other workload experienced reclaim. This is a very similar scenario, only instead of reclaim, the memcg was offlined. As a matter of fact, it's common to try to free up a memcg before removing it (by lowering the limit or using memory.reclaim). In that case, the net result would be exactly the same -- with the difference being that recharging will avoid freeing the memory and faulting it back in. > > In general, I don't think we can handle shared memory well without gettin= g rid > of "whoever allocates a page, pays the full price" policy and making a sh= ared > ownership a fully supported concept. Of course, it's a huge work and I be= lieve > the only way we can achieve it is to compromise on the granularity of the > accounting. Will the resulting system be better in the real life, it's ha= rd to > say in advance. > > Thanks!