Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp3549274rdh; Thu, 28 Sep 2023 15:27:36 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFtieBQlOLwxqQj2j84LKX9I92lJwwG+2mUvz3alQeCpqBaSvF+1p93GQubLSkHsj2QJatu X-Received: by 2002:a17:90a:4a82:b0:268:5c3b:6f37 with SMTP id f2-20020a17090a4a8200b002685c3b6f37mr2616222pjh.0.1695940056565; Thu, 28 Sep 2023 15:27:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695940056; cv=none; d=google.com; s=arc-20160816; b=AlCNfJB44IwfKKpYpNhMHUtdsc/ssdsCCZJVKhFfMcPg3Nofcw/LE+buFHHjq+eeML 2jXt5XAINUkyoa+6fIEIESrKpuLCj3/zRovsBFwho0XcLe83bjaIhSUE8i8XbzBFSDKl kd40JEOzfz7ikKm7h1cG9jVarMDty6UxD0RqCytdRYDQUEqljt7GHApTCfRc1w1kJRmE Zy8kIvl0yTUKj58qnbjkZ2zmawGCgcnNaRZ6CLHEdCZqXchkBSy5TqUGa9NPh+3VdzSK tmQOvwRESZ3I3t8dTzHYkwpDeZNmWlevuD1y9LN5EUVesGWpZQp5gmlzh4K2XtW1bOyc Ao9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=JpkzH31Yq2C5TtSflxHZb2eMAQY8GuLcUePer8rUOKM=; fh=X9EYCUZyVil4yA+n6Ws+Wzqpb+9fy6g3PtObyAZdAm0=; b=tErBU1qE8RCCoV+Q5ISvKmm1bmtzzPIQG1fUqJlYjKFeJNYddSe7Xk50VsoJdqqUa7 4OiAnw9XJpuKMPE0JIKq0q6M1BRD7H5glsjX7weue43abBM/VEHH3ozNvp7sCvlYDWO6 IYOXtgijxwCcrX+0PsadV7XO6ONgrLnFn/tJFXKSdspIlsTR9t7HHGaCkKi9gfwsNbHM fT1IEsUCOJETtWWjPHrlrBBzaDFo1I3C6yKfX8AXLibIjfeSwzjf5j2Z60GLwhEDTxQU zYzI6D9/WCO6rLVAET45QZYOKhkqGSuEqFOoNNpxqDT83IjkdNXOQfMZAnC9eoaiOe9R 9KXQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=lD5VTD3x; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id y12-20020a170903010c00b001c5f9de7a35si14056353plc.109.2023.09.28.15.27.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Sep 2023 15:27:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=lD5VTD3x; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 7DE9F805A78D; Thu, 28 Sep 2023 05:52:57 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232091AbjI1Mwm (ORCPT + 99 others); Thu, 28 Sep 2023 08:52:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59520 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231293AbjI1Mwl (ORCPT ); Thu, 28 Sep 2023 08:52:41 -0400 Received: from mail-qv1-xf31.google.com (mail-qv1-xf31.google.com [IPv6:2607:f8b0:4864:20::f31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 82C96139 for ; Thu, 28 Sep 2023 05:52:39 -0700 (PDT) Received: by mail-qv1-xf31.google.com with SMTP id 6a1803df08f44-65afae9e544so50929526d6.2 for ; Thu, 28 Sep 2023 05:52:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1695905558; x=1696510358; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=JpkzH31Yq2C5TtSflxHZb2eMAQY8GuLcUePer8rUOKM=; b=lD5VTD3x8o44NyL6UsJRxVk0rhmdfVE3mmfdkY6BtjP/n5Wesv3dRC7i57Vv0jf8Lp FtTs8yHszU6Ct+4nbdVvY2L4dEETuPq3QP+P2oJfqvEWQsUQEeAJ7vrDD7x0a65Qtrfk W+7ErlyYE4ajE0EnVfexs+kDosZKJ6stFFjPZN1n1/JcFGWDEUmFXt8StoUTqBBmS5dl ZE6bslXpreaayIVShZ5mH0LtW+tRa722Siab9jiUlb3rOYNgDC0CeJCTx3iRjVz0amsL jWZSr9ZE6LJR1q7vvLyJkHHH3hWio6GUvycF6SAhRrAGeTNQoyNM2j2GQb0oVTa3lx/8 Yq1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695905558; x=1696510358; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=JpkzH31Yq2C5TtSflxHZb2eMAQY8GuLcUePer8rUOKM=; b=u3cz+5BswDrEvKnv5FF6DHeWk13HQroCCMs2OZ3iVJCVreMwb5RFHwLewQgJlrNUZN ZCZz5Uc9qWfzdlmioYxpLyoHcaXagnDuNA/ImBTXKVAoGprESS38n5/wIwWjZxFfot6I JPLKYFWJFyDZeSSb5yeJTBu/34DsK96XPfObhiLjeWvt6uzEztV2TU1EJMg07ICAqL4X q1gSqU7K0zfBvuRksP20Rogtt/3qtfV2BwRMXBqRImgjQ1gHuG0KjiucOTEUeMq5KNUG YT5iwfvl/MGzCiyJGZwAXgpq4tZY7a/q0QGK4lijr7OV4bbAky82VUdlNIE+XDjFLUR0 6cjQ== X-Gm-Message-State: AOJu0Yyia3nKh1LO02LjFBH/LGOjh9KF+bLXkq8cWsR02bNSzY9//PrK WBG1aoapXzXQsR3Fv8TkNkrUWA== X-Received: by 2002:a05:6214:1925:b0:64f:3795:c10 with SMTP id es5-20020a056214192500b0064f37950c10mr926742qvb.10.1695905558577; Thu, 28 Sep 2023 05:52:38 -0700 (PDT) Received: from localhost ([2620:10d:c091:400::5:ba06]) by smtp.gmail.com with ESMTPSA id r10-20020a0c9e8a000000b006562b70805bsm3806364qvd.84.2023.09.28.05.52.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Sep 2023 05:52:38 -0700 (PDT) Date: Thu, 28 Sep 2023 08:52:37 -0400 From: Johannes Weiner To: Roman Gushchin Cc: Michal Hocko , Nhat Pham , akpm@linux-foundation.org, riel@surriel.com, shakeelb@google.com, muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com, shuah@kernel.org, mike.kravetz@oracle.com, yosryahmed@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH 0/2] hugetlb memcg accounting Message-ID: <20230928125237.GA407389@cmpxchg.org> References: <20230926194949.2637078-1-nphamcs@gmail.com> <20230927184738.GC365513@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Thu, 28 Sep 2023 05:52:57 -0700 (PDT) On Wed, Sep 27, 2023 at 02:37:47PM -0700, Roman Gushchin wrote: > On Wed, Sep 27, 2023 at 02:47:38PM -0400, Johannes Weiner wrote: > > On Wed, Sep 27, 2023 at 01:21:20PM +0200, Michal Hocko wrote: > > > On Tue 26-09-23 12:49:47, Nhat Pham wrote: > > > > Currently, hugetlb memory usage is not acounted for in the memory > > > > controller, which could lead to memory overprotection for cgroups with > > > > hugetlb-backed memory. This has been observed in our production system. > > > > > > > > This patch series rectifies this issue by charging the memcg when the > > > > hugetlb folio is allocated, and uncharging when the folio is freed. In > > > > addition, a new selftest is added to demonstrate and verify this new > > > > behavior. > > > > > > The primary reason why hugetlb is living outside of memcg (and the core > > > MM as well) is that it doesn't really fit the whole scheme. In several > > > aspects. First and the foremost it is an independently managed resource > > > with its own pool management, use and lifetime. > > > > Honestly, the simpler explanation is that few people have used hugetlb > > in regular, containerized non-HPC workloads. > > > > Hugetlb has historically been much more special, and it retains a > > specialness that warrants e.g. the hugetlb cgroup container. But it > > has also made strides with hugetlb_cma, migratability, madvise support > > etc. that allows much more on-demand use. It's no longer the case that > > you just put a static pool of memory aside during boot and only a few > > blessed applications are using it. > > > > For example, we're using hugetlb_cma very broadly with generic > > containers. The CMA region is fully usable by movable non-huge stuff > > until huge pages are allocated in it. With the hugetlb controller you > > can define a maximum number of hugetlb pages that can be used per > > container. But what if that container isn't using any? Why shouldn't > > it be allowed to use its overall memory allowance for anon and cache > > instead? > > Cool, I remember proposing hugetlb memcg stats several years ago and if > I remember correctly at that time you was opposing it based on the idea > that huge pages are not a part of the overall memcg flow: they are not > a subject for memory pressure, can't be evicted, etc. And thp's were seen > as a long-term replacement. Even though all above it's true, hugetlb has > it's niche and I don't think thp's will realistically replace it any time > soon. Yeah, Michal's arguments very much reminded me of my stance then. I stand corrected. I'm still hopeful that we can make 2M work transparently. I'd expect 1G to remain in the hugetlb domain for some time to come, but even those are mostly dynamic now with your hugetlb_cma feature! > So I'm glad to see this effort (and very supportive) on making hugetlb > more convenient and transparent for an end user. Thanks!