Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp3045558rdh; Thu, 28 Sep 2023 00:19:19 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGCy4pB4Q+06GhFhZdViRVTzCH71rnLZW9mADQ/4OEBbqDFjGA/Xa9bkRoA4nIekgfrmgzu X-Received: by 2002:a05:6a00:1813:b0:690:d620:7804 with SMTP id y19-20020a056a00181300b00690d6207804mr446825pfa.13.1695885558381; Thu, 28 Sep 2023 00:19:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695885558; cv=none; d=google.com; s=arc-20160816; b=vabskUWdemLpuliQsDbZwDLk86xHRa4LwpjV1NXmB+GRPY8FsizaOs0EezQkJWTOY1 n3ftxtsoGChEeao9dhgCo/si617IhZ9DQUV6S3INwNfiB9zsx2bYfT/VdYW6NXqhbKuc nZWOGpDHtv2SKkRK746cL7wEGSJUmc7HuLIg3rpVvWsP0f3gUfg0fpdN506G+1P9Eegr LcJntufrB7+CC5IPbzRGl5KhPvZBn1KoJSzbFzLx3ozQCYAunDvjsJHcoUg0U0ExlAjv 2wMMEi7uyOTMcv6fZ0IQ7caUHQgKM9/MQQDQx6scI4GIfBOxUZDb8P878Bhz2/NYKw8C nb9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:dkim-signature:date; bh=Fue+26Fi7AotWRkg3AT4B3mHGA43i1+EdYHYGE/Ep4Y=; fh=pVQ1GEwpRnRWxBD8aQXrm41UeZIcm9gtW9Gss6BKdwQ=; b=u4V3Y2tuayuHUkleXX7ux1oh0YVLBL+yMZCROpY8G4jmNUXhnX1cHi7D/C0ygjMe8W RfA32AN3t/lH5IqXoZFOQB2QeNdPxP6VkGf8G/3rxdQrwC9ystb1uXFCxwsE7BjzW/tg CGc0ol4tsEgOD8qFXCsczJbK3/YvAvrOEkcfPSwZl7IBeXHmJiSsg3kCp+3Sc7zLdSx0 799JjRGYWFsXmQziprA0+FPBU9emhOVlxh7QWXPMHUm4Xv6BeXrxS+NuvXkIX6HoGWO2 5Dm5wQzsJAflUd6EeX9Uk7wG2NF0OKlhZQd8VO1HUBoZ5YeEDd/MANg2UZ5W43pp/fld xNUQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=KKIpmjs6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id i190-20020a6387c7000000b0057ab7f46381si17364535pge.76.2023.09.28.00.19.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Sep 2023 00:19:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=KKIpmjs6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id BBEAF80EF26F; Wed, 27 Sep 2023 14:38:19 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229969AbjI0ViI (ORCPT + 99 others); Wed, 27 Sep 2023 17:38:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60856 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229746AbjI0ViH (ORCPT ); Wed, 27 Sep 2023 17:38:07 -0400 Received: from out-193.mta0.migadu.com (out-193.mta0.migadu.com [91.218.175.193]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B7C4ED6 for ; Wed, 27 Sep 2023 14:38:05 -0700 (PDT) Date: Wed, 27 Sep 2023 14:37:47 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1695850683; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Fue+26Fi7AotWRkg3AT4B3mHGA43i1+EdYHYGE/Ep4Y=; b=KKIpmjs6N2RDl4NoQebosFZc/837RlU9Q67hiMiiXzWdf8Ho9dWsfnrAZhwHmYazDbIVIc 5MThELoMpe7+7DOttUKIcQjnYbyC3R0i+uvwLgpxT0H3vhysicEt9Zz3rz+6MRCSp45NZG bITBCQdsIGuUN5iEPoPt2o4bmYasj5c= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Johannes Weiner Cc: Michal Hocko , Nhat Pham , akpm@linux-foundation.org, riel@surriel.com, shakeelb@google.com, muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com, shuah@kernel.org, mike.kravetz@oracle.com, yosryahmed@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH 0/2] hugetlb memcg accounting Message-ID: References: <20230926194949.2637078-1-nphamcs@gmail.com> <20230927184738.GC365513@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230927184738.GC365513@cmpxchg.org> X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Wed, 27 Sep 2023 14:38:19 -0700 (PDT) On Wed, Sep 27, 2023 at 02:47:38PM -0400, Johannes Weiner wrote: > On Wed, Sep 27, 2023 at 01:21:20PM +0200, Michal Hocko wrote: > > On Tue 26-09-23 12:49:47, Nhat Pham wrote: > > > Currently, hugetlb memory usage is not acounted for in the memory > > > controller, which could lead to memory overprotection for cgroups with > > > hugetlb-backed memory. This has been observed in our production system. > > > > > > This patch series rectifies this issue by charging the memcg when the > > > hugetlb folio is allocated, and uncharging when the folio is freed. In > > > addition, a new selftest is added to demonstrate and verify this new > > > behavior. > > > > The primary reason why hugetlb is living outside of memcg (and the core > > MM as well) is that it doesn't really fit the whole scheme. In several > > aspects. First and the foremost it is an independently managed resource > > with its own pool management, use and lifetime. > > Honestly, the simpler explanation is that few people have used hugetlb > in regular, containerized non-HPC workloads. > > Hugetlb has historically been much more special, and it retains a > specialness that warrants e.g. the hugetlb cgroup container. But it > has also made strides with hugetlb_cma, migratability, madvise support > etc. that allows much more on-demand use. It's no longer the case that > you just put a static pool of memory aside during boot and only a few > blessed applications are using it. > > For example, we're using hugetlb_cma very broadly with generic > containers. The CMA region is fully usable by movable non-huge stuff > until huge pages are allocated in it. With the hugetlb controller you > can define a maximum number of hugetlb pages that can be used per > container. But what if that container isn't using any? Why shouldn't > it be allowed to use its overall memory allowance for anon and cache > instead? Cool, I remember proposing hugetlb memcg stats several years ago and if I remember correctly at that time you was opposing it based on the idea that huge pages are not a part of the overall memcg flow: they are not a subject for memory pressure, can't be evicted, etc. And thp's were seen as a long-term replacement. Even though all above it's true, hugetlb has it's niche and I don't think thp's will realistically replace it any time soon. So I'm glad to see this effort (and very supportive) on making hugetlb more convenient and transparent for an end user. Thanks!