Received: by 2002:a05:7412:3784:b0:e2:908c:2ebd with SMTP id jk4csp1483660rdb; Mon, 2 Oct 2023 10:50:58 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGSXx/JyGVbDQLTTrBVwXmrZQydPBjKRdlpD/2MB+DC9cKfO+UecPu2Mv4zJuylROKPTl6c X-Received: by 2002:a05:6830:188:b0:6c0:edaa:58a1 with SMTP id q8-20020a056830018800b006c0edaa58a1mr12733779ota.36.1696269057933; Mon, 02 Oct 2023 10:50:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696269057; cv=none; d=google.com; s=arc-20160816; b=uornKoVtZM5d9N3J1ZheAgoVilVpax1dfGBIZiAu5nEkqJEPKl0daQ4kaB3Q7VqgKV vSZ32RVxG4bVdoIVg034Ls23KsLyI5p6Ei8WXnVffwpYNHbw0g+JdgThpoiUGMnztfoY RafRP7KbqOEPxRfcjTH7oVcBToVS3Z9Ct+Gex9swjrDLanTbuLPFhjLhfGwzD+df2ETK rcpDFdMzGwTDKozxKKDtb4WYgVpyn+oJp7/Oeg1kZ0Hrrnduuvsd7QDshGa4WWMZGTHk GskASvLRnFFlo5Jaow1r0OEFxGK+qto97FIAY0KSuWUYmjuBrCurognYDvJ+DcigsA1f y/rw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=3flX8pvYQpulFjLzINdT1Z6OZiL8rcWR2w9QCAJvARw=; fh=TM2vs+QtB9CWJvoHiOTQBlTsmRP4CFyxdrHkUqzOXdU=; b=x2GQi/enB0HY14b84lhrJt7nA+8fduhU2fCmORsPj3/GmNtMHcVHNr2wbSHNQsLZV0 yGbe1Z4NhWJEJsgmjInyTvuU6PPqyM5P3i9wG+0b+3Pg4KHlYGTDX52jX9f6UFM4WQYa mdq3jZcjp+HtQMZWidrXoLJZlSbt1oktIVQqQ708l+l1zn47NXTPF4KKCySw+zr2nOcJ E+7d7QpuiTySJxEeQq3uOU4GpONSvwcT7sOVCnjCuDrE0Cc6E+O3qNWTbmpm6KDxFC+S ouD02JgFsqZMBjxgIApZVyx3PJtso+3eruAcGtK+MeShqbbXdDQ28Uisz7TEUEe8DGNp NRAQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=yZE5GGJ4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id s20-20020a656454000000b00578db71451fsi27121373pgv.501.2023.10.02.10.50.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Oct 2023 10:50:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=yZE5GGJ4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 4FC518069F07; Mon, 2 Oct 2023 07:50:35 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237867AbjJBOud (ORCPT + 99 others); Mon, 2 Oct 2023 10:50:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51610 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237862AbjJBOuc (ORCPT ); Mon, 2 Oct 2023 10:50:32 -0400 Received: from mail-qk1-x72d.google.com (mail-qk1-x72d.google.com [IPv6:2607:f8b0:4864:20::72d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94CC994 for ; Mon, 2 Oct 2023 07:50:29 -0700 (PDT) Received: by mail-qk1-x72d.google.com with SMTP id af79cd13be357-774141bb415so942180085a.3 for ; Mon, 02 Oct 2023 07:50:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1696258228; x=1696863028; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=3flX8pvYQpulFjLzINdT1Z6OZiL8rcWR2w9QCAJvARw=; b=yZE5GGJ4SlG3U2OihoNsmoYESUEUX5gx8uv32YURfDzbZcRCaiS276djCvfoUU37OM RKTBdflQLvVOia8YKXEW1tQbU9PMBN7khACVc2FzAvWomhRZ/+sKHG1bUAU0ojhRhnTt BFeth3Zp8JNz7TRbB8Z36+RGwS4ZfP94YfkmbxxFSZrVcIq72+kaBGZFvAohcrfg9NHo gYy0xUOjxvZ1WuhEhzj3vR7CGFkyMn2RJHI5rgXCxc8I2M7BdqjsL759mddEFGWx2KbU k4mcPFnyhaWBv3VpkBVa6t6qMN5v0pjhC8Rv9S82nXZkjk2z8b0Fok4Y1a4NGRb/Qezp z3xA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696258228; x=1696863028; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=3flX8pvYQpulFjLzINdT1Z6OZiL8rcWR2w9QCAJvARw=; b=odFUT+Edphduywkae93EAwL9i3hBSxcCMXxEuc+RoycPjkfP7bgPYia0POsDL0MUCe 1D7aSv21DlIISrGJ8+PNX1aL/NGCTehkGZ05tx/3ViseqivirjvhoEXQbTeaXO4qHm9M t+uOMYF+yOfJf4D8GzO0aF112J5tiynBRr1e4+k8UDWMbWPd/E5Q0biBMmeoaNmeNtjr So3f3hjpyH8DKu+ei47WbBlmnnwd0R+nQf3mvLVpZENcKsOWdH/JPdVpKM5HCTw6T3Rl DNOkB0nfaYZ3mbsGR/DKL9vqE4W4g8b3V/hmsVuIvi+XNsouIZlZHZY8EA+uz7yqXZOw Ld0A== X-Gm-Message-State: AOJu0YyHUaz/aJlpdjV9wDpZJRedtqeShpVfDSnE5Kj117NzbjYjeLz4 zaEfOlPH/DKSWw71Zj1UJZkAOA== X-Received: by 2002:a05:620a:1920:b0:773:ae67:4b96 with SMTP id bj32-20020a05620a192000b00773ae674b96mr13046574qkb.53.1696258228654; Mon, 02 Oct 2023 07:50:28 -0700 (PDT) Received: from localhost (2603-7000-0c01-2716-3012-16a2-6bc2-2937.res6.spectrum.com. [2603:7000:c01:2716:3012:16a2:6bc2:2937]) by smtp.gmail.com with ESMTPSA id ou2-20020a05620a620200b0076768dfe53esm8987922qkn.105.2023.10.02.07.50.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Oct 2023 07:50:28 -0700 (PDT) Date: Mon, 2 Oct 2023 10:50:26 -0400 From: Johannes Weiner To: Michal Hocko Cc: Nhat Pham , akpm@linux-foundation.org, riel@surriel.com, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com, shuah@kernel.org, mike.kravetz@oracle.com, yosryahmed@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH v2 1/2] hugetlb: memcg: account hugetlb-backed memory in memory controller Message-ID: <20231002145026.GB4414@cmpxchg.org> References: <20230928005723.1709119-1-nphamcs@gmail.com> <20230928005723.1709119-2-nphamcs@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Mon, 02 Oct 2023 07:50:35 -0700 (PDT) On Mon, Oct 02, 2023 at 03:43:19PM +0200, Michal Hocko wrote: > On Wed 27-09-23 17:57:22, Nhat Pham wrote: > > Currently, hugetlb memory usage is not acounted for in the memory > > controller, which could lead to memory overprotection for cgroups with > > hugetlb-backed memory. This has been observed in our production system. > > > > This patch rectifies this issue by charging the memcg when the hugetlb > > folio is allocated, and uncharging when the folio is freed (analogous to > > the hugetlb controller). > > This changelog is missing a lot of information. Both about the usecase > (we do not want to fish that out from archives in the future) and the > actual implementation and the reasoning behind that. > > AFAICS you have decided to charge on the hugetlb use rather than hugetlb > allocation to the pool. I suspect the underlying reasoning is that pool > pages do not belong to anybody. This is a deliberate decision and it > should be documented as such. > > It is also very important do describe subtle behavior properties that > might be rather unintuitive to users. Most notably > - there is no hugetlb pool management involved in the memcg > controller. One has to use hugetlb controller for that purpose. > Also the pre allocated pool as such doesn't belong to anybody so the > memcg host overcommit management has to consider it when configuring > hard limits. +1 > - memcg limit reclaim doesn't assist hugetlb pages allocation when > hugetlb overcommit is configured (i.e. pages are not consumed from the > pool) which means that the page allocation might disrupt workloads > from other memcgs. > - failure to charge a hugetlb page results in SIGBUS rather > than memcg oom killer. That could be the case even if the > hugetlb pool still has pages available and there is > reclaimable memory in the memcg. Are these actually true? AFAICS, regardless of whether the page comes from the pool or the buddy allocator, the memcg code will go through the regular charge path, attempt reclaim, and OOM if that fails.