Received: by 2002:a05:7412:3784:b0:e2:908c:2ebd with SMTP id jk4csp1935744rdb; Tue, 3 Oct 2023 05:54:52 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGeHnj0+Cv8YI/570xkLmjvr4/v8kIMbJm8lvxnWNAmZWMq7b4xr0ue7e6CP89W/v0WiUkx X-Received: by 2002:a05:6870:9581:b0:1bb:c0ee:5536 with SMTP id k1-20020a056870958100b001bbc0ee5536mr17565437oao.47.1696337691834; Tue, 03 Oct 2023 05:54:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696337691; cv=none; d=google.com; s=arc-20160816; b=nRCeTQxUoDI19zw4jyQz+NOcCkd5+Y+TbI2r/S58zSQvLvH3gTc17rmIblGtawGrMg RklBLFjaLbfuecv2jOKFhtTgQ6joYwtdp9NhfFLTyMpvnfonaye2hVsgTGit/nzzwG9y sFs2j+SS1uE9+TsVxy8eotI1U9PnRmAYmEKyELSDDRudcHRqDdqO1ZfdB7I63+bBIPh0 YGZnu5Zq47y6dPY7KMYIDmKCPq7WxCHjWR0Ak41Jwly5fPJzdT0FVJruAqGqCEI8SiK+ UihhXY9npXJ5wlFPCIrZSCyabiPb92skEC3KA9IGyKCDeC38HYnpGEgm33DkoaO9Bl/s Bw/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=JSjIWFluF7K4ottX8oDikBL846wHdq5PEeEcvlyqteQ=; fh=I/uh0PLt0Q7SD9oWHjfcRZzfVY85hnSkD2A9C1cdbOI=; b=LkxgJX8YysMjTgJmdfYF2wxYYjjjkwHGFLtR47dhj5uY92C45Y7W00Twm3U4hCANrr 90aq6DhVnCQW7xxc8e5QSgOdlGN3XTJDD4w06YVYOZvh5REwMAOoUKkxcZjIV9QUQqr+ LnHg1mGE5stPI1pYWhaAqk65sXGkbrp+LiYdW8E+CfGaU84bax+mKJbZpFvOTrVt7F0V aTkesKMiHQmbCq2CxYtqbHr0QjOcfdfpmmI5naeUDl7+5GmZWoBYiaH7sFykiwydNVI/ fqiHrkK3rJwqGAEYJO6BuJ5ZgKgPqWKrTqZf9BEAOfh4H9kje+awYUQj+V1aESYSs56L 4Bbg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=IxFMC2NN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id bv128-20020a632e86000000b0056c4189d6cbsi1417447pgb.171.2023.10.03.05.54.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Oct 2023 05:54:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=IxFMC2NN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id CCA0081896CD; Tue, 3 Oct 2023 05:54:50 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231266AbjJCMyv (ORCPT + 99 others); Tue, 3 Oct 2023 08:54:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57180 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229803AbjJCMyu (ORCPT ); Tue, 3 Oct 2023 08:54:50 -0400 Received: from mail-qt1-x836.google.com (mail-qt1-x836.google.com [IPv6:2607:f8b0:4864:20::836]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AAE0193 for ; Tue, 3 Oct 2023 05:54:46 -0700 (PDT) Received: by mail-qt1-x836.google.com with SMTP id d75a77b69052e-419787a43ebso6111091cf.1 for ; Tue, 03 Oct 2023 05:54:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1696337686; x=1696942486; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=JSjIWFluF7K4ottX8oDikBL846wHdq5PEeEcvlyqteQ=; b=IxFMC2NNkQu10Nxr1g6V3G21TwhjXCCuyBdS9DtckCzhWW4SiZ+sEeNJGInKjVyeyN SEpCheFUd4eVTRlaLrUuE6Afdc6kGDzh3ryq4UWU6ePzz9LipJgfWwt9BYb1yxHeroW1 QhQIKFB34WoBaqSuHSL3w3xLNqqoMwGAMtrkwiFSiQ8uufDsU9dZwzDY1uhoFLdk01rj cwShMmrPN3zuTFOYwNPP3bbMoInCkLNJEGcIaCL6Owvj23Ut64iaXSxo/U7XLBQN9nIK R7vx2DpfjBSNOeyBSlvwRUbITNFK8Mh6Uiu8Z7G+O5TDJhkl/jevdJmK1nWEB71p/zWh H9iQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696337686; x=1696942486; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=JSjIWFluF7K4ottX8oDikBL846wHdq5PEeEcvlyqteQ=; b=pHGUEIaKneaXdBiaaqSdDnq01lGd9XbqZsuTxwV9mympO3NKkMPF0tU3hUFCU23IJd UPC9d2nXooFz2/rktGvU5jcly8t3lmfpNsv+KSKZJMyJ30z7qLiDhrhK17/6DsS4lrnA PXGk8Xs5lNUTN//GDepnGkkkg+GXbUoah+Mc1/VR7R/ewqYPM94PDzbIOj7tzft+eC3T P7Ddqrp3XwW8Kp8iNcY42HHmKThQ92W9Z31NCynWqtkkqHCcpd6sweHt6uc3Wm2fHUNs +rGrt5Svl8Wtox1Vxijus1LMRNPbfKQSY3jug/hKXACBElSM40IuckRKR0gYZMVFtjW8 i5dQ== X-Gm-Message-State: AOJu0YwDi2FwhzxKR7BpnLQkk1537YoWsOoyLHUHCyyMYfDMU0/bSDPQ dPqi8TZFkiz1dyycQEgE1XWz2w== X-Received: by 2002:a05:622a:1a92:b0:416:5e11:f7ec with SMTP id s18-20020a05622a1a9200b004165e11f7ecmr18798717qtc.52.1696337685795; Tue, 03 Oct 2023 05:54:45 -0700 (PDT) Received: from localhost (2603-7000-0c01-2716-3012-16a2-6bc2-2937.res6.spectrum.com. [2603:7000:c01:2716:3012:16a2:6bc2:2937]) by smtp.gmail.com with ESMTPSA id c15-20020ac84e0f000000b004197d6d97c4sm413230qtw.24.2023.10.03.05.54.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Oct 2023 05:54:45 -0700 (PDT) Date: Tue, 3 Oct 2023 08:54:44 -0400 From: Johannes Weiner To: Nhat Pham Cc: akpm@linux-foundation.org, riel@surriel.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com, shuah@kernel.org, mike.kravetz@oracle.com, yosryahmed@google.com, fvdl@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH v3 2/3] hugetlb: memcg: account hugetlb-backed memory in memory controller Message-ID: <20231003125444.GB17012@cmpxchg.org> References: <20231003001828.2554080-1-nphamcs@gmail.com> <20231003001828.2554080-3-nphamcs@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231003001828.2554080-3-nphamcs@gmail.com> X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Tue, 03 Oct 2023 05:54:50 -0700 (PDT) On Mon, Oct 02, 2023 at 05:18:27PM -0700, Nhat Pham wrote: > Currently, hugetlb memory usage is not acounted for in the memory > controller, which could lead to memory overprotection for cgroups with > hugetlb-backed memory. This has been observed in our production system. > > For instance, here is one of our usecases: suppose there are two 32G > containers. The machine is booted with hugetlb_cma=6G, and each > container may or may not use up to 3 gigantic page, depending on the > workload within it. The rest is anon, cache, slab, etc. We can set the > hugetlb cgroup limit of each cgroup to 3G to enforce hugetlb fairness. > But it is very difficult to configure memory.max to keep overall > consumption, including anon, cache, slab etc. fair. > > What we have had to resort to is to constantly poll hugetlb usage and > readjust memory.max. Similar procedure is done to other memory limits > (memory.low for e.g). However, this is rather cumbersome and buggy. > Furthermore, when there is a delay in memory limits correction, (for e.g > when hugetlb usage changes within consecutive runs of the userspace > agent), the system could be in an over/underprotected state. > > This patch rectifies this issue by charging the memcg when the hugetlb > folio is utilized, and uncharging when the folio is freed (analogous to > the hugetlb controller). Note that we do not charge when the folio is > allocated to the hugetlb pool, because at this point it is not owned by > any memcg. > > Some caveats to consider: > * This feature is only available on cgroup v2. > * There is no hugetlb pool management involved in the memory > controller. As stated above, hugetlb folios are only charged towards > the memory controller when it is used. Host overcommit management > has to consider it when configuring hard limits. > * Failure to charge towards the memcg results in SIGBUS. This could > happen even if the hugetlb pool still has pages (but the cgroup > limit is hit and reclaim attempt fails). > * When this feature is enabled, hugetlb pages contribute to memory > reclaim protection. low, min limits tuning must take into account > hugetlb memory. > * Hugetlb pages utilized while this option is not selected will not > be tracked by the memory controller (even if cgroup v2 is remounted > later on). > > Signed-off-by: Nhat Pham Acked-by: Johannes Weiner