Received: by 2002:a05:7412:37c9:b0:e2:908c:2ebd with SMTP id jz9csp2980144rdb; Fri, 22 Sep 2023 14:07:55 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEbYNIuXCYiHoyJcSEBq7Gh1MLQb2jMm6SVJ6QyPuikRi3nW5Ie47NcwCbP2PsrTPFBP4Bg X-Received: by 2002:a17:902:d2c6:b0:1c5:6f43:9502 with SMTP id n6-20020a170902d2c600b001c56f439502mr1034126plc.14.1695416875651; Fri, 22 Sep 2023 14:07:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695416875; cv=none; d=google.com; s=arc-20160816; b=nGA/sGnl3/dHULOHzEhQXgX9E1JhXKewhnYi9WXRbUUAEsXYbGJ65b33hqR0eX/yyh RMoSuwOjzRjPgAERvaap/Xi97ET3oeQfejHuH0Sy7mMJzuH1oWeu67g7/7NP+l4Hp8Y5 UWQXskCprPfZJMzEg2J9POWtZe3aB4NO1yp5p9BFGBsZu6JWE0oF1N0nxX5KxKNYcJf2 zeRZCnSW78hdX16/Hu0zKKuIhMKk7hOjTsBtJhY1z1V0KlDP3RpSqDb2jLG1Ay99KpnN hfNw2yvGpnMeeGZxdAQHWjUqywhHooGholQd+6tNC8Bjlju6ykMLUkSMglfN116SMKLS 8Z0A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=4K5YBr9/tEzngbv3UTbthsjK/IjLYc1431CBjULW/cI=; fh=bmoqP5g1YqmTStufDxTkGWLDY4xDzineJ46gUvBmOzQ=; b=R4qG4nuMW/6689x+1PXcYSUS2STfmVgLfwvERIVkaQPTNh+o4Sv1kp6KfJORfLKv1P VzQJ53kRV8tBZokoNfV07uDGs2B8FT7e4LZA7ltlOGNp2gPcfXs2FbPCkj2YwL+pAxO1 NpNRYm7SX54WwZi4dm7W/GFGj6M7mCb8mEIrQltX6wmkNagHlYA8UFTgHPQ1UoJwa75S pGoo7GZpikriu797iP7qHJ4e0RIUfeaBtYc3GSfh3LFZCexPsY/eys6w05H39xRVFeJ3 PmWFVfRbq1vfTfjuACymGD7jVkkXjt7TQc/3nStxg0CFxkVDt3e5kYetqDfoIrwdSyG2 V2SA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=FjEdrQSs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id i12-20020a170902c94c00b001bde698074csi4121149pla.584.2023.09.22.14.07.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Sep 2023 14:07:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=FjEdrQSs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 2B102832FDF0; Thu, 21 Sep 2023 14:16:20 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232500AbjIUVPy (ORCPT + 99 others); Thu, 21 Sep 2023 17:15:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45510 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232894AbjIUVPB (ORCPT ); Thu, 21 Sep 2023 17:15:01 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A91979D442; Thu, 21 Sep 2023 10:56:54 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id E5B612229E; Thu, 21 Sep 2023 11:21:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1695295314; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4K5YBr9/tEzngbv3UTbthsjK/IjLYc1431CBjULW/cI=; b=FjEdrQSslwZHZzAMEniNGI7aZwN68qkF7uStSsT/dl2ljLF1akmGyCnmLTbOuXad5cjvhJ 56YHR3X13JuV5uOzU1Rj3/l56+9QzKeR55dD7xzo8EDQ3QnsNpL4mYuq86hAzlVIXXN2lY s1e7dKT1d2+T921R6NjvhAxzdLUTI3Q= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id D745B13513; Thu, 21 Sep 2023 11:21:54 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id D8wVNFInDGV/dAAAMHmgww (envelope-from ); Thu, 21 Sep 2023 11:21:54 +0000 Date: Thu, 21 Sep 2023 13:21:54 +0200 From: Michal Hocko To: Jeremi Piotrowski Cc: Shakeel Butt , Johannes Weiner , Roman Gushchin , Muchun Song , Greg Kroah-Hartman , stable@vger.kernel.org, patches@lists.linux.dev, Tejun Heo , Andrew Morton , linux-kernel@vger.kernel.org, regressions@lists.linux.dev, mathieu.tortuyaux@gmail.com Subject: Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes Message-ID: References: <4eb47d6a-b127-4aad-af30-896c3b9505b4@linux.microsoft.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Thu, 21 Sep 2023 14:16:20 -0700 (PDT) On Thu 21-09-23 12:43:05, Jeremi Piotrowski wrote: > On 9/21/2023 9:52 AM, Michal Hocko wrote: > > On Wed 20-09-23 14:46:52, Shakeel Butt wrote: > >> On Wed, Sep 20, 2023 at 1:08 PM Michal Hocko wrote: > >>> > >> [...] > >>>> have a strong opinion against it. Also just to be clear we are not > >>>> talking about full revert of 58056f77502f but just the returning of > >>>> EOPNOTSUPP, right? > >>> > >>> If we allow the limit to be set without returning a failure then we > >>> still have options 2 and 3 on how to deal with that. One of them is to > >>> enforce the limit. > >>> > >> > >> Option 3 is a partial revert of 58056f77502f where we keep the no > >> limit enforcement and remove the EOPNOTSUPP return on write. Let's go > >> with option 3. In addition, let's add pr_warn_once on the read of > >> kmem.limit_in_bytes as well. > > > > How about this? > > --- > > I'm OK with this approach. You're missing this in the patch below: > > // static struct cftype mem_cgroup_legacy_files[] = { > > + { > + .name = "kmem.limit_in_bytes", > + .private = MEMFILE_PRIVATE(_KMEM, RES_LIMIT), > + .write = mem_cgroup_write, > + .read_u64 = mem_cgroup_read_u64, > + }, Of course. I've lost the hunk while massaging the revert. Thanks for spotting. Updated version below. Btw. I've decided to not pr_{warn,info} on the read side because realistically I do not think this will help all that much. I am worried we will get stuck with this for ever because there always be somebody stuck on unpatched userspace. --- From bb6702b698efd31f3f90f4f1dd36ffe223397bec Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Thu, 21 Sep 2023 09:38:29 +0200 Subject: [PATCH] mm, memcg: reconsider kmem.limit_in_bytes deprecation This reverts commits 86327e8eb94c ("memcg: drop kmem.limit_in_bytes") and partially reverts 58056f77502f ("memcg, kmem: further deprecate kmem.limit_in_bytes") which have incrementally removed support for the kernel memory accounting hard limit. Unfortunately it has turned out that there is still userspace depending on the existence of memory.kmem.limit_in_bytes [1]. The underlying functionality is not really required but the non-existent file just confuses the userspace which fails in the result. The patch to fix this on the userspace side has been submitted but it is hard to predict how it will propagate through the maze of 3rd party consumers of the software. Now, reverting alone 86327e8eb94c is not an option because there is another set of userspace which cannot cope with ENOTSUPP returned when writing to the file. Therefore we have to go and revisit 58056f77502f as well. There are two ways to go ahead. Either we give up on the deprecation and fully revert 58056f77502f as well or we can keep kmem.limit_in_bytes but make the write a noop and warn about the fact. This should work for both known breaking workloads which depend on the existence but do not depend on the hard limit enforcement. [1] http://lkml.kernel.org/r/20230920081101.GA12096@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net Fixes: 86327e8eb94c ("memcg: drop kmem.limit_in_bytes") Fixes: 58056f77502f ("memcg, kmem: further deprecate kmem.limit_in_bytes") Signed-off-by: Michal Hocko --- Documentation/admin-guide/cgroup-v1/memory.rst | 7 +++++++ mm/memcontrol.c | 18 ++++++++++++++++++ 2 files changed, 25 insertions(+) diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst index 5f502bf68fbc..ff456871bf4b 100644 --- a/Documentation/admin-guide/cgroup-v1/memory.rst +++ b/Documentation/admin-guide/cgroup-v1/memory.rst @@ -92,6 +92,13 @@ Brief summary of control files. memory.oom_control set/show oom controls. memory.numa_stat show the number of memory usage per numa node + memory.kmem.limit_in_bytes Deprecated knob to set and read the kernel + memory hard limit. Kernel hard limit is not + supported since 5.16. Writing any value to + do file will not have any effect same as if + nokmem kernel parameter was specified. + Kernel memory is still charged and reported + by memory.kmem.usage_in_bytes. memory.kmem.usage_in_bytes show current kernel memory allocation memory.kmem.failcnt show the number of kernel memory usage hits limits diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a4d3282493b6..0b161705ef36 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3097,6 +3097,7 @@ static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg, static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp, unsigned int nr_pages) { + struct page_counter *counter; struct mem_cgroup *memcg; int ret; @@ -3107,6 +3108,10 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp, goto out; memcg_account_kmem(memcg, nr_pages); + + /* There is no way to set up kmem hard limit so this operation cannot fail */ + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) + WARN_ON(!page_counter_try_charge(&memcg->kmem, nr_pages, &counter)); out: css_put(&memcg->css); @@ -3867,6 +3872,13 @@ static ssize_t mem_cgroup_write(struct kernfs_open_file *of, case _MEMSWAP: ret = mem_cgroup_resize_max(memcg, nr_pages, true); break; + case _KMEM: + pr_warn_once("kmem.limit_in_bytes is deprecated and will be removed. " + "Writing any value to this file has no effect. " + "Please report your usecase to linux-mm@kvack.org if you " + "depend on this functionality.\n"); + ret = 0; + break; case _TCP: ret = memcg_update_tcp_max(memcg, nr_pages); break; @@ -5077,6 +5089,12 @@ static struct cftype mem_cgroup_legacy_files[] = { .seq_show = memcg_numa_stat_show, }, #endif + { + .name = "kmem.limit_in_bytes", + .private = MEMFILE_PRIVATE(_KMEM, RES_LIMIT), + .write = mem_cgroup_write, + .read_u64 = mem_cgroup_read_u64, + }, { .name = "kmem.usage_in_bytes", .private = MEMFILE_PRIVATE(_KMEM, RES_USAGE), -- 2.30.2 -- Michal Hocko SUSE Labs