Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp1337543rdh; Mon, 25 Sep 2023 09:35:09 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHmJOydtQGBRZhn4zu/hnVIAm5X6DQY7xbrPvazTTz/wyjZw9XYwf4UfBjOrsCE2U7CM4O7 X-Received: by 2002:a17:902:ef85:b0:1c5:ecfb:b6b9 with SMTP id iz5-20020a170902ef8500b001c5ecfbb6b9mr6848345plb.35.1695659709429; Mon, 25 Sep 2023 09:35:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695659709; cv=none; d=google.com; s=arc-20160816; b=Njqs9rhCDzDzi+2AHAbvY+65UrO8u/JKHV+gYKcWYlfcMJqmCBLR3XZEmchIbcGf3E H2+GpnjmDFAMBtK5UOirivNMovDU15knrNfPRz27CfOOCN6w2OIbaYbfQgHno/oTIqfy h0tKdZ1Cv8xVNKPiZOgUrPqPsOVVM/aj82dJsAo8kXHjmwfojyddvZGEzuozGf2E5igd XoR9JENRxQv9/nXlA4+1h9He85K1nWpurVktrHw7LoQQO6bShKOPM5vXpvWg9YilXbq3 9nLtvW43jQE1euGSMbDRiLm1IdZfk6M4/4xzmLtkzchpImMRnf+gHI1N+xhoyyhMfsWo dVrQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=cSu4wGVek0BpbH4wGaaZGcCAl5X/0fUdoCQft3qzdm0=; fh=DQHsOcJV3T2xW/zhgrI2WuLhykuHXW8AJ0OoJLpYpcA=; b=IWeBWIendTHm6ydYTN6Mz9aDuf2byFq1hRV9ROuycaE6FRJ/1gglgFQd8icGBGq2Yq 3OA1v8HP9uVSAHth/YoGS8oL4OfXnF9zahdSx5Az+rzx5CqREPhyZlgZ8nyLCN9E/27j MT9C2GIjxi48RprwLU1o+i+40RCEVFkf5ceG/Ve/X9qOQKj1Foocn2rGdvnflPKxLB9q LnGcBZ+slbQ0SO2fpeOTdk8n3A0zhWu/snERtWY/RQvP15b0N64G8NbYWXi4maxTYoic Dw0QQUwHKnGU0NgZDgqOZHvzQcl56jO0Rnxh7Lc6x6Mcf9hdjK2wGP3wKOn3acnybMtA 0BCw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=CNO4eidZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from pete.vger.email (pete.vger.email. [23.128.96.36]) by mx.google.com with ESMTPS id b9-20020a170902650900b001c5f71b145dsi5700772plk.162.2023.09.25.09.35.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 09:35:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) client-ip=23.128.96.36; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=CNO4eidZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 742C9807C5CD; Mon, 25 Sep 2023 00:40:19 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232198AbjIYHkO (ORCPT + 99 others); Mon, 25 Sep 2023 03:40:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231666AbjIYHkO (ORCPT ); Mon, 25 Sep 2023 03:40:14 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 99201DA; Mon, 25 Sep 2023 00:40:06 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 44D3C21857; Mon, 25 Sep 2023 07:40:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1695627605; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=cSu4wGVek0BpbH4wGaaZGcCAl5X/0fUdoCQft3qzdm0=; b=CNO4eidZTSFsEwCrbZKgUsxjnL0x2ozTqiff12xj46x8FFPA5YgDrohM6ns2BU5VGOaQ1F atOF8BNkP153oygRxIF49wztP6mMmKBQVsAqrPs+gnV1VGL6qdd1jzGPqDpNVSUTHq2bwb KSv8Q+xk7S/tEvdGUx0YTNIFepXXP5w= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 2697A13A67; Mon, 25 Sep 2023 07:40:05 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id ZSFfBlU5EWUqQQAAMHmgww (envelope-from ); Mon, 25 Sep 2023 07:40:05 +0000 Date: Mon, 25 Sep 2023 09:40:04 +0200 From: Michal Hocko To: Johannes Weiner , Andrew Morton Cc: Jeremi Piotrowski , Shakeel Butt , Roman Gushchin , Muchun Song , Greg Kroah-Hartman , stable@vger.kernel.org, patches@lists.linux.dev, Tejun Heo , linux-kernel@vger.kernel.org, regressions@lists.linux.dev, mathieu.tortuyaux@gmail.com Subject: Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes Message-ID: References: <20230922133017.GD124289@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230922133017.GD124289@cmpxchg.org> X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Mon, 25 Sep 2023 00:40:19 -0700 (PDT) On Fri 22-09-23 09:30:17, Johannes Weiner wrote: > On Thu, Sep 21, 2023 at 01:21:54PM +0200, Michal Hocko wrote: > > @@ -3097,6 +3097,7 @@ static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg, > > static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp, > > unsigned int nr_pages) > > { > > + struct page_counter *counter; > > struct mem_cgroup *memcg; > > int ret; > > > > @@ -3107,6 +3108,10 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp, > > goto out; > > > > memcg_account_kmem(memcg, nr_pages); > > + > > + /* There is no way to set up kmem hard limit so this operation cannot fail */ > > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) > > + WARN_ON(!page_counter_try_charge(&memcg->kmem, nr_pages, &counter)); > > This hunk doesn't look quite right. > > static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages) > { > mod_memcg_state(memcg, MEMCG_KMEM, nr_pages); > if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) { > if (nr_pages > 0) > page_counter_charge(&memcg->kmem, nr_pages); > else > page_counter_uncharge(&memcg->kmem, -nr_pages); > } > } > > Other than that, please add Good point. I have missed a8c49af3be5f ("memcg: add per-memcg total kernel memory stat") introduced in 4.18 > Acked-by: Johannes Weiner Fixed version below. Andrew, it seems we have a good consensus for this. Could you queue this up and send it to Linus please? --- From 8c3cbe68bba0fe5103d8fe73a06b3608ed49bda0 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Thu, 21 Sep 2023 09:38:29 +0200 Subject: [PATCH] mm, memcg: reconsider kmem.limit_in_bytes deprecation This reverts commits 86327e8eb94c ("memcg: drop kmem.limit_in_bytes") and partially reverts 58056f77502f ("memcg, kmem: further deprecate kmem.limit_in_bytes") which have incrementally removed support for the kernel memory accounting hard limit. Unfortunately it has turned out that there is still userspace depending on the existence of memory.kmem.limit_in_bytes [1]. The underlying functionality is not really required but the non-existent file just confuses the userspace which fails in the result. The patch to fix this on the userspace side has been submitted but it is hard to predict how it will propagate through the maze of 3rd party consumers of the software. Now, reverting alone 86327e8eb94c is not an option because there is another set of userspace which cannot cope with ENOTSUPP returned when writing to the file. Therefore we have to go and revisit 58056f77502f as well. There are two ways to go ahead. Either we give up on the deprecation and fully revert 58056f77502f as well or we can keep kmem.limit_in_bytes but make the write a noop and warn about the fact. This should work for both known breaking workloads which depend on the existence but do not depend on the hard limit enforcement. Note to backporters to stable trees. a8c49af3be5f ("memcg: add per-memcg total kernel memory stat") introduced in 4.18 has added memcg_account_kmem so the accounting is not done by obj_cgroup_charge_pages directly for v1 anymore. Prior kernels need to add it explicitly (thanks to Johannes for pointing this out). [1] http://lkml.kernel.org/r/20230920081101.GA12096@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net Cc: stable Fixes: 86327e8eb94c ("memcg: drop kmem.limit_in_bytes") Fixes: 58056f77502f ("memcg, kmem: further deprecate kmem.limit_in_bytes") Acked-by: Shakeel Butt Acked-by: Johannes Weiner Signed-off-by: Michal Hocko --- Documentation/admin-guide/cgroup-v1/memory.rst | 7 +++++++ mm/memcontrol.c | 14 ++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst index 5f502bf68fbc..ff456871bf4b 100644 --- a/Documentation/admin-guide/cgroup-v1/memory.rst +++ b/Documentation/admin-guide/cgroup-v1/memory.rst @@ -92,6 +92,13 @@ Brief summary of control files. memory.oom_control set/show oom controls. memory.numa_stat show the number of memory usage per numa node + memory.kmem.limit_in_bytes Deprecated knob to set and read the kernel + memory hard limit. Kernel hard limit is not + supported since 5.16. Writing any value to + do file will not have any effect same as if + nokmem kernel parameter was specified. + Kernel memory is still charged and reported + by memory.kmem.usage_in_bytes. memory.kmem.usage_in_bytes show current kernel memory allocation memory.kmem.failcnt show the number of kernel memory usage hits limits diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a4d3282493b6..63bdaab2a906 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3097,6 +3097,7 @@ static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg, static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp, unsigned int nr_pages) { + struct page_counter *counter; struct mem_cgroup *memcg; int ret; @@ -3867,6 +3868,13 @@ static ssize_t mem_cgroup_write(struct kernfs_open_file *of, case _MEMSWAP: ret = mem_cgroup_resize_max(memcg, nr_pages, true); break; + case _KMEM: + pr_warn_once("kmem.limit_in_bytes is deprecated and will be removed. " + "Writing any value to this file has no effect. " + "Please report your usecase to linux-mm@kvack.org if you " + "depend on this functionality.\n"); + ret = 0; + break; case _TCP: ret = memcg_update_tcp_max(memcg, nr_pages); break; @@ -5077,6 +5085,12 @@ static struct cftype mem_cgroup_legacy_files[] = { .seq_show = memcg_numa_stat_show, }, #endif + { + .name = "kmem.limit_in_bytes", + .private = MEMFILE_PRIVATE(_KMEM, RES_LIMIT), + .write = mem_cgroup_write, + .read_u64 = mem_cgroup_read_u64, + }, { .name = "kmem.usage_in_bytes", .private = MEMFILE_PRIVATE(_KMEM, RES_USAGE), -- 2.30.2 -- Michal Hocko SUSE Labs