Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76FF8C05027 for ; Fri, 3 Feb 2023 15:23:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230309AbjBCPXa (ORCPT ); Fri, 3 Feb 2023 10:23:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34920 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233175AbjBCPXO (ORCPT ); Fri, 3 Feb 2023 10:23:14 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D886AF0E6; Fri, 3 Feb 2023 07:21:12 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 44CEE34A4B; Fri, 3 Feb 2023 15:21:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1675437670; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=+71rVrs5ZtdXbUgSgq6MBdY/wG/sltNAW0iIaM9D09U=; b=k5S9UVYKf/aovkCa8B4Dl6MXE7YfOvX+foI5ZXfG/2sotHPrv7SJRxEqcpu6Q6LXB/Yne7 yf8F690NY4F0U9QNOOSmEAqkoR7wY0NbCc0oqX46KuZR9PO4nzXKkIChygBlio4kLsri57 xmr6ZvvrrXx1WRqSO0Tqp+oiDndKnQQ= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 25F211346D; Fri, 3 Feb 2023 15:21:10 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id VzHZBmYm3WP0OgAAMHmgww (envelope-from ); Fri, 03 Feb 2023 15:21:10 +0000 Date: Fri, 3 Feb 2023 16:21:09 +0100 From: Michal Hocko To: Roman Gushchin Cc: Marcelo Tosatti , Leonardo =?iso-8859-1?Q?Br=E1s?= , Johannes Weiner , Shakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Frederic Weisbecker Subject: Re: [PATCH v2 0/5] Introduce memcg_stock_pcp remote draining Message-ID: References: <20230125073502.743446-1-leobras@redhat.com> <9e61ab53e1419a144f774b95230b789244895424.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org OK, so this is a speculative patch. cpu_is_isolated doesn't exist yet. I have talked to Frederic off-list and he is working on an implementation. I will be offline next whole week (will be back Feb 13th) so I am sending this early but this patch cannot be merged without his one of course. I have tried to summarize the reasoning behind both approach should we ever need to revisit this approach. For now I strongly believe a simpler solution should be preferred. Roman, I have added your ack as you indicated but if you disagree with the reasoning please let me know. --- From 6d99c4d7a238809ff2eb7c362b45d002ca212c70 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Fri, 3 Feb 2023 15:54:38 +0100 Subject: [PATCH] memcg: do not drain charge pcp caches on remote isolated cpus Leonardo Bras has noticed that pcp charge cache draining might be disruptive on workloads relying on 'isolated cpus', a feature commonly used on workloads that are sensitive to interruption and context switching such as vRAN and Industrial Control Systems. There are essentially two ways how to approach the issue. We can either allow the pcp cache to be drained on a different rather than a local cpu or avoid remote flushing on isolated cpus. The current pcp charge cache is really optimized for high performance and it always relies to stick with its cpu. That means it only requires local_lock (preempt_disable on !RT) and draining is handed over to pcp WQ to drain locally again. The former solution (remote draining) would require to add an additional locking to prevent local charges from racing with the draining. This adds an atomic operation to otherwise simple arithmetic fast path in the try_charge path. Another concern is that the remote draining can cause a lock contention for the isolated workloads and therefore interfere with it indirectly via user space interfaces. Another option is to avoid draining scheduling on isolated cpus altogether. That means that those remote cpus would keep their charges even after drain_all_stock returns. This is certainly not optimal either but it shouldn't really cause any major problems. In the worst case (many isolated cpus with charges - each of them with MEMCG_CHARGE_BATCH i.e 64 page) the memory consumption of a memcg would be artificially higher than can be immediately used from other cpus. Theoretically a memcg OOM killer could be triggered pre-maturely. Currently it is not really clear whether this is a practical problem though. Tight memcg limit would be really counter productive to cpu isolated workloads pretty much by definition because any memory reclaimed induced by memcg limit could break user space timing expectations as those usually expect execution in the userspace most of the time. Also charges could be left behind on memcg removal. Any future charge on those isolated cpus will drain that pcp cache so this won't be a permanent leak. Considering cons and pros of both approaches this patch is implementing the second option and simply do not schedule remote draining if the target cpu is isolated. This solution is much more simpler. It doesn't add any new locking and it is more more predictable from the user space POV. Should the pre-mature memcg OOM become a real life problem, we can revisit this decision. Reported-by: Leonardo Bras Acked-by: Roman Gushchin Signed-off-by: Michal Hocko --- mm/memcontrol.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ab457f0394ab..55e440e54504 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2357,7 +2357,7 @@ static void drain_all_stock(struct mem_cgroup *root_memcg) !test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) { if (cpu == curcpu) drain_local_stock(&stock->work); - else + else if (!cpu_is_isolated(cpu)) schedule_work_on(cpu, &stock->work); } } -- 2.30.2 -- Michal Hocko SUSE Labs