Received: by 2002:a05:6a10:9afc:0:0:0:0 with SMTP id t28csp720450pxm; Fri, 25 Feb 2022 18:16:22 -0800 (PST) X-Google-Smtp-Source: ABdhPJzaWjfUrPuaFwSX7SNgcIWUQSXElHH4scCUnsE96tUewvVIgEpLdthWzX7j/WLSZGT4tbDe X-Received: by 2002:a17:90a:fe4:b0:1bc:1db6:c77f with SMTP id 91-20020a17090a0fe400b001bc1db6c77fmr5969857pjz.184.1645841782055; Fri, 25 Feb 2022 18:16:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645841782; cv=none; d=google.com; s=arc-20160816; b=cOMt40phy4wVzge0C0FFadZzDSVaL+jg+r26o3ZJJsC7kb/CIOTc46hwkXKXc+UGlC L0MvAA0axwmKQw2jGX0AkWLHA/ajB2hyyAuWYKnutLnQVn36neHDycxLSgnjRtzc2yyZ 1Bew0bf/JGk4Xql+Ao7IOivWbeMLnhk0feQKqglDqZU9z2p4DxtQzhRPweZBoQhERz5f hHMAZ7xWNIhlYNMB3GPjKJSmHeZCi55f5t1E8amvTLBV9COWGGdOHRWAY0w8i+grA2ug 9v8zwzuW89xc1U3U3iCSW9T5zq7yfU2sOEh7cQ2pu1y30tqA8yyCa3VrUInCEBRm+F5j t0Vw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=klw8Ps9MZ/dXrIweYMObuFwqcH6D5oxWCmqDKqtpaHI=; b=N/fnj6StLXvC8jSCFWuAnZjN+B+9cOQB/PYPPw0qIXcVFuCtgcAdsKi0i4a3+jdkRu /baf0f+6L7ooGYPri2b/vK8vRXzUHHkc8SruPmR/WsqGvYzbmJh5P8ocDjjMynfvtw73 DUrwn9jFQ8fZBO2CfCu4kB4YL1ivZcqSOufcJ4G254MmV8vUqtoH9XALxAXeIKN2d/Z1 PU+A95DQZLHDAmKWUJe7Czpr1D+NYdHlk1+peT5KDdR+cumhNx+3hsrxio8gscafMPav M0MJ3KMD6H7nM6s+vTFV9a26hbgHKwSsaDQbpa+mnFOuOjEqEUhV37I42pE/aZG1x/1g Ghzw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linux-foundation.org header.s=korg header.b=FNSyEFmn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id n3-20020a17090aab8300b001bc3aadbb67si3436735pjq.142.2022.02.25.18.16.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Feb 2022 18:16:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=fail header.i=@linux-foundation.org header.s=korg header.b=FNSyEFmn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 70AE0625B; Fri, 25 Feb 2022 17:49:37 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240171AbiBZA7V (ORCPT + 99 others); Fri, 25 Feb 2022 19:59:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54282 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239387AbiBZA7U (ORCPT ); Fri, 25 Feb 2022 19:59:20 -0500 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5BC69179A28; Fri, 25 Feb 2022 16:58:46 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id 56345CE27D8; Sat, 26 Feb 2022 00:58:45 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0AD25C340E7; Sat, 26 Feb 2022 00:58:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1645837123; bh=7jPwtC/LdmlqEz69iDS2lkoVpynptg4SU3kS78wjJ68=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=FNSyEFmnICgX6UEvl7Wu3fgt3zLc9lRn+wQwfYO3ZFHGcFj0Tm7HvAngITqEdnO9R oG/CeNsDZvz5lj6xIDYfZQa1MaaE7B3vXsgFrdppQ755Rrva+OEKwsXHQr0zIiPWzz SzfnJMh87Ix34h4oB3OTLsO7HE5ZdJctfEvRYRuk= Date: Fri, 25 Feb 2022 16:58:42 -0800 From: Andrew Morton To: Shakeel Butt Cc: =?ISO-8859-1?Q? "Michal_Koutn=FD" ?= , Johannes Weiner , Michal Hocko , Roman Gushchin , Ivan Babrou , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Daniel Dao , stable@vger.kernel.org Subject: Re: [PATCH] memcg: async flush memcg stats from perf sensitive codepaths Message-Id: <20220225165842.561d3a475310aeab86a2d653@linux-foundation.org> In-Reply-To: <20220226002412.113819-1-shakeelb@google.com> References: <20220226002412.113819-1-shakeelb@google.com> X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 25 Feb 2022 16:24:12 -0800 Shakeel Butt wrote: > Daniel Dao has reported [1] a regression on workloads that may trigger > a lot of refaults (anon and file). The underlying issue is that flushing > rstat is expensive. Although rstat flush are batched with (nr_cpus * > MEMCG_BATCH) stat updates, it seems like there are workloads which > genuinely do stat updates larger than batch value within short amount of > time. Since the rstat flush can happen in the performance critical > codepaths like page faults, such workload can suffer greatly. > > The easiest fix for now is for performance critical codepaths trigger > the rstat flush asynchronously. This patch converts the refault codepath > to use async rstat flush. In addition, this patch has premptively > converted mem_cgroup_wb_stats and shrink_node to also use the async > rstat flush as they may also similar performance regressions. Gee we do this trick a lot and gee I don't like it :( a) if we're doing too much work then we're doing too much work. Punting that work over to a different CPU or thread doesn't alter that - it in fact adds more work. b) there's an assumption here that the flusher is able to keep up with the producer. What happens if that isn't the case? Do we simply wind up the deferred items until the system goes oom? What happens if there's a producer running on each CPU? Can the flushers keep up? Pathologically, what happens if the producer is running task_is_realtime() on a single-CPU system? Or if there's a task_is_realtime() producer running on every CPU? The flusher never gets to run and we're dead? An obvious fix is to limit the permissible amount of windup (to what?) and at some point, do the flushing synchronously anyway. Or we just don't do any this at all and put up with the cost of the current code. I mean, this "fix" is kind of fake anyway, isn't it? Pushing the 4-10ms delay onto a different CPU will just disrupt something else which wanted to run on that CPU. The overall effect is to hide the impact from one particular testcase, but is the benefit really a real one?