Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp544493rwd; Wed, 24 May 2023 23:52:17 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4UPkSGqMCwStQdYLSY1MQjk4r0+DPhiWCJrT4Xf/08aKkcEb2WX5IU9Xja8qZ9k2QlDn5i X-Received: by 2002:a17:90a:9202:b0:255:586e:acb8 with SMTP id m2-20020a17090a920200b00255586eacb8mr641245pjo.14.1684997536671; Wed, 24 May 2023 23:52:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684997536; cv=none; d=google.com; s=arc-20160816; b=YvUWosWIo3lGkfHj5Qm+b0B3p6p0Es/9pjn7I2xB5ele3pZ/uIjXRnRA/zRLvNI+9O w/SD6Id2TP8I1rIlMhOsCpIImw0idPO1SnMrISe+HVc1FcqEyUcdN8brZ5B/+CPo70GG BrZhCcUCJbdWY4aN9owPxJNj3++uT0pIsn/ct6+/E4NtRtwUky33OcGpj74Fp16S99PX 2zwvOPbMC8/a6jvMHZX7JttWaSEefptbAZm+kvrMy/H9raUUe2gWjEkwb2R4KzeF5VT8 RQef6DYxYjZrLhj20VD/J+LBli8OCTgZVsJYtgGXFsczp6jrxe5DuUSTYYc5HMhQmfM4 KA9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=H+juiSs8/r4vhTei6CJgEt//h9b/Y51gEXQ4E75Yfcc=; b=ogzbzN4mrC6It9x93m1aTa+i3avGJhJtG2M/AWpLP1NRKmL/NnhzRTvCrNDSj0BnKD wfPNj4zg9l2bF6CvHFGnyCY1KAPUQKCckv9PSEi12XRZELvhEsPuMP9ru2Y1cLMfK+Nw 4y5zaRIeeZ99SAV1OGn05uDgUioBS3hENr89SmB4GLj4w1u59GNgdvwWcxTlGq2taB4u nzTSVmiVQwYy9lq+XG+VOG9sqrxQalx30eUJKk5wnJD69T2Ax1guLUVqc3Y9n+12hqfr j0+XkfoBoCCdhgqJTvjiCzllrYAuTgT4URRpBNLSvJ6NlS+m8ZM+YD8iMNyAewjJx/7f +XWA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b="ah0j6tO/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b24-20020a17090acc1800b002299b06dca9si879605pju.83.2023.05.24.23.52.04; Wed, 24 May 2023 23:52:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b="ah0j6tO/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238590AbjEYGsF (ORCPT + 99 others); Thu, 25 May 2023 02:48:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37710 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230265AbjEYGsC (ORCPT ); Thu, 25 May 2023 02:48:02 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AC919C0 for ; Wed, 24 May 2023 23:48:00 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id CD37A21C45; Thu, 25 May 2023 06:47:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1684997278; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=H+juiSs8/r4vhTei6CJgEt//h9b/Y51gEXQ4E75Yfcc=; b=ah0j6tO/OvF9OOdn3tDraQVrEAVJlxXWXvENO3yZhCIzUqKAk9B2HTVj0zFSNw5nPVO7Dy wN/lfth+QXJL783C3shhbBe+5DTkCQCL/MLbUZJWL5exmQdiHri7CsHLbohjvlWJsQmNzZ PeUG+4vyRbrcqKF/GpA9KBxpu7WIZZ8= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 9EDC313356; Thu, 25 May 2023 06:47:58 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id A3xzJJ4Eb2SKBQAAMHmgww (envelope-from ); Thu, 25 May 2023 06:47:58 +0000 Date: Thu, 25 May 2023 08:47:57 +0200 From: Michal Hocko To: Marcelo Tosatti Cc: Christoph Lameter , Aaron Tomlin , Frederic Weisbecker , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Russell King , Huacai Chen , Heiko Carstens , x86@kernel.org, Vlastimil Babka Subject: Re: [PATCH v8 00/13] fold per-CPU vmstats remotely Message-ID: References: <20230515180015.016409657@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 24-05-23 10:53:23, Marcelo Tosatti wrote: > On Wed, May 24, 2023 at 02:51:55PM +0200, Michal Hocko wrote: > > [Sorry for a late response but I was conferencing last two weeks and now > > catching up] > > > > On Mon 15-05-23 15:00:15, Marcelo Tosatti wrote: > > [...] > > > v8 > > > - Add summary of discussion on -v7 to cover letter > > > > Thanks this is very useful! This helps to frame the further discussion. > > > > I believe the most important question to answer is this in fact > > > I think what needs to be done is to avoid new queue_work_on() > > > users from being introduced in the tree (the number of > > > existing ones is finite and can therefore be fixed). > > > > > > Agree with the criticism here, however, i can't see other > > > options than the following: > > > > > > 1) Given an activity, which contains a sequence of instructions > > > to execute on a CPU, to change the algorithm > > > to execute that code remotely (therefore avoid interrupting a CPU), > > > or to avoid the interruption somehow (which must be dealt with > > > on a case-by-case basis). > > > > > > 2) To block that activity from happening in the first place, > > > for the sites where it can be blocked (that return errors to > > > userspace, for example). > > > > > > 3) Completly isolate the CPU from the kernel (off-line it). > > > > I agree that a reliable cpu isolation implementation needs to address > > queue_work_on problem. And it has to do that _realiably_. This cannot by > > achieved by an endless whack-a-mole and chasing each new instance. There > > must be a more systematic approach. One way would be to change the > > semantic of schedule_work_on and fail call for an isolated CPU. The > > caller would have a way to fallback and handle the operation by other > > means. E.g. vmstat could simply ignore folding pcp data because an > > imprecision shouldn't really matter. Other callers might chose to do the > > operation remotely. This is a lot of work, no doubt about that, but it > > is a long term maintainable solution that doesn't give you new surprises > > with any new released kernel. There are likely other remote interfaces > > that would need to follow that scheme. > > > > If the cpu isolation is not planned to be worth that time investment > > then I do not think it is also worth reducing a highly optimized vmstat > > code. These stats are invoked from many hot paths and per-cpu > > implementation has been optimized for that case. > > It is exactly the same code, but now with a "LOCK" prefix for CMPXCHG > instruction. Which should not cost much due to cache locking (these are > per-CPU variables anyway). Sorry but just a LOCK prefix for a hot path is not a serious argument. > > If your workload would > > like to avoid that as disturbing then you already have a quiet_vmstat > > precedence so find a way how to use it for your workload instead. > > > > -- > > Michal Hocko > > SUSE Labs > > OK so an alternative solution is to completly disable vmstat updates > for isolated CPUs. Are you OK with that ? Yes, the number of events should be reasonably small and those places in the kernel which really need a precise value need to do a per-cpu walk anyway. IIRC /proc/vmstat et al also do accumulate pcp state. But let me reiterate. Even with vmstat updates out of the game, you have so many other sources of disruption that your isolated workload will be fragile until you actually try to deal with the problem on a more fundamental level. -- Michal Hocko SUSE Labs