Received: by 2002:a05:7412:f589:b0:e2:908c:2ebd with SMTP id eh9csp339760rdb; Tue, 31 Oct 2023 08:55:34 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEy3aGCFv0MyfkTQmeyGpPyT0wuU+xyWzs5YCua9/iDS494y0UL/QYpIMR0tah/p0f1eZF0 X-Received: by 2002:a17:90a:d34b:b0:27f:bd9e:5a15 with SMTP id i11-20020a17090ad34b00b0027fbd9e5a15mr10581659pjx.28.1698767733602; Tue, 31 Oct 2023 08:55:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698767733; cv=none; d=google.com; s=arc-20160816; b=S+BScvz58wcSNumiMocofC1WDDoouWgY2KKNOR4iEkp1LCshyrfLyInaWWl6NnJPNr vcNFDvjn3V9R6s7fY8oN9Snk2hyQE1vMf2RjyjMescyLKjHlWcYLda8rzFYbhuDAmg8+ 9B7vtNJGJG1w1NCe0vpV2kQTfKLegwzp1R2rm/28seh6oq/3dXQJZ6V9m/bu8fxX+4FQ 21dRwYUUxBVL2PH6oqjLSlmPoD/qYcv71W3vTnI7yHx5p6m9Rvo5lPyHT897OkLY8Bps sHLJxjSKObi0JRZoPrFSQ5mOzVy3Q7qgYULyTVqWCMmDlczNXLEsGPKI0KPIiJGoDWLC hHlg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:references:message-id :in-reply-to:subject:cc:to:from:date:dkim-signature:dkim-signature; bh=BGDnUmq+DoNfYWfUHq3dmosfieRm74PPa/1eL0fCpDU=; fh=MMk2EqUUzINjiNK8VSlmwwFE7OtbGnXT0d+d57UXT9o=; b=wCLwgCDij5T1VG8NCC/Y9WhYgSBS9aA48Ua/5NVvrP+MEt7hdUwGlwogG2zvW6r88T p6Sb6PDohIu1A7OoSQBexogHQLEpVQKBDAVXCW5B9CJfeS2p6TJ8cmAGC1T3/y/zlDqX ZCxviVDJpDgE17ukz3v/7Hq1nlkSPlmd3ldPokQBjyDnP3TPPuJyn33W8uhhRzYhc+JB 0ddT/u4fXs1UgQa8UseIv3oYHqVsDaa3vtY0Fa+JlzjX4u8bgH/+w3tyf7CgIiA3BhqS BXHfi7rJ4CvznCvMYhL7OE8DmsVACQ8QoL9Zy0IW0cvt+89Q/Cm6nKNBALypudUsyASi tjIw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=UHtwX61p; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519 header.b=WinZk0QE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Return-Path: Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id p19-20020a17090b011300b002802f5bf238si1132677pjz.31.2023.10.31.08.55.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Oct 2023 08:55:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=UHtwX61p; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519 header.b=WinZk0QE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id DD0D6805DC27; Tue, 31 Oct 2023 08:55:31 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344803AbjJaPz2 (ORCPT + 99 others); Tue, 31 Oct 2023 11:55:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37874 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344795AbjJaPz0 (ORCPT ); Tue, 31 Oct 2023 11:55:26 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A8CF4DA; Tue, 31 Oct 2023 08:55:23 -0700 (PDT) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 5D93E1F74D; Tue, 31 Oct 2023 15:55:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1698767722; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=BGDnUmq+DoNfYWfUHq3dmosfieRm74PPa/1eL0fCpDU=; b=UHtwX61pI7PonsdkBZwtiC+/18y3vUFI+sSv2ogbkVQcYvY9qikprRT8OSNuZgKdTn80g3 iKMtMU8tzSxUE6d51l7ZRaNtyvbiBTnOowUWbsF7HVjvJc8ee+Y10zN8bZyhhnSvp1lPdO bcSMz/5y726xEJbvLLdDtl3gg3j8VbY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1698767722; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=BGDnUmq+DoNfYWfUHq3dmosfieRm74PPa/1eL0fCpDU=; b=WinZk0QEnluiWfhEUGTEQiQgq58+wuZhRAAt1BxVah5Wq1OzhB9SZOCmRysMm65ltLiYI3 yEpFxMkh28CHstBg== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 8E3C02C23B; Tue, 31 Oct 2023 15:55:20 +0000 (UTC) Received: by wotan.suse.de (Postfix, from userid 10510) id C366F64A1; Tue, 31 Oct 2023 15:55:20 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by wotan.suse.de (Postfix) with ESMTP id BFE2963F8; Tue, 31 Oct 2023 15:55:20 +0000 (UTC) Date: Tue, 31 Oct 2023 15:55:20 +0000 (UTC) From: Michael Matz To: Peter Zijlstra cc: "Paul E. McKenney" , Frederic Weisbecker , LKML , Boqun Feng , Joel Fernandes , Josh Triplett , Mathieu Desnoyers , Neeraj Upadhyay , Steven Rostedt , Uladzislau Rezki , rcu , Zqiang , "Liam R . Howlett" , ubizjak@gmail.com Subject: Re: [PATCH 2/4] rcu/tasks: Handle new PF_IDLE semantics In-Reply-To: <20231031151645.GB15024@noisy.programming.kicks-ass.net> Message-ID: References: <20231027144050.110601-1-frederic@kernel.org> <20231027144050.110601-3-frederic@kernel.org> <20231027192026.GG26550@noisy.programming.kicks-ass.net> <2a0d52a5-5c28-498a-8df7-789f020e36ed@paulmck-laptop> <20231027224628.GI26550@noisy.programming.kicks-ass.net> <200c57ce-90a7-418b-9527-602dbf64231f@paulmck-laptop> <20231030082138.GJ26550@noisy.programming.kicks-ass.net> <622438a5-4d20-4bc9-86b9-f3de55ca6cda@paulmck-laptop> <20231031095202.GC35651@noisy.programming.kicks-ass.net> <20231031151645.GB15024@noisy.programming.kicks-ass.net> User-Agent: Alpine 2.20 (LSU 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Tue, 31 Oct 2023 08:55:32 -0700 (PDT) Hello, On Tue, 31 Oct 2023, Peter Zijlstra wrote: > > > For absolutely no reason :-( > > > > The reason is simple (and should be obvious): to adhere to the abstract > > machine regarding volatile. When x is volatile then x++ consists of a > > read and a write, in this order. The easiest way to ensure this is to > > actually generate a read and a write instruction. Anything else is an > > optimization, and for each such optimization you need to actively find an > > argument why this optimization is correct to start with (and then if it's > > an optimization at all). In this case the argument needs to somehow > > involve arguing that an rmw instruction on x86 is in fact completely > > equivalent to the separate instructions, from read cycle to write cycle > > over all pipeline stages, on all implementations of x86. I.e. that a rmw > > instruction is spec'ed to be equivalent. > > > > You most probably can make that argument in this specific case, I'll give > > you that. But why bother to start with, in a piece of software that is > > already fairly complex (the compiler)? It's much easier to just not do > > much anything with volatile accesses at all and be guaranteed correct. > > Even more so as the software author, when using volatile, most likely is > > much more interested in correct code (even from a abstract machine > > perspective) than micro optimizations. > > There's a pile of situations where a RmW instruction is actively > different vs a load-store split, esp for volatile variables that are > explicitly expected to change asynchronously. > > The original RmW instruction is IRQ-safe, while the load-store version > is not. If an interrupt lands in between the load and store and also > modifies the variable then the store after interrupt-return will > over-write said modification. > > These are not equivalent. Okay, then there you have it. Namely that LLVM has a bug (but see next paragraph). For volatile x, x++ _must_ expand to a separate read and write, because the abstract machine of C says so. If a RmW isn't equivalent to that, then it can't be used in this situation. If you _have_ to use a RmW for other reasons like interrupt safety, then a volatile variable is not the way to force this, as C simply doesn't have that concept and hence can't talk about it. (Of course it can't, as not all architectures could implement such, if it were required). (If an RmW merely gives you more guarantees than a split load-store then of course LLVM doesn't have a bug, but you said not-equivalent, so I'm assuming the worst, that RmW also has fewer (other) guarantees) > > > At least clang doesn't do this, it stays: > > > > > > 0403 413: 49 ff 45 00 incq 0x0(%r13) > > > > > > irrespective of the volatile. > > > > And, are you 100% sure that this is correct? Even for x86 CPU > > pipeline implementations that you aren't intimately knowing about? ;-) > > It so happens that the x86 architecture does guarantee RmW ops are > IRQ-safe or locally atomic. SMP/concurrent loads will observe either > pre or post but no intermediate state as well. So, are RMW ops a strict superset (vis the guarantees they give) of split load-store? If so we can at least say that using RMW is a valid optimization :) Still, an optmization only. > > But all that seems to be a side-track anyway, what's your real worry with > > the code sequence generated by GCC? > > In this case it's sub-optimal code, both larger and possibly slower for > having two memops. > > The reason to have volatile is because that's what Linux uses to > dis-allow store-tearing, something that doesn't happen in this case. A > suitably insane but conforming compiler could compile a non-volatile > memory increment into something insane like: > > load byte-0, r1 > increment r1 > store r1, byte-0 > jno done > load byte-1, r1 > increment ri > store r1, byte 1 > jno done > ... > done: > > We want to explicitly dis-allow this. Yeah, I see. Within C you don't have much choice than volatile for this :-/ Funny thing: on some architectures this is actually what is generated sometimes, even if it has multi-byte loads/stores. This came up recently on the gcc list and the byte-per-byte sequence was faster ;-) (it was rather: load-by-bytes, form whole value via shifts, increment, store-by-bytes) Insane codegen for insane micro-architectures! > I know C has recently (2011) grown this _Atomic thing, but that has > other problems. Yeah. So, hmm, I don't quite know what to say, you're between a rock and a hard place, I guess. You have to use volatile for its effects but then are unhappy about its effects :) If you can confirm the above about validity of the optimization, then at least there'd by a point for adding a peephole in GCC for this, even if current codegen isn't a bug, but I still wouldn't hold my breath. volatile is so ... ewww, it's best left alone. Ciao, Michael.