Received: by 2002:a05:6512:3d0e:0:0:0:0 with SMTP id d14csp14770lfv; Tue, 12 Apr 2022 15:19:47 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyx6iQx45WoO5nvIE1X0jChHqrWAUsl+0f+Ip9cDH3+y1dcc1Q9RTQZJfwB9XFrNB6B/yKX X-Received: by 2002:a63:f4e:0:b0:382:1e31:79e8 with SMTP id 14-20020a630f4e000000b003821e3179e8mr33039451pgp.167.1649801987669; Tue, 12 Apr 2022 15:19:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649801987; cv=none; d=google.com; s=arc-20160816; b=MbBAQpOuDdzPi5ClDMHNTPALbs6Saz5gtJZh5q+Q3uO7ZPgiCvvl1645DuLtd7jLGh BBvCR6eH+bYefrDzmhe8KGtGSciBfuamvqYcS05W5FXZ+rdGXLnRmqS/ERguQjqub0UY ffFlcfA8W/Rlyv5Xn8cyn2qhX8HOilkwo/XcQMiWuoiWN0gvpxJDd0Jo/LzwrooUXP8p N8Zpv9Sh7hr9yzrDdO63fqLqR8p5W4xsFne+MqbaFhvcRFEffUuSA3ida7Mz0DfY3fd4 +Va5QeN7+N00+aVxNmxJKkQ7o4T65xSurnTb7FCT5VztP2km+HHYAkLTEJp1ixQWTreG tgIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=CspZWIDtzO+nXH0KSmq3bf0uzBgKtJ7n/mkT+XDnhIw=; b=rhC0mQD3oswfFYHrQ8JyrKO5E6GCCYYaNwj757BDIlceQuL6OOfec1QVdrX2WeXJ8Z moryuGwkwsH+dUjsSDyPdafT3jkCcgRzbcRbubY+JaV6MjdJwLKgkFtq8XctyQqcH4+9 kEOUBX3GRkrYUV1t37fMh+71SZpYDcrqKN5Bh1UqNEn+RfmQTtYgfUdTGARgc28vhRqz Fozf7t5Mlm6+GZ/FUQLUWXPonNOA8QKI59RGrlagRLmiyD0ETlLipshZM7Cafh2VUSZe z1qaq75N49AeLEaT+OrceuwUqtKalUJRGJ+6fczGO1p6/kMzGdzi9aJ+gGG6Hmq/8Z4f 5Vaw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=noOI5ZLj; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id h62-20020a638341000000b003816043f008si3884548pge.509.2022.04.12.15.19.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Apr 2022 15:19:47 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=noOI5ZLj; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id DB3C3171544; Tue, 12 Apr 2022 14:01:13 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349133AbiDKSYB (ORCPT + 99 others); Mon, 11 Apr 2022 14:24:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39140 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347575AbiDKSYA (ORCPT ); Mon, 11 Apr 2022 14:24:00 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EF4BB2AD0; Mon, 11 Apr 2022 11:21:44 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 83EE0B817FD; Mon, 11 Apr 2022 18:21:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4691FC385A4; Mon, 11 Apr 2022 18:21:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1649701302; bh=PiNuPNk+6l26kgsR8fDzAxyPFcRQN+JgTZLJVWwDBYE=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=noOI5ZLjGQD7hoyTo16tgnjCgJouFmSZ63y24DCLm9z2eLrf6gclO4hSIRYLxcRGy nzm5MaqsbovJtDGsDzoRWxx2KFE3gaDNMsVBxRNx2qpXmCN9TvmQfiv8iT1xHEwXOZ 0Df90tz/qQUHJ/p5FeQKXS/GALQiSu/jUR90j04kKDRRWibWJt2qYGBL4vGQVgEOCM gezfbD2EylBDJKxiHlike2ZGNVWZclIEcISmRz6Lvy2Ct+tU+5kyZ2r2qrKWIsKcaw gJcTwlFlF8QDKJaWynKcLc1fz7F4LDdnxKliEy5g9N07PCS1lgwUMDlhj6onHR4IdZ OlGFgTjC3wnuw== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id EBCF05C03AE; Mon, 11 Apr 2022 11:21:41 -0700 (PDT) Date: Mon, 11 Apr 2022 11:21:41 -0700 From: "Paul E. McKenney" To: Paul =?iso-8859-1?Q?Heidekr=FCger?= Cc: Alan Stern , Andrea Parri , Will Deacon , Peter Zijlstra , Boqun Feng , Nicholas Piggin , David Howells , Jade Alglave , Luc Maranget , Akira Yokosawa , Daniel Lustig , Joel Fernandes , linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, llvm@lists.linux.dev, Marco Elver , Charalampos Mainas , Pramod Bhatotia Subject: Re: Dangerous addr to ctrl dependency transformation in fs/nfs/delegation.c::nfs_server_return_marked_delegations()? Message-ID: <20220411182141.GZ4285@paulmck-ThinkPad-P17-Gen-1> Reply-To: paulmck@kernel.org References: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 07, 2022 at 05:12:15PM +0200, Paul Heidekr?ger wrote: > Hi all, > > work on my dependency checker tool is progressing nicely, and it is > flagging, what I believe is, a harmful addr to ctrl dependency > transformation. For context, see [1] and [2]. I'm using the Clang > compiler. > > The dependency in question runs from line 618 into the for loop > increment, i.e. the third expresion in the for loop condition, in line > 622 of fs/nfs/delegation.c::nfs_server_return_marked_delegations(). > > I did my best to reverse-engineer some pseudocode from Clang's IR for > showing what I think is going on. First, thank you very much for doing this work! > Clang's unoptimised version: > > > restart: > > if(place_holder != NULL) > > delegation = rcu_dereference(place_holder->delegation); /* 618 */ > > if(delegation != NULL) > > if(delegation != place_holder_deleg) > > delegation = list_entry_rcu(server->delegations.next, struct nfs_delegation, super_list); /* 620 */ > > > > for( ; &(delegation)->super_list != &server->delegations; delegation = list_entry_rcu(delegation->super_list.next, typeof(*(delegation)), super_list)) { /* 622 */ > > /* > > * Can continue, "goto restart" or "goto break" (after loop). > > * Can reassign "delegation", "place_holder", "place_holder_deleg". > > * "delegation" might be assigned either a value depending on > > * "delegation" itself, i.e. it is part of the dependency chain, > > * or NULL. > > * Can modifiy fields of the "nfs_delegation" struct "delegation" > > * points to. > > * Assume line 618 has been executed and line 620 hasn't. Then, > > * there exists a path s.t. "delegation" isn't reassigned NULL > > * and the for loop's increment is executed. > > */ > > } > > Clang's optimised version: > > > restart: > > if(place_holder == NULL) { > > delegation = list_entry_rcu(server->delegations.next, struct nfs_delegation, super_list); > > } else { > > cmp = rcu_dereference(place_holder->delegation); /* 618 */ > > if(cmp != NULL) { /* Transformation to ctrl dep */ > > if(cmp == place_holder_deleg) { > > delegation = place_holder_deleg; > > } else { > > delegation = list_entry_rcu(server->delegations.nex, struct nfs_delegation, super_list); > > } > > } else { > > delegation = list_entry_rcu(server->delegations.next, struct nfs_delegation, super_list); > > } > > } > > > > for( ; &(delegation)->super_list != &server->delegations; delegation = list_entry_rcu(delegation->super_list.next, typeof(*(delegation)), super_list)) { > > /* > > * At this point, "delegation" cannot depend on line 618 anymore > > * since the "rcu_dereference()" was only used for an assignment > > * to "cmp" and a subsequent comparison (ctrl dependency). > > * Therefore, the loop increment cannot depend on the > > * "rcu_dereference()" either. The dependency chain has been > > * broken. > > */ > > } > > The above is an abstraction of the following control flow path in > "nfs_server_return_marked_delegations()": > > 1. When "nfs_server_return_marked_delegations()" gets called, it has no > choice but to skip the dependency beginning in line 620, since > "place_holder" is NULL the first time round. > > 2. Now take a path until "place_holder", the condition for the > dependency beginning, becomes true and "!delegation || delegation != > place_holder_deleg", the condition for the assignment in line 620, > becomes false. Then, enter the loop again and take a path to one of the > "goto restart" statements without reassigning to "delegation". > > 3. After going back to "restart", since the condition for line 618 > became true, "rcu_dereference()" into "delegation". > > 4. Enter the for loop again, but avoid the "goto restart" statements in > the first iteration and ensure that "&(delegation)->super_list != > &server->delegations", the loop condition, remains true and "delegation" > isn't assigned NULL. > > 5. When the for loop condition is reached for the second time, the loop > increment is executed and there is an address dependency. > > Now, why would the compiler decide to assign "place_holder_deleg" to > "delegation" instead of what "rcu_dereference()" returned? Here's my > attempt at explaining. > > In the pseudo code above, i.e. in the optimised IR, the assignment of > "place_holder_deleg" is guarded by two conditions. It is executed iff > "place_holder" isn't NULL and the "rcu_dereference()" didn't return > NULL. In other words, iff "place_holder != NULL && rcu_dereference() != > NULL" holds at line 617, then "delegation == rcu_dereference() == > place_holder_deleg" must hold at line 622. Otherwise, the optimisation > would be wrong. > > Assume control flow has just reached the first if, i.e. line 617, in > source code. Since "place_holder" isn't NULL, it will execute the first > if's body and "rcu_dereference()" into "delegation" (618). Now it has > reached the second if. Per our aussmption, "rcu_dereference()" returned > something that wasn't NULL. Therefore, "!delegation", the first part of > the second if condition's OR, will be false. > > However, if we want "rcu_dereference() == delegation" to hold after the > two if's, we can't enter the second if anyway, as it will overwrite > "delegation" with a value that might not be equal to what > "rcu_dereference()" returned. So, we want the second part of the second > if condition's OR, i.e. "delegation != place_holder_deleg" to be false > as well. > > When is that the case? It is the case when "delegation == > place_holder_deleg" holds. > > So, if we want "delegation == rcu_dereference() == place_holder_deleg" > to hold after the two if's, "place_holder != NULL && rcu_dereference() > != NULL" must hold before the two if's, which is what we wanted to show > and what the compiler figured out too. > > TL;DR: it appears the compiler optimisation is plausible, yet it still > breaks the address dependency. > > For those interested, I have made the unoptimised and optimised IR CFGs > available. In the optimised one, the interesting part is the transition > from "if.end" to "if.end13". > > Unoptimised: https://raw.githubusercontent.com/gist/PBHDK/700bf7bdf968fe25c82506de58143bbe/raw/54bf2c1e1a72fb30120f7e812f05ef01ca86b78f/O0-nfs_server_return_marked_delegations.svg > > Optimised: https://raw.githubusercontent.com/gist/PBHDK/700bf7bdf968fe25c82506de58143bbe/raw/54bf2c1e1a72fb30120f7e812f05ef01ca86b78f/O2-nfs_server_return_marked_delegations.svg > > What do you think? > > Many thanks, > Paul > > [1]: https://linuxplumbersconf.org/event/7/contributions/821/attachments/598/1075/LPC_2020_--_Dependency_ordering.pdf > [2]: https://lore.kernel.org/llvm/YXknxGFjvaB46d%2Fp@Pauls-MacBook-Pro/T/#u If I understand this correctly (rather unlikely), this stems from violating the following rule in Documentation/RCU/rcu_dereference.rst: ------------------------------------------------------------------------ - Be very careful about comparing pointers obtained from rcu_dereference() against non-NULL values. As Linus Torvalds explained, if the two pointers are equal, the compiler could substitute the pointer you are comparing against for the pointer obtained from rcu_dereference(). For example:: p = rcu_dereference(gp); if (p == &default_struct) do_default(p->a); Because the compiler now knows that the value of "p" is exactly the address of the variable "default_struct", it is free to transform this code into the following:: p = rcu_dereference(gp); if (p == &default_struct) do_default(default_struct.a); On ARM and Power hardware, the load from "default_struct.a" can now be speculated, such that it might happen before the rcu_dereference(). This could result in bugs due to misordering. However, comparisons are OK in the following cases: - The comparison was against the NULL pointer. If the compiler knows that the pointer is NULL, you had better not be dereferencing it anyway. If the comparison is non-equal, the compiler is none the wiser. Therefore, it is safe to compare pointers from rcu_dereference() against NULL pointers. - The pointer is never dereferenced after being compared. Since there are no subsequent dereferences, the compiler cannot use anything it learned from the comparison to reorder the non-existent subsequent dereferences. This sort of comparison occurs frequently when scanning RCU-protected circular linked lists. Note that if checks for being within an RCU read-side critical section are not required and the pointer is never dereferenced, rcu_access_pointer() should be used in place of rcu_dereference(). - The comparison is against a pointer that references memory that was initialized "a long time ago." The reason this is safe is that even if misordering occurs, the misordering will not affect the accesses that follow the comparison. So exactly how long ago is "a long time ago"? Here are some possibilities: - Compile time. - Boot time. - Module-init time for module code. - Prior to kthread creation for kthread code. - During some prior acquisition of the lock that we now hold. - Before mod_timer() time for a timer handler. There are many other possibilities involving the Linux kernel's wide array of primitives that cause code to be invoked at a later time. - The pointer being compared against also came from rcu_dereference(). In this case, both pointers depend on one rcu_dereference() or another, so you get proper ordering either way. That said, this situation can make certain RCU usage bugs more likely to happen. Which can be a good thing, at least if they happen during testing. An example of such an RCU usage bug is shown in the section titled "EXAMPLE OF AMPLIFIED RCU-USAGE BUG". - All of the accesses following the comparison are stores, so that a control dependency preserves the needed ordering. That said, it is easy to get control dependencies wrong. Please see the "CONTROL DEPENDENCIES" section of Documentation/memory-barriers.txt for more details. - The pointers are not equal *and* the compiler does not have enough information to deduce the value of the pointer. Note that the volatile cast in rcu_dereference() will normally prevent the compiler from knowing too much. However, please note that if the compiler knows that the pointer takes on only one of two values, a not-equal comparison will provide exactly the information that the compiler needs to deduce the value of the pointer. ------------------------------------------------------------------------ But it would be good to support this use case, for example, by having the compiler provide some way of marking the "delegation" variable as carrying a full dependency. Or did I miss a turn in here somewhere? Thanx, Paul