Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp4555460pxb; Tue, 2 Nov 2021 11:38:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy/A2FVLHl1tb9COYeHg/WUlieMSfohV2wd1w3BynIjh9IozWzRQrlcYEFcxhcClgRGQZsC X-Received: by 2002:a17:906:e85:: with SMTP id p5mr48001607ejf.159.1635878306359; Tue, 02 Nov 2021 11:38:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635878306; cv=none; d=google.com; s=arc-20160816; b=xKC+V0B+e4444niwSRKPslxCAiOtkhz/e5pd4L8+4tjHkryeWhAG58qloFt/YssPzg SuURYGcRbcKBvO0oLk0UmjHOfWSAazB21p/UNikt6eymGqVltMeedkC+T1sIpiaRvkET Uh5a2YhDs0FR38vDbb8sByAVjsNVR5HlEgMkLzd3W78vdaIuYF4dcW4OGv1iM17xMm0M j3gPGa2OatiSUcD7hC+c4cLHzKszvjyOait1kpGwWZFYxDHcVcB/1XkgJY7eFentBwk2 TsH4Hll3J4wtm/l4UI5vI/+xahTgo0rYfFX6uAPM0yZooJ8HfDn8fmZHgzQ/Rl//kBUy nrmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=ixyvZDamoz3mWqG9V6kNCfJ1dmbGQS14heiP44M7rJ0=; b=OP2j2oY3kxHVb8umPCOrHmxHS+fQ9YENs56J5Unr1yX5qaetBkYD0Mak5GZqV/j6oH 9mShHn9t6rml9UR852pFtvouX8CnHGrn/OfUNw4uBD3q3V2m8vDhJaXmCMz3s02PpdR4 8IEa34vcXlyJQZIG6RE7xkL2MEFoZ1H2GMLUoWb358zQJkxg6HkuhKCP7LqRxH+HZP98 ICpFcId+VKOhwQRV6bnQHApZFM1m78C2/lzGx+4IqSd81bLDPP+UdbQKJHzpwvmJMRke XLb0HO99pjwEw4nKbCE9AMIZLuL/DquvtuHAbNbqvGslLzRB2Ca2vyTY7Mlb/jrcpfJa gBnA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=tum.de Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b18si33045841ede.158.2021.11.02.11.38.01; Tue, 02 Nov 2021 11:38:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=tum.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234185AbhKBSii (ORCPT + 99 others); Tue, 2 Nov 2021 14:38:38 -0400 Received: from mail-out2.in.tum.de ([131.159.0.36]:48714 "EHLO mail-out2.informatik.tu-muenchen.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230361AbhKBSih (ORCPT ); Tue, 2 Nov 2021 14:38:37 -0400 Received: from mailrelay1.rbg.tum.de (mailrelay1.in.tum.de [131.159.254.14]) by mail-out2.informatik.tu-muenchen.de (Postfix) with ESMTP id 69C4A2400D6; Tue, 2 Nov 2021 19:36:00 +0100 (CET) Received: by mailrelay1.rbg.tum.de (Postfix, from userid 112) id 6736C466; Tue, 2 Nov 2021 19:36:00 +0100 (CET) Received: from mailrelay1.rbg.tum.de (localhost [127.0.0.1]) by mailrelay1.rbg.tum.de (Postfix) with ESMTP id 460C645A; Tue, 2 Nov 2021 19:36:00 +0100 (CET) Received: from mail.in.tum.de (mailproxy.in.tum.de [IPv6:2a09:80c0::78]) by mailrelay1.rbg.tum.de (Postfix) with ESMTPS id 43455451; Tue, 2 Nov 2021 19:36:00 +0100 (CET) Received: by mail.in.tum.de (Postfix, from userid 112) id 4067A4A0620; Tue, 2 Nov 2021 19:36:00 +0100 (CET) Received: (Authenticated sender: heidekrp) by mail.in.tum.de (Postfix) with ESMTPSA id ADD754A0115; Tue, 2 Nov 2021 19:35:59 +0100 (CET) (Extended-Queue-bit tech_cgasm@fff.in.tum.de) Date: Tue, 2 Nov 2021 19:35:57 +0100 From: Paul =?iso-8859-1?Q?Heidekr=FCger?= To: Alan Stern Cc: paulmck@kernel.org, will@kernel.org, peterz@infradead.org, boqun.feng@gmail.com, parri.andrea@gmail.com, linux-kernel@vger.kernel.org, llvm@lists.linux.dev, elver@google.com, charalampos.mainas@gmail.com, pramod.bhatotia@in.tum.de Subject: Re: Potentially Broken Address Dependency via test_bit() When Compiling With Clang Message-ID: References: <20211027142720.GB1319606@rowland.harvard.edu> <20211028143446.GA1351384@rowland.harvard.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20211028143446.GA1351384@rowland.harvard.edu> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 28, 2021 at 10:34:46AM -0400, Alan Stern wrote: > On Thu, Oct 28, 2021 at 02:37:47PM +0200, Paul Heidekr?ger wrote: > > On Wed, Oct 27, 2021 at 10:27:20AM -0400, Alan Stern wrote: > > > On Wed, Oct 27, 2021 at 12:19:48PM +0200, Paul Heidekr?ger wrote: > > > > > Address dependency in source code, lines 373 - 375 in fs/afs/addr_list.c: > > > > > > > > > [...] > > > > > index = READ_ONCE(ac->alist->preferred); > > > > > if (test_bit(index, &set)) > > > > > goto selected; > > > > > [...] > > > > > > > > where test_bit() expands to the following in > > > > include/asm-generic/bitops/non-atomic.h, lines 115 - 122: > > > > > > > > > static __always_inline int > > > > > arch_test_bit(unsigned int nr, const volatile unsigned long *addr) > > > > > { > > > > > return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1))); > > > > > } > > > > > #define test_bit arch_test_bit > > > > However, I can't follow the IR code. Can you please explain in ordinary > > > English how the LLVM compiler manages to lose track of this dependency? > > > > > > Alan Stern > > > > Here's what we think might be going on: > > - In 'arch_test_bit()', 'addr[BIT_WORD(nr)]' expands to 'addr[(nr) / 64]'. > > - Since 'addr' points to an 'unsigned long', any other result than '0' for > > '(nr) / 64' would be out of bounds and therefore undefined. > > - We assume LLVM is able to figure this out and use it to get rid of the > > address computation all together. > > Ah, that explains it. Yes, when set is a single unsigned long (or an > array of length 1), the address dependency is only syntactic, not > semantic. As a result, we should expect that compilers will sometimes > not preserve it. In LKMM's explanation.txt, lines 450 - 453 state: > A read event and another memory access event are linked by an address > dependency if the value obtained by the read affects the location > accessed by the other event. If we understand correctly, the ambiguity you're pointing out is that by looking at 'addr[BIT_WORD(nr)]', one could deduce an address dependency by seeing an array subscript operator, using a value which can be traced back to a 'READ_ONCE()' (syntactic). However, since 'addr' points to an 'unsigned long', the 'READ_ONCE()' never affects the location accessed in 'arch_test_bit()' as the offset computed in the subscript operator can only be, as clang identified, '0' (semantic). > The danger, of course, is that people relying on an ordering prescribed > by the LKMM may get fooled because (unbeknownst to them) the dependency > in question is not semantic. Do you think this would warrant a change to LKMM's explanation.txt to make this more explicit? > It would be great if a static checker > could detect such things -- but this would require some way for us to > inform the checker about when the code does rely on a dependency > ordering. The compiler passes we're working on will hopefully be able to do exactly that, not taking into account the programmer's intent of course. Many thanks, Paul