Received: by 2002:a25:5b86:0:0:0:0:0 with SMTP id p128csp1667792ybb; Fri, 29 Mar 2019 08:56:14 -0700 (PDT) X-Google-Smtp-Source: APXvYqxtetMZkizReCRwo8SE6+KwmaHsDH35qmpGV28IJelWR4kacMGkY90a8/IYI6R4wwfzSS/o X-Received: by 2002:a63:4343:: with SMTP id q64mr16757132pga.105.1553874973914; Fri, 29 Mar 2019 08:56:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553874973; cv=none; d=google.com; s=arc-20160816; b=To2FvxR891QxksOBh7FTtXHNQgI6Gq6sJfvh9Vi3KS7DgUrJ+5y/uYxjNAxCSBxjIi +fDHDIa2cd+QtV9Q5tJYJuzevuprLRmBvMoD+QBdDCtv2v4yzoORq/NIX/+DGeGlbZxx snXiiLV5bpv929MGxYMP6TaAE+e1UgJ1zFnq7W0uh3Vjn99vBB95bZWQjjwRajs3FjZ7 cuOTnVUaxPl5MQh/KB1UTKjOAZT6CTrD8Kg+1NmZqKLseeAm3n+W8WycyQwczJtkFqzR IhzQtU85PAgZlS00maNJjZTFEG4xy/o+rAtpdXFw4w1Z2vHuLThiKIzcFmgMe8D5QZ5B qUXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=tjkcfhc0Ep9NbQLojkG4D6Rr/nOzsyp4ydQaHH2U3DU=; b=fSxrvfc6x7NUQGix47Kvk6wlEcHXqbOB0DN3OLwAs+YMWzLTxiX239FiAgeCV8pk+z DQ6MCbBWEiizMCFFfrD8C8KpCnHfdCNCe2QiyCsw1sDRCpbcwUR+jvT+Jm5Bar4xkPi4 N0w154IypaeGwF1bvaOzIpaJYiFbEcA9pj9ZH0oIyJWh8nXLj5v7YfRIeH1C5OpR7dXi BE4kDoFLgRwqkd9aGZspQX2KkvuzGp5VyhkA9G+0SsIOrR/S7H1FeMV5aUEStmNRilZd dRNaNkETtvdRKjWs6WB77Xv9qvCFNBL5HVimpxeGvmp21zw1nSlc7kwz5QTbkkH8Rhkx XqKw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=AxT8PF6T; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 22si2192760pgd.540.2019.03.29.08.55.58; Fri, 29 Mar 2019 08:56:13 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=AxT8PF6T; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729798AbfC2PzH (ORCPT + 99 others); Fri, 29 Mar 2019 11:55:07 -0400 Received: from mail-vs1-f68.google.com ([209.85.217.68]:41152 "EHLO mail-vs1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729658AbfC2PzG (ORCPT ); Fri, 29 Mar 2019 11:55:06 -0400 Received: by mail-vs1-f68.google.com with SMTP id g187so1570934vsc.8 for ; Fri, 29 Mar 2019 08:55:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=tjkcfhc0Ep9NbQLojkG4D6Rr/nOzsyp4ydQaHH2U3DU=; b=AxT8PF6TSmbJrui3ZUYKiCa5lh7j+ALt73j4h4c3hxg8++RgtqBv7M3lPxLcSQGNqa AWdx7bkTCP1H8hBLv3k2VzlHH7pasmL1Us478tQtENDYVXfWYKsdhBlq6LJmedAZA0kj YP/HK+oRnMIknrQmxDTVx3iIwouygxcSbZjRyw4neqsg03RRNNJE4Y2sVYK13w/f8bOX OB5F6KZrqWBQiQb9yXQWU/QCGZa+w57AQELD2QlD8WsQ6+7YV23bQiB1pVKfdLL1FuDr 0zDcuDoQJadtvnjj8cH8LMTGZUC0y2DhfqW2BzEWrJgWSO8tHx3yjzCJsbsuMka7itep MZUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=tjkcfhc0Ep9NbQLojkG4D6Rr/nOzsyp4ydQaHH2U3DU=; b=eFtre40zNa6xmCTvqOeBpIZEb+jVu51H2HL4eIi25lYpBHoukH90KG/mdsA+ZE91Ji fPvh2ExTKval/08eGPtj5Pb3ZrfRwMwoZU7GcL8HVxeMZWWEF94cyh4OwHS4l+HqigGL KbzAgWnhhLmsoQD5gEi2L881NI45xV2PDlcwbfXmwO2/NMeeyJh9/lZiX75sN3LsqKNP PoFAfjpqykkctO5EwykxGqIUVfkW0Q+SV9cexqXNP8SJJxrV9jQ7tHb1TJt7Bz8HJEI1 7E2Nk5qOkMFNAJsUnfXHHsjETWTsptK4uFyW65oK0j4fnf784x73lz1fTOv3rDVQq2S1 P0dg== X-Gm-Message-State: APjAAAVBZMKZZYhjZAEvWCQ7dpQ98HWkyNRtW6dntnfZhinHuQZzZoo7 W+bPam/hadBJdMkeNrcZqORR+32HtF3ePeuwg5BjPw== X-Received: by 2002:a67:eb57:: with SMTP id x23mr10794031vso.39.1553874904397; Fri, 29 Mar 2019 08:55:04 -0700 (PDT) MIME-Version: 1.0 References: <20190328162222.GO4102@linux.ibm.com> In-Reply-To: <20190328162222.GO4102@linux.ibm.com> From: Alexander Potapenko Date: Fri, 29 Mar 2019 16:54:52 +0100 Message-ID: Subject: Re: Potentially missing "memory" clobbers in bitops.h for x86 To: Paul McKenney Cc: Peter Zijlstra , "H. Peter Anvin" , Ingo Molnar , LKML , Dmitriy Vyukov , James Y Knight Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 28, 2019 at 5:22 PM Paul E. McKenney wr= ote: > > On Thu, Mar 28, 2019 at 03:14:12PM +0100, Alexander Potapenko wrote: > > Hello, > > > > arch/x86/include/asm/bitops.h defines clear_bit(nr, addr) for > > non-constant |nr| values as follows: > > > > void clear_bit(long nr, volatile unsigned long *addr) { > > asm volatile("lock; btr %1,%0" > > : "+m"(*(volatile long *)addr) > > : "Ir" (nr)); > > } > > (https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/bi= tops.h#L111) > > > > According to the comments in the file, |nr| may be arbitrarily large. > > However the assembly constraints only imply that the first unsigned > > long value at |addr| is written to. > > This may result in the compiler ignoring the effect of the asm directiv= e. > > > > Consider the following example (https://godbolt.org/z/naTmjn): > > > > #include > > void clear_bit(long nr, volatile unsigned long *addr) { > > asm volatile("lock; btr %1,%0" > > : "+m"(*(volatile long *)addr) > > : "Ir" (nr)); > > } > > > > unsigned long foo() { > > unsigned long addr[2] =3D {1, 2}; > > clear_bit(65, addr); > > return addr[0] + addr[1]; > > } > > > > int main() { > > printf("foo: %lu\n", foo()); > > } > > > > Depending on the optimization level, the program may print either 1 > > (for -O0 and -O1) or 3 (for -O2 and -O3). > > This is because on higher optimization levels GCC assumes that addr[1] > > is unchanged and directly propagates the constant to the result. > > > > I suspect the definitions of clear_bit() and similar functions are > > lacking the "memory" clobber. > > But the whole file tends to be very picky about whether this clobber > > needs to be applied in each case, so in the case of a performance > > penalty we may need to consider alternative approaches to fixing this > > code. > > Is there a way of indicating a clobber only on the specific location > affected? We're unaware of such a way. > I suppose that one approach would be to calculate in C code > a pointer to the specific element of the addr[] array, which would > put the specific clobbered memory location onto the outputs list. > Or keep the current calculation, but also add "addr[nr / sizeof(long)]" > to the output list, thus telling the compiler exactly what is being > clobbered. Assuming that actually works... That works, but sometimes GCC emits the code calculating addr[nr/64] before the BTR instruction for no reason. Adding this output for clear_bit() makes around 120 kernel functions change their sizes. I only checked a handful, but some contained code like: cmovns %rdi, %rax sarq $6, %rax leaq (%rsi,%rax,8), %rax Another idea we've come up with is to cast the pointer in BITOP_ADDR() to a very big array: https://godbolt.org/z/apHvDc Making so prevents the compiler from caching the assembly inputs, but has its own drawbacks. First, it may potentially yield incorrect assumptions about the size of an object passed into the assembly (which may lead to e.g. dropping some size checks) Second, for smaller arrays these casts may result in excessive clobbers (this happens in misc_register(), where GCC decides that the input parameter |misc| may alias with the global |misc_minors| bitmap). Applying the cast to clear_bit() only changes the size of 3 kernel functions: hub_port_reset(), misc_register(), tick_broadcast_setup_oneshot(). None of them have visible semantic changes. > Of course, this would force the compiler to actually compute the > offset, which would slow things down. I have no idea whether this > would be better or worse than just using the "memory" clobber. Just adding the "memory" clobber to clear_bit() changes sizes of 5 kernel functions (the three mentioned above, plus hub_activate() and native_send_call_func_ipi()) by a small margin. This probably means the performance impact of this clobber is negligible in this case. > Thanx, Paul > -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Stra=C3=9Fe, 33 80636 M=C3=BCnchen Gesch=C3=A4ftsf=C3=BChrer: Paul Manicle, Halimah DeLaine Prado Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg