Received: by 2002:a05:6358:4e97:b0:b3:742d:4702 with SMTP id ce23csp3916338rwb; Tue, 16 Aug 2022 10:54:25 -0700 (PDT) X-Google-Smtp-Source: AA6agR7ADNa27HuF42ZErrGKsYiiH6uIesi4I0SJQ0ZmjF7QXqiqzNAUety1mMSEilOpKakbY7Qu X-Received: by 2002:a17:90b:3d85:b0:1fa:62a2:ffd9 with SMTP id pq5-20020a17090b3d8500b001fa62a2ffd9mr17219325pjb.94.1660672465540; Tue, 16 Aug 2022 10:54:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660672465; cv=none; d=google.com; s=arc-20160816; b=o2IYgEcHTclgV3g1U3U/A2M/Vy53pA4RKpJUjVsPn1nFWmoZz73HWowIOmspf90Ftm KAz1GpC4ECXMZX8wRNUFSTCIqUCAQtlAztVsfv1BvBjJy9F0PWaaSxI6M3bNxs5VA//W RFzfOIwgBGnVaPbnNGNcXLNG8AGw0ErmeTeUzs5Q9utix350PUCgqFsH32v/mXfURXnF L0qhbFIuwj3sPB+i6UfwL65cLYM3CQtoCSKPJVHpVvtiWO7IO/rP+EoZYNfDVoBrlrFP WinfeJC2z7W248FZgT9wT5VTj69k7TIXZYjxc2u+6hm1tmiDXElfvjU2WIma8RFGd/7+ tMcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=kjqRKq9oJnes+t5wbhe3Qf46+H6wK6o+K4jvO1Tjuz8=; b=G12wMb29Ie1cZm5MjVXokC3ohT0mjtyVJjkld9qprPd6vOt9LoGZMz3Jj8NFUFWYQB VyZ8+r/WJYTzWesqyA2GKj0ptHqP9PfgeBukUGMEv9ubjObvsYbLVbdLEXtr2BFcZ+3K J5c34hKHFBVy1QaDvRgTbJUnKzBTwNiEkULbAT2+7v8S2t9uhC2DV97sffzzWeZnzQik AqcRmTZwO3ln6oDaO/xoynAQBlDMuPB0q/qTsZ1pssWVpD3zNsyX5i3pbFUc1H351gKD IRgXtPkUfLZTg1oXW0v93y34cjGsmrKF+GygWsExJc1ZFW+enmIjhWfaCLZ4UWvbzTLq JnNA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@solid-run-com.20210112.gappssmtp.com header.s=20210112 header.b=2Pp8bzj1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z12-20020a1709027e8c00b0016eccdd02cdsi79362pla.222.2022.08.16.10.54.12; Tue, 16 Aug 2022 10:54:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@solid-run-com.20210112.gappssmtp.com header.s=20210112 header.b=2Pp8bzj1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235602AbiHPRt6 (ORCPT + 99 others); Tue, 16 Aug 2022 13:49:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46838 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236241AbiHPRt4 (ORCPT ); Tue, 16 Aug 2022 13:49:56 -0400 Received: from mail-io1-xd2d.google.com (mail-io1-xd2d.google.com [IPv6:2607:f8b0:4864:20::d2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8FC8E804AF for ; Tue, 16 Aug 2022 10:49:54 -0700 (PDT) Received: by mail-io1-xd2d.google.com with SMTP id v185so8782728ioe.11 for ; Tue, 16 Aug 2022 10:49:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=solid-run-com.20210112.gappssmtp.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=kjqRKq9oJnes+t5wbhe3Qf46+H6wK6o+K4jvO1Tjuz8=; b=2Pp8bzj1MnGCU6V3vFJNsSKqqD/ziYTUDm2CF1NJhzXFBX0xc9mbyDXDKdd+QB6Wzg fnZJp2eKCWYuykufGDOSXGPsv7jAdNiq872uZ6+YS3FEFDrsSvziP6q9qSsi0Ckj3rGk XHwBXg1srh4Gm2YSVxSA6FWxO8UL52ni/TVzR7B/j2W6YFIoS6dNX9hcZaecsBB51/48 +kYqS4rCI7UtAK7/T6NtJ9t9wNKLMRAm8kGrTxwRKhYfTpMqr49Lyx3o36X27FzopYeX UT/Tl9jNtv43LRs9+KljN2KuIM+HYa4Z0ZT/eAtIJuPoK4dcHdAc27Zu4v9ChOlJn8pk BZMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=kjqRKq9oJnes+t5wbhe3Qf46+H6wK6o+K4jvO1Tjuz8=; b=Y7zN51h47tRIkoQLWyyb3U+JgaCg2UHQDkS0XAvV9eMKs7bgoUtqgKQ7HBHqOLqHKR YnzkPG29AFdj2f0FL1HYpD2jJSYbh1gDa3r3cFYkLzFG4sAlRA06uM/b/erg3PyJt/15 J9I6G44gx6WXieROks3/11PJjeLG9o57JU6A+gkfn4HNvn6ncq+rObHNWLNY2EDND3ai Drt/rPv75fYgycxBlEvlXkJJ3/5jIsWP5QPfzP7t8aYaBrTtrdJz9FwZote7mPmtSs3J VSINXb+XVjoJdc+FOJvW0jKF1h1b7/mX3HOx4wI/MR5RFv7G6CsL1fbGpGS1cQtgz7Pd 1k4w== X-Gm-Message-State: ACgBeo3wKqaAyXsmgjblOVSLdoNY3+ki1ycT7mnpKFig3RWNjkp+hT9z QNRVmIuJ2/9kp2rynVQgc19cajJBKuU0jDntB6oNNg== X-Received: by 2002:a05:6602:3cc:b0:679:61e7:3928 with SMTP id g12-20020a05660203cc00b0067961e73928mr9381040iov.217.1660672193863; Tue, 16 Aug 2022 10:49:53 -0700 (PDT) MIME-Version: 1.0 References: <20220816070311.89186-1-marcan@marcan.st> <20220816140423.GC11202@willie-the-truck> <20220816173654.GA11766@willie-the-truck> In-Reply-To: <20220816173654.GA11766@willie-the-truck> From: Jon Nettleton Date: Tue, 16 Aug 2022 19:49:16 +0200 Message-ID: Subject: Re: [PATCH] locking/atomic: Make test_and_*_bit() ordered on failure To: Will Deacon Cc: Hector Martin , Peter Zijlstra , Arnd Bergmann , Ingo Molnar , Alan Stern , Andrea Parri , Boqun Feng , Nicholas Piggin , David Howells , Jade Alglave , Luc Maranget , "Paul E. McKenney" , Akira Yokosawa , Daniel Lustig , Joel Fernandes , Mark Rutland , Jonathan Corbet , Tejun Heo , jirislaby@kernel.org, Marc Zyngier , Catalin Marinas , Oliver Neukum , Linus Torvalds , linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-doc@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Asahi Linux , stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 16, 2022 at 7:38 PM Will Deacon wrote: > > On Tue, Aug 16, 2022 at 11:30:45PM +0900, Hector Martin wrote: > > On 16/08/2022 23.04, Will Deacon wrote: > > >> diff --git a/include/asm-generic/bitops/atomic.h b/include/asm-generic/bitops/atomic.h > > >> index 3096f086b5a3..71ab4ba9c25d 100644 > > >> --- a/include/asm-generic/bitops/atomic.h > > >> +++ b/include/asm-generic/bitops/atomic.h > > >> @@ -39,9 +39,6 @@ arch_test_and_set_bit(unsigned int nr, volatile unsigned long *p) > > >> unsigned long mask = BIT_MASK(nr); > > >> > > >> p += BIT_WORD(nr); > > >> - if (READ_ONCE(*p) & mask) > > >> - return 1; > > >> - > > >> old = arch_atomic_long_fetch_or(mask, (atomic_long_t *)p); > > >> return !!(old & mask); > > >> } > > >> @@ -53,9 +50,6 @@ arch_test_and_clear_bit(unsigned int nr, volatile unsigned long *p) > > >> unsigned long mask = BIT_MASK(nr); > > >> > > >> p += BIT_WORD(nr); > > >> - if (!(READ_ONCE(*p) & mask)) > > >> - return 0; > > >> - > > >> old = arch_atomic_long_fetch_andnot(mask, (atomic_long_t *)p); > > >> return !!(old & mask); > > > > > > I suppose one sad thing about this is that, on arm64, we could reasonably > > > keep the READ_ONCE() path with a DMB LD (R->RW) barrier before the return > > > but I don't think we can express that in the Linux memory model so we > > > end up in RmW territory every time. > > > > You'd need a barrier *before* the READ_ONCE(), since what we're trying > > to prevent is a consumer from writing to the value without being able to > > observe the writes that happened prior, while this side read the old > > value. A barrier after the READ_ONCE() doesn't do anything, as that read > > is the last memory operation in this thread (of the problematic sequence). > > Right, having gone back to your litmus test, I now realise it's the "SB" > shape from the memory ordering terminology. It's funny because the arm64 > acquire/release instructions are RCsc and so upgrading the READ_ONCE() > to an *arm64* acquire instruction would work for your specific case, but > only because the preceeding store is a release. > > > At that point, I'm not sure DMB LD / early read / LSE atomic would be > > any faster than just always doing the LSE atomic? > > It depends a lot on the configuration of the system and the state of the > relevant cacheline, but generally avoiding an RmW by introducing a barrier > is likely to be a win. It just gets ugly here as we'd want to avoid the > DMB in the case where we end up doing the RmW. Possibly we could do > something funky like a test-and-test-and-test-and-set (!) where we do > the DMB+READ_ONCE() only if the first READ_ONCE() has the bit set, but > even just typing that is horrible and I'd _absolutely_ want to see perf > numbers to show that it's a benefit once you start taking into account > things like branch prediction. > > Anywho, since Linus has applied the patch and it should work, this is > just an interesting aside. > > Will > It is moot if Linus has already taken the patch, but with a stock kernel config I am still seeing a slight performance dip but only ~1-2% in the specific tests I was running. Sorry about the noise I will need to look at my kernel builder and see what went wrong when I have more time. Cheers, Jon