Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp32034461rwd; Fri, 7 Jul 2023 07:51:15 -0700 (PDT) X-Google-Smtp-Source: APBJJlE0CF4SLYrQ0/nUh4CUbkF3cjp/cG5gQU6q1TWKvZsNUDm/mPwjnP9vGXrBslF6XKlHKnwO X-Received: by 2002:a05:6a20:8418:b0:12f:fef8:ceba with SMTP id c24-20020a056a20841800b0012ffef8cebamr9106253pzd.18.1688741474688; Fri, 07 Jul 2023 07:51:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688741474; cv=none; d=google.com; s=arc-20160816; b=svjPv3wudDoyY+F1ZNTu6xxCeugv1v+gBTWUMtyDtJlvmNtW/i265Vo7uxnQStW/Z1 z/1IHY/cCQODnmkH/zqyzM4L/YXqTh49ZQCrvRrzwPQ2BCnIP4M544qTED6YZK+s+4Jd itJ7js8dW0PafGgagcxbx0iDHiEc30TjTEptyAylY/39Em64dLEEiFfNvbH1B6jCVpO+ 24bQujQB0miLtGrDqe5uHjC08lIdvuRRJxdfujUOh3TMhotU1v41WKqkf1zVLKTYrJs3 rQg1hpGY/msGrBeMoEINNOZgxw78QVWU4OsduDetO1ZTeU7c8vwJPmS7kIs+yqFZWApO gTKg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :organization:in-reply-to:subject:cc:to:from:dkim-signature; bh=bhn8v7JtjlZpmfuoLnCO1N3qwhAtEOh0aGai1WdW7pc=; fh=9wlsbmgaXWjhMpqFYvWobNPKi2spfzxFsPoeqlm1Buo=; b=VUbA930x7C6Z3Ib0ywhrtEYqiH7iHAjdzOrLqZCF9CzBaTHE/3TRBzzK9ana8jiOzo jn4nPQ5EQkzqM1gS0HEYWxnth/gGtAWwmRvb9xTkumltvd+voEXef7Jq7kEBoJls2ZLN ATIewwOrFRvJnjw7oAD9WObTCbIiaBSaYS66S50KrN1gi2PBFgx95Emb8Ehzjr2Yt1uP Je/IQpZbRuthx06n5bfBUr+kSwiC9uv18HewKzxrqeCO+rVnCreNbHk+sO+OiaVlClHI +wOBWF1t22qJdXG5TNKq0f8HK+tutHDvvoARE+AMm0y5M12Tr/cdrA70rp9QUKb9h5Vg FgTw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@efficios.com header.s=smtpout1 header.b=bO3Wwwnp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f1-20020a655901000000b005572b563e65si3936652pgu.305.2023.07.07.07.51.02; Fri, 07 Jul 2023 07:51:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=smtpout1 header.b=bO3Wwwnp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231284AbjGGOEN (ORCPT + 99 others); Fri, 7 Jul 2023 10:04:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40206 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229460AbjGGOEL (ORCPT ); Fri, 7 Jul 2023 10:04:11 -0400 Received: from smtpout.efficios.com (unknown [IPv6:2607:5300:203:b2ee::31e5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4167E11B; Fri, 7 Jul 2023 07:04:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1688738646; bh=5jbxwb1aPjrUPrfRW2dBw44je1ZzASY+efNvvqqX77E=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=bO3WwwnpXUmsPeGK/VzYP/BxzqqbcJ7yGTLa5EXcqB47OM/KCOQR6Ptnr0kd+OaoR 30bCgjYNw3wdiJH/NdPy47S/pkF3Pa6TUZFEzO6PaXd1dvc6lI5lYvSKA7Ougu/uGX 7AD+uLdlxq8Io9UN/Ct5sKi8eY9tsvDuH08qOA49etvJOVyArVsRSzTXYozDjMo8W2 xPd0RWL9TovbU1wfZasC2NkuyxuR6wQwVlJKNWdnvGYWctuT1vlPmL6xqry0tDAfWY hTgl2Y6JdrUMNmoYVC6StSKbi0STZqAoF0xMHH9xAD9td4KpECLi4ZhQxdfKTtxpyo O6uKAeMA1W4tA== Received: from localhost (modemcable094.169-200-24.mc.videotron.ca [24.200.169.94]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4QyFWB6FM1z1G4y; Fri, 7 Jul 2023 10:04:06 -0400 (EDT) From: Olivier Dion To: Peter Zijlstra Cc: Mathieu Desnoyers , rnk@google.com, Alan Stern , Andrea Parri , Will Deacon , Boqun Feng , Nicholas Piggin , David Howells , Jade Alglave , Luc Maranget , "Paul E. McKenney" , Nathan Chancellor , Nick Desaulniers , Tom Rix , linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, gcc@gcc.gnu.org, llvm@lists.linux.dev Subject: Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics In-Reply-To: <20230704094627.GS4253@hirez.programming.kicks-ass.net> Organization: EfficiOS References: <87ttukdcow.fsf@laura> <20230704094627.GS4253@hirez.programming.kicks-ass.net> Date: Fri, 07 Jul 2023 10:04:06 -0400 Message-ID: <87cz13hl7t.fsf@laura> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RDNS_NONE,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 04 Jul 2023, Peter Zijlstra wrote: > On Mon, Jul 03, 2023 at 03:20:31PM -0400, Olivier Dion wrote: [...] >> On x86-64 (gcc 13.1 -O2) we get: >> >> t0(): >> movl $1, x(%rip) >> movl $1, %eax >> xchgl dummy(%rip), %eax >> lock orq $0, (%rsp) ;; Redundant with previous exchange. >> movl y(%rip), %eax >> movl %eax, r0(%rip) >> ret >> t1(): >> movl $1, y(%rip) >> lock orq $0, (%rsp) >> movl x(%rip), %eax >> movl %eax, r1(%rip) >> ret > > So I would expect the compilers to do better here. It should know those > __atomic_thread_fence() thingies are superfluous and simply not emit > them. This could even be done as a peephole pass later, where it sees > consecutive atomic ops and the second being a no-op. Indeed, a peephole optimization could work for this Dekker, if the compiler adds the pattern for it. However, AFAIK, a peephole can not be applied when the two fences are in different basic blocks. For example, only emitting a fence on a compare_exchange success. This limitation implies that the optimization can not be done across functions/modules (shared libraries). For example, it would be interesting to be able to promote an acquire fence of a pthread_mutex_lock() to a full fence on weakly ordered architectures while preventing a redundant fence on strongly ordered architectures. We know that at least Clang has such peephole optimizations for some architecture backends. It seems however that they do not recognize lock-prefixed instructions as fence. AFAIK, GCC does not have that kind of optimization. We are also aware that some research has been done on this topic [0]. The idea is to use PRE for elimiation of redundant fences. This would work across multiple basic blocks, although the paper focus on intra-procedural eliminations. However, it seems that the latest work on that [1] has never been completed [2]. Our proposed approach provides a mean for the user to express -- and document -- the wanted semantic in the source code. This allows the compiler to only emit wanted fences, therefore not relying on architecture specific backend optimizations. In other words, this applies even on unoptimized binaries. [...] Thanks, Olivier [0] https://dl.acm.org/doi/10.1145/3033019.3033021 [1] https://discourse.llvm.org/t/fence-elimination-pass-proposal/33679 [2] https://reviews.llvm.org/D5758 -- Olivier Dion EfficiOS Inc. https://www.efficios.com