Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp54656pxb; Tue, 12 Apr 2022 16:38:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxYpXrEblFKuHrYT59XZwVd6H8uPisl4pq51yJYjV7JmqbowhhGczF3tpErl9FnQftBuVDr X-Received: by 2002:a05:6a00:1687:b0:4e1:45d:3ded with SMTP id k7-20020a056a00168700b004e1045d3dedmr40672607pfc.0.1649806683683; Tue, 12 Apr 2022 16:38:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649806683; cv=none; d=google.com; s=arc-20160816; b=SSCeSZF9m2z5BVAZofyHRMiRY9nIcSWMEUnTSh46WOUaSl4fqFq8KfCx5fOw5eH/Li 9ZGOtOV02BifDXRvCYEZnndhOsKdR+MalrfjGYPBPNcWvV7ER1YfvBI7dphxGs8+HmZ/ uL8vcbft1LuLTg9U5rNvCZfEtWAPg/VdUUDEFBqf4BTdgQYMO9t/T9q7AUsI5DOCmM+N dKIlA+SrZPflipSbv0N4tF85FF5WbmVBild1D298VwXGVKBOUuvdjj92ti7Q+T5EJATG 87pBk2X8Skd3aR84+WUdxhvnLsT5i0xh1tuH0ax7KhSlG7Y+p003GsWdnbXtmcudeqdk 8f+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=t8xeJ0HfamQ9NlitkUKGKc6WdQGO097L0NTlmOlRofk=; b=fKOaIe3HF+woflrHkJSwdOd+Yi+MJy7wtBaOZ3/YwwpeJ2A8tUB2PccYyQJ/BDREol I2F/o6h1AWCGvuC/tf/fYORjPPcvZHADjIv8TDZeS17KKXffcY6i4vi9UBC2zNg3uz+p XMCXSI0J1n6I0Q7gY6Q3HoRosd2aJitcKgU80fDOyyYz4Wrz5LY1PQ7vB6UBDDOer3is uILz4Rxxxq1cMxGBZuJBGQvPbWmD7lge6fhKy7mzOW0BBfiH52nRO7LVwE27bQ5S3o2O JLHvGTW5cWh3G2xxruSGJ1FLquGoN3+psjKYUz++NXJe5+6mq9Gk5cJvVH6KPE1KyGcw zg6w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=CuQ980KV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id lp17-20020a17090b4a9100b001ca56c5e70asi2727887pjb.177.2022.04.12.16.38.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Apr 2022 16:38:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=CuQ980KV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 26996B0D06; Tue, 12 Apr 2022 14:29:51 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345295AbiDLDww (ORCPT + 99 others); Mon, 11 Apr 2022 23:52:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48966 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345125AbiDLDwm (ORCPT ); Mon, 11 Apr 2022 23:52:42 -0400 Received: from sin.source.kernel.org (sin.source.kernel.org [IPv6:2604:1380:40e1:4800::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AD79215A3E; Mon, 11 Apr 2022 20:50:25 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id 0EE15CE1B7A; Tue, 12 Apr 2022 03:50:24 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8C3FCC385AB; Tue, 12 Apr 2022 03:50:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1649735422; bh=iCdbG02mgVVYW4+sej5TTWxpnJKtyLVo9d28rf5ycX4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=CuQ980KVhdq4lDk9PnDK9y0/+KIQn9QbchQ3J8QNya2vQmclNBqCyK7voVjO20g8v z/l6SMV8YwY4C6huLvd2OrUvXyyauWUDaZY45Le14bwXImysCCuAZJT1L1h0h1EoIl wIMyFR3y3sRYZkueifaYRnuzRucGBem6IVlDaXODVaywyB7irlwSf703YY0gxuFBRK tgCRQNO/tCk8PilAl/XCB/fl1E0qYgcg9hAO8O5Ru79sRaMlBTNLRM0ckBTG3jW7Ic hXWj89/cANAzDu2wsbw2wkvotrZZz3K488cWuq0QIeEMbR+uIn4qW+3El1MI78DLnu KexZkaR/W9dOQ== From: guoren@kernel.org To: guoren@kernel.org, arnd@arndb.de, palmer@dabbelt.com, mark.rutland@arm.com, will@kernel.org, peterz@infradead.org, boqun.feng@gmail.com Cc: linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Guo Ren Subject: [PATCH V2 2/3] riscv: atomic: Optimize acquire and release for AMO operations Date: Tue, 12 Apr 2022 11:49:56 +0800 Message-Id: <20220412034957.1481088-3-guoren@kernel.org> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220412034957.1481088-1-guoren@kernel.org> References: <20220412034957.1481088-1-guoren@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Guo Ren Current acquire & release implementations from atomic-arch- fallback.h are using __atomic_acquire/release_fence(), it cause another extra "fence r, rw/fence rw,w" instruction after/before AMO instruction. RISC-V AMO instructions could combine acquire and release in the instruction self, we could gain two benefits by using this feature: - Reduce an instruction - Prevent strong "fence r/fence ,w" which are used to protect "lr/sc". Signed-off-by: Guo Ren Signed-off-by: Guo Ren Cc: Palmer Dabbelt Cc: Mark Rutland --- arch/riscv/include/asm/atomic.h | 64 ++++++++++++++++++++++++++++++++ arch/riscv/include/asm/cmpxchg.h | 12 ++---- 2 files changed, 68 insertions(+), 8 deletions(-) diff --git a/arch/riscv/include/asm/atomic.h b/arch/riscv/include/asm/atomic.h index ac9bdf4fc404..20ce8b83bc18 100644 --- a/arch/riscv/include/asm/atomic.h +++ b/arch/riscv/include/asm/atomic.h @@ -99,6 +99,30 @@ c_type arch_atomic##prefix##_fetch_##op##_relaxed(c_type i, \ return ret; \ } \ static __always_inline \ +c_type arch_atomic##prefix##_fetch_##op##_acquire(c_type i, \ + atomic##prefix##_t *v) \ +{ \ + register c_type ret; \ + __asm__ __volatile__ ( \ + " amo" #asm_op "." #asm_type ".aq %1, %2, %0" \ + : "+A" (v->counter), "=r" (ret) \ + : "r" (I) \ + : "memory"); \ + return ret; \ +} \ +static __always_inline \ +c_type arch_atomic##prefix##_fetch_##op##_release(c_type i, \ + atomic##prefix##_t *v) \ +{ \ + register c_type ret; \ + __asm__ __volatile__ ( \ + " amo" #asm_op "." #asm_type ".rl %1, %2, %0" \ + : "+A" (v->counter), "=r" (ret) \ + : "r" (I) \ + : "memory"); \ + return ret; \ +} \ +static __always_inline \ c_type arch_atomic##prefix##_fetch_##op(c_type i, atomic##prefix##_t *v) \ { \ register c_type ret; \ @@ -118,6 +142,18 @@ c_type arch_atomic##prefix##_##op##_return_relaxed(c_type i, \ return arch_atomic##prefix##_fetch_##op##_relaxed(i, v) c_op I; \ } \ static __always_inline \ +c_type arch_atomic##prefix##_##op##_return_acquire(c_type i, \ + atomic##prefix##_t *v) \ +{ \ + return arch_atomic##prefix##_fetch_##op##_acquire(i, v) c_op I; \ +} \ +static __always_inline \ +c_type arch_atomic##prefix##_##op##_return_release(c_type i, \ + atomic##prefix##_t *v) \ +{ \ + return arch_atomic##prefix##_fetch_##op##_release(i, v) c_op I; \ +} \ +static __always_inline \ c_type arch_atomic##prefix##_##op##_return(c_type i, atomic##prefix##_t *v) \ { \ return arch_atomic##prefix##_fetch_##op(i, v) c_op I; \ @@ -140,22 +176,38 @@ ATOMIC_OPS(sub, add, +, -i) #define arch_atomic_add_return_relaxed arch_atomic_add_return_relaxed #define arch_atomic_sub_return_relaxed arch_atomic_sub_return_relaxed +#define arch_atomic_add_return_acquire arch_atomic_add_return_acquire +#define arch_atomic_sub_return_acquire arch_atomic_sub_return_acquire +#define arch_atomic_add_return_release arch_atomic_add_return_release +#define arch_atomic_sub_return_release arch_atomic_sub_return_release #define arch_atomic_add_return arch_atomic_add_return #define arch_atomic_sub_return arch_atomic_sub_return #define arch_atomic_fetch_add_relaxed arch_atomic_fetch_add_relaxed #define arch_atomic_fetch_sub_relaxed arch_atomic_fetch_sub_relaxed +#define arch_atomic_fetch_add_acquire arch_atomic_fetch_add_acquire +#define arch_atomic_fetch_sub_acquire arch_atomic_fetch_sub_acquire +#define arch_atomic_fetch_add_release arch_atomic_fetch_add_release +#define arch_atomic_fetch_sub_release arch_atomic_fetch_sub_release #define arch_atomic_fetch_add arch_atomic_fetch_add #define arch_atomic_fetch_sub arch_atomic_fetch_sub #ifndef CONFIG_GENERIC_ATOMIC64 #define arch_atomic64_add_return_relaxed arch_atomic64_add_return_relaxed #define arch_atomic64_sub_return_relaxed arch_atomic64_sub_return_relaxed +#define arch_atomic64_add_return_acquire arch_atomic64_add_return_acquire +#define arch_atomic64_sub_return_acquire arch_atomic64_sub_return_acquire +#define arch_atomic64_add_return_release arch_atomic64_add_return_release +#define arch_atomic64_sub_return_release arch_atomic64_sub_return_release #define arch_atomic64_add_return arch_atomic64_add_return #define arch_atomic64_sub_return arch_atomic64_sub_return #define arch_atomic64_fetch_add_relaxed arch_atomic64_fetch_add_relaxed #define arch_atomic64_fetch_sub_relaxed arch_atomic64_fetch_sub_relaxed +#define arch_atomic64_fetch_add_acquire arch_atomic64_fetch_add_acquire +#define arch_atomic64_fetch_sub_acquire arch_atomic64_fetch_sub_acquire +#define arch_atomic64_fetch_add_release arch_atomic64_fetch_add_release +#define arch_atomic64_fetch_sub_release arch_atomic64_fetch_sub_release #define arch_atomic64_fetch_add arch_atomic64_fetch_add #define arch_atomic64_fetch_sub arch_atomic64_fetch_sub #endif @@ -178,6 +230,12 @@ ATOMIC_OPS(xor, xor, i) #define arch_atomic_fetch_and_relaxed arch_atomic_fetch_and_relaxed #define arch_atomic_fetch_or_relaxed arch_atomic_fetch_or_relaxed #define arch_atomic_fetch_xor_relaxed arch_atomic_fetch_xor_relaxed +#define arch_atomic_fetch_and_acquire arch_atomic_fetch_and_acquire +#define arch_atomic_fetch_or_acquire arch_atomic_fetch_or_acquire +#define arch_atomic_fetch_xor_acquire arch_atomic_fetch_xor_acquire +#define arch_atomic_fetch_and_release arch_atomic_fetch_and_release +#define arch_atomic_fetch_or_release arch_atomic_fetch_or_release +#define arch_atomic_fetch_xor_release arch_atomic_fetch_xor_release #define arch_atomic_fetch_and arch_atomic_fetch_and #define arch_atomic_fetch_or arch_atomic_fetch_or #define arch_atomic_fetch_xor arch_atomic_fetch_xor @@ -186,6 +244,12 @@ ATOMIC_OPS(xor, xor, i) #define arch_atomic64_fetch_and_relaxed arch_atomic64_fetch_and_relaxed #define arch_atomic64_fetch_or_relaxed arch_atomic64_fetch_or_relaxed #define arch_atomic64_fetch_xor_relaxed arch_atomic64_fetch_xor_relaxed +#define arch_atomic64_fetch_and_acquire arch_atomic64_fetch_and_acquire +#define arch_atomic64_fetch_or_acquire arch_atomic64_fetch_or_acquire +#define arch_atomic64_fetch_xor_acquire arch_atomic64_fetch_xor_acquire +#define arch_atomic64_fetch_and_release arch_atomic64_fetch_and_release +#define arch_atomic64_fetch_or_release arch_atomic64_fetch_or_release +#define arch_atomic64_fetch_xor_release arch_atomic64_fetch_xor_release #define arch_atomic64_fetch_and arch_atomic64_fetch_and #define arch_atomic64_fetch_or arch_atomic64_fetch_or #define arch_atomic64_fetch_xor arch_atomic64_fetch_xor diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h index 12debce235e5..1af8db92250b 100644 --- a/arch/riscv/include/asm/cmpxchg.h +++ b/arch/riscv/include/asm/cmpxchg.h @@ -52,16 +52,14 @@ switch (size) { \ case 4: \ __asm__ __volatile__ ( \ - " amoswap.w %0, %2, %1\n" \ - RISCV_ACQUIRE_BARRIER \ + " amoswap.w.aq %0, %2, %1\n" \ : "=r" (__ret), "+A" (*__ptr) \ : "r" (__new) \ : "memory"); \ break; \ case 8: \ __asm__ __volatile__ ( \ - " amoswap.d %0, %2, %1\n" \ - RISCV_ACQUIRE_BARRIER \ + " amoswap.d.aq %0, %2, %1\n" \ : "=r" (__ret), "+A" (*__ptr) \ : "r" (__new) \ : "memory"); \ @@ -87,16 +85,14 @@ switch (size) { \ case 4: \ __asm__ __volatile__ ( \ - RISCV_RELEASE_BARRIER \ - " amoswap.w %0, %2, %1\n" \ + " amoswap.w.rl %0, %2, %1\n" \ : "=r" (__ret), "+A" (*__ptr) \ : "r" (__new) \ : "memory"); \ break; \ case 8: \ __asm__ __volatile__ ( \ - RISCV_RELEASE_BARRIER \ - " amoswap.d %0, %2, %1\n" \ + " amoswap.d.rl %0, %2, %1\n" \ : "=r" (__ret), "+A" (*__ptr) \ : "r" (__new) \ : "memory"); \ -- 2.25.1