Received: by 2002:a05:7412:8521:b0:e2:908c:2ebd with SMTP id t33csp1986509rdf; Mon, 6 Nov 2023 00:48:08 -0800 (PST) X-Google-Smtp-Source: AGHT+IGODWQiZMGxCNbWtRnMTeVFkayOIQvJw8lngCm4pdNL49CzoOa9DcXUkAG49WZex4jDT5Ll X-Received: by 2002:a05:6a20:12c5:b0:160:58f5:693b with SMTP id v5-20020a056a2012c500b0016058f5693bmr25397794pzg.44.1699260488239; Mon, 06 Nov 2023 00:48:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699260488; cv=none; d=google.com; s=arc-20160816; b=u9boIb7qxXceC2PL+TyVsXPN7xAXZGvHZ3DKANHX0lvsnJVufzvIkuX31hS5K+NF93 X9XV4vmCzZ3Tue77mFGqEzQbvy35XX3VM4gm2ulABGVtvrGw1Qc3wOtuwfQ9JssJkOTm 2jZkLh/+QVKHTour4WJJ27xkIYo6P5aKZX/7P7WGaNc4HyK5ySvdRhyaTVSPqkLlBlXd yBjXJ2uyuOFOM/V74fygDU7Gg5nIWbKmEkGkSnX1X53N23C6boTWdFzZI6Qc+GIz41Sz 7wMDwIDQb5De4Sc+peERcXB1+m7XsYicHACOHPBlFvhJyRxYImZBn1lkoyzANKYQPHCh fzhw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=5H38ptlFlqdZQvXWUpABZ9sqy1v2HNAbONu6jrsunHs=; fh=SfFqCZvKd/WsFaODeekaK4K28RBTkW6GipAlnKCEI4Y=; b=u5CTur9tlKcB0MnctsLsCK22X717hLoWp0f+Z4MwDaJuEAhQw4eWQpDJVAAuZ8+ay0 SM9T+UnZGCtxNhopJHE+swfHx4nB/nOYlJAW503vb4+0SjRINuP5e2ebBI81Ct5U61yy Czc0tZDqVH1OLnFAMvYXh8wCBniV/bL7U0ENCUz2jV1CIxTP7CFlqbEuNk3+6P/N/Zra lRWFQS24oZCE2J0p+PFNF5s5HlCBwsIPfi0h/VsgrfaljahZL4V6bSTmjA3X2QjZFNEB 5qrv0EdeSwhjwz/BlEynn5a0RL8hhcpf+SZICcmOgOGN//F9d1Exiio9pzRrpl3+BrZ0 JCJg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id b1-20020a170903228100b001c3fa95ca18si8242250plh.333.2023.11.06.00.48.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Nov 2023 00:48:08 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 4F07180B026F; Mon, 6 Nov 2023 00:48:05 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231289AbjKFIry (ORCPT + 99 others); Mon, 6 Nov 2023 03:47:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54682 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231285AbjKFIrw (ORCPT ); Mon, 6 Nov 2023 03:47:52 -0500 Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 14E74EA for ; Mon, 6 Nov 2023 00:47:47 -0800 (PST) Received: from loongson.cn (unknown [112.22.233.25]) by gateway (Coremail) with SMTP id _____8DxBfEyqEhlmEQ3AA--.43271S3; Mon, 06 Nov 2023 16:47:46 +0800 (CST) Received: from localhost.localdomain (unknown [112.22.233.25]) by localhost.localdomain (Coremail) with SMTP id AQAAf8BxE+QtqEhl0y86AA--.62988S2; Mon, 06 Nov 2023 16:47:44 +0800 (CST) From: WANG Rui To: Huacai Chen , Will Deacon , Peter Zijlstra Cc: WANG Xuerui , Boqun Feng , Mark Rutland , linux-kernel@vger.kernel.org, loongarch@lists.linux.dev, loongson-kernel@lists.loongnix.cn, WANG Rui Subject: [PATCH] LoongArch: Relax memory ordering for atomic operations Date: Mon, 6 Nov 2023 16:47:34 +0800 Message-ID: <20231106084734.203243-1-wangrui@loongson.cn> X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID: AQAAf8BxE+QtqEhl0y86AA--.62988S2 X-CM-SenderInfo: pzdqw2txl6z05rqj20fqof0/ X-Coremail-Antispam: 1Uk129KBj93XoW3Xry8XrW3Cr1xZF4UWr1fZrc_yoWDJFWfp3 y09F98tF45Xay5G3yvyan8W345Jr1YvryqqryYyr9ruFy2kwnxJ3W8XF1vvr1Utw48Ka1r Gr4jkayUWFn2vwcCm3ZEXasCq-sJn29KB7ZKAUJUUUUr529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUU9Yb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r106r15M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Jr0_JF4l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Jr0_Gr1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AK xVW8Jr0_Cr1UM2kKe7AKxVWUXVWUAwAS0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07 AIYIkI8VC2zVCFFI0UMc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWU XVWUAwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7V AKI48JMxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMxCIbckI1I0E14v2 6r1Y6r17MI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17 CEb7AF67AKxVWUtVW8ZwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF 0xvE2Ix0cI8IcVCY1x0267AKxVWUJVW8JwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIx AIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVWUJVW8JbIYCTnIWIev Ja73UjIFyTuYvjxU2G-eUUUUU X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Mon, 06 Nov 2023 00:48:05 -0800 (PST) This patch relaxes the implementation while satisfying the memory ordering requirements for atomic operations, which will help improve performance on LA664+. Unixbench with full threads (8) before after Dhrystone 2 using register variables 203910714.2 203909539.8 0.00% Double-Precision Whetstone 37930.9 37931 0.00% Execl Throughput 29431.5 29545.8 0.39% File Copy 1024 bufsize 2000 maxblocks 6645759.5 6676320 0.46% File Copy 256 bufsize 500 maxblocks 2138772.4 2144182.4 0.25% File Copy 4096 bufsize 8000 maxblocks 11640698.4 11602703 -0.33% Pipe Throughput 8849077.7 8917009.4 0.77% Pipe-based Context Switching 1255108.5 1287277.3 2.56% Process Creation 50825.9 50442.1 -0.76% Shell Scripts (1 concurrent) 25795.8 25942.3 0.57% Shell Scripts (8 concurrent) 3812.6 3835.2 0.59% System Call Overhead 9248212.6 9353348.6 1.14% ======= System Benchmarks Index Score 8076.6 8114.4 0.47% Signed-off-by: WANG Rui --- arch/loongarch/include/asm/atomic.h | 88 ++++++++++++++++++++++------- 1 file changed, 68 insertions(+), 20 deletions(-) diff --git a/arch/loongarch/include/asm/atomic.h b/arch/loongarch/include/asm/atomic.h index e27f0c72d324..99af8b3160a8 100644 --- a/arch/loongarch/include/asm/atomic.h +++ b/arch/loongarch/include/asm/atomic.h @@ -36,19 +36,19 @@ static inline void arch_atomic_##op(int i, atomic_t *v) \ { \ __asm__ __volatile__( \ - "am"#asm_op"_db.w" " $zero, %1, %0 \n" \ + "am"#asm_op".w" " $zero, %1, %0 \n" \ : "+ZB" (v->counter) \ : "r" (I) \ : "memory"); \ } -#define ATOMIC_OP_RETURN(op, I, asm_op, c_op) \ -static inline int arch_atomic_##op##_return_relaxed(int i, atomic_t *v) \ +#define ATOMIC_OP_RETURN(op, I, asm_op, c_op, mb, suffix) \ +static inline int arch_atomic_##op##_return##suffix(int i, atomic_t *v) \ { \ int result; \ \ __asm__ __volatile__( \ - "am"#asm_op"_db.w" " %1, %2, %0 \n" \ + "am"#asm_op#mb".w" " %1, %2, %0 \n" \ : "+ZB" (v->counter), "=&r" (result) \ : "r" (I) \ : "memory"); \ @@ -56,13 +56,13 @@ static inline int arch_atomic_##op##_return_relaxed(int i, atomic_t *v) \ return result c_op I; \ } -#define ATOMIC_FETCH_OP(op, I, asm_op) \ -static inline int arch_atomic_fetch_##op##_relaxed(int i, atomic_t *v) \ +#define ATOMIC_FETCH_OP(op, I, asm_op, mb, suffix) \ +static inline int arch_atomic_fetch_##op##suffix(int i, atomic_t *v) \ { \ int result; \ \ __asm__ __volatile__( \ - "am"#asm_op"_db.w" " %1, %2, %0 \n" \ + "am"#asm_op#mb".w" " %1, %2, %0 \n" \ : "+ZB" (v->counter), "=&r" (result) \ : "r" (I) \ : "memory"); \ @@ -72,29 +72,53 @@ static inline int arch_atomic_fetch_##op##_relaxed(int i, atomic_t *v) \ #define ATOMIC_OPS(op, I, asm_op, c_op) \ ATOMIC_OP(op, I, asm_op) \ - ATOMIC_OP_RETURN(op, I, asm_op, c_op) \ - ATOMIC_FETCH_OP(op, I, asm_op) + ATOMIC_OP_RETURN(op, I, asm_op, c_op, _db, ) \ + ATOMIC_OP_RETURN(op, I, asm_op, c_op, , _relaxed) \ + ATOMIC_FETCH_OP(op, I, asm_op, _db, ) \ + ATOMIC_FETCH_OP(op, I, asm_op, , _relaxed) ATOMIC_OPS(add, i, add, +) ATOMIC_OPS(sub, -i, add, +) +#define arch_atomic_add_return arch_atomic_add_return +#define arch_atomic_add_return_acquire arch_atomic_add_return +#define arch_atomic_add_return_release arch_atomic_add_return #define arch_atomic_add_return_relaxed arch_atomic_add_return_relaxed +#define arch_atomic_sub_return arch_atomic_sub_return +#define arch_atomic_sub_return_acquire arch_atomic_sub_return +#define arch_atomic_sub_return_release arch_atomic_sub_return #define arch_atomic_sub_return_relaxed arch_atomic_sub_return_relaxed +#define arch_atomic_fetch_add arch_atomic_fetch_add +#define arch_atomic_fetch_add_acquire arch_atomic_fetch_add +#define arch_atomic_fetch_add_release arch_atomic_fetch_add #define arch_atomic_fetch_add_relaxed arch_atomic_fetch_add_relaxed +#define arch_atomic_fetch_sub arch_atomic_fetch_sub +#define arch_atomic_fetch_sub_acquire arch_atomic_fetch_sub +#define arch_atomic_fetch_sub_release arch_atomic_fetch_sub #define arch_atomic_fetch_sub_relaxed arch_atomic_fetch_sub_relaxed #undef ATOMIC_OPS #define ATOMIC_OPS(op, I, asm_op) \ ATOMIC_OP(op, I, asm_op) \ - ATOMIC_FETCH_OP(op, I, asm_op) + ATOMIC_FETCH_OP(op, I, asm_op, _db, ) \ + ATOMIC_FETCH_OP(op, I, asm_op, , _relaxed) ATOMIC_OPS(and, i, and) ATOMIC_OPS(or, i, or) ATOMIC_OPS(xor, i, xor) +#define arch_atomic_fetch_and arch_atomic_fetch_and +#define arch_atomic_fetch_and_acquire arch_atomic_fetch_and +#define arch_atomic_fetch_and_release arch_atomic_fetch_and #define arch_atomic_fetch_and_relaxed arch_atomic_fetch_and_relaxed +#define arch_atomic_fetch_or arch_atomic_fetch_or +#define arch_atomic_fetch_or_acquire arch_atomic_fetch_or +#define arch_atomic_fetch_or_release arch_atomic_fetch_or #define arch_atomic_fetch_or_relaxed arch_atomic_fetch_or_relaxed +#define arch_atomic_fetch_xor arch_atomic_fetch_xor +#define arch_atomic_fetch_xor_acquire arch_atomic_fetch_xor +#define arch_atomic_fetch_xor_release arch_atomic_fetch_xor #define arch_atomic_fetch_xor_relaxed arch_atomic_fetch_xor_relaxed #undef ATOMIC_OPS @@ -172,18 +196,18 @@ static inline int arch_atomic_sub_if_positive(int i, atomic_t *v) static inline void arch_atomic64_##op(long i, atomic64_t *v) \ { \ __asm__ __volatile__( \ - "am"#asm_op"_db.d " " $zero, %1, %0 \n" \ + "am"#asm_op".d " " $zero, %1, %0 \n" \ : "+ZB" (v->counter) \ : "r" (I) \ : "memory"); \ } -#define ATOMIC64_OP_RETURN(op, I, asm_op, c_op) \ -static inline long arch_atomic64_##op##_return_relaxed(long i, atomic64_t *v) \ +#define ATOMIC64_OP_RETURN(op, I, asm_op, c_op, mb, suffix) \ +static inline long arch_atomic64_##op##_return##suffix(long i, atomic64_t *v) \ { \ long result; \ __asm__ __volatile__( \ - "am"#asm_op"_db.d " " %1, %2, %0 \n" \ + "am"#asm_op#mb".d " " %1, %2, %0 \n" \ : "+ZB" (v->counter), "=&r" (result) \ : "r" (I) \ : "memory"); \ @@ -191,13 +215,13 @@ static inline long arch_atomic64_##op##_return_relaxed(long i, atomic64_t *v) \ return result c_op I; \ } -#define ATOMIC64_FETCH_OP(op, I, asm_op) \ -static inline long arch_atomic64_fetch_##op##_relaxed(long i, atomic64_t *v) \ +#define ATOMIC64_FETCH_OP(op, I, asm_op, mb, suffix) \ +static inline long arch_atomic64_fetch_##op##suffix(long i, atomic64_t *v) \ { \ long result; \ \ __asm__ __volatile__( \ - "am"#asm_op"_db.d " " %1, %2, %0 \n" \ + "am"#asm_op#mb".d " " %1, %2, %0 \n" \ : "+ZB" (v->counter), "=&r" (result) \ : "r" (I) \ : "memory"); \ @@ -207,29 +231,53 @@ static inline long arch_atomic64_fetch_##op##_relaxed(long i, atomic64_t *v) \ #define ATOMIC64_OPS(op, I, asm_op, c_op) \ ATOMIC64_OP(op, I, asm_op) \ - ATOMIC64_OP_RETURN(op, I, asm_op, c_op) \ - ATOMIC64_FETCH_OP(op, I, asm_op) + ATOMIC64_OP_RETURN(op, I, asm_op, c_op, _db, ) \ + ATOMIC64_OP_RETURN(op, I, asm_op, c_op, , _relaxed) \ + ATOMIC64_FETCH_OP(op, I, asm_op, _db, ) \ + ATOMIC64_FETCH_OP(op, I, asm_op, , _relaxed) ATOMIC64_OPS(add, i, add, +) ATOMIC64_OPS(sub, -i, add, +) +#define arch_atomic64_add_return arch_atomic64_add_return +#define arch_atomic64_add_return_acquire arch_atomic64_add_return +#define arch_atomic64_add_return_release arch_atomic64_add_return #define arch_atomic64_add_return_relaxed arch_atomic64_add_return_relaxed +#define arch_atomic64_sub_return arch_atomic64_sub_return +#define arch_atomic64_sub_return_acquire arch_atomic64_sub_return +#define arch_atomic64_sub_return_release arch_atomic64_sub_return #define arch_atomic64_sub_return_relaxed arch_atomic64_sub_return_relaxed +#define arch_atomic64_fetch_add arch_atomic64_fetch_add +#define arch_atomic64_fetch_add_acquire arch_atomic64_fetch_add +#define arch_atomic64_fetch_add_release arch_atomic64_fetch_add #define arch_atomic64_fetch_add_relaxed arch_atomic64_fetch_add_relaxed +#define arch_atomic64_fetch_sub arch_atomic64_fetch_sub +#define arch_atomic64_fetch_sub_acquire arch_atomic64_fetch_sub +#define arch_atomic64_fetch_sub_release arch_atomic64_fetch_sub #define arch_atomic64_fetch_sub_relaxed arch_atomic64_fetch_sub_relaxed #undef ATOMIC64_OPS #define ATOMIC64_OPS(op, I, asm_op) \ ATOMIC64_OP(op, I, asm_op) \ - ATOMIC64_FETCH_OP(op, I, asm_op) + ATOMIC64_FETCH_OP(op, I, asm_op, _db, ) \ + ATOMIC64_FETCH_OP(op, I, asm_op, , _relaxed) ATOMIC64_OPS(and, i, and) ATOMIC64_OPS(or, i, or) ATOMIC64_OPS(xor, i, xor) +#define arch_atomic64_fetch_and arch_atomic64_fetch_and +#define arch_atomic64_fetch_and_acquire arch_atomic64_fetch_and +#define arch_atomic64_fetch_and_release arch_atomic64_fetch_and #define arch_atomic64_fetch_and_relaxed arch_atomic64_fetch_and_relaxed +#define arch_atomic64_fetch_or arch_atomic64_fetch_or +#define arch_atomic64_fetch_or_acquire arch_atomic64_fetch_or +#define arch_atomic64_fetch_or_release arch_atomic64_fetch_or #define arch_atomic64_fetch_or_relaxed arch_atomic64_fetch_or_relaxed +#define arch_atomic64_fetch_xor arch_atomic64_fetch_xor +#define arch_atomic64_fetch_xor_acquire arch_atomic64_fetch_xor +#define arch_atomic64_fetch_xor_release arch_atomic64_fetch_xor #define arch_atomic64_fetch_xor_relaxed arch_atomic64_fetch_xor_relaxed #undef ATOMIC64_OPS -- 2.42.1