Received: by 2002:a05:7412:a9a2:b0:e2:908c:2ebd with SMTP id o34csp2546820rdh; Sun, 29 Oct 2023 23:29:58 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEeAvfeCwAXvD3MLjYj84ogybKfTF2+eZ3L9CPJXYQf3xPa2F/MTh1CUmDSK2a89gcgESIv X-Received: by 2002:a05:6a20:3942:b0:16b:79b3:222b with SMTP id r2-20020a056a20394200b0016b79b3222bmr12744589pzg.34.1698647398356; Sun, 29 Oct 2023 23:29:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698647398; cv=none; d=google.com; s=arc-20160816; b=neqoOXhX5qcVyl1NE8bFmh/Rz0rDqr6cxdvbVs1ht5F1grYTIlGs6eu82/4T8GPw9O iptY9zdfBl3Jl27ugR9INGF4mlyzORwXaqIepcB5+Mrur39JmH7vx+ORiIMvKvkcUQE9 8oBI8KWrQVILlHmDa90c0n+YYTfQ5lGqwV3KVV9I1pSCGTHmYLaAQf7Zw4ubYKVddb6N 2TXfVrt1m6OMNF4kJYqZGL3MD5dBG3xxmT7TdYXAm16tHKjAecX+wNqRV07dXRAPVx5X 3sIQ/Fx30QR0SJf1nP/zJSHeoV6p0u/XuRHPbUfFcVyQ0AKX6adMvXnZgaow25mFEFW7 a9cQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=NwueWHFf13uaebTlfW2cTLY2knT1YjDpPn2m5f5xYuE=; fh=RkeGhuPfJLw3dOdz/TW3CR/XPxSdgTwCG50xRVzq3IQ=; b=ZTEKrNb+TDgsB5LCwOFPuOVOPbbOA+YvcChiz7h9x9OKvZ4ZQRpqc2fNFNZQaaqjml GQeiyHcFI1qcPnMIqBx2DfuHdECUgIa2rd/FO5pvU5YanCPjB8TiJAXJ/7MsJSIxEgMa gYLXhz1Mf9EDFvdkePKqu1JBZEx3SkB1jBAZK69PbG+XtnKDtnbh46BPrcj4uvdlmuGC oNK4W2PsOva4YTtwdW+om7CjaloFp21fPVDEPqrfFG8OhC6RC0Xl1AnkNYOagWPzvFeo EPv2czesN3Ifggp5T8Phw24topnUqovJe+SJp0HAveHltPmwZW4NvidkITpERoix1nmu APmw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=R04ZXq8g; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id e23-20020a656797000000b005704979833csi4463179pgr.855.2023.10.29.23.29.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 29 Oct 2023 23:29:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=R04ZXq8g; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 312D1803E7A6; Sun, 29 Oct 2023 23:29:57 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231614AbjJ3G3v (ORCPT + 99 others); Mon, 30 Oct 2023 02:29:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46744 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231458AbjJ3G3m (ORCPT ); Mon, 30 Oct 2023 02:29:42 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E6284D9; Sun, 29 Oct 2023 23:29:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698647375; x=1730183375; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=i0c97lut2Bvllo+XzHeJrjnxRIYvTu8913KH1Ax+Fms=; b=R04ZXq8gmOm0QWrlFj0E+U0hfGrO+0od8v5TsMnD4/74OrJmSD4u332Q 72INIgN0M8xJl4QZpPRviPTfVi4M1oMeRd9fB0IxstDGTdw3JQHwSy7P2 Mu15pwTsFUYHMF7CGBVh/3P+bPXifcHBwVLdLkhnQTKburlPBC7WxvPOZ IuhOuRIOLLTgrEblzKz83+QJPQaBgiWsMtqkklA16h5fjzZ4rdikYSldf uZaXItsinjVXn6QlR26esKR6saxo/wIz+1jpQE9RnyJ2Gm1m1yMzEGB6x w6QbduBKkmC3b4CxfZTWNGqcYOJX0TqOdFuuwB0kpOPGQ89wULTEeaWdb w==; X-IronPort-AV: E=McAfee;i="6600,9927,10878"; a="452278002" X-IronPort-AV: E=Sophos;i="6.03,262,1694761200"; d="scan'208";a="452278002" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Oct 2023 23:29:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10878"; a="903893081" X-IronPort-AV: E=Sophos;i="6.03,262,1694761200"; d="scan'208";a="903893081" Received: from xiao-desktop.sh.intel.com ([10.239.46.158]) by fmsmga001.fm.intel.com with ESMTP; 29 Oct 2023 23:29:31 -0700 From: Xiao Wang To: paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, ardb@kernel.org Cc: anup@brainfault.org, haicheng.li@intel.com, ajones@ventanamicro.com, yujie.liu@intel.com, charlie@rivosinc.com, linux-riscv@lists.infradead.org, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, Xiao Wang Subject: [PATCH v4 2/2] riscv: Optimize bitops with Zbb extension Date: Mon, 30 Oct 2023 14:39:04 +0800 Message-Id: <20231030063904.2116277-3-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231030063904.2116277-1-xiao.w.wang@intel.com> References: <20231030063904.2116277-1-xiao.w.wang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Sun, 29 Oct 2023 23:29:57 -0700 (PDT) This patch leverages the alternative mechanism to dynamically optimize bitops (including __ffs, __fls, ffs, fls) with Zbb instructions. When Zbb ext is not supported by the runtime CPU, legacy implementation is used. If Zbb is supported, then the optimized variants will be selected via alternative patching. The legacy bitops support is taken from the generic C implementation as fallback. If the parameter is a build-time constant, we leverage compiler builtin to calculate the result directly, this approach is inspired by x86 bitops implementation. EFI stub runs before the kernel, so alternative mechanism should not be used there, this patch introduces a macro NO_ALTERNATIVE for this purpose. Signed-off-by: Xiao Wang --- arch/riscv/include/asm/bitops.h | 255 +++++++++++++++++++++++++- drivers/bitopstest/Kconfig | 1 + drivers/firmware/efi/libstub/Makefile | 2 +- 3 files changed, 254 insertions(+), 4 deletions(-) diff --git a/arch/riscv/include/asm/bitops.h b/arch/riscv/include/asm/bitops.h index 3540b690944b..ef35c9ebc2ed 100644 --- a/arch/riscv/include/asm/bitops.h +++ b/arch/riscv/include/asm/bitops.h @@ -15,13 +15,262 @@ #include #include +#if !defined(CONFIG_RISCV_ISA_ZBB) || defined(NO_ALTERNATIVE) #include -#include -#include #include +#include +#include + +#else +#include +#include + +#if (BITS_PER_LONG == 64) +#define CTZW "ctzw " +#define CLZW "clzw " +#elif (BITS_PER_LONG == 32) +#define CTZW "ctz " +#define CLZW "clz " +#else +#error "Unexpected BITS_PER_LONG" +#endif + +static __always_inline unsigned long variable__ffs(unsigned long word) +{ + int num; + + asm_volatile_goto( + ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) + : : : : legacy); + + asm volatile ( + ".option push\n" + ".option arch,+zbb\n" + "ctz %0, %1\n" + ".option pop\n" + : "=r" (word) : "r" (word) :); + + return word; + +legacy: + num = 0; +#if BITS_PER_LONG == 64 + if ((word & 0xffffffff) == 0) { + num += 32; + word >>= 32; + } +#endif + if ((word & 0xffff) == 0) { + num += 16; + word >>= 16; + } + if ((word & 0xff) == 0) { + num += 8; + word >>= 8; + } + if ((word & 0xf) == 0) { + num += 4; + word >>= 4; + } + if ((word & 0x3) == 0) { + num += 2; + word >>= 2; + } + if ((word & 0x1) == 0) + num += 1; + return num; +} + +/** + * __ffs - find first set bit in a long word + * @word: The word to search + * + * Undefined if no set bit exists, so code should check against 0 first. + */ +#define __ffs(word) \ + (__builtin_constant_p(word) ? \ + (unsigned long)__builtin_ctzl(word) : \ + variable__ffs(word)) + +static __always_inline unsigned long variable__fls(unsigned long word) +{ + int num; + + asm_volatile_goto( + ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) + : : : : legacy); + + asm volatile ( + ".option push\n" + ".option arch,+zbb\n" + "clz %0, %1\n" + ".option pop\n" + : "=r" (word) : "r" (word) :); + + return BITS_PER_LONG - 1 - word; + +legacy: + num = BITS_PER_LONG - 1; +#if BITS_PER_LONG == 64 + if (!(word & (~0ul << 32))) { + num -= 32; + word <<= 32; + } +#endif + if (!(word & (~0ul << (BITS_PER_LONG-16)))) { + num -= 16; + word <<= 16; + } + if (!(word & (~0ul << (BITS_PER_LONG-8)))) { + num -= 8; + word <<= 8; + } + if (!(word & (~0ul << (BITS_PER_LONG-4)))) { + num -= 4; + word <<= 4; + } + if (!(word & (~0ul << (BITS_PER_LONG-2)))) { + num -= 2; + word <<= 2; + } + if (!(word & (~0ul << (BITS_PER_LONG-1)))) + num -= 1; + return num; +} + +/** + * __fls - find last set bit in a long word + * @word: the word to search + * + * Undefined if no set bit exists, so code should check against 0 first. + */ +#define __fls(word) \ + (__builtin_constant_p(word) ? \ + (unsigned long)(BITS_PER_LONG - 1 - __builtin_clzl(word)) : \ + variable__fls(word)) + +static __always_inline int variable_ffs(int x) +{ + int r; + + if (!x) + return 0; + + asm_volatile_goto( + ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) + : : : : legacy); + + asm volatile ( + ".option push\n" + ".option arch,+zbb\n" + CTZW "%0, %1\n" + ".option pop\n" + : "=r" (r) : "r" (x) :); + + return r + 1; + +legacy: + r = 1; + if (!(x & 0xffff)) { + x >>= 16; + r += 16; + } + if (!(x & 0xff)) { + x >>= 8; + r += 8; + } + if (!(x & 0xf)) { + x >>= 4; + r += 4; + } + if (!(x & 3)) { + x >>= 2; + r += 2; + } + if (!(x & 1)) { + x >>= 1; + r += 1; + } + return r; +} + +/** + * ffs - find first set bit in a word + * @x: the word to search + * + * This is defined the same way as the libc and compiler builtin ffs routines. + * + * ffs(value) returns 0 if value is 0 or the position of the first set bit if + * value is nonzero. The first (least significant) bit is at position 1. + */ +#define ffs(x) (__builtin_constant_p(x) ? __builtin_ffs(x) : variable_ffs(x)) + +static __always_inline int variable_fls(unsigned int x) +{ + int r; + + if (!x) + return 0; + + asm_volatile_goto( + ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) + : : : : legacy); + + asm volatile ( + ".option push\n" + ".option arch,+zbb\n" + CLZW "%0, %1\n" + ".option pop\n" + : "=r" (r) : "r" (x) :); + + return 32 - r; + +legacy: + r = 32; + if (!(x & 0xffff0000u)) { + x <<= 16; + r -= 16; + } + if (!(x & 0xff000000u)) { + x <<= 8; + r -= 8; + } + if (!(x & 0xf0000000u)) { + x <<= 4; + r -= 4; + } + if (!(x & 0xc0000000u)) { + x <<= 2; + r -= 2; + } + if (!(x & 0x80000000u)) { + x <<= 1; + r -= 1; + } + return r; +} + +/** + * fls - find last set bit in a word + * @x: the word to search + * + * This is defined in a similar way as ffs, but returns the position of the most + * significant set bit. + * + * fls(value) returns 0 if value is 0 or the position of the last set bit if + * value is nonzero. The last (most significant) bit is at position 32. + */ +#define fls(x) \ + (__builtin_constant_p(x) ? \ + (int)(((x) != 0) ? \ + (sizeof(unsigned int) * 8 - __builtin_clz(x)) : 0) : \ + variable_fls(x)) + +#endif /* !defined(CONFIG_RISCV_ISA_ZBB) || defined(NO_ALTERNATIVE) */ + +#include #include #include -#include #include diff --git a/drivers/bitopstest/Kconfig b/drivers/bitopstest/Kconfig index d0e2af4b801e..6ef6dcd41d49 100644 --- a/drivers/bitopstest/Kconfig +++ b/drivers/bitopstest/Kconfig @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only menuconfig BITOPSTEST tristate "self test for bitops optimization" + default y help Enable this to test the bitops APIs. diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile index a1157c2a7170..d68cacd4e3af 100644 --- a/drivers/firmware/efi/libstub/Makefile +++ b/drivers/firmware/efi/libstub/Makefile @@ -28,7 +28,7 @@ cflags-$(CONFIG_ARM) += -DEFI_HAVE_STRLEN -DEFI_HAVE_STRNLEN \ -DEFI_HAVE_MEMCHR -DEFI_HAVE_STRRCHR \ -DEFI_HAVE_STRCMP -fno-builtin -fpic \ $(call cc-option,-mno-single-pic-base) -cflags-$(CONFIG_RISCV) += -fpic +cflags-$(CONFIG_RISCV) += -fpic -DNO_ALTERNATIVE cflags-$(CONFIG_LOONGARCH) += -fpie cflags-$(CONFIG_EFI_PARAMS_FROM_FDT) += -I$(srctree)/scripts/dtc/libfdt -- 2.25.1