Received: by 2002:a05:7412:85a1:b0:e2:908c:2ebd with SMTP id n33csp179082rdh; Mon, 30 Oct 2023 19:02:25 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGlM2deG8m7RBD+UCvNnzHD8ZDHZVgmEGNVMc2P4EdborPfB1+I5DS0WkAahFmbC/oNHjEh X-Received: by 2002:a17:90b:1a8d:b0:274:616e:3fc4 with SMTP id ng13-20020a17090b1a8d00b00274616e3fc4mr8319930pjb.34.1698717745063; Mon, 30 Oct 2023 19:02:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698717745; cv=none; d=google.com; s=arc-20160816; b=v4ztEhjtQZWhY8mW3UhASZIugU/1S6DjXBqq2MNUtT6SwxhzcN9QPZCQU2ddHPcGgD 5IZpLgF6xCgWbXXIFBFtWPiSiSSEvI3CYMfS4Um2ZpTMtDUrRTEbnO5UE8/R7x7Uy7K3 GbjR6vOGDtIRmM3nr/2mWrrRD/hVnJQoSW8nG9yxLwhrnEakUDriHC3RJn2HfAa45uDd MbeQ3aWd9up7gAMYC7bE3HxHxj8BNmPea9dXzBi9V2wTHd5ROPr9RMlZ8UCKG+zJd6TE u1pxMvmljXwZxftjvM/LiFEY1KngXHoHF7B9w2OkmxfCrIfsbEbXE2d/JbwkRbUYW/tv DQJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=yszt55EnKWGxg6KcB1UQhlAo9GCdqqf2M9pNnFPrkT0=; fh=BOdb9oreCY4zcCha2v0ATpBZaT7ggyvmsqnkOUKA4KQ=; b=PgVowy66iymk6QPX2J9M1DiRR8fzC9zXwrigWvXM/vf00xBZSE2J49JJhgTk1OUViJ j/N7bUOJBhoU6AUN0sa6ar7yZTP6+1sWSgRgme/UG1x+W/NjVx9B5Wv1LyVbtgW8qtkg Kb5WCXQAn148mqe88JoKkntTNRUrjymsOJZ0REojvwBkVkVuHowD52jDTrYvwW41NUof EviyUjJZBB7L4wK8833sJv3RE761+SlBHxc+F0CoaQsHsIwuTXUWMJF6bYGZG2mm0TJw 0iHlucUngrT7OqU2X/vK+c2CbnVgTxr6BMBSH9PRUmmAHacnRTM1fcpxVk+5WUdLiOef 483Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=15itxU1C; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from pete.vger.email (pete.vger.email. [23.128.96.36]) by mx.google.com with ESMTPS id fr12-20020a17090ae2cc00b0028039e55582si223816pjb.96.2023.10.30.19.02.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Oct 2023 19:02:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) client-ip=23.128.96.36; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20230601.gappssmtp.com header.s=20230601 header.b=15itxU1C; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 70E3E80413BB; Mon, 30 Oct 2023 19:02:21 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229510AbjJaCCN (ORCPT + 99 others); Mon, 30 Oct 2023 22:02:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45176 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229766AbjJaCCL (ORCPT ); Mon, 30 Oct 2023 22:02:11 -0400 Received: from mail-ot1-x32e.google.com (mail-ot1-x32e.google.com [IPv6:2607:f8b0:4864:20::32e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5BAC2E4 for ; Mon, 30 Oct 2023 19:02:08 -0700 (PDT) Received: by mail-ot1-x32e.google.com with SMTP id 46e09a7af769-6ce2cf67be2so3306849a34.2 for ; Mon, 30 Oct 2023 19:02:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1698717727; x=1699322527; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=yszt55EnKWGxg6KcB1UQhlAo9GCdqqf2M9pNnFPrkT0=; b=15itxU1Cg2QxBYkQDNhP4DetFGU3yw1a4Uxz4t+Xr36FhKTeOzafkdq7t8CNblZH/X DIKeZZqcSJT013/TyaeknFLbmTqJ0Zo08+dFvm6gI7tJiYQnjAJY3LQr3ZynT47q4jX6 Auuj/0WAFzNiBP7a/krcw462gXcpMjz15r1s37ULekBV0RIQq1ALzCrfKZId0RDzbXdh sjPubCYYZgdUwg2KSzpIK9xahtDThr1nVq5oDHt00DZ4SMmtjC1j2WkTjPwV/7TWAqhk 0MI6qgJO77K4C5j4D0ZV1gtKsvnZ0nBjsbba4ekrWL2a+YH4n6tdC4aMpySlnUo2ZkPm 0vwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698717727; x=1699322527; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=yszt55EnKWGxg6KcB1UQhlAo9GCdqqf2M9pNnFPrkT0=; b=e2WIleamEROeRHdLBozf7PfAncSeOl31SUzj0gX6b3dRkZ92uyEih5E2xcofyqgxUC Cg9VV8oq37SOfoE3ORto38KIv0tkK93jd6p2sWTJPD9THdHccxnIUZ2jjujP/2wYFv8S Gn+6dFVGtRz/nQY0jWtgHViCWt/2DhgQFg3idePXoNsrjawa+UA+5saSZryNuDWU2lsh dKvB2oVjB7DaunVtM+HrEyJsM7LwS++jzj8f0LYJlmCTUM+fa0tDZd2hhYl6d+KNXYv1 iU4V9WFviXlM3RoxqjFNnNMg5L+V1c+6xHzhoIX3SIDK6t12ZAOdUxYHSKBEjSc1HrMI Fzbw== X-Gm-Message-State: AOJu0Yz9zIi+V+uB+amdRICPxUdF1m+RjS+Uys248deTghUoDf5zVd4g pxsJUQFhBQIgJb9A1bZPd6QLhQ== X-Received: by 2002:a9d:7d0c:0:b0:6b7:52ce:ff38 with SMTP id v12-20020a9d7d0c000000b006b752ceff38mr11470032otn.16.1698717727674; Mon, 30 Oct 2023 19:02:07 -0700 (PDT) Received: from ghost ([2601:647:5700:6860:f2bd:1ee:3a71:49a]) by smtp.gmail.com with ESMTPSA id d4-20020a05683025c400b006c7c1868b05sm59121otu.50.2023.10.30.19.02.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Oct 2023 19:02:07 -0700 (PDT) Date: Mon, 30 Oct 2023 19:02:04 -0700 From: Charlie Jenkins To: "Wang, Xiao W" Cc: "paul.walmsley@sifive.com" , "palmer@dabbelt.com" , "aou@eecs.berkeley.edu" , "ardb@kernel.org" , "anup@brainfault.org" , "Li, Haicheng" , "ajones@ventanamicro.com" , "Liu, Yujie" , "linux-riscv@lists.infradead.org" , "linux-efi@vger.kernel.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH v4 2/2] riscv: Optimize bitops with Zbb extension Message-ID: References: <20231030063904.2116277-1-xiao.w.wang@intel.com> <20231030063904.2116277-3-xiao.w.wang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Mon, 30 Oct 2023 19:02:21 -0700 (PDT) On Tue, Oct 31, 2023 at 01:53:55AM +0000, Wang, Xiao W wrote: > > > > -----Original Message----- > > From: Charlie Jenkins > > Sent: Tuesday, October 31, 2023 4:37 AM > > To: Wang, Xiao W > > Cc: paul.walmsley@sifive.com; palmer@dabbelt.com; > > aou@eecs.berkeley.edu; ardb@kernel.org; anup@brainfault.org; Li, Haicheng > > ; ajones@ventanamicro.com; Liu, Yujie > > ; linux-riscv@lists.infradead.org; linux- > > efi@vger.kernel.org; linux-kernel@vger.kernel.org > > Subject: Re: [PATCH v4 2/2] riscv: Optimize bitops with Zbb extension > > > > On Mon, Oct 30, 2023 at 02:39:04PM +0800, Xiao Wang wrote: > > > This patch leverages the alternative mechanism to dynamically optimize > > > bitops (including __ffs, __fls, ffs, fls) with Zbb instructions. When > > > Zbb ext is not supported by the runtime CPU, legacy implementation is > > > used. If Zbb is supported, then the optimized variants will be selected > > > via alternative patching. > > > > > > The legacy bitops support is taken from the generic C implementation as > > > fallback. > > > > > > If the parameter is a build-time constant, we leverage compiler builtin to > > > calculate the result directly, this approach is inspired by x86 bitops > > > implementation. > > > > > > EFI stub runs before the kernel, so alternative mechanism should not be > > > used there, this patch introduces a macro NO_ALTERNATIVE for this > > purpose. > > > > > > Signed-off-by: Xiao Wang > > > --- > > > arch/riscv/include/asm/bitops.h | 255 +++++++++++++++++++++++++- > > > drivers/bitopstest/Kconfig | 1 + > > > drivers/firmware/efi/libstub/Makefile | 2 +- > > > 3 files changed, 254 insertions(+), 4 deletions(-) > > > > > > diff --git a/arch/riscv/include/asm/bitops.h > > b/arch/riscv/include/asm/bitops.h > > > index 3540b690944b..ef35c9ebc2ed 100644 > > > --- a/arch/riscv/include/asm/bitops.h > > > +++ b/arch/riscv/include/asm/bitops.h > > > @@ -15,13 +15,262 @@ > > > #include > > > #include > > > > > > +#if !defined(CONFIG_RISCV_ISA_ZBB) || defined(NO_ALTERNATIVE) > > > #include > > > -#include > > > -#include > > > #include > > > +#include > > > +#include > > > + > > > +#else > > > +#include > > > +#include > > > + > > > +#if (BITS_PER_LONG == 64) > > > +#define CTZW "ctzw " > > > +#define CLZW "clzw " > > > +#elif (BITS_PER_LONG == 32) > > > +#define CTZW "ctz " > > > +#define CLZW "clz " > > > +#else > > > +#error "Unexpected BITS_PER_LONG" > > > +#endif > > > + > > > +static __always_inline unsigned long variable__ffs(unsigned long word) > > > +{ > > > + int num; > > > + > > > + asm_volatile_goto( > > > + ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) > > > + : : : : legacy); > > > + > > > > On this and following asm blocks, checkpatch outputs: "Lines should not > > end with a '('". > > I did below check, but I got no warning. > # ./scripts/checkpatch.pl v4-0002-riscv-Optimize-bitops-with-Zbb-extension.patch > total: 0 errors, 0 warnings, 280 lines checked > May I know how you do the check? > BTW, I see arch/riscv/include/asm/jump_label.h and arch/riscv/include/asm/cpufeature.h have similar code. I normally use the --strict flag since that is what the Patchwork server uses. > > > > > > + asm volatile ( > > > + ".option push\n" > > > + ".option arch,+zbb\n" > > > + "ctz %0, %1\n" > > > + ".option pop\n" > > > + : "=r" (word) : "r" (word) :); > > > + > > > + return word; > > > + > > > +legacy: > > > + num = 0; > > > +#if BITS_PER_LONG == 64 > > > + if ((word & 0xffffffff) == 0) { > > > + num += 32; > > > + word >>= 32; > > > + } > > > +#endif > > > + if ((word & 0xffff) == 0) { > > > + num += 16; > > > + word >>= 16; > > > + } > > > + if ((word & 0xff) == 0) { > > > + num += 8; > > > + word >>= 8; > > > + } > > > + if ((word & 0xf) == 0) { > > > + num += 4; > > > + word >>= 4; > > > + } > > > + if ((word & 0x3) == 0) { > > > + num += 2; > > > + word >>= 2; > > > + } > > > + if ((word & 0x1) == 0) > > > + num += 1; > > > + return num; > > > +} > > > + > > > +/** > > > + * __ffs - find first set bit in a long word > > > + * @word: The word to search > > > + * > > > + * Undefined if no set bit exists, so code should check against 0 first. > > > + */ > > > +#define __ffs(word) \ > > > + (__builtin_constant_p(word) ? \ > > > + (unsigned long)__builtin_ctzl(word) : \ > > > + variable__ffs(word)) > > > + > > > +static __always_inline unsigned long variable__fls(unsigned long word) > > > +{ > > > + int num; > > > + > > > + asm_volatile_goto( > > > + ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) > > > + : : : : legacy); > > > + > > > + asm volatile ( > > > + ".option push\n" > > > + ".option arch,+zbb\n" > > > + "clz %0, %1\n" > > > + ".option pop\n" > > > + : "=r" (word) : "r" (word) :); > > > + > > > + return BITS_PER_LONG - 1 - word; > > > + > > > +legacy: > > > + num = BITS_PER_LONG - 1; > > > +#if BITS_PER_LONG == 64 > > > + if (!(word & (~0ul << 32))) { > > > + num -= 32; > > > + word <<= 32; > > > + } > > > +#endif > > > + if (!(word & (~0ul << (BITS_PER_LONG-16)))) { > > > + num -= 16; > > > + word <<= 16; > > > + } > > > + if (!(word & (~0ul << (BITS_PER_LONG-8)))) { > > > + num -= 8; > > > + word <<= 8; > > > + } > > > + if (!(word & (~0ul << (BITS_PER_LONG-4)))) { > > > + num -= 4; > > > + word <<= 4; > > > + } > > > + if (!(word & (~0ul << (BITS_PER_LONG-2)))) { > > > + num -= 2; > > > + word <<= 2; > > > + } > > > + if (!(word & (~0ul << (BITS_PER_LONG-1)))) > > > + num -= 1; > > > + return num; > > > +} > > > + > > > +/** > > > + * __fls - find last set bit in a long word > > > + * @word: the word to search > > > + * > > > + * Undefined if no set bit exists, so code should check against 0 first. > > > + */ > > > +#define __fls(word) \ > > > + (__builtin_constant_p(word) ? \ > > > + (unsigned long)(BITS_PER_LONG - 1 - __builtin_clzl(word)) : \ > > > + variable__fls(word)) > > > + > > > +static __always_inline int variable_ffs(int x) > > > +{ > > > + int r; > > > + > > > + if (!x) > > > + return 0; > > > + > > > + asm_volatile_goto( > > > + ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) > > > + : : : : legacy); > > > + > > > + asm volatile ( > > > + ".option push\n" > > > + ".option arch,+zbb\n" > > > + CTZW "%0, %1\n" > > > + ".option pop\n" > > > + : "=r" (r) : "r" (x) :); > > > + > > > + return r + 1; > > > + > > > +legacy: > > > + r = 1; > > > + if (!(x & 0xffff)) { > > > + x >>= 16; > > > + r += 16; > > > + } > > > + if (!(x & 0xff)) { > > > + x >>= 8; > > > + r += 8; > > > + } > > > + if (!(x & 0xf)) { > > > + x >>= 4; > > > + r += 4; > > > + } > > > + if (!(x & 3)) { > > > + x >>= 2; > > > + r += 2; > > > + } > > > + if (!(x & 1)) { > > > + x >>= 1; > > > + r += 1; > > > + } > > > + return r; > > > +} > > > + > > > +/** > > > + * ffs - find first set bit in a word > > > + * @x: the word to search > > > + * > > > + * This is defined the same way as the libc and compiler builtin ffs routines. > > > + * > > > + * ffs(value) returns 0 if value is 0 or the position of the first set bit if > > > + * value is nonzero. The first (least significant) bit is at position 1. > > > + */ > > > +#define ffs(x) (__builtin_constant_p(x) ? __builtin_ffs(x) : variable_ffs(x)) > > > + > > > +static __always_inline int variable_fls(unsigned int x) > > > +{ > > > + int r; > > > + > > > + if (!x) > > > + return 0; > > > + > > > + asm_volatile_goto( > > > + ALTERNATIVE("j %l[legacy]", "nop", 0, RISCV_ISA_EXT_ZBB, 1) > > > + : : : : legacy); > > > + > > > + asm volatile ( > > > + ".option push\n" > > > + ".option arch,+zbb\n" > > > + CLZW "%0, %1\n" > > > + ".option pop\n" > > > + : "=r" (r) : "r" (x) :); > > > + > > > + return 32 - r; > > > + > > > +legacy: > > > + r = 32; > > > + if (!(x & 0xffff0000u)) { > > > + x <<= 16; > > > + r -= 16; > > > + } > > > + if (!(x & 0xff000000u)) { > > > + x <<= 8; > > > + r -= 8; > > > + } > > > + if (!(x & 0xf0000000u)) { > > > + x <<= 4; > > > + r -= 4; > > > + } > > > + if (!(x & 0xc0000000u)) { > > > + x <<= 2; > > > + r -= 2; > > > + } > > > + if (!(x & 0x80000000u)) { > > > + x <<= 1; > > > + r -= 1; > > > + } > > > + return r; > > > +} > > > + > > > +/** > > > + * fls - find last set bit in a word > > > + * @x: the word to search > > > + * > > > + * This is defined in a similar way as ffs, but returns the position of the most > > > + * significant set bit. > > > + * > > > + * fls(value) returns 0 if value is 0 or the position of the last set bit if > > > + * value is nonzero. The last (most significant) bit is at position 32. > > > + */ > > > +#define fls(x) > > \ > > > + (__builtin_constant_p(x) ? \ > > > + (int)(((x) != 0) ? \ > > > + (sizeof(unsigned int) * 8 - __builtin_clz(x)) : 0) : \ > > > + variable_fls(x)) > > > + > > > > Checkpath complains: "Macro argument reuse 'x' - possible side-effects" > > > > Ditto. > > > > +#endif /* !defined(CONFIG_RISCV_ISA_ZBB) || defined(NO_ALTERNATIVE) > > */ > > > + > > > +#include > > > #include > > > #include > > > -#include > > > > > > #include > > > > > > diff --git a/drivers/bitopstest/Kconfig b/drivers/bitopstest/Kconfig > > > index d0e2af4b801e..6ef6dcd41d49 100644 > > > --- a/drivers/bitopstest/Kconfig > > > +++ b/drivers/bitopstest/Kconfig > > > @@ -1,6 +1,7 @@ > > > # SPDX-License-Identifier: GPL-2.0-only > > > menuconfig BITOPSTEST > > > tristate "self test for bitops optimization" > > > + default y > > > help > > > Enable this to test the bitops APIs. > > > > Is this a test you wanted to add? The source code isn't included. > > Sorry, I mistakenly did a "git add" for my local test. Will drop it. > > BRs, > Xiao > > > > > - Charlie In the next version after you remove this Kconfig file you can add my reviewed-by signature to this series. Reviewed-by: Charlie Jenkins > > > > > > > > diff --git a/drivers/firmware/efi/libstub/Makefile > > b/drivers/firmware/efi/libstub/Makefile > > > index a1157c2a7170..d68cacd4e3af 100644 > > > --- a/drivers/firmware/efi/libstub/Makefile > > > +++ b/drivers/firmware/efi/libstub/Makefile > > > @@ -28,7 +28,7 @@ cflags-$(CONFIG_ARM) += - > > DEFI_HAVE_STRLEN -DEFI_HAVE_STRNLEN \ > > > -DEFI_HAVE_MEMCHR - > > DEFI_HAVE_STRRCHR \ > > > -DEFI_HAVE_STRCMP -fno-builtin -fpic \ > > > $(call cc-option,-mno-single-pic-base) > > > -cflags-$(CONFIG_RISCV) += -fpic > > > +cflags-$(CONFIG_RISCV) += -fpic -DNO_ALTERNATIVE > > > cflags-$(CONFIG_LOONGARCH) += -fpie > > > > > > cflags-$(CONFIG_EFI_PARAMS_FROM_FDT) += - > > I$(srctree)/scripts/dtc/libfdt > > > -- > > > 2.25.1 > > >