2023-01-11 17:31:11

by Jisheng Zhang

[permalink] [raw]
Subject: [PATCH v3 00/13] riscv: improve boot time isa extensions handling

Generally, riscv ISA extensions are fixed for any specific hardware
platform, so a hart's features won't change after booting, this
chacteristic makes it straightforward to use a static branch to check
a specific ISA extension is supported or not to optimize performance.

However, some ISA extensions such as SVPBMT and ZICBOM are handled
via. the alternative sequences.

Basically, for ease of maintenance, we prefer to use static branches
in C code, but recently, Samuel found that the static branch usage in
cpu_relax() breaks building with CONFIG_CC_OPTIMIZE_FOR_SIZE[1]. As
Samuel pointed out, "Having a static branch in cpu_relax() is
problematic because that function is widely inlined, including in some
quite complex functions like in the VDSO. A quick measurement shows
this static branch is responsible by itself for around 40% of the jump
table."

Samuel's findings pointed out one of a few downsides of static branches
usage in C code to handle ISA extensions detected at boot time:
static branch's metadata in the __jump_table section, which is not
discarded after ISA extensions are finalized, wastes some space.

I want to try to solve the issue for all possible dynamic handling of
ISA extensions at boot time. Inspired by Mark[2], this patch introduces
riscv_has_extension_*() helpers, which work like static branches but
are patched using alternatives, thus the metadata can be freed after
patching.

Hi Heiko,

I combined your code and my code into patch1, since one of the key
patch in the merged "Allow calls in alternatives" series rolled
back to your v1. So I added your Co-developed-by and Signed-off-by
thanks

Since v2
- rebase on riscv-next
- collect Reviewed-by tag
- fix jal imm construction
- combine Heiko's code and my code for jal patching, thus add
Co-developed-by tag
- address comments from Conor

Since v1
- rebase on v6.1-rc7 + Heiko's alternative improvements[3]
- collect Reviewed-by tag
- add one patch to update jal offsets in patched alternatives
- add one patch to switch to relative alternative entries
- add patches to patch vdso

[1]https://lore.kernel.org/linux-riscv/[email protected]/
[2]https://lore.kernel.org/linux-arm-kernel/[email protected]/
[3]https://lore.kernel.org/linux-riscv/[email protected]/


Andrew Jones (1):
riscv: KVM: Switch has_svinval() to riscv_has_extension_unlikely()

Jisheng Zhang (12):
riscv: fix jal offsets in patched alternatives
riscv: move riscv_noncoherent_supported() out of ZICBOM probe
riscv: cpufeature: detect RISCV_ALTERNATIVES_EARLY_BOOT earlier
riscv: hwcap: make ISA extension ids can be used in asm
riscv: cpufeature: extend riscv_cpufeature_patch_func to all ISA
extensions
riscv: introduce riscv_has_extension_[un]likely()
riscv: fpu: switch has_fpu() to riscv_has_extension_likely()
riscv: module: move find_section to module.h
riscv: switch to relative alternative entries
riscv: alternative: patch alternatives in the vDSO
riscv: cpu_relax: switch to riscv_has_extension_likely()
riscv: remove riscv_isa_ext_keys[] array and related usage

arch/riscv/errata/sifive/errata.c | 4 +-
arch/riscv/errata/thead/errata.c | 11 ++-
arch/riscv/include/asm/alternative-macros.h | 20 ++---
arch/riscv/include/asm/alternative.h | 12 +--
arch/riscv/include/asm/errata_list.h | 9 +-
arch/riscv/include/asm/hwcap.h | 97 +++++++++++----------
arch/riscv/include/asm/insn.h | 27 ++++++
arch/riscv/include/asm/module.h | 16 ++++
arch/riscv/include/asm/switch_to.h | 3 +-
arch/riscv/include/asm/vdso.h | 4 +
arch/riscv/include/asm/vdso/processor.h | 2 +-
arch/riscv/kernel/alternative.c | 52 +++++++++++
arch/riscv/kernel/cpufeature.c | 78 +++--------------
arch/riscv/kernel/module.c | 15 ----
arch/riscv/kernel/setup.c | 3 +
arch/riscv/kernel/vdso.c | 5 --
arch/riscv/kernel/vdso/vdso.lds.S | 7 ++
arch/riscv/kvm/tlb.c | 3 +-
18 files changed, 206 insertions(+), 162 deletions(-)

--
2.38.1


2023-01-11 17:31:46

by Jisheng Zhang

[permalink] [raw]
Subject: [PATCH v3 09/13] riscv: switch to relative alternative entries

Instead of using absolute addresses for both the old instrucions and
the alternative instructions, use offsets relative to the alt_entry
values. So this not only cuts the size of the alternative entry, but
also meets the prerequisite for patching alternatives in the vDSO,
since absolute alternative entries are subject to dynamic relocation,
which is incompatible with the vDSO building.

Signed-off-by: Jisheng Zhang <[email protected]>
---
arch/riscv/errata/sifive/errata.c | 4 +++-
arch/riscv/errata/thead/errata.c | 11 ++++++++---
arch/riscv/include/asm/alternative-macros.h | 20 ++++++++++----------
arch/riscv/include/asm/alternative.h | 12 ++++++------
arch/riscv/kernel/cpufeature.c | 8 +++++---
5 files changed, 32 insertions(+), 23 deletions(-)

diff --git a/arch/riscv/errata/sifive/errata.c b/arch/riscv/errata/sifive/errata.c
index 1031038423e7..0e537cdfd324 100644
--- a/arch/riscv/errata/sifive/errata.c
+++ b/arch/riscv/errata/sifive/errata.c
@@ -107,7 +107,9 @@ void __init_or_module sifive_errata_patch_func(struct alt_entry *begin,

tmp = (1U << alt->errata_id);
if (cpu_req_errata & tmp) {
- patch_text_nosync(alt->old_ptr, alt->alt_ptr, alt->alt_len);
+ patch_text_nosync((void *)&alt->old_offset + alt->old_offset,
+ (void *)&alt->alt_offset + alt->alt_offset,
+ alt->alt_len);
cpu_apply_errata |= tmp;
}
}
diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c
index fac5742d1c1e..d56d76a529b5 100644
--- a/arch/riscv/errata/thead/errata.c
+++ b/arch/riscv/errata/thead/errata.c
@@ -87,6 +87,7 @@ void __init_or_module thead_errata_patch_func(struct alt_entry *begin, struct al
struct alt_entry *alt;
u32 cpu_req_errata = thead_errata_probe(stage, archid, impid);
u32 tmp;
+ void *oldptr, *altptr;

for (alt = begin; alt < end; alt++) {
if (alt->vendor_id != THEAD_VENDOR_ID)
@@ -96,12 +97,16 @@ void __init_or_module thead_errata_patch_func(struct alt_entry *begin, struct al

tmp = (1U << alt->errata_id);
if (cpu_req_errata & tmp) {
+ oldptr = (void *)&alt->old_offset + alt->old_offset;
+ altptr = (void *)&alt->alt_offset + alt->alt_offset;
+
/* On vm-alternatives, the mmu isn't running yet */
if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
- memcpy((void *)__pa_symbol(alt->old_ptr),
- (void *)__pa_symbol(alt->alt_ptr), alt->alt_len);
+ memcpy((void *)__pa_symbol(oldptr),
+ (void *)__pa_symbol(altptr),
+ alt->alt_len);
else
- patch_text_nosync(alt->old_ptr, alt->alt_ptr, alt->alt_len);
+ patch_text_nosync(oldptr, altptr, alt->alt_len);
}
}

diff --git a/arch/riscv/include/asm/alternative-macros.h b/arch/riscv/include/asm/alternative-macros.h
index 7226e2462584..3c3ca65e521b 100644
--- a/arch/riscv/include/asm/alternative-macros.h
+++ b/arch/riscv/include/asm/alternative-macros.h
@@ -7,11 +7,11 @@
#ifdef __ASSEMBLY__

.macro ALT_ENTRY oldptr newptr vendor_id errata_id new_len
- RISCV_PTR \oldptr
- RISCV_PTR \newptr
- REG_ASM \vendor_id
- REG_ASM \new_len
- .word \errata_id
+ .long \oldptr - .
+ .long \newptr - .
+ .short \vendor_id
+ .short \new_len
+ .long \errata_id
.endm

.macro ALT_NEW_CONTENT vendor_id, errata_id, enable = 1, new_c : vararg
@@ -59,11 +59,11 @@
#include <linux/stringify.h>

#define ALT_ENTRY(oldptr, newptr, vendor_id, errata_id, newlen) \
- RISCV_PTR " " oldptr "\n" \
- RISCV_PTR " " newptr "\n" \
- REG_ASM " " vendor_id "\n" \
- REG_ASM " " newlen "\n" \
- ".word " errata_id "\n"
+ ".long ((" oldptr ") - .) \n" \
+ ".long ((" newptr ") - .) \n" \
+ ".short " vendor_id "\n" \
+ ".short " newlen "\n" \
+ ".long " errata_id "\n"

#define ALT_NEW_CONTENT(vendor_id, errata_id, enable, new_c) \
".if " __stringify(enable) " == 1\n" \
diff --git a/arch/riscv/include/asm/alternative.h b/arch/riscv/include/asm/alternative.h
index 1bd4027d34ca..b6050a235f50 100644
--- a/arch/riscv/include/asm/alternative.h
+++ b/arch/riscv/include/asm/alternative.h
@@ -31,12 +31,12 @@ void riscv_alternative_fix_offsets(void *alt_ptr, unsigned int len,
int patch_offset);

struct alt_entry {
- void *old_ptr; /* address of original instruciton or data */
- void *alt_ptr; /* address of replacement instruction or data */
- unsigned long vendor_id; /* cpu vendor id */
- unsigned long alt_len; /* The replacement size */
- unsigned int errata_id; /* The errata id */
-} __packed;
+ s32 old_offset; /* offset relative to original instruciton or data */
+ s32 alt_offset; /* offset relative to replacement instruction or data */
+ u16 vendor_id; /* cpu vendor id */
+ u16 alt_len; /* The replacement size */
+ u32 errata_id; /* The errata id */
+};

struct errata_checkfunc_id {
unsigned long vendor_id;
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 6db8b31d9149..c394cde2560b 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -280,6 +280,7 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
unsigned int stage)
{
struct alt_entry *alt;
+ void *oldptr, *altptr;

if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
return;
@@ -293,12 +294,13 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
continue;
}

+ oldptr = (void *)&alt->old_offset + alt->old_offset;
+ altptr = (void *)&alt->alt_offset + alt->alt_offset;
if (!__riscv_isa_extension_available(NULL, alt->errata_id))
continue;

- patch_text_nosync(alt->old_ptr, alt->alt_ptr, alt->alt_len);
- riscv_alternative_fix_offsets(alt->old_ptr, alt->alt_len,
- alt->old_ptr - alt->alt_ptr);
+ patch_text_nosync(oldptr, altptr, alt->alt_len);
+ riscv_alternative_fix_offsets(oldptr, alt->alt_len, oldptr - altptr);
}
}
#endif
--
2.38.1

2023-01-11 17:40:00

by Jisheng Zhang

[permalink] [raw]
Subject: [PATCH v3 11/13] riscv: cpu_relax: switch to riscv_has_extension_likely()

Switch cpu_relax() from static branch to the new helper
riscv_has_extension_likely()

Signed-off-by: Jisheng Zhang <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Reviewed-by: Heiko Stuebner <[email protected]>
Reviewed-by: Guo Ren <[email protected]>
---
arch/riscv/include/asm/vdso/processor.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/include/asm/vdso/processor.h b/arch/riscv/include/asm/vdso/processor.h
index fa70cfe507aa..edf0e25e43d1 100644
--- a/arch/riscv/include/asm/vdso/processor.h
+++ b/arch/riscv/include/asm/vdso/processor.h
@@ -10,7 +10,7 @@

static inline void cpu_relax(void)
{
- if (!static_branch_likely(&riscv_isa_ext_keys[RISCV_ISA_EXT_KEY_ZIHINTPAUSE])) {
+ if (!riscv_has_extension_likely(RISCV_ISA_EXT_ZIHINTPAUSE)) {
#ifdef __riscv_muldiv
int dummy;
/* In lieu of a halt instruction, induce a long-latency stall. */
--
2.38.1

2023-01-11 17:56:58

by Jisheng Zhang

[permalink] [raw]
Subject: [PATCH v3 05/13] riscv: cpufeature: extend riscv_cpufeature_patch_func to all ISA extensions

riscv_cpufeature_patch_func() currently only scans a limited set of
cpufeatures, explicitly defined with macros. Extend it to probe for all
ISA extensions.

Signed-off-by: Jisheng Zhang <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Reviewed-by: Heiko Stuebner <[email protected]>
---
arch/riscv/include/asm/errata_list.h | 9 ++--
arch/riscv/kernel/cpufeature.c | 63 ++++------------------------
2 files changed, 11 insertions(+), 61 deletions(-)

diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
index 4180312d2a70..274c6f889602 100644
--- a/arch/riscv/include/asm/errata_list.h
+++ b/arch/riscv/include/asm/errata_list.h
@@ -7,6 +7,7 @@

#include <asm/alternative.h>
#include <asm/csr.h>
+#include <asm/hwcap.h>
#include <asm/vendorid_list.h>

#ifdef CONFIG_ERRATA_SIFIVE
@@ -22,10 +23,6 @@
#define ERRATA_THEAD_NUMBER 3
#endif

-#define CPUFEATURE_SVPBMT 0
-#define CPUFEATURE_ZICBOM 1
-#define CPUFEATURE_NUMBER 2
-
#ifdef __ASSEMBLY__

#define ALT_INSN_FAULT(x) \
@@ -55,7 +52,7 @@ asm(ALTERNATIVE("sfence.vma %0", "sfence.vma", SIFIVE_VENDOR_ID, \
#define ALT_SVPBMT(_val, prot) \
asm(ALTERNATIVE_2("li %0, 0\t\nnop", \
"li %0, %1\t\nslli %0,%0,%3", 0, \
- CPUFEATURE_SVPBMT, CONFIG_RISCV_ISA_SVPBMT, \
+ RISCV_ISA_EXT_SVPBMT, CONFIG_RISCV_ISA_SVPBMT, \
"li %0, %2\t\nslli %0,%0,%4", THEAD_VENDOR_ID, \
ERRATA_THEAD_PBMT, CONFIG_ERRATA_THEAD_PBMT) \
: "=r"(_val) \
@@ -129,7 +126,7 @@ asm volatile(ALTERNATIVE_2( \
"add a0, a0, %0\n\t" \
"2:\n\t" \
"bltu a0, %2, 3b\n\t" \
- "nop", 0, CPUFEATURE_ZICBOM, CONFIG_RISCV_ISA_ZICBOM, \
+ "nop", 0, RISCV_ISA_EXT_ZICBOM, CONFIG_RISCV_ISA_ZICBOM, \
"mv a0, %1\n\t" \
"j 2f\n\t" \
"3:\n\t" \
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 37e8c5e69754..6db8b31d9149 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -275,58 +275,11 @@ void __init riscv_fill_hwcap(void)
}

#ifdef CONFIG_RISCV_ALTERNATIVE
-static bool __init_or_module cpufeature_probe_svpbmt(unsigned int stage)
-{
- if (!IS_ENABLED(CONFIG_RISCV_ISA_SVPBMT))
- return false;
-
- if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
- return false;
-
- return riscv_isa_extension_available(NULL, SVPBMT);
-}
-
-static bool __init_or_module cpufeature_probe_zicbom(unsigned int stage)
-{
- if (!IS_ENABLED(CONFIG_RISCV_ISA_ZICBOM))
- return false;
-
- if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
- return false;
-
- if (!riscv_isa_extension_available(NULL, ZICBOM))
- return false;
-
- return true;
-}
-
-/*
- * Probe presence of individual extensions.
- *
- * This code may also be executed before kernel relocation, so we cannot use
- * addresses generated by the address-of operator as they won't be valid in
- * this context.
- */
-static u32 __init_or_module cpufeature_probe(unsigned int stage)
-{
- u32 cpu_req_feature = 0;
-
- if (cpufeature_probe_svpbmt(stage))
- cpu_req_feature |= BIT(CPUFEATURE_SVPBMT);
-
- if (cpufeature_probe_zicbom(stage))
- cpu_req_feature |= BIT(CPUFEATURE_ZICBOM);
-
- return cpu_req_feature;
-}
-
void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
struct alt_entry *end,
unsigned int stage)
{
- u32 cpu_req_feature = cpufeature_probe(stage);
struct alt_entry *alt;
- u32 tmp;

if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
return;
@@ -334,18 +287,18 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
for (alt = begin; alt < end; alt++) {
if (alt->vendor_id != 0)
continue;
- if (alt->errata_id >= CPUFEATURE_NUMBER) {
- WARN(1, "This feature id:%d is not in kernel cpufeature list",
+ if (alt->errata_id >= RISCV_ISA_EXT_MAX) {
+ WARN(1, "This extension id:%d is not in ISA extension list",
alt->errata_id);
continue;
}

- tmp = (1U << alt->errata_id);
- if (cpu_req_feature & tmp) {
- patch_text_nosync(alt->old_ptr, alt->alt_ptr, alt->alt_len);
- riscv_alternative_fix_offsets(alt->old_ptr, alt->alt_len,
- alt->old_ptr - alt->alt_ptr);
- }
+ if (!__riscv_isa_extension_available(NULL, alt->errata_id))
+ continue;
+
+ patch_text_nosync(alt->old_ptr, alt->alt_ptr, alt->alt_len);
+ riscv_alternative_fix_offsets(alt->old_ptr, alt->alt_len,
+ alt->old_ptr - alt->alt_ptr);
}
}
#endif
--
2.38.1

2023-01-11 18:34:53

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v3 09/13] riscv: switch to relative alternative entries

On Thu, Jan 12, 2023 at 01:10:23AM +0800, Jisheng Zhang wrote:
> Instead of using absolute addresses for both the old instrucions and
> the alternative instructions, use offsets relative to the alt_entry
> values. So this not only cuts the size of the alternative entry, but
> also meets the prerequisite for patching alternatives in the vDSO,
> since absolute alternative entries are subject to dynamic relocation,
> which is incompatible with the vDSO building.
>
> Signed-off-by: Jisheng Zhang <[email protected]>
> ---
> arch/riscv/errata/sifive/errata.c | 4 +++-
> arch/riscv/errata/thead/errata.c | 11 ++++++++---
> arch/riscv/include/asm/alternative-macros.h | 20 ++++++++++----------
> arch/riscv/include/asm/alternative.h | 12 ++++++------
> arch/riscv/kernel/cpufeature.c | 8 +++++---
> 5 files changed, 32 insertions(+), 23 deletions(-)
>
> diff --git a/arch/riscv/errata/sifive/errata.c b/arch/riscv/errata/sifive/errata.c
> index 1031038423e7..0e537cdfd324 100644
> --- a/arch/riscv/errata/sifive/errata.c
> +++ b/arch/riscv/errata/sifive/errata.c
> @@ -107,7 +107,9 @@ void __init_or_module sifive_errata_patch_func(struct alt_entry *begin,
>
> tmp = (1U << alt->errata_id);
> if (cpu_req_errata & tmp) {
> - patch_text_nosync(alt->old_ptr, alt->alt_ptr, alt->alt_len);
> + patch_text_nosync((void *)&alt->old_offset + alt->old_offset,
> + (void *)&alt->alt_offset + alt->alt_offset,

I was hoping to see Conor's macro suggestion show up in this version.

> + alt->alt_len);
> cpu_apply_errata |= tmp;
> }
> }
> diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c
> index fac5742d1c1e..d56d76a529b5 100644
> --- a/arch/riscv/errata/thead/errata.c
> +++ b/arch/riscv/errata/thead/errata.c
> @@ -87,6 +87,7 @@ void __init_or_module thead_errata_patch_func(struct alt_entry *begin, struct al
> struct alt_entry *alt;
> u32 cpu_req_errata = thead_errata_probe(stage, archid, impid);
> u32 tmp;
> + void *oldptr, *altptr;
>
> for (alt = begin; alt < end; alt++) {
> if (alt->vendor_id != THEAD_VENDOR_ID)
> @@ -96,12 +97,16 @@ void __init_or_module thead_errata_patch_func(struct alt_entry *begin, struct al
>
> tmp = (1U << alt->errata_id);
> if (cpu_req_errata & tmp) {
> + oldptr = (void *)&alt->old_offset + alt->old_offset;
> + altptr = (void *)&alt->alt_offset + alt->alt_offset;
> +
> /* On vm-alternatives, the mmu isn't running yet */
> if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
> - memcpy((void *)__pa_symbol(alt->old_ptr),
> - (void *)__pa_symbol(alt->alt_ptr), alt->alt_len);
> + memcpy((void *)__pa_symbol(oldptr),
> + (void *)__pa_symbol(altptr),
> + alt->alt_len);
> else
> - patch_text_nosync(alt->old_ptr, alt->alt_ptr, alt->alt_len);
> + patch_text_nosync(oldptr, altptr, alt->alt_len);
> }
> }
>
> diff --git a/arch/riscv/include/asm/alternative-macros.h b/arch/riscv/include/asm/alternative-macros.h
> index 7226e2462584..3c3ca65e521b 100644
> --- a/arch/riscv/include/asm/alternative-macros.h
> +++ b/arch/riscv/include/asm/alternative-macros.h
> @@ -7,11 +7,11 @@
> #ifdef __ASSEMBLY__
>
> .macro ALT_ENTRY oldptr newptr vendor_id errata_id new_len
> - RISCV_PTR \oldptr
> - RISCV_PTR \newptr
> - REG_ASM \vendor_id
> - REG_ASM \new_len
> - .word \errata_id
> + .long \oldptr - .
> + .long \newptr - .
> + .short \vendor_id
> + .short \new_len
> + .long \errata_id

nit: I like .2byte and .4byte since I always have to double check how many
bytes .long is.

> .endm
>
> .macro ALT_NEW_CONTENT vendor_id, errata_id, enable = 1, new_c : vararg
> @@ -59,11 +59,11 @@
> #include <linux/stringify.h>
>
> #define ALT_ENTRY(oldptr, newptr, vendor_id, errata_id, newlen) \
> - RISCV_PTR " " oldptr "\n" \
> - RISCV_PTR " " newptr "\n" \
> - REG_ASM " " vendor_id "\n" \
> - REG_ASM " " newlen "\n" \
> - ".word " errata_id "\n"
> + ".long ((" oldptr ") - .) \n" \
> + ".long ((" newptr ") - .) \n" \
> + ".short " vendor_id "\n" \
> + ".short " newlen "\n" \
> + ".long " errata_id "\n"
>
> #define ALT_NEW_CONTENT(vendor_id, errata_id, enable, new_c) \
> ".if " __stringify(enable) " == 1\n" \
> diff --git a/arch/riscv/include/asm/alternative.h b/arch/riscv/include/asm/alternative.h
> index 1bd4027d34ca..b6050a235f50 100644
> --- a/arch/riscv/include/asm/alternative.h
> +++ b/arch/riscv/include/asm/alternative.h
> @@ -31,12 +31,12 @@ void riscv_alternative_fix_offsets(void *alt_ptr, unsigned int len,
> int patch_offset);
>
> struct alt_entry {
> - void *old_ptr; /* address of original instruciton or data */
> - void *alt_ptr; /* address of replacement instruction or data */
> - unsigned long vendor_id; /* cpu vendor id */
> - unsigned long alt_len; /* The replacement size */
> - unsigned int errata_id; /* The errata id */
> -} __packed;
> + s32 old_offset; /* offset relative to original instruciton or data */
^
instruction

(The typo was already there, but, IMO, we can fix something like that
while touching it.)

> + s32 alt_offset; /* offset relative to replacement instruction or data */
> + u16 vendor_id; /* cpu vendor id */
> + u16 alt_len; /* The replacement size */
> + u32 errata_id; /* The errata id */
> +};
>
> struct errata_checkfunc_id {
> unsigned long vendor_id;
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 6db8b31d9149..c394cde2560b 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -280,6 +280,7 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
> unsigned int stage)
> {
> struct alt_entry *alt;
> + void *oldptr, *altptr;
>
> if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
> return;
> @@ -293,12 +294,13 @@ void __init_or_module riscv_cpufeature_patch_func(struct alt_entry *begin,
> continue;
> }
>
> + oldptr = (void *)&alt->old_offset + alt->old_offset;
> + altptr = (void *)&alt->alt_offset + alt->alt_offset;
> if (!__riscv_isa_extension_available(NULL, alt->errata_id))
> continue;
>
> - patch_text_nosync(alt->old_ptr, alt->alt_ptr, alt->alt_len);
> - riscv_alternative_fix_offsets(alt->old_ptr, alt->alt_len,
> - alt->old_ptr - alt->alt_ptr);
> + patch_text_nosync(oldptr, altptr, alt->alt_len);
> + riscv_alternative_fix_offsets(oldptr, alt->alt_len, oldptr - altptr);
> }
> }
> #endif
> --
> 2.38.1

Besides preferring a macro and the nits, LGTM

Reviewed-by: Andrew Jones <[email protected]>

Thanks,
drew

2023-01-12 00:01:11

by Heiko Stuebner

[permalink] [raw]
Subject: Re: [PATCH v3 05/13] riscv: cpufeature: extend riscv_cpufeature_patch_func to all ISA extensions

Hi Jisheng.

Am Mittwoch, 11. Januar 2023, 18:10:19 CET schrieb Jisheng Zhang:
> riscv_cpufeature_patch_func() currently only scans a limited set of
> cpufeatures, explicitly defined with macros. Extend it to probe for all
> ISA extensions.
>
> Signed-off-by: Jisheng Zhang <[email protected]>
> Reviewed-by: Andrew Jones <[email protected]>
> Reviewed-by: Heiko Stuebner <[email protected]>
> ---
> arch/riscv/include/asm/errata_list.h | 9 ++--
> arch/riscv/kernel/cpufeature.c | 63 ++++------------------------
> 2 files changed, 11 insertions(+), 61 deletions(-)

hmmm ... I do see a somewhat big caveat for this.
and would like to take back my Reviewed-by for now


With this change we would limit the patchable cpufeatures to actual
riscv extensions. But cpufeatures can also be soft features like
how performant the core handles unaligned accesses.

See Palmer's series [0].


Also this essentially codifies that each ALTERNATIVE can only ever
be attached to exactly one extension.

But contrary to vendor-errata, it is very likely that we will need
combinations of different extensions for some alternatives in the future.

In my optimization quest, I found that it's actually pretty neat to
convert the errata-id for cpufeatures to a bitfield [1], because then it's
possible to just combine extensions into said bitfield [2]:

ALTERNATIVE_2("nop",
"j strcmp_zbb_unaligned", 0, CPUFEATURE_ZBB | CPUFEATURE_FAST_UNALIGNED, 0, CONFIG_RISCV_ISA_ZBB,
"j variant_zbb", 0, CPUFEATURE_ZBB, CPUFEATURE_FAST_UNALIGNED, CONFIG_RISCV_ISA_ZBB)

[the additional field there models a "not" component]

So I really feel this would limit us quite a bit.


Heiko



[0] https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=riscv-hwprobe-v1&id=510c491cb9d87dcbdc91c63558dc704968723240
[1] https://github.com/mmind/linux-riscv/commit/f57a896122ee7e666692079320fc35829434cf96
[2] https://github.com/mmind/linux-riscv/commit/8cef615dab0c00ad68af2651ee5b93d06be17f27#diff-194cb8a86f9fb9b03683295f21c8f46b456a9f94737f01726ddbcbb9e3aace2cR12


2023-01-12 09:43:29

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v3 05/13] riscv: cpufeature: extend riscv_cpufeature_patch_func to all ISA extensions

On Thu, Jan 12, 2023 at 12:29:57AM +0100, Heiko St?bner wrote:
> Hi Jisheng.
>
> Am Mittwoch, 11. Januar 2023, 18:10:19 CET schrieb Jisheng Zhang:
> > riscv_cpufeature_patch_func() currently only scans a limited set of
> > cpufeatures, explicitly defined with macros. Extend it to probe for all
> > ISA extensions.
> >
> > Signed-off-by: Jisheng Zhang <[email protected]>
> > Reviewed-by: Andrew Jones <[email protected]>
> > Reviewed-by: Heiko Stuebner <[email protected]>
> > ---
> > arch/riscv/include/asm/errata_list.h | 9 ++--
> > arch/riscv/kernel/cpufeature.c | 63 ++++------------------------
> > 2 files changed, 11 insertions(+), 61 deletions(-)
>
> hmmm ... I do see a somewhat big caveat for this.
> and would like to take back my Reviewed-by for now
>
>
> With this change we would limit the patchable cpufeatures to actual
> riscv extensions. But cpufeatures can also be soft features like
> how performant the core handles unaligned accesses.

I agree that this needs to be addressed and Jisheng also raised this
yesterday here [*]. It seems we need the concept of cpufeatures, which
may be extensions or non-extensions.

[*] https://lore.kernel.org/all/Y77xyNPNqnFQUqAx@xhacker/

>
> See Palmer's series [0].
>
>
> Also this essentially codifies that each ALTERNATIVE can only ever
> be attached to exactly one extension.
>
> But contrary to vendor-errata, it is very likely that we will need
> combinations of different extensions for some alternatives in the future.

One possible approach may be to combine extensions/non-extensions at boot
time into pseudo-cpufeatures. Then, alternatives can continue attaching to
a single "feature". (I'm not saying that's a better approach than the
bitmap, I'm just suggesting it as something else to consider.)

Thanks,
drew

>
> In my optimization quest, I found that it's actually pretty neat to
> convert the errata-id for cpufeatures to a bitfield [1], because then it's
> possible to just combine extensions into said bitfield [2]:
>
> ALTERNATIVE_2("nop",
> "j strcmp_zbb_unaligned", 0, CPUFEATURE_ZBB | CPUFEATURE_FAST_UNALIGNED, 0, CONFIG_RISCV_ISA_ZBB,
> "j variant_zbb", 0, CPUFEATURE_ZBB, CPUFEATURE_FAST_UNALIGNED, CONFIG_RISCV_ISA_ZBB)
>
> [the additional field there models a "not" component]
>
> So I really feel this would limit us quite a bit.
>
>
> Heiko
>
>
>
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=riscv-hwprobe-v1&id=510c491cb9d87dcbdc91c63558dc704968723240
> [1] https://github.com/mmind/linux-riscv/commit/f57a896122ee7e666692079320fc35829434cf96
> [2] https://github.com/mmind/linux-riscv/commit/8cef615dab0c00ad68af2651ee5b93d06be17f27#diff-194cb8a86f9fb9b03683295f21c8f46b456a9f94737f01726ddbcbb9e3aace2cR12
>
>

2023-01-12 22:23:53

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v3 09/13] riscv: switch to relative alternative entries

Hey Jisheng,

On Thu, Jan 12, 2023 at 01:10:23AM +0800, Jisheng Zhang wrote:
> Instead of using absolute addresses for both the old instrucions and
> the alternative instructions, use offsets relative to the alt_entry
> values. So this not only cuts the size of the alternative entry, but
> also meets the prerequisite for patching alternatives in the vDSO,
> since absolute alternative entries are subject to dynamic relocation,
> which is incompatible with the vDSO building.
>
> Signed-off-by: Jisheng Zhang <[email protected]>
> ---
> arch/riscv/errata/sifive/errata.c | 4 +++-
> arch/riscv/errata/thead/errata.c | 11 ++++++++---
> arch/riscv/include/asm/alternative-macros.h | 20 ++++++++++----------
> arch/riscv/include/asm/alternative.h | 12 ++++++------
> arch/riscv/kernel/cpufeature.c | 8 +++++---
> 5 files changed, 32 insertions(+), 23 deletions(-)
>
> diff --git a/arch/riscv/errata/sifive/errata.c b/arch/riscv/errata/sifive/errata.c
> index 1031038423e7..0e537cdfd324 100644
> --- a/arch/riscv/errata/sifive/errata.c
> +++ b/arch/riscv/errata/sifive/errata.c
> @@ -107,7 +107,9 @@ void __init_or_module sifive_errata_patch_func(struct alt_entry *begin,
>
> tmp = (1U << alt->errata_id);
> if (cpu_req_errata & tmp) {
> - patch_text_nosync(alt->old_ptr, alt->alt_ptr, alt->alt_len);
> + patch_text_nosync((void *)&alt->old_offset + alt->old_offset,
> + (void *)&alt->alt_offset + alt->alt_offset,
> + alt->alt_len);

I left a comment on v2 that went unanswered:
https://lore.kernel.org/all/Y4+3nJ53nvmmc8+z@spud/

The TL;DR is that I would like you to create a macro for this so that
this messy operation is done in a central location, with a nice comment
explaining the offsets. If my "analysis" there was correct, feel free to
use it as a starting point for said comment.

The macro would then reduce the above to something like:
patch_text_nosync(ALT_OFFSET_ADDRESS(alt->old_offset),
ALT_OFFSET_ADDRESS(alt->alt_offset),
alt->alt_len);

Which I think is easier to understand since this "concept" will show up
in several places & is less intuitive than what we currently have.
Nothing beats having this stuff well explained in the codebase IMO.

> diff --git a/arch/riscv/include/asm/alternative.h b/arch/riscv/include/asm/alternative.h
> index 1bd4027d34ca..b6050a235f50 100644
> --- a/arch/riscv/include/asm/alternative.h
> +++ b/arch/riscv/include/asm/alternative.h
> @@ -31,12 +31,12 @@ void riscv_alternative_fix_offsets(void *alt_ptr, unsigned int len,
> int patch_offset);
>
> struct alt_entry {
> - void *old_ptr; /* address of original instruciton or data */
> - void *alt_ptr; /* address of replacement instruction or data */
> - unsigned long vendor_id; /* cpu vendor id */
> - unsigned long alt_len; /* The replacement size */
> - unsigned int errata_id; /* The errata id */
> -} __packed;
> + s32 old_offset; /* offset relative to original instruciton or data */
> + s32 alt_offset; /* offset relative to replacement instruction or data */

This wording is better, but you should fix the "instruciton" typo while
you are in the area.

> + u16 vendor_id; /* cpu vendor id */
> + u16 alt_len; /* The replacement size */
> + u32 errata_id; /* The errata id */
> +};

Thanks,
Conor.


Attachments:
(No filename) (3.32 kB)
signature.asc (235.00 B)
Download all attachments

2023-01-12 23:01:33

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v3 11/13] riscv: cpu_relax: switch to riscv_has_extension_likely()

On Thu, Jan 12, 2023 at 01:10:25AM +0800, Jisheng Zhang wrote:
> Switch cpu_relax() from static branch to the new helper
> riscv_has_extension_likely()
>
> Signed-off-by: Jisheng Zhang <[email protected]>
> Reviewed-by: Andrew Jones <[email protected]>
> Reviewed-by: Heiko Stuebner <[email protected]>
> Reviewed-by: Guo Ren <[email protected]>

With the same caveat here as with fpu, may as well join the
posse once more...
Reviewed-by: Conor Dooley <[email protected]>

> ---
> arch/riscv/include/asm/vdso/processor.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/riscv/include/asm/vdso/processor.h b/arch/riscv/include/asm/vdso/processor.h
> index fa70cfe507aa..edf0e25e43d1 100644
> --- a/arch/riscv/include/asm/vdso/processor.h
> +++ b/arch/riscv/include/asm/vdso/processor.h
> @@ -10,7 +10,7 @@
>
> static inline void cpu_relax(void)
> {
> - if (!static_branch_likely(&riscv_isa_ext_keys[RISCV_ISA_EXT_KEY_ZIHINTPAUSE])) {
> + if (!riscv_has_extension_likely(RISCV_ISA_EXT_ZIHINTPAUSE)) {
> #ifdef __riscv_muldiv
> int dummy;
> /* In lieu of a halt instruction, induce a long-latency stall. */
> --
> 2.38.1
>
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv


Attachments:
(No filename) (1.36 kB)
signature.asc (235.00 B)
Download all attachments

2023-01-13 15:46:42

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v3 05/13] riscv: cpufeature: extend riscv_cpufeature_patch_func to all ISA extensions

On Thu, Jan 12, 2023 at 10:21:36AM +0100, Andrew Jones wrote:
> On Thu, Jan 12, 2023 at 12:29:57AM +0100, Heiko Stübner wrote:
> > Am Mittwoch, 11. Januar 2023, 18:10:19 CET schrieb Jisheng Zhang:
> > > riscv_cpufeature_patch_func() currently only scans a limited set of
> > > cpufeatures, explicitly defined with macros. Extend it to probe for all
> > > ISA extensions.
> > >
> > > Signed-off-by: Jisheng Zhang <[email protected]>
> > > Reviewed-by: Andrew Jones <[email protected]>
> > > Reviewed-by: Heiko Stuebner <[email protected]>
> > > ---
> > > arch/riscv/include/asm/errata_list.h | 9 ++--
> > > arch/riscv/kernel/cpufeature.c | 63 ++++------------------------
> > > 2 files changed, 11 insertions(+), 61 deletions(-)
> >
> > hmmm ... I do see a somewhat big caveat for this.
> > and would like to take back my Reviewed-by for now
> >
> >
> > With this change we would limit the patchable cpufeatures to actual
> > riscv extensions. But cpufeatures can also be soft features like
> > how performant the core handles unaligned accesses.
>
> I agree that this needs to be addressed and Jisheng also raised this
> yesterday here [*]. It seems we need the concept of cpufeatures, which
> may be extensions or non-extensions.
>
> [*] https://lore.kernel.org/all/Y77xyNPNqnFQUqAx@xhacker/
>
> > See Palmer's series [0].
> >
> >
> > Also this essentially codifies that each ALTERNATIVE can only ever
> > be attached to exactly one extension.
> >
> > But contrary to vendor-errata, it is very likely that we will need
> > combinations of different extensions for some alternatives in the future.
>
> One possible approach may be to combine extensions/non-extensions at boot
> time into pseudo-cpufeatures. Then, alternatives can continue attaching to
> a single "feature". (I'm not saying that's a better approach than the
> bitmap, I'm just suggesting it as something else to consider.)


> > ALTERNATIVE_2("nop",
> > "j strcmp_zbb_unaligned", 0, CPUFEATURE_ZBB | CPUFEATURE_FAST_UNALIGNED, 0, CONFIG_RISCV_ISA_ZBB,
> > "j variant_zbb", 0, CPUFEATURE_ZBB, CPUFEATURE_FAST_UNALIGNED, CONFIG_RISCV_ISA_ZBB)
> >
> > [the additional field there models a "not" component]

Since we're discussing theoretical implementations, and it's a little hard
to picture all that they entail in my head, I might be making a fool of
myself here with assumptions...

Heiko's suggestion sounded along the lines of: keep probing individual
"features" as we are now. Features in this case being the presence of
the extension or non-extension capability. And then in the alternative,
make all of the decisions about which to apply.

Drew's suggestion would have significantly more defined CPUFEATURE_FOOs,
but would offload the decision making about which extensions or non-
extension capabilities constitute a feature to regular old cpufeature
code. However, the order of precedence would remain in the alt macro, as
it does now.

I think I am just a wee bit biased, but adding the complexity somewhere
other than alternative macros seems a wise choice, especially as we are
likely to find that complexity increases over time?

The other thing that came to mind, and maybe this is just looking for
holes where they don't exist (or are not worth addressing), is that
order of precedence.
I can imagine that, in some cases, the order of precedence is not a
constant per psuedo-cpufeature, but determined by implementation of
the capabilities that comprise it?

If my assumption/understanding holds, moving decision making out of the
alternative seems like it would better provision for scenarios like
that? I dunno, maybe that is whatever the corollary of "premature
optimisation" is for this discussion.

That's my unsolicited € 0.02, hopefully I wasn't off-base with the
assumptions I made.

Heiko, I figure you've got some sort of WIP stuff for this anyway since
you're interested in the fast unaligned? How close are you to posting any
of that?

While I think of it, w.r.t. extension versus (pseudo)cpufeature etc
naming, it may make sense to call the functions that this series adds
in patch 6 has_cpufeature_{un,}likely(), no matter what decision gets
made here?
IMO using cpufeature seems to make more sense for a general use API that
may be used later on for the likes of unaligned access, even if
initially it is not used for anything other than extensions.

Thanks,
Conor.


Attachments:
(No filename) (4.44 kB)
signature.asc (235.00 B)
Download all attachments

2023-01-14 20:42:49

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v3 05/13] riscv: cpufeature: extend riscv_cpufeature_patch_func to all ISA extensions

Hello again!

On Fri, Jan 13, 2023 at 03:18:59PM +0000, Conor Dooley wrote:
> On Thu, Jan 12, 2023 at 10:21:36AM +0100, Andrew Jones wrote:
> > On Thu, Jan 12, 2023 at 12:29:57AM +0100, Heiko Stübner wrote:
> > > Am Mittwoch, 11. Januar 2023, 18:10:19 CET schrieb Jisheng Zhang:
> > > > riscv_cpufeature_patch_func() currently only scans a limited set of
> > > > cpufeatures, explicitly defined with macros. Extend it to probe for all
> > > > ISA extensions.
> > > >
> > > > Signed-off-by: Jisheng Zhang <[email protected]>
> > > > Reviewed-by: Andrew Jones <[email protected]>
> > > > Reviewed-by: Heiko Stuebner <[email protected]>
> > > > ---
> > > > arch/riscv/include/asm/errata_list.h | 9 ++--
> > > > arch/riscv/kernel/cpufeature.c | 63 ++++------------------------
> > > > 2 files changed, 11 insertions(+), 61 deletions(-)
> > >
> > > hmmm ... I do see a somewhat big caveat for this.
> > > and would like to take back my Reviewed-by for now
> > >
> > >
> > > With this change we would limit the patchable cpufeatures to actual
> > > riscv extensions. But cpufeatures can also be soft features like
> > > how performant the core handles unaligned accesses.
> >
> > I agree that this needs to be addressed and Jisheng also raised this
> > yesterday here [*]. It seems we need the concept of cpufeatures, which
> > may be extensions or non-extensions.
> >
> > [*] https://lore.kernel.org/all/Y77xyNPNqnFQUqAx@xhacker/
> >
> > > See Palmer's series [0].
> > >
> > >
> > > Also this essentially codifies that each ALTERNATIVE can only ever
> > > be attached to exactly one extension.
> > >
> > > But contrary to vendor-errata, it is very likely that we will need
> > > combinations of different extensions for some alternatives in the future.
> >
> > One possible approach may be to combine extensions/non-extensions at boot
> > time into pseudo-cpufeatures. Then, alternatives can continue attaching to
> > a single "feature". (I'm not saying that's a better approach than the
> > bitmap, I'm just suggesting it as something else to consider.)
>
>
> > > ALTERNATIVE_2("nop",
> > > "j strcmp_zbb_unaligned", 0, CPUFEATURE_ZBB | CPUFEATURE_FAST_UNALIGNED, 0, CONFIG_RISCV_ISA_ZBB,
> > > "j variant_zbb", 0, CPUFEATURE_ZBB, CPUFEATURE_FAST_UNALIGNED, CONFIG_RISCV_ISA_ZBB)
> > >
> > > [the additional field there models a "not" component]
>
> Since we're discussing theoretical implementations, and it's a little hard
> to picture all that they entail in my head, I might be making a fool of
> myself here with assumptions...
>
> Heiko's suggestion sounded along the lines of: keep probing individual
> "features" as we are now. Features in this case being the presence of
> the extension or non-extension capability. And then in the alternative,
> make all of the decisions about which to apply.
>
> Drew's suggestion would have significantly more defined CPUFEATURE_FOOs,
> but would offload the decision making about which extensions or non-
> extension capabilities constitute a feature to regular old cpufeature
> code. However, the order of precedence would remain in the alt macro, as
> it does now.
>
> I think I am just a wee bit biased, but adding the complexity somewhere
> other than alternative macros seems a wise choice, especially as we are
> likely to find that complexity increases over time?
>
> The other thing that came to mind, and maybe this is just looking for
> holes where they don't exist (or are not worth addressing), is that
> order of precedence.
> I can imagine that, in some cases, the order of precedence is not a
> constant per psuedo-cpufeature, but determined by implementation of
> the capabilities that comprise it?

Having spent longer than I maybe should've looking at your patches
Heiko, given it's a Saturday evening, the precedence stuff is still
sticking out to me..

For Zbb & fast unaligned, the order may be non-controversial, but in
the general case I don't see how it can be true that the order of
precedence for variants is a constant.

Creating pseudo cpufeatures as Drew suggested does seem like a way to
extract complexity from the alternatives themselves (which I think is a
good thing) but at the expense of eating up cpu_req_feature bits...
By itself, it doesn't help with precedence, but it may better allow us
to deal with some of the precedence in cpufeature.c, since the
alternative would operate based on the pseudo cpufeature rather than on
the individual capabilities that the pseudo cpufeature depends on.

I tried to come up with a suggestion for what to do about precedence,
but everything I thought up felt a bit horrific tbh.
The thing that fits the current model best is just allowing cpu vendors
to add, yet more, "errata" that pick the variant that works best for
their implementation... Although I'd be worried about ballooning some of
these ALT_FOO macros out to a massive degree with that sort of approach.

> If my assumption/understanding holds, moving decision making out of the
> alternative seems like it would better provision for scenarios like
> that? I dunno, maybe that is whatever the corollary of "premature
> optimisation" is for this discussion.
>
> That's my unsolicited € 0.02, hopefully I wasn't off-base with the
> assumptions I made.

The order in which an alternative is added to the macro does matter,
right? At least, that's how I thought it worked and hope I've not had
an incorrect interpretation there all along... I wasn't until I started
reading your patch and couldn't understand why you had a construct that
looked like

if (zbb && !fast_unaligned)
...
else if (zbb && fast_unaligned)
...

rather than just inverting the order and dropping the !fast_unaligned
that I realised I might have a gap in my understanding after all..

> Heiko, I figure you've got some sort of WIP stuff for this anyway since
> you're interested in the fast unaligned? How close are you to posting any
> of that?
>
> While I think of it, w.r.t. extension versus (pseudo)cpufeature etc
> naming, it may make sense to call the functions that this series adds
> in patch 6 has_cpufeature_{un,}likely(), no matter what decision gets
> made here?
> IMO using cpufeature seems to make more sense for a general use API that
> may be used later on for the likes of unaligned access, even if
> initially it is not used for anything other than extensions.


Attachments:
(No filename) (6.40 kB)
signature.asc (235.00 B)
Download all attachments

2023-01-15 14:20:37

by Jisheng Zhang

[permalink] [raw]
Subject: Re: [PATCH v3 05/13] riscv: cpufeature: extend riscv_cpufeature_patch_func to all ISA extensions

On Thu, Jan 12, 2023 at 10:21:36AM +0100, Andrew Jones wrote:
> On Thu, Jan 12, 2023 at 12:29:57AM +0100, Heiko Stübner wrote:
> > Hi Jisheng.
> >
> > Am Mittwoch, 11. Januar 2023, 18:10:19 CET schrieb Jisheng Zhang:
> > > riscv_cpufeature_patch_func() currently only scans a limited set of
> > > cpufeatures, explicitly defined with macros. Extend it to probe for all
> > > ISA extensions.
> > >
> > > Signed-off-by: Jisheng Zhang <[email protected]>
> > > Reviewed-by: Andrew Jones <[email protected]>
> > > Reviewed-by: Heiko Stuebner <[email protected]>
> > > ---
> > > arch/riscv/include/asm/errata_list.h | 9 ++--
> > > arch/riscv/kernel/cpufeature.c | 63 ++++------------------------
> > > 2 files changed, 11 insertions(+), 61 deletions(-)
> >
> > hmmm ... I do see a somewhat big caveat for this.
> > and would like to take back my Reviewed-by for now
> >
> >
> > With this change we would limit the patchable cpufeatures to actual
> > riscv extensions. But cpufeatures can also be soft features like
> > how performant the core handles unaligned accesses.
>
> I agree that this needs to be addressed and Jisheng also raised this
> yesterday here [*]. It seems we need the concept of cpufeatures, which
> may be extensions or non-extensions.
>
> [*] https://lore.kernel.org/all/Y77xyNPNqnFQUqAx@xhacker/
>
> >
> > See Palmer's series [0].
> >
> >
> > Also this essentially codifies that each ALTERNATIVE can only ever
> > be attached to exactly one extension.
> >
> > But contrary to vendor-errata, it is very likely that we will need
> > combinations of different extensions for some alternatives in the future.
>
> One possible approach may be to combine extensions/non-extensions at boot
> time into pseudo-cpufeatures. Then, alternatives can continue attaching to
> a single "feature". (I'm not saying that's a better approach than the
> bitmap, I'm just suggesting it as something else to consider.)

When swtiching pgtable_l4_enabled to static key for the first time, I
suggested bitmap for cpufeatures which cover both ISA extensions
and non-extensions-but-some-cpu-related-features [1],
but it was rejected at that time, it seems we need to revisit the idea.

[1] https://lore.kernel.org/linux-riscv/[email protected]/

>
> Thanks,
> drew
>
> >
> > In my optimization quest, I found that it's actually pretty neat to
> > convert the errata-id for cpufeatures to a bitfield [1], because then it's
> > possible to just combine extensions into said bitfield [2]:
> >
> > ALTERNATIVE_2("nop",
> > "j strcmp_zbb_unaligned", 0, CPUFEATURE_ZBB | CPUFEATURE_FAST_UNALIGNED, 0, CONFIG_RISCV_ISA_ZBB,
> > "j variant_zbb", 0, CPUFEATURE_ZBB, CPUFEATURE_FAST_UNALIGNED, CONFIG_RISCV_ISA_ZBB)
> >
> > [the additional field there models a "not" component]
> >
> > So I really feel this would limit us quite a bit.
> >
> >
> > Heiko
> >
> >
> >
> > [0] https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=riscv-hwprobe-v1&id=510c491cb9d87dcbdc91c63558dc704968723240
> > [1] https://github.com/mmind/linux-riscv/commit/f57a896122ee7e666692079320fc35829434cf96
> > [2] https://github.com/mmind/linux-riscv/commit/8cef615dab0c00ad68af2651ee5b93d06be17f27#diff-194cb8a86f9fb9b03683295f21c8f46b456a9f94737f01726ddbcbb9e3aace2cR12
> >
> >

2023-01-15 14:45:54

by Jisheng Zhang

[permalink] [raw]
Subject: Re: [PATCH v3 05/13] riscv: cpufeature: extend riscv_cpufeature_patch_func to all ISA extensions

On Thu, Jan 12, 2023 at 12:29:57AM +0100, Heiko Stübner wrote:
> Hi Jisheng.

Hi Heiko,

>
> Am Mittwoch, 11. Januar 2023, 18:10:19 CET schrieb Jisheng Zhang:
> > riscv_cpufeature_patch_func() currently only scans a limited set of
> > cpufeatures, explicitly defined with macros. Extend it to probe for all
> > ISA extensions.
> >
> > Signed-off-by: Jisheng Zhang <[email protected]>
> > Reviewed-by: Andrew Jones <[email protected]>
> > Reviewed-by: Heiko Stuebner <[email protected]>
> > ---
> > arch/riscv/include/asm/errata_list.h | 9 ++--
> > arch/riscv/kernel/cpufeature.c | 63 ++++------------------------
> > 2 files changed, 11 insertions(+), 61 deletions(-)
>
> hmmm ... I do see a somewhat big caveat for this.
> and would like to take back my Reviewed-by for now
>
>
> With this change we would limit the patchable cpufeatures to actual
> riscv extensions. But cpufeatures can also be soft features like
> how performant the core handles unaligned accesses.

Besides Drew's comments and my reply a few minutes ago, here are
what I thought: I agree with you about "cpufeatures can also be soft
features" which I called cpu related features, but currently we
don't have that case in urgent, the SV48 and SV57 are extensions now
as Jessica pointed out[1], so I planed to send a v7 to apply the
alternative mechanism for SV48/SV57, and I think we still have time to
revisit the "expanding cpufeatures to cover soft features". But that
need to be addressed in another improvement series.

[1] https://lore.kernel.org/linux-riscv/[email protected]/

>
> See Palmer's series [0].
>
>
> Also this essentially codifies that each ALTERNATIVE can only ever
> be attached to exactly one extension.
>
> But contrary to vendor-errata, it is very likely that we will need
> combinations of different extensions for some alternatives in the future.
>
> In my optimization quest, I found that it's actually pretty neat to
> convert the errata-id for cpufeatures to a bitfield [1], because then it's
> possible to just combine extensions into said bitfield [2]:
>
> ALTERNATIVE_2("nop",
> "j strcmp_zbb_unaligned", 0, CPUFEATURE_ZBB | CPUFEATURE_FAST_UNALIGNED, 0, CONFIG_RISCV_ISA_ZBB,
> "j variant_zbb", 0, CPUFEATURE_ZBB, CPUFEATURE_FAST_UNALIGNED, CONFIG_RISCV_ISA_ZBB)
>
> [the additional field there models a "not" component]
>
> So I really feel this would limit us quite a bit.
>
>
> Heiko
>
>
>
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=riscv-hwprobe-v1&id=510c491cb9d87dcbdc91c63558dc704968723240
> [1] https://github.com/mmind/linux-riscv/commit/f57a896122ee7e666692079320fc35829434cf96
> [2] https://github.com/mmind/linux-riscv/commit/8cef615dab0c00ad68af2651ee5b93d06be17f27#diff-194cb8a86f9fb9b03683295f21c8f46b456a9f94737f01726ddbcbb9e3aace2cR12
>
>

2023-01-18 22:47:51

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v3 05/13] riscv: cpufeature: extend riscv_cpufeature_patch_func to all ISA extensions

Hey!

I guess here is the right place to follow up on all of this stuff...

On Sat, Jan 14, 2023 at 08:32:37PM +0000, Conor Dooley wrote:
> On Fri, Jan 13, 2023 at 03:18:59PM +0000, Conor Dooley wrote:
> > On Thu, Jan 12, 2023 at 10:21:36AM +0100, Andrew Jones wrote:
> > > On Thu, Jan 12, 2023 at 12:29:57AM +0100, Heiko Stübner wrote:
> > > > Am Mittwoch, 11. Januar 2023, 18:10:19 CET schrieb Jisheng Zhang:
> > > > > riscv_cpufeature_patch_func() currently only scans a limited set of
> > > > > cpufeatures, explicitly defined with macros. Extend it to probe for all
> > > > > ISA extensions.
> > > > >
> > > > > Signed-off-by: Jisheng Zhang <[email protected]>
> > > > > Reviewed-by: Andrew Jones <[email protected]>
> > > > > Reviewed-by: Heiko Stuebner <[email protected]>
> > > > > ---
> > > > > arch/riscv/include/asm/errata_list.h | 9 ++--
> > > > > arch/riscv/kernel/cpufeature.c | 63 ++++------------------------
> > > > > 2 files changed, 11 insertions(+), 61 deletions(-)
> > > >
> > > > hmmm ... I do see a somewhat big caveat for this.
> > > > and would like to take back my Reviewed-by for now
> > > >
> > > >
> > > > With this change we would limit the patchable cpufeatures to actual
> > > > riscv extensions. But cpufeatures can also be soft features like
> > > > how performant the core handles unaligned accesses.
> > >
> > > I agree that this needs to be addressed and Jisheng also raised this
> > > yesterday here [*]. It seems we need the concept of cpufeatures, which
> > > may be extensions or non-extensions.
> > >
> > > [*] https://lore.kernel.org/all/Y77xyNPNqnFQUqAx@xhacker/
> > >
> > > > See Palmer's series [0].
> > > >
> > > >
> > > > Also this essentially codifies that each ALTERNATIVE can only ever
> > > > be attached to exactly one extension.
> > > >
> > > > But contrary to vendor-errata, it is very likely that we will need
> > > > combinations of different extensions for some alternatives in the future.
> > >
> > > One possible approach may be to combine extensions/non-extensions at boot
> > > time into pseudo-cpufeatures. Then, alternatives can continue attaching to
> > > a single "feature". (I'm not saying that's a better approach than the
> > > bitmap, I'm just suggesting it as something else to consider.)
> >
> >
> > > > ALTERNATIVE_2("nop",
> > > > "j strcmp_zbb_unaligned", 0, CPUFEATURE_ZBB | CPUFEATURE_FAST_UNALIGNED, 0, CONFIG_RISCV_ISA_ZBB,
> > > > "j variant_zbb", 0, CPUFEATURE_ZBB, CPUFEATURE_FAST_UNALIGNED, CONFIG_RISCV_ISA_ZBB)
> > > >
> > > > [the additional field there models a "not" component]
> >
> > Since we're discussing theoretical implementations, and it's a little hard
> > to picture all that they entail in my head, I might be making a fool of
> > myself here with assumptions...
> >
> > Heiko's suggestion sounded along the lines of: keep probing individual
> > "features" as we are now. Features in this case being the presence of
> > the extension or non-extension capability. And then in the alternative,
> > make all of the decisions about which to apply.
> >
> > Drew's suggestion would have significantly more defined CPUFEATURE_FOOs,
> > but would offload the decision making about which extensions or non-
> > extension capabilities constitute a feature to regular old cpufeature
> > code. However, the order of precedence would remain in the alt macro, as
> > it does now.
> >
> > I think I am just a wee bit biased, but adding the complexity somewhere
> > other than alternative macros seems a wise choice, especially as we are
> > likely to find that complexity increases over time?
> >
> > The other thing that came to mind, and maybe this is just looking for
> > holes where they don't exist (or are not worth addressing), is that
> > order of precedence.
> > I can imagine that, in some cases, the order of precedence is not a
> > constant per psuedo-cpufeature, but determined by implementation of
> > the capabilities that comprise it?
>
> Having spent longer than I maybe should've looking at your patches
> Heiko, given it's a Saturday evening, the precedence stuff is still
> sticking out to me..
>
> For Zbb & fast unaligned, the order may be non-controversial, but in
> the general case I don't see how it can be true that the order of
> precedence for variants is a constant.
>
> Creating pseudo cpufeatures as Drew suggested does seem like a way to
> extract complexity from the alternatives themselves (which I think is a
> good thing) but at the expense of eating up cpu_req_feature bits...
> By itself, it doesn't help with precedence, but it may better allow us
> to deal with some of the precedence in cpufeature.c, since the
> alternative would operate based on the pseudo cpufeature rather than on
> the individual capabilities that the pseudo cpufeature depends on.
>
> I tried to come up with a suggestion for what to do about precedence,
> but everything I thought up felt a bit horrific tbh.
> The thing that fits the current model best is just allowing cpu vendors
> to add, yet more, "errata" that pick the variant that works best for
> their implementation... Although I'd be worried about ballooning some of
> these ALT_FOO macros out to a massive degree with that sort of approach.
>
> > If my assumption/understanding holds, moving decision making out of the
> > alternative seems like it would better provision for scenarios like
> > that? I dunno, maybe that is whatever the corollary of "premature
> > optimisation" is for this discussion.
> >
> > That's my unsolicited € 0.02, hopefully I wasn't off-base with the
> > assumptions I made.
>
> The order in which an alternative is added to the macro does matter,
> right? At least, that's how I thought it worked and hope I've not had
> an incorrect interpretation there all along... I wasn't until I started
> reading your patch and couldn't understand why you had a construct that
> looked like
>
> if (zbb && !fast_unaligned)
> ...
> else if (zbb && fast_unaligned)
> ...
>
> rather than just inverting the order and dropping the !fast_unaligned
> that I realised I might have a gap in my understanding after all..
>
> > Heiko, I figure you've got some sort of WIP stuff for this anyway since
> > you're interested in the fast unaligned? How close are you to posting any
> > of that?
> >
> > While I think of it, w.r.t. extension versus (pseudo)cpufeature etc
> > naming, it may make sense to call the functions that this series adds
> > in patch 6 has_cpufeature_{un,}likely(), no matter what decision gets
> > made here?
> > IMO using cpufeature seems to make more sense for a general use API that
> > may be used later on for the likes of unaligned access, even if
> > initially it is not used for anything other than extensions.

Today at [1] we talked a bit about the various bits going on here.
I'll attempt to summarise what I remember, but I meant to do this
several hours ago and am likely to make a hames of it.

For Zbb/unaligned stuff, the sentiment was along the lines of there
needing to be a performance benefit to justify the inclusion.
Some of us have HW that is (allegedly) capable of Zbb, and, if that's the
case, will give it a go.
I think it was similar for unaligned, since there is nothing yet that
supports this behaviour, we should wait until a benefit is demonstrable.

On the subject of grouping extension/non-extension capabilities into a
single cpufeature, Palmer mentioned that GCC does something similar,
for the likes of the Ventana vendor extensions, that are unlikely to be
present in isolation.
Those are (or were?) probed as a group of extensions rather than
individually.
I think it was said it'd make sense for us to unify extensions that will
only ever appear together single psuedo cpufeature too.

For the bitfield approach versus creating pseudo cpufeatures discussion
& how to deal with that in alternatives etc, I'm a bit less sure what the
outcome was.
IIRC, nothing concrete was said about either approach, but maybe it was
implied that we should do as GCC does, only grouping things that won't
ever been seen apart.
Figuring that out seems to have been punted down the road, as the
inclusion of our only current example of this (Zbb + unaligned) is
dependant on hardware showing up that actually benefits from it.

The plan then seemed to be press ahead with this series & test the
benefits of the Zbb str* functions in Zbb capable hardware before making
a decision there.

Hopefully I wasn't too far off with that summary...

Thanks,
Conor.

1 - https://lore.kernel.org/linux-riscv/mhng-775d4068-6c1e-48a4-a1dc-b4a76ff26bb3@palmer-ri-x1c9a/


Attachments:
(No filename) (8.64 kB)
signature.asc (235.00 B)
Download all attachments

2023-01-19 09:48:50

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v3 05/13] riscv: cpufeature: extend riscv_cpufeature_patch_func to all ISA extensions

On Wed, Jan 18, 2023 at 09:54:46PM +0000, Conor Dooley wrote:
> Hey!
>
> I guess here is the right place to follow up on all of this stuff...
>
> On Sat, Jan 14, 2023 at 08:32:37PM +0000, Conor Dooley wrote:
> > On Fri, Jan 13, 2023 at 03:18:59PM +0000, Conor Dooley wrote:
> > > On Thu, Jan 12, 2023 at 10:21:36AM +0100, Andrew Jones wrote:
> > > > On Thu, Jan 12, 2023 at 12:29:57AM +0100, Heiko Stübner wrote:
> > > > > Am Mittwoch, 11. Januar 2023, 18:10:19 CET schrieb Jisheng Zhang:
> > > > > > riscv_cpufeature_patch_func() currently only scans a limited set of
> > > > > > cpufeatures, explicitly defined with macros. Extend it to probe for all
> > > > > > ISA extensions.
> > > > > >
> > > > > > Signed-off-by: Jisheng Zhang <[email protected]>
> > > > > > Reviewed-by: Andrew Jones <[email protected]>
> > > > > > Reviewed-by: Heiko Stuebner <[email protected]>
> > > > > > ---
> > > > > > arch/riscv/include/asm/errata_list.h | 9 ++--
> > > > > > arch/riscv/kernel/cpufeature.c | 63 ++++------------------------
> > > > > > 2 files changed, 11 insertions(+), 61 deletions(-)
> > > > >
> > > > > hmmm ... I do see a somewhat big caveat for this.
> > > > > and would like to take back my Reviewed-by for now
> > > > >
> > > > >
> > > > > With this change we would limit the patchable cpufeatures to actual
> > > > > riscv extensions. But cpufeatures can also be soft features like
> > > > > how performant the core handles unaligned accesses.
> > > >
> > > > I agree that this needs to be addressed and Jisheng also raised this
> > > > yesterday here [*]. It seems we need the concept of cpufeatures, which
> > > > may be extensions or non-extensions.
> > > >
> > > > [*] https://lore.kernel.org/all/Y77xyNPNqnFQUqAx@xhacker/
> > > >
> > > > > See Palmer's series [0].
> > > > >
> > > > >
> > > > > Also this essentially codifies that each ALTERNATIVE can only ever
> > > > > be attached to exactly one extension.
> > > > >
> > > > > But contrary to vendor-errata, it is very likely that we will need
> > > > > combinations of different extensions for some alternatives in the future.
> > > >
> > > > One possible approach may be to combine extensions/non-extensions at boot
> > > > time into pseudo-cpufeatures. Then, alternatives can continue attaching to
> > > > a single "feature". (I'm not saying that's a better approach than the
> > > > bitmap, I'm just suggesting it as something else to consider.)
> > >
> > >
> > > > > ALTERNATIVE_2("nop",
> > > > > "j strcmp_zbb_unaligned", 0, CPUFEATURE_ZBB | CPUFEATURE_FAST_UNALIGNED, 0, CONFIG_RISCV_ISA_ZBB,
> > > > > "j variant_zbb", 0, CPUFEATURE_ZBB, CPUFEATURE_FAST_UNALIGNED, CONFIG_RISCV_ISA_ZBB)
> > > > >
> > > > > [the additional field there models a "not" component]
> > >
> > > Since we're discussing theoretical implementations, and it's a little hard
> > > to picture all that they entail in my head, I might be making a fool of
> > > myself here with assumptions...
> > >
> > > Heiko's suggestion sounded along the lines of: keep probing individual
> > > "features" as we are now. Features in this case being the presence of
> > > the extension or non-extension capability. And then in the alternative,
> > > make all of the decisions about which to apply.
> > >
> > > Drew's suggestion would have significantly more defined CPUFEATURE_FOOs,
> > > but would offload the decision making about which extensions or non-
> > > extension capabilities constitute a feature to regular old cpufeature
> > > code. However, the order of precedence would remain in the alt macro, as
> > > it does now.
> > >
> > > I think I am just a wee bit biased, but adding the complexity somewhere
> > > other than alternative macros seems a wise choice, especially as we are
> > > likely to find that complexity increases over time?
> > >
> > > The other thing that came to mind, and maybe this is just looking for
> > > holes where they don't exist (or are not worth addressing), is that
> > > order of precedence.
> > > I can imagine that, in some cases, the order of precedence is not a
> > > constant per psuedo-cpufeature, but determined by implementation of
> > > the capabilities that comprise it?
> >
> > Having spent longer than I maybe should've looking at your patches
> > Heiko, given it's a Saturday evening, the precedence stuff is still
> > sticking out to me..
> >
> > For Zbb & fast unaligned, the order may be non-controversial, but in
> > the general case I don't see how it can be true that the order of
> > precedence for variants is a constant.
> >
> > Creating pseudo cpufeatures as Drew suggested does seem like a way to
> > extract complexity from the alternatives themselves (which I think is a
> > good thing) but at the expense of eating up cpu_req_feature bits...
> > By itself, it doesn't help with precedence, but it may better allow us
> > to deal with some of the precedence in cpufeature.c, since the
> > alternative would operate based on the pseudo cpufeature rather than on
> > the individual capabilities that the pseudo cpufeature depends on.
> >
> > I tried to come up with a suggestion for what to do about precedence,
> > but everything I thought up felt a bit horrific tbh.
> > The thing that fits the current model best is just allowing cpu vendors
> > to add, yet more, "errata" that pick the variant that works best for
> > their implementation... Although I'd be worried about ballooning some of
> > these ALT_FOO macros out to a massive degree with that sort of approach.
> >
> > > If my assumption/understanding holds, moving decision making out of the
> > > alternative seems like it would better provision for scenarios like
> > > that? I dunno, maybe that is whatever the corollary of "premature
> > > optimisation" is for this discussion.
> > >
> > > That's my unsolicited € 0.02, hopefully I wasn't off-base with the
> > > assumptions I made.
> >
> > The order in which an alternative is added to the macro does matter,
> > right? At least, that's how I thought it worked and hope I've not had
> > an incorrect interpretation there all along... I wasn't until I started
> > reading your patch and couldn't understand why you had a construct that
> > looked like
> >
> > if (zbb && !fast_unaligned)
> > ...
> > else if (zbb && fast_unaligned)
> > ...
> >
> > rather than just inverting the order and dropping the !fast_unaligned
> > that I realised I might have a gap in my understanding after all..
> >
> > > Heiko, I figure you've got some sort of WIP stuff for this anyway since
> > > you're interested in the fast unaligned? How close are you to posting any
> > > of that?
> > >
> > > While I think of it, w.r.t. extension versus (pseudo)cpufeature etc
> > > naming, it may make sense to call the functions that this series adds
> > > in patch 6 has_cpufeature_{un,}likely(), no matter what decision gets
> > > made here?
> > > IMO using cpufeature seems to make more sense for a general use API that
> > > may be used later on for the likes of unaligned access, even if
> > > initially it is not used for anything other than extensions.
>
> Today at [1] we talked a bit about the various bits going on here.
> I'll attempt to summarise what I remember, but I meant to do this
> several hours ago and am likely to make a hames of it.
>
> For Zbb/unaligned stuff, the sentiment was along the lines of there
> needing to be a performance benefit to justify the inclusion.
> Some of us have HW that is (allegedly) capable of Zbb, and, if that's the
> case, will give it a go.
> I think it was similar for unaligned, since there is nothing yet that
> supports this behaviour, we should wait until a benefit is demonstrable.
>
> On the subject of grouping extension/non-extension capabilities into a
> single cpufeature, Palmer mentioned that GCC does something similar,
> for the likes of the Ventana vendor extensions, that are unlikely to be
> present in isolation.
> Those are (or were?) probed as a group of extensions rather than
> individually.
> I think it was said it'd make sense for us to unify extensions that will
> only ever appear together single psuedo cpufeature too.
>
> For the bitfield approach versus creating pseudo cpufeatures discussion
> & how to deal with that in alternatives etc, I'm a bit less sure what the
> outcome was.
> IIRC, nothing concrete was said about either approach, but maybe it was
> implied that we should do as GCC does, only grouping things that won't
> ever been seen apart.
> Figuring that out seems to have been punted down the road, as the
> inclusion of our only current example of this (Zbb + unaligned) is
> dependant on hardware showing up that actually benefits from it.
>
> The plan then seemed to be press ahead with this series & test the
> benefits of the Zbb str* functions in Zbb capable hardware before making
> a decision there.
>
> Hopefully I wasn't too far off with that summary...

This matches my recollection. Thanks for the summary, Conor.

drew

>
> Thanks,
> Conor.
>
> 1 - https://lore.kernel.org/linux-riscv/mhng-775d4068-6c1e-48a4-a1dc-b4a76ff26bb3@palmer-ri-x1c9a/


2023-01-19 22:37:52

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v3 05/13] riscv: cpufeature: extend riscv_cpufeature_patch_func to all ISA extensions

Me again!

On Thu, Jan 19, 2023 at 09:29:03AM +0100, Andrew Jones wrote:
> On Wed, Jan 18, 2023 at 09:54:46PM +0000, Conor Dooley wrote:
> > Hey!
> >
> > I guess here is the right place to follow up on all of this stuff...
> >
> > On Sat, Jan 14, 2023 at 08:32:37PM +0000, Conor Dooley wrote:

> > Today at [1] we talked a bit about the various bits going on here.
> > I'll attempt to summarise what I remember, but I meant to do this
> > several hours ago and am likely to make a hames of it.
> >
> > For Zbb/unaligned stuff, the sentiment was along the lines of there
> > needing to be a performance benefit to justify the inclusion.
> > Some of us have HW that is (allegedly) capable of Zbb, and, if that's the

I did some very very basic testing today. Ethernet is still a no-go on
my visionfive 2 board, but the sd card works at least, so I can run w/
Zbb code people want & we can see how it goes!

At the very least, it is capable of executing the instructions that were
used in Appendix A. I didn't try to do anything else, because I am lazy
and if there were some pre-existing test programs I didn't want to go
and write out a bunch of asm myself!

impid appears to be 0x4210427, if that means anything to anyone!

> > case, will give it a go.
> > I think it was similar for unaligned, since there is nothing yet that
> > supports this behaviour, we should wait until a benefit is demonstrable.
> >
> > On the subject of grouping extension/non-extension capabilities into a
> > single cpufeature, Palmer mentioned that GCC does something similar,
> > for the likes of the Ventana vendor extensions, that are unlikely to be
> > present in isolation.

Jess pointed out on IRC that GCC doesn't support XVentanaCondOps
so maybe there was a mixup there. I don't think that really matters
though, as the point stands regardless of whether it was in GCC or not.

> > Those are (or were?) probed as a group of extensions rather than
> > individually.
> > I think it was said it'd make sense for us to unify extensions that will
> > only ever appear together single psuedo cpufeature too.
> >
> > For the bitfield approach versus creating pseudo cpufeatures discussion
> > & how to deal with that in alternatives etc, I'm a bit less sure what the
> > outcome was.
> > IIRC, nothing concrete was said about either approach, but maybe it was
> > implied that we should do as GCC does, only grouping things that won't
> > ever been seen apart.
> > Figuring that out seems to have been punted down the road, as the
> > inclusion of our only current example of this (Zbb + unaligned) is
> > dependant on hardware showing up that actually benefits from it.
> >
> > The plan then seemed to be press ahead with this series & test the
> > benefits of the Zbb str* functions in Zbb capable hardware before making
> > a decision there.
> >
> > Hopefully I wasn't too far off with that summary...
>
> This matches my recollection. Thanks for the summary, Conor.

Cool, thanks.


Attachments:
(No filename) (2.97 kB)
signature.asc (235.00 B)
Download all attachments