2024-03-02 01:45:50

by Charlie Jenkins

[permalink] [raw]
Subject: [PATCH v6 0/4] riscv: Use Kconfig to set unaligned access speed

If the hardware unaligned access speed is known at compile time, it is
possible to avoid running the unaligned access speed probe to speedup
boot-time.

Signed-off-by: Charlie Jenkins <[email protected]>
---
Changes in v6:
- Consolidate Kconfig into 4 options (probe, emulated, slow,
efficient)
- Change the behavior of "emulated" to allow hwprobe to return "slow" if
unaligned accesses are not emulated by the kernel
- With this consolidation, check_unaligned_access_emulated is able to be
moved back into the original file (traps_misaligned.c)
- Link to v5: https://lore.kernel.org/r/20240227-disable_misaligned_probe_config-v5-0-b6853846e27a@rivosinc.com

Changes in v5:
- Clarify Kconfig options from Conor's feedback
- Use "unaligned" instead of "misaligned" in introduced file/function.
This is a bit hard to standardize because the riscv manual says
"misaligned" but the existing Linux configs say "unaligned".
- Link to v4: https://lore.kernel.org/r/20240216-disable_misaligned_probe_config-v4-0-dc01e581c0ac@rivosinc.com

Changes in v4:
- Add additional Kconfig options for the other unaligned access speeds
- Link to v3: https://lore.kernel.org/r/20240202-disable_misaligned_probe_config-v3-0-c44f91f03bb6@rivosinc.com

Changes in v3:
- Revert change to csum (Eric)
- Change ifndefs for ifdefs (Eric)
- Change config in Makefile (Elliot/Eric)
- Link to v2: https://lore.kernel.org/r/20240201-disable_misaligned_probe_config-v2-0-77c368bed7b2@rivosinc.com

Changes in v2:
- Move around definitions to reduce ifdefs (Clément)
- Make RISCV_MISALIGNED depend on !HAVE_EFFICIENT_UNALIGNED_ACCESS
(Clément)
- Link to v1: https://lore.kernel.org/r/20240131-disable_misaligned_probe_config-v1-0-98d155e9cda8@rivosinc.com

---
Charlie Jenkins (4):
riscv: lib: Introduce has_fast_unaligned_access function
riscv: Only check online cpus for emulated accesses
riscv: Decouple emulated unaligned accesses from access speed
riscv: Set unaligned access speed at compile time

arch/riscv/Kconfig | 58 ++++--
arch/riscv/include/asm/cpufeature.h | 31 ++--
arch/riscv/kernel/Makefile | 4 +-
arch/riscv/kernel/cpufeature.c | 255 --------------------------
arch/riscv/kernel/sys_hwprobe.c | 21 +++
arch/riscv/kernel/traps_misaligned.c | 22 +--
arch/riscv/kernel/unaligned_access_speed.c | 282 +++++++++++++++++++++++++++++
arch/riscv/lib/csum.c | 7 +-
8 files changed, 383 insertions(+), 297 deletions(-)
---
base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d
change-id: 20240131-disable_misaligned_probe_config-043aea375f93
--
- Charlie



2024-03-02 01:46:13

by Charlie Jenkins

[permalink] [raw]
Subject: [PATCH v6 2/4] riscv: Only check online cpus for emulated accesses

The unaligned access checker only sets valid values for online cpus.
Check for these values on online cpus rather than on present cpus.

Signed-off-by: Charlie Jenkins <[email protected]>
Fixes: 71c54b3d169d ("riscv: report misaligned accesses emulation to hwprobe")
---
arch/riscv/kernel/traps_misaligned.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/kernel/traps_misaligned.c b/arch/riscv/kernel/traps_misaligned.c
index 8ded225e8c5b..c2ed4e689bf9 100644
--- a/arch/riscv/kernel/traps_misaligned.c
+++ b/arch/riscv/kernel/traps_misaligned.c
@@ -632,7 +632,7 @@ void unaligned_emulation_finish(void)
* accesses emulated since tasks requesting such control can run on any
* CPU.
*/
- for_each_present_cpu(cpu) {
+ for_each_online_cpu(cpu) {
if (per_cpu(misaligned_access_speed, cpu) !=
RISCV_HWPROBE_MISALIGNED_EMULATED) {
return;

--
2.43.2


2024-03-02 01:46:30

by Charlie Jenkins

[permalink] [raw]
Subject: [PATCH v6 3/4] riscv: Decouple emulated unaligned accesses from access speed

Detecting if a system traps into the kernel on an unaligned access
can be performed separately from checking the speed of unaligned
accesses. This decoupling will make it possible to selectively enable
or disable each of these checks as is done in the following patch.

Signed-off-by: Charlie Jenkins <[email protected]>
---
arch/riscv/include/asm/cpufeature.h | 2 +-
arch/riscv/kernel/cpufeature.c | 25 +++++++++++++++++++++----
arch/riscv/kernel/traps_misaligned.c | 20 +++++++-------------
3 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
index 466e1f591919..6fec91845aa0 100644
--- a/arch/riscv/include/asm/cpufeature.h
+++ b/arch/riscv/include/asm/cpufeature.h
@@ -37,7 +37,7 @@ void riscv_user_isa_enable(void);

#ifdef CONFIG_RISCV_MISALIGNED
bool unaligned_ctl_available(void);
-bool check_unaligned_access_emulated(int cpu);
+bool check_unaligned_access_emulated_all_cpus(void);
void unaligned_emulation_finish(void);
#else
static inline bool unaligned_ctl_available(void)
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 7878cddccc0d..abb3a2f53106 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -719,7 +719,8 @@ static int check_unaligned_access(void *param)
void *src;
long speed = RISCV_HWPROBE_MISALIGNED_SLOW;

- if (check_unaligned_access_emulated(cpu))
+ if (IS_ENABLED(CONFIG_RISCV_MISALIGNED) &&
+ per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_UNKNOWN)
return 0;

/* Make an unaligned destination buffer. */
@@ -896,8 +897,8 @@ static int riscv_offline_cpu(unsigned int cpu)
return 0;
}

-/* Measure unaligned access on all CPUs present at boot in parallel. */
-static int check_unaligned_access_all_cpus(void)
+/* Measure unaligned access speed on all CPUs present at boot in parallel. */
+static int check_unaligned_access_speed_all_cpus(void)
{
unsigned int cpu;
unsigned int cpu_count = num_possible_cpus();
@@ -935,7 +936,6 @@ static int check_unaligned_access_all_cpus(void)
riscv_online_cpu, riscv_offline_cpu);

out:
- unaligned_emulation_finish();
for_each_cpu(cpu, cpu_online_mask) {
if (bufs[cpu])
__free_pages(bufs[cpu], MISALIGNED_BUFFER_ORDER);
@@ -945,6 +945,23 @@ static int check_unaligned_access_all_cpus(void)
return 0;
}

+#ifdef CONFIG_RISCV_MISALIGNED
+static int check_unaligned_access_all_cpus(void)
+{
+ bool all_cpus_emulated = check_unaligned_access_emulated_all_cpus();
+
+ if (!all_cpus_emulated)
+ return check_unaligned_access_speed_all_cpus();
+
+ return 0;
+}
+#else
+static int check_unaligned_access_all_cpus(void)
+{
+ return check_unaligned_access_speed_all_cpus();
+}
+#endif
+
arch_initcall(check_unaligned_access_all_cpus);

void riscv_user_isa_enable(void)
diff --git a/arch/riscv/kernel/traps_misaligned.c b/arch/riscv/kernel/traps_misaligned.c
index c2ed4e689bf9..1e3cec3f5d93 100644
--- a/arch/riscv/kernel/traps_misaligned.c
+++ b/arch/riscv/kernel/traps_misaligned.c
@@ -596,7 +596,7 @@ int handle_misaligned_store(struct pt_regs *regs)
return 0;
}

-bool check_unaligned_access_emulated(int cpu)
+static bool check_unaligned_access_emulated(int cpu)
{
long *mas_ptr = per_cpu_ptr(&misaligned_access_speed, cpu);
unsigned long tmp_var, tmp_val;
@@ -623,22 +623,16 @@ bool check_unaligned_access_emulated(int cpu)
return misaligned_emu_detected;
}

-void unaligned_emulation_finish(void)
+bool check_unaligned_access_emulated_all_cpus(void)
{
int cpu;

- /*
- * We can only support PR_UNALIGN controls if all CPUs have misaligned
- * accesses emulated since tasks requesting such control can run on any
- * CPU.
- */
- for_each_online_cpu(cpu) {
- if (per_cpu(misaligned_access_speed, cpu) !=
- RISCV_HWPROBE_MISALIGNED_EMULATED) {
- return;
- }
- }
+ for_each_online_cpu(cpu)
+ if (check_unaligned_access_emulated(cpu))
+ return false;
+
unaligned_ctl = true;
+ return true;
}

bool unaligned_ctl_available(void)

--
2.43.2


2024-03-02 01:46:41

by Charlie Jenkins

[permalink] [raw]
Subject: [PATCH v6 1/4] riscv: lib: Introduce has_fast_unaligned_access function

Create has_fast_unaligned_access to avoid needing to explicitly check
the fast_misaligned_access_speed_key static key.

Signed-off-by: Charlie Jenkins <[email protected]>
Reviewed-by: Evan Green <[email protected]>
---
arch/riscv/include/asm/cpufeature.h | 11 ++++++++---
arch/riscv/kernel/cpufeature.c | 6 +++---
arch/riscv/lib/csum.c | 7 ++-----
3 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
index 5a626ed2c47a..466e1f591919 100644
--- a/arch/riscv/include/asm/cpufeature.h
+++ b/arch/riscv/include/asm/cpufeature.h
@@ -1,6 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0-only */
/*
- * Copyright 2022-2023 Rivos, Inc
+ * Copyright 2022-2024 Rivos, Inc
*/

#ifndef _ASM_CPUFEATURE_H
@@ -53,6 +53,13 @@ static inline bool check_unaligned_access_emulated(int cpu)
static inline void unaligned_emulation_finish(void) {}
#endif

+DECLARE_STATIC_KEY_FALSE(fast_unaligned_access_speed_key);
+
+static __always_inline bool has_fast_unaligned_accesses(void)
+{
+ return static_branch_likely(&fast_unaligned_access_speed_key);
+}
+
unsigned long riscv_get_elf_hwcap(void);

struct riscv_isa_ext_data {
@@ -135,6 +142,4 @@ static __always_inline bool riscv_cpu_has_extension_unlikely(int cpu, const unsi
return __riscv_isa_extension_available(hart_isa[cpu].isa, ext);
}

-DECLARE_STATIC_KEY_FALSE(fast_misaligned_access_speed_key);
-
#endif
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 89920f84d0a3..7878cddccc0d 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -810,14 +810,14 @@ static void check_unaligned_access_nonboot_cpu(void *param)
check_unaligned_access(pages[cpu]);
}

-DEFINE_STATIC_KEY_FALSE(fast_misaligned_access_speed_key);
+DEFINE_STATIC_KEY_FALSE(fast_unaligned_access_speed_key);

static void modify_unaligned_access_branches(cpumask_t *mask, int weight)
{
if (cpumask_weight(mask) == weight)
- static_branch_enable_cpuslocked(&fast_misaligned_access_speed_key);
+ static_branch_enable_cpuslocked(&fast_unaligned_access_speed_key);
else
- static_branch_disable_cpuslocked(&fast_misaligned_access_speed_key);
+ static_branch_disable_cpuslocked(&fast_unaligned_access_speed_key);
}

static void set_unaligned_access_static_branches_except_cpu(int cpu)
diff --git a/arch/riscv/lib/csum.c b/arch/riscv/lib/csum.c
index af3df5274ccb..7178e0acfa22 100644
--- a/arch/riscv/lib/csum.c
+++ b/arch/riscv/lib/csum.c
@@ -3,7 +3,7 @@
* Checksum library
*
* Influenced by arch/arm64/lib/csum.c
- * Copyright (C) 2023 Rivos Inc.
+ * Copyright (C) 2023-2024 Rivos Inc.
*/
#include <linux/bitops.h>
#include <linux/compiler.h>
@@ -318,10 +318,7 @@ unsigned int do_csum(const unsigned char *buff, int len)
* branches. The largest chunk of overlap was delegated into the
* do_csum_common function.
*/
- if (static_branch_likely(&fast_misaligned_access_speed_key))
- return do_csum_no_alignment(buff, len);
-
- if (((unsigned long)buff & OFFSET_MASK) == 0)
+ if (has_fast_unaligned_accesses() || (((unsigned long)buff & OFFSET_MASK) == 0))
return do_csum_no_alignment(buff, len);

return do_csum_with_alignment(buff, len);

--
2.43.2


2024-03-02 01:46:50

by Charlie Jenkins

[permalink] [raw]
Subject: [PATCH v6 4/4] riscv: Set unaligned access speed at compile time

Introduce Kconfig options to set the kernel unaligned access support.
These options provide a non-portable alternative to the runtime
unaligned access probe.

To support this, the unaligned access probing code is moved into it's
own file and gated behind a new RISCV_PROBE_UNALIGNED_ACCESS_SUPPORT
option.

Signed-off-by: Charlie Jenkins <[email protected]>
---
arch/riscv/Kconfig | 58 ++++--
arch/riscv/include/asm/cpufeature.h | 26 +--
arch/riscv/kernel/Makefile | 4 +-
arch/riscv/kernel/cpufeature.c | 272 ----------------------------
arch/riscv/kernel/sys_hwprobe.c | 21 +++
arch/riscv/kernel/traps_misaligned.c | 2 +
arch/riscv/kernel/unaligned_access_speed.c | 282 +++++++++++++++++++++++++++++
7 files changed, 369 insertions(+), 296 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index bffbd869a068..60b6de35599d 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -688,27 +688,61 @@ config THREAD_SIZE_ORDER
affects irq stack size, which is equal to thread stack size.

config RISCV_MISALIGNED
- bool "Support misaligned load/store traps for kernel and userspace"
+ bool
select SYSCTL_ARCH_UNALIGN_ALLOW
- default y
help
- Say Y here if you want the kernel to embed support for misaligned
- load/store for both kernel and userspace. When disable, misaligned
- accesses will generate SIGBUS in userspace and panic in kernel.
+ Embed support for misaligned load/store for both kernel and userspace.
+ When disabled, misaligned accesses will generate SIGBUS in userspace
+ and panic in kernel.
+
+choice
+ prompt "Unaligned Accesses Support"
+ default RISCV_PROBE_UNALIGNED_ACCESS
+ help
+ This selects the hardware support for unaligned accesses. This
+ information is used by the kernel to perform optimizations. It is also
+ exposed to user space via the hwprobe syscall. The hardware will be
+ probed at boot by default.
+
+config RISCV_PROBE_UNALIGNED_ACCESS
+ bool "Probe for hardware unaligned access support"
+ select RISCV_MISALIGNED
+ help
+ During boot, the kernel will run a series of tests to determine the
+ speed of unaligned accesses. This probing will dynamically determine
+ the speed of unaligned accesses on the boot hardware. The kernel will
+ also check if unaligned memory accesses will trap into the kernel and
+ handle such traps accordingly.
+
+config RISCV_EMULATED_UNALIGNED_ACCESS
+ bool "Assume the system expects emulated unaligned memory accesses"
+ select RISCV_MISALIGNED
+ help
+ Assume that the system expects unaligned memory accesses to be
+ emulated. The kernel will check if unaligned memory accesses will
+ trap into the kernel and handle such traps accordingly.
+
+config RISCV_SLOW_UNALIGNED_ACCESS
+ bool "Assume the system supports slow unaligned memory accesses"
+ depends on NONPORTABLE
+ help
+ Assume that the system supports slow unaligned memory accesses. The
+ kernel may not be able to run at all on systems that do not support
+ unaligned memory accesses.

config RISCV_EFFICIENT_UNALIGNED_ACCESS
- bool "Assume the CPU supports fast unaligned memory accesses"
+ bool "Assume the system supports fast unaligned memory accesses"
depends on NONPORTABLE
select DCACHE_WORD_ACCESS if MMU
select HAVE_EFFICIENT_UNALIGNED_ACCESS
help
- Say Y here if you want the kernel to assume that the CPU supports
- efficient unaligned memory accesses. When enabled, this option
- improves the performance of the kernel on such CPUs. However, the
- kernel will run much more slowly, or will not be able to run at all,
- on CPUs that do not support efficient unaligned memory accesses.
+ Assume that the system supports fast unaligned memory accesses. When
+ enabled, this option improves the performance of the kernel on such
+ systems. However, the kernel will run much more slowly, or will not
+ be able to run at all, on systems that do not support efficient
+ unaligned memory accesses.

- If unsure what to do here, say N.
+endchoice

endmenu # "Platform type"

diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
index 6fec91845aa0..357972cd8f82 100644
--- a/arch/riscv/include/asm/cpufeature.h
+++ b/arch/riscv/include/asm/cpufeature.h
@@ -28,37 +28,41 @@ struct riscv_isainfo {

DECLARE_PER_CPU(struct riscv_cpuinfo, riscv_cpuinfo);

-DECLARE_PER_CPU(long, misaligned_access_speed);
-
/* Per-cpu ISA extensions. */
extern struct riscv_isainfo hart_isa[NR_CPUS];

void riscv_user_isa_enable(void);

-#ifdef CONFIG_RISCV_MISALIGNED
-bool unaligned_ctl_available(void);
+#if defined(CONFIG_RISCV_MISALIGNED)
bool check_unaligned_access_emulated_all_cpus(void);
void unaligned_emulation_finish(void);
+bool unaligned_ctl_available(void);
+DECLARE_PER_CPU(long, misaligned_access_speed);
#else
static inline bool unaligned_ctl_available(void)
{
return false;
}
-
-static inline bool check_unaligned_access_emulated(int cpu)
-{
- return false;
-}
-
-static inline void unaligned_emulation_finish(void) {}
#endif

+#if defined(CONFIG_RISCV_PROBE_UNALIGNED_ACCESS)
DECLARE_STATIC_KEY_FALSE(fast_unaligned_access_speed_key);

static __always_inline bool has_fast_unaligned_accesses(void)
{
return static_branch_likely(&fast_unaligned_access_speed_key);
}
+#elif defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
+static __always_inline bool has_fast_unaligned_accesses(void)
+{
+ return true;
+}
+#else
+static __always_inline bool has_fast_unaligned_accesses(void)
+{
+ return false;
+}
+#endif

unsigned long riscv_get_elf_hwcap(void);

diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index f71910718053..c8085126a6f9 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -38,7 +38,6 @@ extra-y += vmlinux.lds
obj-y += head.o
obj-y += soc.o
obj-$(CONFIG_RISCV_ALTERNATIVE) += alternative.o
-obj-y += copy-unaligned.o
obj-y += cpu.o
obj-y += cpufeature.o
obj-y += entry.o
@@ -62,6 +61,9 @@ obj-y += tests/
obj-$(CONFIG_MMU) += vdso.o vdso/

obj-$(CONFIG_RISCV_MISALIGNED) += traps_misaligned.o
+obj-$(CONFIG_RISCV_MISALIGNED) += unaligned_access_speed.o
+obj-$(CONFIG_RISCV_PROBE_UNALIGNED_ACCESS) += copy-unaligned.o
+
obj-$(CONFIG_FPU) += fpu.o
obj-$(CONFIG_RISCV_ISA_V) += vector.o
obj-$(CONFIG_RISCV_ISA_V) += kernel_mode_vector.o
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index abb3a2f53106..319670af5704 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -11,7 +11,6 @@
#include <linux/cpu.h>
#include <linux/cpuhotplug.h>
#include <linux/ctype.h>
-#include <linux/jump_label.h>
#include <linux/log2.h>
#include <linux/memory.h>
#include <linux/module.h>
@@ -21,20 +20,12 @@
#include <asm/cacheflush.h>
#include <asm/cpufeature.h>
#include <asm/hwcap.h>
-#include <asm/hwprobe.h>
#include <asm/patch.h>
#include <asm/processor.h>
#include <asm/vector.h>

-#include "copy-unaligned.h"
-
#define NUM_ALPHA_EXTS ('z' - 'a' + 1)

-#define MISALIGNED_ACCESS_JIFFIES_LG2 1
-#define MISALIGNED_BUFFER_SIZE 0x4000
-#define MISALIGNED_BUFFER_ORDER get_order(MISALIGNED_BUFFER_SIZE)
-#define MISALIGNED_COPY_SIZE ((MISALIGNED_BUFFER_SIZE / 2) - 0x80)
-
unsigned long elf_hwcap __read_mostly;

/* Host ISA bitmap */
@@ -43,11 +34,6 @@ static DECLARE_BITMAP(riscv_isa, RISCV_ISA_EXT_MAX) __read_mostly;
/* Per-cpu ISA extensions. */
struct riscv_isainfo hart_isa[NR_CPUS];

-/* Performance information */
-DEFINE_PER_CPU(long, misaligned_access_speed);
-
-static cpumask_t fast_misaligned_access;
-
/**
* riscv_isa_extension_base() - Get base extension word
*
@@ -706,264 +692,6 @@ unsigned long riscv_get_elf_hwcap(void)
return hwcap;
}

-static int check_unaligned_access(void *param)
-{
- int cpu = smp_processor_id();
- u64 start_cycles, end_cycles;
- u64 word_cycles;
- u64 byte_cycles;
- int ratio;
- unsigned long start_jiffies, now;
- struct page *page = param;
- void *dst;
- void *src;
- long speed = RISCV_HWPROBE_MISALIGNED_SLOW;
-
- if (IS_ENABLED(CONFIG_RISCV_MISALIGNED) &&
- per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_UNKNOWN)
- return 0;
-
- /* Make an unaligned destination buffer. */
- dst = (void *)((unsigned long)page_address(page) | 0x1);
- /* Unalign src as well, but differently (off by 1 + 2 = 3). */
- src = dst + (MISALIGNED_BUFFER_SIZE / 2);
- src += 2;
- word_cycles = -1ULL;
- /* Do a warmup. */
- __riscv_copy_words_unaligned(dst, src, MISALIGNED_COPY_SIZE);
- preempt_disable();
- start_jiffies = jiffies;
- while ((now = jiffies) == start_jiffies)
- cpu_relax();
-
- /*
- * For a fixed amount of time, repeatedly try the function, and take
- * the best time in cycles as the measurement.
- */
- while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) {
- start_cycles = get_cycles64();
- /* Ensure the CSR read can't reorder WRT to the copy. */
- mb();
- __riscv_copy_words_unaligned(dst, src, MISALIGNED_COPY_SIZE);
- /* Ensure the copy ends before the end time is snapped. */
- mb();
- end_cycles = get_cycles64();
- if ((end_cycles - start_cycles) < word_cycles)
- word_cycles = end_cycles - start_cycles;
- }
-
- byte_cycles = -1ULL;
- __riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
- start_jiffies = jiffies;
- while ((now = jiffies) == start_jiffies)
- cpu_relax();
-
- while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) {
- start_cycles = get_cycles64();
- mb();
- __riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
- mb();
- end_cycles = get_cycles64();
- if ((end_cycles - start_cycles) < byte_cycles)
- byte_cycles = end_cycles - start_cycles;
- }
-
- preempt_enable();
-
- /* Don't divide by zero. */
- if (!word_cycles || !byte_cycles) {
- pr_warn("cpu%d: rdtime lacks granularity needed to measure unaligned access speed\n",
- cpu);
-
- return 0;
- }
-
- if (word_cycles < byte_cycles)
- speed = RISCV_HWPROBE_MISALIGNED_FAST;
-
- ratio = div_u64((byte_cycles * 100), word_cycles);
- pr_info("cpu%d: Ratio of byte access time to unaligned word access is %d.%02d, unaligned accesses are %s\n",
- cpu,
- ratio / 100,
- ratio % 100,
- (speed == RISCV_HWPROBE_MISALIGNED_FAST) ? "fast" : "slow");
-
- per_cpu(misaligned_access_speed, cpu) = speed;
-
- /*
- * Set the value of fast_misaligned_access of a CPU. These operations
- * are atomic to avoid race conditions.
- */
- if (speed == RISCV_HWPROBE_MISALIGNED_FAST)
- cpumask_set_cpu(cpu, &fast_misaligned_access);
- else
- cpumask_clear_cpu(cpu, &fast_misaligned_access);
-
- return 0;
-}
-
-static void check_unaligned_access_nonboot_cpu(void *param)
-{
- unsigned int cpu = smp_processor_id();
- struct page **pages = param;
-
- if (smp_processor_id() != 0)
- check_unaligned_access(pages[cpu]);
-}
-
-DEFINE_STATIC_KEY_FALSE(fast_unaligned_access_speed_key);
-
-static void modify_unaligned_access_branches(cpumask_t *mask, int weight)
-{
- if (cpumask_weight(mask) == weight)
- static_branch_enable_cpuslocked(&fast_unaligned_access_speed_key);
- else
- static_branch_disable_cpuslocked(&fast_unaligned_access_speed_key);
-}
-
-static void set_unaligned_access_static_branches_except_cpu(int cpu)
-{
- /*
- * Same as set_unaligned_access_static_branches, except excludes the
- * given CPU from the result. When a CPU is hotplugged into an offline
- * state, this function is called before the CPU is set to offline in
- * the cpumask, and thus the CPU needs to be explicitly excluded.
- */
-
- cpumask_t fast_except_me;
-
- cpumask_and(&fast_except_me, &fast_misaligned_access, cpu_online_mask);
- cpumask_clear_cpu(cpu, &fast_except_me);
-
- modify_unaligned_access_branches(&fast_except_me, num_online_cpus() - 1);
-}
-
-static void set_unaligned_access_static_branches(void)
-{
- /*
- * This will be called after check_unaligned_access_all_cpus so the
- * result of unaligned access speed for all CPUs will be available.
- *
- * To avoid the number of online cpus changing between reading
- * cpu_online_mask and calling num_online_cpus, cpus_read_lock must be
- * held before calling this function.
- */
-
- cpumask_t fast_and_online;
-
- cpumask_and(&fast_and_online, &fast_misaligned_access, cpu_online_mask);
-
- modify_unaligned_access_branches(&fast_and_online, num_online_cpus());
-}
-
-static int lock_and_set_unaligned_access_static_branch(void)
-{
- cpus_read_lock();
- set_unaligned_access_static_branches();
- cpus_read_unlock();
-
- return 0;
-}
-
-arch_initcall_sync(lock_and_set_unaligned_access_static_branch);
-
-static int riscv_online_cpu(unsigned int cpu)
-{
- static struct page *buf;
-
- /* We are already set since the last check */
- if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_UNKNOWN)
- goto exit;
-
- buf = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER);
- if (!buf) {
- pr_warn("Allocation failure, not measuring misaligned performance\n");
- return -ENOMEM;
- }
-
- check_unaligned_access(buf);
- __free_pages(buf, MISALIGNED_BUFFER_ORDER);
-
-exit:
- set_unaligned_access_static_branches();
-
- return 0;
-}
-
-static int riscv_offline_cpu(unsigned int cpu)
-{
- set_unaligned_access_static_branches_except_cpu(cpu);
-
- return 0;
-}
-
-/* Measure unaligned access speed on all CPUs present at boot in parallel. */
-static int check_unaligned_access_speed_all_cpus(void)
-{
- unsigned int cpu;
- unsigned int cpu_count = num_possible_cpus();
- struct page **bufs = kzalloc(cpu_count * sizeof(struct page *),
- GFP_KERNEL);
-
- if (!bufs) {
- pr_warn("Allocation failure, not measuring misaligned performance\n");
- return 0;
- }
-
- /*
- * Allocate separate buffers for each CPU so there's no fighting over
- * cache lines.
- */
- for_each_cpu(cpu, cpu_online_mask) {
- bufs[cpu] = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER);
- if (!bufs[cpu]) {
- pr_warn("Allocation failure, not measuring misaligned performance\n");
- goto out;
- }
- }
-
- /* Check everybody except 0, who stays behind to tend jiffies. */
- on_each_cpu(check_unaligned_access_nonboot_cpu, bufs, 1);
-
- /* Check core 0. */
- smp_call_on_cpu(0, check_unaligned_access, bufs[0], true);
-
- /*
- * Setup hotplug callbacks for any new CPUs that come online or go
- * offline.
- */
- cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "riscv:online",
- riscv_online_cpu, riscv_offline_cpu);
-
-out:
- for_each_cpu(cpu, cpu_online_mask) {
- if (bufs[cpu])
- __free_pages(bufs[cpu], MISALIGNED_BUFFER_ORDER);
- }
-
- kfree(bufs);
- return 0;
-}
-
-#ifdef CONFIG_RISCV_MISALIGNED
-static int check_unaligned_access_all_cpus(void)
-{
- bool all_cpus_emulated = check_unaligned_access_emulated_all_cpus();
-
- if (!all_cpus_emulated)
- return check_unaligned_access_speed_all_cpus();
-
- return 0;
-}
-#else
-static int check_unaligned_access_all_cpus(void)
-{
- return check_unaligned_access_speed_all_cpus();
-}
-#endif
-
-arch_initcall(check_unaligned_access_all_cpus);
-
void riscv_user_isa_enable(void)
{
if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_ZICBOZ))
diff --git a/arch/riscv/kernel/sys_hwprobe.c b/arch/riscv/kernel/sys_hwprobe.c
index a7c56b41efd2..dad02f5faec3 100644
--- a/arch/riscv/kernel/sys_hwprobe.c
+++ b/arch/riscv/kernel/sys_hwprobe.c
@@ -147,8 +147,10 @@ static bool hwprobe_ext0_has(const struct cpumask *cpus, unsigned long ext)
return (pair.value & ext);
}

+#if defined(CONFIG_RISCV_PROBE_UNALIGNED_ACCESS)
static u64 hwprobe_misaligned(const struct cpumask *cpus)
{
+ return RISCV_HWPROBE_MISALIGNED_FAST;
int cpu;
u64 perf = -1ULL;

@@ -169,6 +171,25 @@ static u64 hwprobe_misaligned(const struct cpumask *cpus)

return perf;
}
+#elif defined(CONFIG_RISCV_EMULATED_UNALIGNED_ACCESS)
+static u64 hwprobe_misaligned(const struct cpumask *cpus)
+{
+ if (unaligned_ctl_available())
+ return RISCV_HWPROBE_MISALIGNED_EMULATED;
+ else
+ return RISCV_HWPROBE_MISALIGNED_SLOW;
+}
+#elif defined(CONFIG_RISCV_SLOW_UNALIGNED_ACCESS)
+static u64 hwprobe_misaligned(const struct cpumask *cpus)
+{
+ return RISCV_HWPROBE_MISALIGNED_SLOW;
+}
+#elif defined(CONFIG_RISCV_EFFICIENT_UNALIGNED_ACCESS)
+static u64 hwprobe_misaligned(const struct cpumask *cpus)
+{
+ return RISCV_HWPROBE_MISALIGNED_FAST;
+}
+#endif

static void hwprobe_one_pair(struct riscv_hwprobe *pair,
const struct cpumask *cpus)
diff --git a/arch/riscv/kernel/traps_misaligned.c b/arch/riscv/kernel/traps_misaligned.c
index 1e3cec3f5d93..6153158adcb7 100644
--- a/arch/riscv/kernel/traps_misaligned.c
+++ b/arch/riscv/kernel/traps_misaligned.c
@@ -413,7 +413,9 @@ int handle_misaligned_load(struct pt_regs *regs)

perf_sw_event(PERF_COUNT_SW_ALIGNMENT_FAULTS, 1, regs, addr);

+#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS
*this_cpu_ptr(&misaligned_access_speed) = RISCV_HWPROBE_MISALIGNED_EMULATED;
+#endif

if (!unaligned_enabled)
return -1;
diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
new file mode 100644
index 000000000000..52264ea4f0bd
--- /dev/null
+++ b/arch/riscv/kernel/unaligned_access_speed.c
@@ -0,0 +1,282 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2024 Rivos Inc.
+ */
+
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/jump_label.h>
+#include <linux/mm.h>
+#include <linux/smp.h>
+#include <linux/types.h>
+#include <asm/cpufeature.h>
+#include <asm/hwprobe.h>
+
+#include "copy-unaligned.h"
+
+#define MISALIGNED_ACCESS_JIFFIES_LG2 1
+#define MISALIGNED_BUFFER_SIZE 0x4000
+#define MISALIGNED_BUFFER_ORDER get_order(MISALIGNED_BUFFER_SIZE)
+#define MISALIGNED_COPY_SIZE ((MISALIGNED_BUFFER_SIZE / 2) - 0x80)
+
+DEFINE_PER_CPU(long, misaligned_access_speed);
+
+#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS
+static cpumask_t fast_misaligned_access;
+static int check_unaligned_access(void *param)
+{
+ int cpu = smp_processor_id();
+ u64 start_cycles, end_cycles;
+ u64 word_cycles;
+ u64 byte_cycles;
+ int ratio;
+ unsigned long start_jiffies, now;
+ struct page *page = param;
+ void *dst;
+ void *src;
+ long speed = RISCV_HWPROBE_MISALIGNED_SLOW;
+
+ if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_UNKNOWN)
+ return 0;
+
+ /* Make an unaligned destination buffer. */
+ dst = (void *)((unsigned long)page_address(page) | 0x1);
+ /* Unalign src as well, but differently (off by 1 + 2 = 3). */
+ src = dst + (MISALIGNED_BUFFER_SIZE / 2);
+ src += 2;
+ word_cycles = -1ULL;
+ /* Do a warmup. */
+ __riscv_copy_words_unaligned(dst, src, MISALIGNED_COPY_SIZE);
+ preempt_disable();
+ start_jiffies = jiffies;
+ while ((now = jiffies) == start_jiffies)
+ cpu_relax();
+
+ /*
+ * For a fixed amount of time, repeatedly try the function, and take
+ * the best time in cycles as the measurement.
+ */
+ while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) {
+ start_cycles = get_cycles64();
+ /* Ensure the CSR read can't reorder WRT to the copy. */
+ mb();
+ __riscv_copy_words_unaligned(dst, src, MISALIGNED_COPY_SIZE);
+ /* Ensure the copy ends before the end time is snapped. */
+ mb();
+ end_cycles = get_cycles64();
+ if ((end_cycles - start_cycles) < word_cycles)
+ word_cycles = end_cycles - start_cycles;
+ }
+
+ byte_cycles = -1ULL;
+ __riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
+ start_jiffies = jiffies;
+ while ((now = jiffies) == start_jiffies)
+ cpu_relax();
+
+ while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) {
+ start_cycles = get_cycles64();
+ mb();
+ __riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
+ mb();
+ end_cycles = get_cycles64();
+ if ((end_cycles - start_cycles) < byte_cycles)
+ byte_cycles = end_cycles - start_cycles;
+ }
+
+ preempt_enable();
+
+ /* Don't divide by zero. */
+ if (!word_cycles || !byte_cycles) {
+ pr_warn("cpu%d: rdtime lacks granularity needed to measure unaligned access speed\n",
+ cpu);
+
+ return 0;
+ }
+
+ if (word_cycles < byte_cycles)
+ speed = RISCV_HWPROBE_MISALIGNED_FAST;
+
+ ratio = div_u64((byte_cycles * 100), word_cycles);
+ pr_info("cpu%d: Ratio of byte access time to unaligned word access is %d.%02d, unaligned accesses are %s\n",
+ cpu,
+ ratio / 100,
+ ratio % 100,
+ (speed == RISCV_HWPROBE_MISALIGNED_FAST) ? "fast" : "slow");
+
+ per_cpu(misaligned_access_speed, cpu) = speed;
+
+ /*
+ * Set the value of fast_misaligned_access of a CPU. These operations
+ * are atomic to avoid race conditions.
+ */
+ if (speed == RISCV_HWPROBE_MISALIGNED_FAST)
+ cpumask_set_cpu(cpu, &fast_misaligned_access);
+ else
+ cpumask_clear_cpu(cpu, &fast_misaligned_access);
+
+ return 0;
+}
+
+static void check_unaligned_access_nonboot_cpu(void *param)
+{
+ unsigned int cpu = smp_processor_id();
+ struct page **pages = param;
+
+ if (smp_processor_id() != 0)
+ check_unaligned_access(pages[cpu]);
+}
+
+DEFINE_STATIC_KEY_FALSE(fast_unaligned_access_speed_key);
+
+static void modify_unaligned_access_branches(cpumask_t *mask, int weight)
+{
+ if (cpumask_weight(mask) == weight)
+ static_branch_enable_cpuslocked(&fast_unaligned_access_speed_key);
+ else
+ static_branch_disable_cpuslocked(&fast_unaligned_access_speed_key);
+}
+
+static void set_unaligned_access_static_branches_except_cpu(int cpu)
+{
+ /*
+ * Same as set_unaligned_access_static_branches, except excludes the
+ * given CPU from the result. When a CPU is hotplugged into an offline
+ * state, this function is called before the CPU is set to offline in
+ * the cpumask, and thus the CPU needs to be explicitly excluded.
+ */
+
+ cpumask_t fast_except_me;
+
+ cpumask_and(&fast_except_me, &fast_misaligned_access, cpu_online_mask);
+ cpumask_clear_cpu(cpu, &fast_except_me);
+
+ modify_unaligned_access_branches(&fast_except_me, num_online_cpus() - 1);
+}
+
+static void set_unaligned_access_static_branches(void)
+{
+ /*
+ * This will be called after check_unaligned_access_all_cpus so the
+ * result of unaligned access speed for all CPUs will be available.
+ *
+ * To avoid the number of online cpus changing between reading
+ * cpu_online_mask and calling num_online_cpus, cpus_read_lock must be
+ * held before calling this function.
+ */
+
+ cpumask_t fast_and_online;
+
+ cpumask_and(&fast_and_online, &fast_misaligned_access, cpu_online_mask);
+
+ modify_unaligned_access_branches(&fast_and_online, num_online_cpus());
+}
+
+static int lock_and_set_unaligned_access_static_branch(void)
+{
+ cpus_read_lock();
+ set_unaligned_access_static_branches();
+ cpus_read_unlock();
+
+ return 0;
+}
+
+arch_initcall_sync(lock_and_set_unaligned_access_static_branch);
+
+static int riscv_online_cpu(unsigned int cpu)
+{
+ static struct page *buf;
+
+ /* We are already set since the last check */
+ if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_UNKNOWN)
+ goto exit;
+
+ buf = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER);
+ if (!buf) {
+ pr_warn("Allocation failure, not measuring misaligned performance\n");
+ return -ENOMEM;
+ }
+
+ check_unaligned_access(buf);
+ __free_pages(buf, MISALIGNED_BUFFER_ORDER);
+
+exit:
+ set_unaligned_access_static_branches();
+
+ return 0;
+}
+
+static int riscv_offline_cpu(unsigned int cpu)
+{
+ set_unaligned_access_static_branches_except_cpu(cpu);
+
+ return 0;
+}
+
+/* Measure unaligned access speed on all CPUs present at boot in parallel. */
+static int check_unaligned_access_speed_all_cpus(void)
+{
+ unsigned int cpu;
+ unsigned int cpu_count = num_possible_cpus();
+ struct page **bufs = kzalloc(cpu_count * sizeof(struct page *),
+ GFP_KERNEL);
+
+ if (!bufs) {
+ pr_warn("Allocation failure, not measuring misaligned performance\n");
+ return 0;
+ }
+
+ /*
+ * Allocate separate buffers for each CPU so there's no fighting over
+ * cache lines.
+ */
+ for_each_cpu(cpu, cpu_online_mask) {
+ bufs[cpu] = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER);
+ if (!bufs[cpu]) {
+ pr_warn("Allocation failure, not measuring misaligned performance\n");
+ goto out;
+ }
+ }
+
+ /* Check everybody except 0, who stays behind to tend jiffies. */
+ on_each_cpu(check_unaligned_access_nonboot_cpu, bufs, 1);
+
+ /* Check core 0. */
+ smp_call_on_cpu(0, check_unaligned_access, bufs[0], true);
+
+ /*
+ * Setup hotplug callbacks for any new CPUs that come online or go
+ * offline.
+ */
+ cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "riscv:online",
+ riscv_online_cpu, riscv_offline_cpu);
+
+out:
+ for_each_cpu(cpu, cpu_online_mask) {
+ if (bufs[cpu])
+ __free_pages(bufs[cpu], MISALIGNED_BUFFER_ORDER);
+ }
+
+ kfree(bufs);
+ return 0;
+}
+
+static int check_unaligned_access_all_cpus(void)
+{
+ bool all_cpus_emulated = check_unaligned_access_emulated_all_cpus();
+
+ if (!all_cpus_emulated)
+ return check_unaligned_access_speed_all_cpus();
+
+ return 0;
+}
+#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */
+static int check_unaligned_access_all_cpus(void)
+{
+ check_unaligned_access_emulated_all_cpus();
+
+ return 0;
+}
+#endif
+
+arch_initcall(check_unaligned_access_all_cpus);

--
2.43.2


2024-03-06 13:13:54

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v6 2/4] riscv: Only check online cpus for emulated accesses

On Fri, Mar 01, 2024 at 05:45:33PM -0800, Charlie Jenkins wrote:
> The unaligned access checker only sets valid values for online cpus.
> Check for these values on online cpus rather than on present cpus.
>
> Signed-off-by: Charlie Jenkins <[email protected]>
> Fixes: 71c54b3d169d ("riscv: report misaligned accesses emulation to hwprobe")

Reviewed-by: Conor Dooley <[email protected]>

Cheers,
Conor.

> ---
> arch/riscv/kernel/traps_misaligned.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/riscv/kernel/traps_misaligned.c b/arch/riscv/kernel/traps_misaligned.c
> index 8ded225e8c5b..c2ed4e689bf9 100644
> --- a/arch/riscv/kernel/traps_misaligned.c
> +++ b/arch/riscv/kernel/traps_misaligned.c
> @@ -632,7 +632,7 @@ void unaligned_emulation_finish(void)
> * accesses emulated since tasks requesting such control can run on any
> * CPU.
> */
> - for_each_present_cpu(cpu) {
> + for_each_online_cpu(cpu) {
> if (per_cpu(misaligned_access_speed, cpu) !=
> RISCV_HWPROBE_MISALIGNED_EMULATED) {
> return;

I went looking to see what the practical differences were between
"present" and "possible", cos I'd never really seen much code using
"present". Turns out present and possible are essentially the same on
riscv. TIL.


Attachments:
(No filename) (1.29 kB)
signature.asc (235.00 B)
Download all attachments

2024-03-06 13:22:09

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v6 3/4] riscv: Decouple emulated unaligned accesses from access speed

On Fri, Mar 01, 2024 at 05:45:34PM -0800, Charlie Jenkins wrote:

> -void unaligned_emulation_finish(void)
> +bool check_unaligned_access_emulated_all_cpus(void)
> {
> int cpu;
>
> - /*
> - * We can only support PR_UNALIGN controls if all CPUs have misaligned
> - * accesses emulated since tasks requesting such control can run on any
> - * CPU.
> - */

Why was this comment removed? This patch doesn't change the situations
in which PR_UNALIGN is allowed, right?

> - for_each_online_cpu(cpu) {
> - if (per_cpu(misaligned_access_speed, cpu) !=
> - RISCV_HWPROBE_MISALIGNED_EMULATED) {
> - return;
> - }
> - }
> + for_each_online_cpu(cpu)
> + if (check_unaligned_access_emulated(cpu))
> + return false;
> +
> unaligned_ctl = true;
> + return true;
> }


Attachments:
(No filename) (806.00 B)
signature.asc (235.00 B)
Download all attachments

2024-03-06 16:19:45

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v6 4/4] riscv: Set unaligned access speed at compile time

Hey,

On Fri, Mar 01, 2024 at 05:45:35PM -0800, Charlie Jenkins wrote:
> Introduce Kconfig options to set the kernel unaligned access support.
> These options provide a non-portable alternative to the runtime
> unaligned access probe.
>
> To support this, the unaligned access probing code is moved into it's
> own file and gated behind a new RISCV_PROBE_UNALIGNED_ACCESS_SUPPORT
> option.
>
> Signed-off-by: Charlie Jenkins <[email protected]>
> ---
> arch/riscv/Kconfig | 58 ++++--
> arch/riscv/include/asm/cpufeature.h | 26 +--
> arch/riscv/kernel/Makefile | 4 +-
> arch/riscv/kernel/cpufeature.c | 272 ----------------------------
> arch/riscv/kernel/sys_hwprobe.c | 21 +++
> arch/riscv/kernel/traps_misaligned.c | 2 +
> arch/riscv/kernel/unaligned_access_speed.c | 282 +++++++++++++++++++++++++++++
> 7 files changed, 369 insertions(+), 296 deletions(-)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index bffbd869a068..60b6de35599d 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -688,27 +688,61 @@ config THREAD_SIZE_ORDER
> affects irq stack size, which is equal to thread stack size.
>
> config RISCV_MISALIGNED
> - bool "Support misaligned load/store traps for kernel and userspace"
> + bool
> select SYSCTL_ARCH_UNALIGN_ALLOW
> - default y
> help
> - Say Y here if you want the kernel to embed support for misaligned
> - load/store for both kernel and userspace. When disable, misaligned
> - accesses will generate SIGBUS in userspace and panic in kernel.
> + Embed support for misaligned load/store for both kernel and userspace.
> + When disabled, misaligned accesses will generate SIGBUS in userspace
> + and panic in kernel.

"in the kernel".

> +
> +choice
> + prompt "Unaligned Accesses Support"
> + default RISCV_PROBE_UNALIGNED_ACCESS
> + help

> + This selects the hardware support for unaligned accesses. This

"This determines what level of support for..."

> + information is used by the kernel to perform optimizations. It is also
> + exposed to user space via the hwprobe syscall. The hardware will be
> + probed at boot by default.
> +
> +config RISCV_PROBE_UNALIGNED_ACCESS
> + bool "Probe for hardware unaligned access support"
> + select RISCV_MISALIGNED
> + help
> + During boot, the kernel will run a series of tests to determine the
> + speed of unaligned accesses. This probing will dynamically determine
> + the speed of unaligned accesses on the boot hardware.

"on the underlying system"?

> The kernel will
> + also check if unaligned memory accesses will trap into the kernel and
> + handle such traps accordingly.

I think I would phrase this to be more understandable to users. I think
we need to explain why it would trap and what we will do. Maybe
something like: "if unaligned memory accesses trap into the kernel as
they are not supported by the system, the kernel will emulate the
unaligned accesses to preserve the UABI".

> +config RISCV_EMULATED_UNALIGNED_ACCESS
> + bool "Assume the system expects emulated unaligned memory accesses"
> + select RISCV_MISALIGNED
> + help
> + Assume that the system expects unaligned memory accesses to be
> + emulated. The kernel will check if unaligned memory accesses will
> + trap into the kernel and handle such traps accordingly.

I guess the same suggestion applies here, but I think the description
here isn't quite accurate. This option is basically the same as above,
but without the speed test, right? It doesn't actually assume emulation
is required at all, in fact the assumption we make is that if the
hardware supports unaligned access that access is slow.

I think I'd do:
```
boot "Emulate unaligned access where system support is missing"
help
If unaligned accesses trap into the kernel as they are not supported
by the system, the kernel will emulate the unaligned accesses to
preserve the UABI. When the underlying system does support unaligned
accesses, probing at boot is not done and unaligned accesses are
assumed to be slow.

> +config RISCV_SLOW_UNALIGNED_ACCESS
> + bool "Assume the system supports slow unaligned memory accesses"
> + depends on NONPORTABLE
> + help
> + Assume that the system supports slow unaligned memory accesses. The
> + kernel may not be able to run at all on systems that do not support
> + unaligned memory accesses.

...and userspace programs cannot use unaligned access either, I think
that is worth mentioning.

>
> config RISCV_EFFICIENT_UNALIGNED_ACCESS
> - bool "Assume the CPU supports fast unaligned memory accesses"
> + bool "Assume the system supports fast unaligned memory accesses"
> depends on NONPORTABLE
> select DCACHE_WORD_ACCESS if MMU
> select HAVE_EFFICIENT_UNALIGNED_ACCESS
> help
> - Say Y here if you want the kernel to assume that the CPU supports
> - efficient unaligned memory accesses. When enabled, this option
> - improves the performance of the kernel on such CPUs. However, the
> - kernel will run much more slowly, or will not be able to run at all,
> - on CPUs that do not support efficient unaligned memory accesses.
> + Assume that the system supports fast unaligned memory accesses. When
> + enabled, this option improves the performance of the kernel on such
> + systems. However, the kernel will run much more slowly, or will not
> + be able to run at all, on systems that do not support efficient
> + unaligned memory accesses.
>
> - If unsure what to do here, say N.
> +endchoice
>
> endmenu # "Platform type"

> +#if defined(CONFIG_RISCV_PROBE_UNALIGNED_ACCESS)
> DECLARE_STATIC_KEY_FALSE(fast_unaligned_access_speed_key);
>
> static __always_inline bool has_fast_unaligned_accesses(void)
> {
> return static_branch_likely(&fast_unaligned_access_speed_key);
> }
> +#elif defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
> +static __always_inline bool has_fast_unaligned_accesses(void)
> +{
> + return true;
> +}
> +#else
> +static __always_inline bool has_fast_unaligned_accesses(void)
> +{
> + return false;
> +}
> +#endif

These tree could just be one function with if(IS_ENABLED), whatever code
gets made dead should be optimised out.

> diff --git a/arch/riscv/kernel/sys_hwprobe.c b/arch/riscv/kernel/sys_hwprobe.c
> index a7c56b41efd2..dad02f5faec3 100644
> --- a/arch/riscv/kernel/sys_hwprobe.c
> +++ b/arch/riscv/kernel/sys_hwprobe.c
> @@ -147,8 +147,10 @@ static bool hwprobe_ext0_has(const struct cpumask *cpus, unsigned long ext)
> return (pair.value & ext);
> }
>
> +#if defined(CONFIG_RISCV_PROBE_UNALIGNED_ACCESS)
> static u64 hwprobe_misaligned(const struct cpumask *cpus)
> {
> + return RISCV_HWPROBE_MISALIGNED_FAST;

This hack is still here.

> int cpu;
> u64 perf = -1ULL;
>
> @@ -169,6 +171,25 @@ static u64 hwprobe_misaligned(const struct cpumask *cpus)
>
> return perf;
> }

> +#elif defined(CONFIG_RISCV_EMULATED_UNALIGNED_ACCESS)
> +static u64 hwprobe_misaligned(const struct cpumask *cpus)
> +{
> + if (unaligned_ctl_available())
> + return RISCV_HWPROBE_MISALIGNED_EMULATED;
> + else
> + return RISCV_HWPROBE_MISALIGNED_SLOW;
> +}
> +#elif defined(CONFIG_RISCV_SLOW_UNALIGNED_ACCESS)
> +static u64 hwprobe_misaligned(const struct cpumask *cpus)
> +{
> + return RISCV_HWPROBE_MISALIGNED_SLOW;
> +}
> +#elif defined(CONFIG_RISCV_EFFICIENT_UNALIGNED_ACCESS)
> +static u64 hwprobe_misaligned(const struct cpumask *cpus)
> +{
> + return RISCV_HWPROBE_MISALIGNED_FAST;
> +}
> +#endif

Same applies to these three functions.

Thanks,
Conor.


Attachments:
(No filename) (7.62 kB)
signature.asc (235.00 B)
Download all attachments

2024-03-06 18:28:46

by Charlie Jenkins

[permalink] [raw]
Subject: Re: [PATCH v6 3/4] riscv: Decouple emulated unaligned accesses from access speed

On Wed, Mar 06, 2024 at 01:19:37PM +0000, Conor Dooley wrote:
> On Fri, Mar 01, 2024 at 05:45:34PM -0800, Charlie Jenkins wrote:
>
> > -void unaligned_emulation_finish(void)
> > +bool check_unaligned_access_emulated_all_cpus(void)
> > {
> > int cpu;
> >
> > - /*
> > - * We can only support PR_UNALIGN controls if all CPUs have misaligned
> > - * accesses emulated since tasks requesting such control can run on any
> > - * CPU.
> > - */
>
> Why was this comment removed? This patch doesn't change the situations
> in which PR_UNALIGN is allowed, right?

I'll add it back, that was unintentional. Thank you.

- Charlie

>
> > - for_each_online_cpu(cpu) {
> > - if (per_cpu(misaligned_access_speed, cpu) !=
> > - RISCV_HWPROBE_MISALIGNED_EMULATED) {
> > - return;
> > - }
> > - }
> > + for_each_online_cpu(cpu)
> > + if (check_unaligned_access_emulated(cpu))
> > + return false;
> > +
> > unaligned_ctl = true;
> > + return true;
> > }
>



2024-03-06 18:39:13

by Charlie Jenkins

[permalink] [raw]
Subject: Re: [PATCH v6 4/4] riscv: Set unaligned access speed at compile time

On Wed, Mar 06, 2024 at 04:19:33PM +0000, Conor Dooley wrote:
> Hey,
>
> On Fri, Mar 01, 2024 at 05:45:35PM -0800, Charlie Jenkins wrote:
> > Introduce Kconfig options to set the kernel unaligned access support.
> > These options provide a non-portable alternative to the runtime
> > unaligned access probe.
> >
> > To support this, the unaligned access probing code is moved into it's
> > own file and gated behind a new RISCV_PROBE_UNALIGNED_ACCESS_SUPPORT
> > option.
> >
> > Signed-off-by: Charlie Jenkins <[email protected]>
> > ---
> > arch/riscv/Kconfig | 58 ++++--
> > arch/riscv/include/asm/cpufeature.h | 26 +--
> > arch/riscv/kernel/Makefile | 4 +-
> > arch/riscv/kernel/cpufeature.c | 272 ----------------------------
> > arch/riscv/kernel/sys_hwprobe.c | 21 +++
> > arch/riscv/kernel/traps_misaligned.c | 2 +
> > arch/riscv/kernel/unaligned_access_speed.c | 282 +++++++++++++++++++++++++++++
> > 7 files changed, 369 insertions(+), 296 deletions(-)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index bffbd869a068..60b6de35599d 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -688,27 +688,61 @@ config THREAD_SIZE_ORDER
> > affects irq stack size, which is equal to thread stack size.
> >
> > config RISCV_MISALIGNED
> > - bool "Support misaligned load/store traps for kernel and userspace"
> > + bool
> > select SYSCTL_ARCH_UNALIGN_ALLOW
> > - default y
> > help
> > - Say Y here if you want the kernel to embed support for misaligned
> > - load/store for both kernel and userspace. When disable, misaligned
> > - accesses will generate SIGBUS in userspace and panic in kernel.
> > + Embed support for misaligned load/store for both kernel and userspace.
> > + When disabled, misaligned accesses will generate SIGBUS in userspace
> > + and panic in kernel.
>
> "in the kernel".
>
> > +
> > +choice
> > + prompt "Unaligned Accesses Support"
> > + default RISCV_PROBE_UNALIGNED_ACCESS
> > + help
>
> > + This selects the hardware support for unaligned accesses. This
>
> "This determines what level of support for..."
>
> > + information is used by the kernel to perform optimizations. It is also
> > + exposed to user space via the hwprobe syscall. The hardware will be
> > + probed at boot by default.
> > +
> > +config RISCV_PROBE_UNALIGNED_ACCESS
> > + bool "Probe for hardware unaligned access support"
> > + select RISCV_MISALIGNED
> > + help
> > + During boot, the kernel will run a series of tests to determine the
> > + speed of unaligned accesses. This probing will dynamically determine
> > + the speed of unaligned accesses on the boot hardware.
>
> "on the underlying system"?
>
> > The kernel will
> > + also check if unaligned memory accesses will trap into the kernel and
> > + handle such traps accordingly.
>
> I think I would phrase this to be more understandable to users. I think
> we need to explain why it would trap and what we will do. Maybe
> something like: "if unaligned memory accesses trap into the kernel as
> they are not supported by the system, the kernel will emulate the
> unaligned accesses to preserve the UABI".
>
> > +config RISCV_EMULATED_UNALIGNED_ACCESS
> > + bool "Assume the system expects emulated unaligned memory accesses"
> > + select RISCV_MISALIGNED
> > + help
> > + Assume that the system expects unaligned memory accesses to be
> > + emulated. The kernel will check if unaligned memory accesses will
> > + trap into the kernel and handle such traps accordingly.
>
> I guess the same suggestion applies here, but I think the description
> here isn't quite accurate. This option is basically the same as above,
> but without the speed test, right? It doesn't actually assume emulation
> is required at all, in fact the assumption we make is that if the
> hardware supports unaligned access that access is slow.
>
> I think I'd do:
> ```
> boot "Emulate unaligned access where system support is missing"
> help
> If unaligned accesses trap into the kernel as they are not supported
> by the system, the kernel will emulate the unaligned accesses to
> preserve the UABI. When the underlying system does support unaligned
> accesses, probing at boot is not done and unaligned accesses are
> assumed to be slow.

Great suggestions thank you. I think I will change up the second
sentence a little bit to be "When the underlying system does support
unaligned accesses, the unaligned accesses are assumed to be slow."

>
> > +config RISCV_SLOW_UNALIGNED_ACCESS
> > + bool "Assume the system supports slow unaligned memory accesses"
> > + depends on NONPORTABLE
> > + help
> > + Assume that the system supports slow unaligned memory accesses. The
> > + kernel may not be able to run at all on systems that do not support
> > + unaligned memory accesses.
>
> ...and userspace programs cannot use unaligned access either, I think
> that is worth mentioning.
>
> >
> > config RISCV_EFFICIENT_UNALIGNED_ACCESS
> > - bool "Assume the CPU supports fast unaligned memory accesses"
> > + bool "Assume the system supports fast unaligned memory accesses"
> > depends on NONPORTABLE
> > select DCACHE_WORD_ACCESS if MMU
> > select HAVE_EFFICIENT_UNALIGNED_ACCESS
> > help
> > - Say Y here if you want the kernel to assume that the CPU supports
> > - efficient unaligned memory accesses. When enabled, this option
> > - improves the performance of the kernel on such CPUs. However, the
> > - kernel will run much more slowly, or will not be able to run at all,
> > - on CPUs that do not support efficient unaligned memory accesses.
> > + Assume that the system supports fast unaligned memory accesses. When
> > + enabled, this option improves the performance of the kernel on such
> > + systems. However, the kernel will run much more slowly, or will not
> > + be able to run at all, on systems that do not support efficient
> > + unaligned memory accesses.
> >
> > - If unsure what to do here, say N.
> > +endchoice
> >
> > endmenu # "Platform type"
>
> > +#if defined(CONFIG_RISCV_PROBE_UNALIGNED_ACCESS)
> > DECLARE_STATIC_KEY_FALSE(fast_unaligned_access_speed_key);
> >
> > static __always_inline bool has_fast_unaligned_accesses(void)
> > {
> > return static_branch_likely(&fast_unaligned_access_speed_key);
> > }
> > +#elif defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
> > +static __always_inline bool has_fast_unaligned_accesses(void)
> > +{
> > + return true;
> > +}
> > +#else
> > +static __always_inline bool has_fast_unaligned_accesses(void)
> > +{
> > + return false;
> > +}
> > +#endif
>
> These tree could just be one function with if(IS_ENABLED), whatever code
> gets made dead should be optimised out.
>

Sure, will do.

> > diff --git a/arch/riscv/kernel/sys_hwprobe.c b/arch/riscv/kernel/sys_hwprobe.c
> > index a7c56b41efd2..dad02f5faec3 100644
> > --- a/arch/riscv/kernel/sys_hwprobe.c
> > +++ b/arch/riscv/kernel/sys_hwprobe.c
> > @@ -147,8 +147,10 @@ static bool hwprobe_ext0_has(const struct cpumask *cpus, unsigned long ext)
> > return (pair.value & ext);
> > }
> >
> > +#if defined(CONFIG_RISCV_PROBE_UNALIGNED_ACCESS)
> > static u64 hwprobe_misaligned(const struct cpumask *cpus)
> > {
> > + return RISCV_HWPROBE_MISALIGNED_FAST;
>
> This hack is still here.

Oh no! I removed it locally but it snuck back in...

>
> > int cpu;
> > u64 perf = -1ULL;
> >
> > @@ -169,6 +171,25 @@ static u64 hwprobe_misaligned(const struct cpumask *cpus)
> >
> > return perf;
> > }
>
> > +#elif defined(CONFIG_RISCV_EMULATED_UNALIGNED_ACCESS)
> > +static u64 hwprobe_misaligned(const struct cpumask *cpus)
> > +{
> > + if (unaligned_ctl_available())
> > + return RISCV_HWPROBE_MISALIGNED_EMULATED;
> > + else
> > + return RISCV_HWPROBE_MISALIGNED_SLOW;
> > +}
> > +#elif defined(CONFIG_RISCV_SLOW_UNALIGNED_ACCESS)
> > +static u64 hwprobe_misaligned(const struct cpumask *cpus)
> > +{
> > + return RISCV_HWPROBE_MISALIGNED_SLOW;
> > +}
> > +#elif defined(CONFIG_RISCV_EFFICIENT_UNALIGNED_ACCESS)
> > +static u64 hwprobe_misaligned(const struct cpumask *cpus)
> > +{
> > + return RISCV_HWPROBE_MISALIGNED_FAST;
> > +}
> > +#endif
>
> Same applies to these three functions.
>
> Thanks,
> Conor.

Thank you, I will send out a new version shortly.

- Charlie


2024-03-06 18:39:23

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v6 4/4] riscv: Set unaligned access speed at compile time

On Wed, Mar 06, 2024 at 10:32:34AM -0800, Charlie Jenkins wrote:
> On Wed, Mar 06, 2024 at 04:19:33PM +0000, Conor Dooley wrote:
> > On Fri, Mar 01, 2024 at 05:45:35PM -0800, Charlie Jenkins wrote:

> > > +
> > > +choice
> > > + prompt "Unaligned Accesses Support"
> > > + default RISCV_PROBE_UNALIGNED_ACCESS
> > > + help
> >
> > > + This selects the hardware support for unaligned accesses. This
> >
> > "This determines what level of support for..."

Reading this back, s/what/the/

> > > + information is used by the kernel to perform optimizations. It is also
> > > + exposed to user space via the hwprobe syscall. The hardware will be
> > > + probed at boot by default.


Attachments:
(No filename) (705.00 B)
signature.asc (235.00 B)
Download all attachments