2024-02-27 23:16:01

by Charlie Jenkins

[permalink] [raw]
Subject: [PATCH v5 2/2] riscv: Set unaligned access speed at compile time

Introduce Kconfig options to set the kernel unaligned access support.
These options provide a non-portable alternative to the runtime
unaligned access probe.

To support this, the unaligned access probing code is moved into it's
own file and gated behind a new RISCV_PROBE_UNALIGNED_ACCESS_SUPPORT
option.

Signed-off-by: Charlie Jenkins <[email protected]>
---
arch/riscv/Kconfig | 47 +++++-
arch/riscv/include/asm/cpufeature.h | 32 +++-
arch/riscv/kernel/Makefile | 6 +-
arch/riscv/kernel/cpufeature.c | 255 ------------------------------
arch/riscv/kernel/probe_emulated_access.c | 64 ++++++++
arch/riscv/kernel/sys_hwprobe.c | 23 +++
arch/riscv/kernel/traps_misaligned.c | 54 +------
7 files changed, 163 insertions(+), 318 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index bffbd869a068..ad0a9c1f8802 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -690,25 +690,58 @@ config THREAD_SIZE_ORDER
config RISCV_MISALIGNED
bool "Support misaligned load/store traps for kernel and userspace"
select SYSCTL_ARCH_UNALIGN_ALLOW
+ depends on RISCV_PROBE_UNALIGNED_ACCESS || RISCV_EMULATED_UNALIGNED_ACCESS
default y
help
Say Y here if you want the kernel to embed support for misaligned
load/store for both kernel and userspace. When disable, misaligned
accesses will generate SIGBUS in userspace and panic in kernel.

+choice
+ prompt "Unaligned Accesses Support"
+ default RISCV_PROBE_UNALIGNED_ACCESS
+ help
+ This selects the hardware support for unaligned accesses. This
+ information is used by the kernel to perform optimizations. It is also
+ exposed to user space via the hwprobe syscall. The hardware will be
+ probed at boot by default.
+
+config RISCV_PROBE_UNALIGNED_ACCESS
+ bool "Probe for hardware unaligned access support"
+ help
+ During boot, the kernel will run a series of tests to determine the
+ speed of unaligned accesses. This probing will dynamically determine
+ the speed of unaligned accesses on the boot hardware.
+
+config RISCV_EMULATED_UNALIGNED_ACCESS
+ bool "Assume the system expects emulated unaligned memory accesses"
+ help
+ Assume that the system expects emulated unaligned memory accesses.
+ When enabled, this option notifies the kernel and userspace that
+ unaligned memory accesses will be emulated by either the kernel or
+ firmware.
+
+config RISCV_SLOW_UNALIGNED_ACCESS
+ bool "Assume the system supports slow unaligned memory accesses"
+ depends on NONPORTABLE
+ help
+ Assume that the system supports slow unaligned memory accesses. The
+ kernel may not be able to run at all on systems that do not support
+ unaligned memory accesses.
+
config RISCV_EFFICIENT_UNALIGNED_ACCESS
- bool "Assume the CPU supports fast unaligned memory accesses"
+ bool "Assume the system supports fast unaligned memory accesses"
depends on NONPORTABLE
select DCACHE_WORD_ACCESS if MMU
select HAVE_EFFICIENT_UNALIGNED_ACCESS
help
- Say Y here if you want the kernel to assume that the CPU supports
- efficient unaligned memory accesses. When enabled, this option
- improves the performance of the kernel on such CPUs. However, the
- kernel will run much more slowly, or will not be able to run at all,
- on CPUs that do not support efficient unaligned memory accesses.
+ Assume that the system supports fast unaligned memory accesses. When
+ enabled, this option improves the performance of the kernel on such
+ systems. However, the kernel will run much more slowly, or will not
+ be able to run at all, on systems that do not support efficient
+ unaligned memory accesses.

- If unsure what to do here, say N.
+endchoice

endmenu # "Platform type"

diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
index 466e1f591919..cd166b0e52c4 100644
--- a/arch/riscv/include/asm/cpufeature.h
+++ b/arch/riscv/include/asm/cpufeature.h
@@ -28,17 +28,29 @@ struct riscv_isainfo {

DECLARE_PER_CPU(struct riscv_cpuinfo, riscv_cpuinfo);

-DECLARE_PER_CPU(long, misaligned_access_speed);
-
/* Per-cpu ISA extensions. */
extern struct riscv_isainfo hart_isa[NR_CPUS];

void riscv_user_isa_enable(void);

-#ifdef CONFIG_RISCV_MISALIGNED
+#if defined(CONFIG_RISCV_MISALIGNED)
+#if defined(CONFIG_RISCV_PROBE_UNALIGNED_ACCESS)
bool unaligned_ctl_available(void);
bool check_unaligned_access_emulated(int cpu);
void unaligned_emulation_finish(void);
+#elif defined(CONFIG_RISCV_EMULATED_UNALIGNED_ACCESS)
+static inline bool unaligned_ctl_available(void)
+{
+ return true;
+}
+
+static inline bool check_unaligned_access_emulated(int cpu)
+{
+ return true;
+}
+
+static inline void unaligned_emulation_finish(void) {}
+#endif
#else
static inline bool unaligned_ctl_available(void)
{
@@ -53,12 +65,26 @@ static inline bool check_unaligned_access_emulated(int cpu)
static inline void unaligned_emulation_finish(void) {}
#endif

+#if defined(CONFIG_RISCV_PROBE_UNALIGNED_ACCESS)
+DECLARE_PER_CPU(long, misaligned_access_speed);
+
DECLARE_STATIC_KEY_FALSE(fast_unaligned_access_speed_key);

static __always_inline bool has_fast_unaligned_accesses(void)
{
return static_branch_likely(&fast_unaligned_access_speed_key);
}
+#elif defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
+static __always_inline bool has_fast_unaligned_accesses(void)
+{
+ return true;
+}
+#else
+static __always_inline bool has_fast_unaligned_accesses(void)
+{
+ return false;
+}
+#endif

unsigned long riscv_get_elf_hwcap(void);

diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index f71910718053..3355ef87684a 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -38,7 +38,6 @@ extra-y += vmlinux.lds
obj-y += head.o
obj-y += soc.o
obj-$(CONFIG_RISCV_ALTERNATIVE) += alternative.o
-obj-y += copy-unaligned.o
obj-y += cpu.o
obj-y += cpufeature.o
obj-y += entry.o
@@ -62,6 +61,11 @@ obj-y += tests/
obj-$(CONFIG_MMU) += vdso.o vdso/

obj-$(CONFIG_RISCV_MISALIGNED) += traps_misaligned.o
+ifeq ($(CONFIG_RISCV_PROBE_UNALIGNED_ACCESS), y)
+obj-y += copy-unaligned.o
+obj-y += unaligned_access_speed.o
+obj-$(CONFIG_RISCV_MISALIGNED) += probe_emulated_access.o
+endif
obj-$(CONFIG_FPU) += fpu.o
obj-$(CONFIG_RISCV_ISA_V) += vector.o
obj-$(CONFIG_RISCV_ISA_V) += kernel_mode_vector.o
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 7878cddccc0d..319670af5704 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -11,7 +11,6 @@
#include <linux/cpu.h>
#include <linux/cpuhotplug.h>
#include <linux/ctype.h>
-#include <linux/jump_label.h>
#include <linux/log2.h>
#include <linux/memory.h>
#include <linux/module.h>
@@ -21,20 +20,12 @@
#include <asm/cacheflush.h>
#include <asm/cpufeature.h>
#include <asm/hwcap.h>
-#include <asm/hwprobe.h>
#include <asm/patch.h>
#include <asm/processor.h>
#include <asm/vector.h>

-#include "copy-unaligned.h"
-
#define NUM_ALPHA_EXTS ('z' - 'a' + 1)

-#define MISALIGNED_ACCESS_JIFFIES_LG2 1
-#define MISALIGNED_BUFFER_SIZE 0x4000
-#define MISALIGNED_BUFFER_ORDER get_order(MISALIGNED_BUFFER_SIZE)
-#define MISALIGNED_COPY_SIZE ((MISALIGNED_BUFFER_SIZE / 2) - 0x80)
-
unsigned long elf_hwcap __read_mostly;

/* Host ISA bitmap */
@@ -43,11 +34,6 @@ static DECLARE_BITMAP(riscv_isa, RISCV_ISA_EXT_MAX) __read_mostly;
/* Per-cpu ISA extensions. */
struct riscv_isainfo hart_isa[NR_CPUS];

-/* Performance information */
-DEFINE_PER_CPU(long, misaligned_access_speed);
-
-static cpumask_t fast_misaligned_access;
-
/**
* riscv_isa_extension_base() - Get base extension word
*
@@ -706,247 +692,6 @@ unsigned long riscv_get_elf_hwcap(void)
return hwcap;
}

-static int check_unaligned_access(void *param)
-{
- int cpu = smp_processor_id();
- u64 start_cycles, end_cycles;
- u64 word_cycles;
- u64 byte_cycles;
- int ratio;
- unsigned long start_jiffies, now;
- struct page *page = param;
- void *dst;
- void *src;
- long speed = RISCV_HWPROBE_MISALIGNED_SLOW;
-
- if (check_unaligned_access_emulated(cpu))
- return 0;
-
- /* Make an unaligned destination buffer. */
- dst = (void *)((unsigned long)page_address(page) | 0x1);
- /* Unalign src as well, but differently (off by 1 + 2 = 3). */
- src = dst + (MISALIGNED_BUFFER_SIZE / 2);
- src += 2;
- word_cycles = -1ULL;
- /* Do a warmup. */
- __riscv_copy_words_unaligned(dst, src, MISALIGNED_COPY_SIZE);
- preempt_disable();
- start_jiffies = jiffies;
- while ((now = jiffies) == start_jiffies)
- cpu_relax();
-
- /*
- * For a fixed amount of time, repeatedly try the function, and take
- * the best time in cycles as the measurement.
- */
- while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) {
- start_cycles = get_cycles64();
- /* Ensure the CSR read can't reorder WRT to the copy. */
- mb();
- __riscv_copy_words_unaligned(dst, src, MISALIGNED_COPY_SIZE);
- /* Ensure the copy ends before the end time is snapped. */
- mb();
- end_cycles = get_cycles64();
- if ((end_cycles - start_cycles) < word_cycles)
- word_cycles = end_cycles - start_cycles;
- }
-
- byte_cycles = -1ULL;
- __riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
- start_jiffies = jiffies;
- while ((now = jiffies) == start_jiffies)
- cpu_relax();
-
- while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) {
- start_cycles = get_cycles64();
- mb();
- __riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
- mb();
- end_cycles = get_cycles64();
- if ((end_cycles - start_cycles) < byte_cycles)
- byte_cycles = end_cycles - start_cycles;
- }
-
- preempt_enable();
-
- /* Don't divide by zero. */
- if (!word_cycles || !byte_cycles) {
- pr_warn("cpu%d: rdtime lacks granularity needed to measure unaligned access speed\n",
- cpu);
-
- return 0;
- }
-
- if (word_cycles < byte_cycles)
- speed = RISCV_HWPROBE_MISALIGNED_FAST;
-
- ratio = div_u64((byte_cycles * 100), word_cycles);
- pr_info("cpu%d: Ratio of byte access time to unaligned word access is %d.%02d, unaligned accesses are %s\n",
- cpu,
- ratio / 100,
- ratio % 100,
- (speed == RISCV_HWPROBE_MISALIGNED_FAST) ? "fast" : "slow");
-
- per_cpu(misaligned_access_speed, cpu) = speed;
-
- /*
- * Set the value of fast_misaligned_access of a CPU. These operations
- * are atomic to avoid race conditions.
- */
- if (speed == RISCV_HWPROBE_MISALIGNED_FAST)
- cpumask_set_cpu(cpu, &fast_misaligned_access);
- else
- cpumask_clear_cpu(cpu, &fast_misaligned_access);
-
- return 0;
-}
-
-static void check_unaligned_access_nonboot_cpu(void *param)
-{
- unsigned int cpu = smp_processor_id();
- struct page **pages = param;
-
- if (smp_processor_id() != 0)
- check_unaligned_access(pages[cpu]);
-}
-
-DEFINE_STATIC_KEY_FALSE(fast_unaligned_access_speed_key);
-
-static void modify_unaligned_access_branches(cpumask_t *mask, int weight)
-{
- if (cpumask_weight(mask) == weight)
- static_branch_enable_cpuslocked(&fast_unaligned_access_speed_key);
- else
- static_branch_disable_cpuslocked(&fast_unaligned_access_speed_key);
-}
-
-static void set_unaligned_access_static_branches_except_cpu(int cpu)
-{
- /*
- * Same as set_unaligned_access_static_branches, except excludes the
- * given CPU from the result. When a CPU is hotplugged into an offline
- * state, this function is called before the CPU is set to offline in
- * the cpumask, and thus the CPU needs to be explicitly excluded.
- */
-
- cpumask_t fast_except_me;
-
- cpumask_and(&fast_except_me, &fast_misaligned_access, cpu_online_mask);
- cpumask_clear_cpu(cpu, &fast_except_me);
-
- modify_unaligned_access_branches(&fast_except_me, num_online_cpus() - 1);
-}
-
-static void set_unaligned_access_static_branches(void)
-{
- /*
- * This will be called after check_unaligned_access_all_cpus so the
- * result of unaligned access speed for all CPUs will be available.
- *
- * To avoid the number of online cpus changing between reading
- * cpu_online_mask and calling num_online_cpus, cpus_read_lock must be
- * held before calling this function.
- */
-
- cpumask_t fast_and_online;
-
- cpumask_and(&fast_and_online, &fast_misaligned_access, cpu_online_mask);
-
- modify_unaligned_access_branches(&fast_and_online, num_online_cpus());
-}
-
-static int lock_and_set_unaligned_access_static_branch(void)
-{
- cpus_read_lock();
- set_unaligned_access_static_branches();
- cpus_read_unlock();
-
- return 0;
-}
-
-arch_initcall_sync(lock_and_set_unaligned_access_static_branch);
-
-static int riscv_online_cpu(unsigned int cpu)
-{
- static struct page *buf;
-
- /* We are already set since the last check */
- if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_UNKNOWN)
- goto exit;
-
- buf = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER);
- if (!buf) {
- pr_warn("Allocation failure, not measuring misaligned performance\n");
- return -ENOMEM;
- }
-
- check_unaligned_access(buf);
- __free_pages(buf, MISALIGNED_BUFFER_ORDER);
-
-exit:
- set_unaligned_access_static_branches();
-
- return 0;
-}
-
-static int riscv_offline_cpu(unsigned int cpu)
-{
- set_unaligned_access_static_branches_except_cpu(cpu);
-
- return 0;
-}
-
-/* Measure unaligned access on all CPUs present at boot in parallel. */
-static int check_unaligned_access_all_cpus(void)
-{
- unsigned int cpu;
- unsigned int cpu_count = num_possible_cpus();
- struct page **bufs = kzalloc(cpu_count * sizeof(struct page *),
- GFP_KERNEL);
-
- if (!bufs) {
- pr_warn("Allocation failure, not measuring misaligned performance\n");
- return 0;
- }
-
- /*
- * Allocate separate buffers for each CPU so there's no fighting over
- * cache lines.
- */
- for_each_cpu(cpu, cpu_online_mask) {
- bufs[cpu] = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER);
- if (!bufs[cpu]) {
- pr_warn("Allocation failure, not measuring misaligned performance\n");
- goto out;
- }
- }
-
- /* Check everybody except 0, who stays behind to tend jiffies. */
- on_each_cpu(check_unaligned_access_nonboot_cpu, bufs, 1);
-
- /* Check core 0. */
- smp_call_on_cpu(0, check_unaligned_access, bufs[0], true);
-
- /*
- * Setup hotplug callbacks for any new CPUs that come online or go
- * offline.
- */
- cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "riscv:online",
- riscv_online_cpu, riscv_offline_cpu);
-
-out:
- unaligned_emulation_finish();
- for_each_cpu(cpu, cpu_online_mask) {
- if (bufs[cpu])
- __free_pages(bufs[cpu], MISALIGNED_BUFFER_ORDER);
- }
-
- kfree(bufs);
- return 0;
-}
-
-arch_initcall(check_unaligned_access_all_cpus);
-
void riscv_user_isa_enable(void)
{
if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_ZICBOZ))
diff --git a/arch/riscv/kernel/probe_emulated_access.c b/arch/riscv/kernel/probe_emulated_access.c
new file mode 100644
index 000000000000..5f65b31c7558
--- /dev/null
+++ b/arch/riscv/kernel/probe_emulated_access.c
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright 2024 Rivos Inc.
+ */
+
+#include <linux/cache.h>
+#include <linux/cpumask.h>
+#include <linux/types.h>
+#include <linux/percpu-defs.h>
+#include <linux/printk.h>
+#include <asm/cpufeature.h>
+#include <asm/hwprobe.h>
+
+static bool unaligned_ctl __read_mostly;
+
+bool check_unaligned_access_emulated(int cpu)
+{
+ long *mas_ptr = per_cpu_ptr(&misaligned_access_speed, cpu);
+ unsigned long tmp_var, tmp_val;
+ bool misaligned_emu_detected;
+
+ *mas_ptr = RISCV_HWPROBE_MISALIGNED_UNKNOWN;
+
+ __asm__ __volatile__ (
+ " "REG_L" %[tmp], 1(%[ptr])\n"
+ : [tmp] "=r" (tmp_val) : [ptr] "r" (&tmp_var) : "memory");
+
+ misaligned_emu_detected = (*mas_ptr == RISCV_HWPROBE_MISALIGNED_EMULATED);
+ /*
+ * If unaligned_ctl is already set, this means that we detected that all
+ * CPUS uses emulated misaligned access at boot time. If that changed
+ * when hotplugging the new cpu, this is something we don't handle.
+ */
+ if (unlikely(unaligned_ctl && !misaligned_emu_detected)) {
+ pr_crit("CPU misaligned accesses non homogeneous (expected all emulated)\n");
+ while (true)
+ cpu_relax();
+ }
+
+ return misaligned_emu_detected;
+}
+
+void unaligned_emulation_finish(void)
+{
+ int cpu;
+
+ /*
+ * We can only support PR_UNALIGN controls if all CPUs have misaligned
+ * accesses emulated since tasks requesting such control can run on any
+ * CPU.
+ */
+ for_each_present_cpu(cpu) {
+ if (per_cpu(misaligned_access_speed, cpu) !=
+ RISCV_HWPROBE_MISALIGNED_EMULATED) {
+ return;
+ }
+ }
+ unaligned_ctl = true;
+}
+
+bool unaligned_ctl_available(void)
+{
+ return unaligned_ctl;
+}
diff --git a/arch/riscv/kernel/sys_hwprobe.c b/arch/riscv/kernel/sys_hwprobe.c
index a7c56b41efd2..69df8eaad9e7 100644
--- a/arch/riscv/kernel/sys_hwprobe.c
+++ b/arch/riscv/kernel/sys_hwprobe.c
@@ -147,8 +147,10 @@ static bool hwprobe_ext0_has(const struct cpumask *cpus, unsigned long ext)
return (pair.value & ext);
}

+#if defined(CONFIG_RISCV_PROBE_UNALIGNED_ACCESS)
static u64 hwprobe_misaligned(const struct cpumask *cpus)
{
+ return RISCV_HWPROBE_MISALIGNED_FAST;
int cpu;
u64 perf = -1ULL;

@@ -169,6 +171,27 @@ static u64 hwprobe_misaligned(const struct cpumask *cpus)

return perf;
}
+#elif defined(CONFIG_RISCV_EMULATED_UNALIGNED_ACCESS)
+static u64 hwprobe_misaligned(const struct cpumask *cpus)
+{
+ return RISCV_HWPROBE_MISALIGNED_EMULATED;
+}
+#elif defined(CONFIG_RISCV_SLOW_UNALIGNED_ACCESS)
+static u64 hwprobe_misaligned(const struct cpumask *cpus)
+{
+ return RISCV_HWPROBE_MISALIGNED_SLOW;
+}
+#elif defined(CONFIG_RISCV_EFFICIENT_UNALIGNED_ACCESS)
+static u64 hwprobe_misaligned(const struct cpumask *cpus)
+{
+ return RISCV_HWPROBE_MISALIGNED_FAST;
+}
+#elif defined(CONFIG_RISCV_UNSUPPORTED_UNALIGNED_ACCESS)
+static u64 hwprobe_misaligned(const struct cpumask *cpus)
+{
+ return RISCV_HWPROBE_MISALIGNED_UNSUPPORTED;
+}
+#endif

static void hwprobe_one_pair(struct riscv_hwprobe *pair,
const struct cpumask *cpus)
diff --git a/arch/riscv/kernel/traps_misaligned.c b/arch/riscv/kernel/traps_misaligned.c
index 8ded225e8c5b..ba6763dd9895 100644
--- a/arch/riscv/kernel/traps_misaligned.c
+++ b/arch/riscv/kernel/traps_misaligned.c
@@ -398,8 +398,6 @@ union reg_data {
u64 data_u64;
};

-static bool unaligned_ctl __read_mostly;
-
/* sysctl hooks */
int unaligned_enabled __read_mostly = 1; /* Enabled by default */

@@ -413,7 +411,9 @@ int handle_misaligned_load(struct pt_regs *regs)

perf_sw_event(PERF_COUNT_SW_ALIGNMENT_FAULTS, 1, regs, addr);

+#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS
*this_cpu_ptr(&misaligned_access_speed) = RISCV_HWPROBE_MISALIGNED_EMULATED;
+#endif

if (!unaligned_enabled)
return -1;
@@ -595,53 +595,3 @@ int handle_misaligned_store(struct pt_regs *regs)

return 0;
}
-
-bool check_unaligned_access_emulated(int cpu)
-{
- long *mas_ptr = per_cpu_ptr(&misaligned_access_speed, cpu);
- unsigned long tmp_var, tmp_val;
- bool misaligned_emu_detected;
-
- *mas_ptr = RISCV_HWPROBE_MISALIGNED_UNKNOWN;
-
- __asm__ __volatile__ (
- " "REG_L" %[tmp], 1(%[ptr])\n"
- : [tmp] "=r" (tmp_val) : [ptr] "r" (&tmp_var) : "memory");
-
- misaligned_emu_detected = (*mas_ptr == RISCV_HWPROBE_MISALIGNED_EMULATED);
- /*
- * If unaligned_ctl is already set, this means that we detected that all
- * CPUS uses emulated misaligned access at boot time. If that changed
- * when hotplugging the new cpu, this is something we don't handle.
- */
- if (unlikely(unaligned_ctl && !misaligned_emu_detected)) {
- pr_crit("CPU misaligned accesses non homogeneous (expected all emulated)\n");
- while (true)
- cpu_relax();
- }
-
- return misaligned_emu_detected;
-}
-
-void unaligned_emulation_finish(void)
-{
- int cpu;
-
- /*
- * We can only support PR_UNALIGN controls if all CPUs have misaligned
- * accesses emulated since tasks requesting such control can run on any
- * CPU.
- */
- for_each_present_cpu(cpu) {
- if (per_cpu(misaligned_access_speed, cpu) !=
- RISCV_HWPROBE_MISALIGNED_EMULATED) {
- return;
- }
- }
- unaligned_ctl = true;
-}
-
-bool unaligned_ctl_available(void)
-{
- return unaligned_ctl;
-}

--
2.43.2



2024-02-28 10:43:42

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] riscv: Set unaligned access speed at compile time

On Tue, Feb 27, 2024 at 03:13:14PM -0800, Charlie Jenkins wrote:
> diff --git a/arch/riscv/kernel/sys_hwprobe.c b/arch/riscv/kernel/sys_hwprobe.c
> index a7c56b41efd2..69df8eaad9e7 100644
> --- a/arch/riscv/kernel/sys_hwprobe.c
> +++ b/arch/riscv/kernel/sys_hwprobe.c
> @@ -147,8 +147,10 @@ static bool hwprobe_ext0_has(const struct cpumask *cpus, unsigned long ext)
> return (pair.value & ext);
> }
>
> +#if defined(CONFIG_RISCV_PROBE_UNALIGNED_ACCESS)
> static u64 hwprobe_misaligned(const struct cpumask *cpus)
> {

> + return RISCV_HWPROBE_MISALIGNED_FAST;

This looks like a left over testing hack.

> int cpu;
> u64 perf = -1ULL;
>
> @@ -169,6 +171,27 @@ static u64 hwprobe_misaligned(const struct cpumask *cpus)
>
> return perf;
> }
> +#elif defined(CONFIG_RISCV_EMULATED_UNALIGNED_ACCESS)
> +static u64 hwprobe_misaligned(const struct cpumask *cpus)
> +{
> + return RISCV_HWPROBE_MISALIGNED_EMULATED;
> +}
> +#elif defined(CONFIG_RISCV_SLOW_UNALIGNED_ACCESS)
> +static u64 hwprobe_misaligned(const struct cpumask *cpus)
> +{
> + return RISCV_HWPROBE_MISALIGNED_SLOW;
> +}
> +#elif defined(CONFIG_RISCV_EFFICIENT_UNALIGNED_ACCESS)
> +static u64 hwprobe_misaligned(const struct cpumask *cpus)
> +{
> + return RISCV_HWPROBE_MISALIGNED_FAST;
> +}
> +#elif defined(CONFIG_RISCV_UNSUPPORTED_UNALIGNED_ACCESS)
> +static u64 hwprobe_misaligned(const struct cpumask *cpus)
> +{
> + return RISCV_HWPROBE_MISALIGNED_UNSUPPORTED;
> +}
> +#endif

I'm curious why we need multiple definitions of this here. My first
question is why we cannot have
if (IS_ENABLED(foo))
return bar;
inside the existing function for these cases.

My second question would be why fiddle with this in hwprobe at all?
I was specifically wondering why the kernel would not track this
information in the per-cpu variable misaligned_access_speed and the
syscall code could be left in the dark about how this is implemented.

>
> static void hwprobe_one_pair(struct riscv_hwprobe *pair,
> const struct cpumask *cpus)
> diff --git a/arch/riscv/kernel/traps_misaligned.c b/arch/riscv/kernel/traps_misaligned.c
> index 8ded225e8c5b..ba6763dd9895 100644
> --- a/arch/riscv/kernel/traps_misaligned.c
> +++ b/arch/riscv/kernel/traps_misaligned.c
> @@ -398,8 +398,6 @@ union reg_data {
> u64 data_u64;
> };
>
> -static bool unaligned_ctl __read_mostly;
> -
> /* sysctl hooks */
> int unaligned_enabled __read_mostly = 1; /* Enabled by default */
>
> @@ -413,7 +411,9 @@ int handle_misaligned_load(struct pt_regs *regs)
>
> perf_sw_event(PERF_COUNT_SW_ALIGNMENT_FAULTS, 1, regs, addr);
>
> +#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS

if (IS_ENABLED()), no?
But perhaps more interestingly - why is this being set here at all?
My understanding of the emulated stuff was that if the in-kernel
emulation was enabled, it would be used unless the CPU itself supported
fast accesses. I would expect this to be advertised to userspace once
the system has booted, not set the first time that a cpu emulates an
access,
I feel like I need to take a look at how this has been implemented, I
never really looked too closely at it and how it is enabled does not
match my expectations.

Cheers,
Conor.

> *this_cpu_ptr(&misaligned_access_speed) = RISCV_HWPROBE_MISALIGNED_EMULATED;
> +#endif
>
> if (!unaligned_enabled)
> return -1;
> @@ -595,53 +595,3 @@ int handle_misaligned_store(struct pt_regs *regs)
>
> return 0;
> }
> -
> -bool check_unaligned_access_emulated(int cpu)
> -{
> - long *mas_ptr = per_cpu_ptr(&misaligned_access_speed, cpu);
> - unsigned long tmp_var, tmp_val;
> - bool misaligned_emu_detected;
> -
> - *mas_ptr = RISCV_HWPROBE_MISALIGNED_UNKNOWN;
> -
> - __asm__ __volatile__ (
> - " "REG_L" %[tmp], 1(%[ptr])\n"
> - : [tmp] "=r" (tmp_val) : [ptr] "r" (&tmp_var) : "memory");
> -
> - misaligned_emu_detected = (*mas_ptr == RISCV_HWPROBE_MISALIGNED_EMULATED);
> - /*
> - * If unaligned_ctl is already set, this means that we detected that all
> - * CPUS uses emulated misaligned access at boot time. If that changed
> - * when hotplugging the new cpu, this is something we don't handle.
> - */
> - if (unlikely(unaligned_ctl && !misaligned_emu_detected)) {
> - pr_crit("CPU misaligned accesses non homogeneous (expected all emulated)\n");
> - while (true)
> - cpu_relax();
> - }
> -
> - return misaligned_emu_detected;
> -}
> -
> -void unaligned_emulation_finish(void)
> -{
> - int cpu;
> -
> - /*
> - * We can only support PR_UNALIGN controls if all CPUs have misaligned
> - * accesses emulated since tasks requesting such control can run on any
> - * CPU.
> - */
> - for_each_present_cpu(cpu) {
> - if (per_cpu(misaligned_access_speed, cpu) !=
> - RISCV_HWPROBE_MISALIGNED_EMULATED) {
> - return;
> - }
> - }
> - unaligned_ctl = true;
> -}
> -
> -bool unaligned_ctl_available(void)
> -{
> - return unaligned_ctl;
> -}
>
> --
> 2.43.2
>
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv


Attachments:
(No filename) (5.11 kB)
signature.asc (235.00 B)
Download all attachments

2024-02-29 12:37:56

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] riscv: Set unaligned access speed at compile time

On Tue, Feb 27, 2024 at 03:13:14PM -0800, Charlie Jenkins wrote:

> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index bffbd869a068..ad0a9c1f8802 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -690,25 +690,58 @@ config THREAD_SIZE_ORDER
> config RISCV_MISALIGNED
> bool "Support misaligned load/store traps for kernel and userspace"
> select SYSCTL_ARCH_UNALIGN_ALLOW
> + depends on RISCV_PROBE_UNALIGNED_ACCESS || RISCV_EMULATED_UNALIGNED_ACCESS
> default y
> help
> Say Y here if you want the kernel to embed support for misaligned
> load/store for both kernel and userspace. When disable, misaligned
> accesses will generate SIGBUS in userspace and panic in kernel.
>
> +choice
> + prompt "Unaligned Accesses Support"
> + default RISCV_PROBE_UNALIGNED_ACCESS
> + help
> + This selects the hardware support for unaligned accesses. This
> + information is used by the kernel to perform optimizations. It is also
> + exposed to user space via the hwprobe syscall. The hardware will be
> + probed at boot by default.
> +
> +config RISCV_PROBE_UNALIGNED_ACCESS
> + bool "Probe for hardware unaligned access support"
> + help
> + During boot, the kernel will run a series of tests to determine the
> + speed of unaligned accesses. This probing will dynamically determine
> + the speed of unaligned accesses on the boot hardware.
> +
> +config RISCV_EMULATED_UNALIGNED_ACCESS
> + bool "Assume the system expects emulated unaligned memory accesses"
> + help
> + Assume that the system expects emulated unaligned memory accesses.
> + When enabled, this option notifies the kernel and userspace that
> + unaligned memory accesses will be emulated by either the kernel or
> + firmware.
> +
> +config RISCV_SLOW_UNALIGNED_ACCESS
> + bool "Assume the system supports slow unaligned memory accesses"
> + depends on NONPORTABLE
> + help
> + Assume that the system supports slow unaligned memory accesses. The
> + kernel may not be able to run at all on systems that do not support
> + unaligned memory accesses.
> +
> config RISCV_EFFICIENT_UNALIGNED_ACCESS
> - bool "Assume the CPU supports fast unaligned memory accesses"
> + bool "Assume the system supports fast unaligned memory accesses"
> depends on NONPORTABLE
> select DCACHE_WORD_ACCESS if MMU
> select HAVE_EFFICIENT_UNALIGNED_ACCESS
> help
> - Say Y here if you want the kernel to assume that the CPU supports
> - efficient unaligned memory accesses. When enabled, this option
> - improves the performance of the kernel on such CPUs. However, the
> - kernel will run much more slowly, or will not be able to run at all,
> - on CPUs that do not support efficient unaligned memory accesses.
> + Assume that the system supports fast unaligned memory accesses. When
> + enabled, this option improves the performance of the kernel on such
> + systems. However, the kernel will run much more slowly, or will not
> + be able to run at all, on systems that do not support efficient
> + unaligned memory accesses.
>
> - If unsure what to do here, say N.
> +endchoice

Thinking about this some more, you've got 6 different options here:

1 probed with no emulation available (choice set to probe + RISCV_MISALIGNED=n)
2 probe with in-kernel emulation available (choice set to probe + RISCV_MISALIGNED=y)
3 in-kernel emulation only (choice set to emulated + RISCV_MISALIGNED=y)
4 no in-kernel emulation but report emulated (choice set to emulated + RISCV_MISALIGNED=n)
5 slow unaligned (choice set to slow)
6 fast unaligned (choice set to fast)

Out of these, only 2 and 3 are portable options, since 1, 4 and 5 will
cause uabi issues if the CPUs or firmware does not support unaligned
access & 6 will not run in the same circumstances.

My first thought here was about the motivation for the patch and what it
has resulted in. Being able to support HAVE_EFFICIENT_ALIGNED_ACCESS is
pretty nice, but it then seems like beyond that we are introducing
configuration for configurations sake, without looking at what the
resultant kernels will be useful for. Having 6 different options for how
the kernel can be configured in this way seems excessive and I don't
really get why some of them are even useful.

Take for example situation 4. Unless I have misunderstood the Kconfig
options above, if you configure a kernel in that way, it will always
report as emulated, but there is no emulation provided. This just seems
like a option that's only purpose is setting a hwprobe value, which is
a dog wagging the tail situation to me.

The other thing is about what options are actually marked as
NONPORTABLE, given it is done in the choice option - but whether or not
something is actually non-portable actually depends on whether or not
the in-kernel emulator exists.

I think what I would do here is simplify this quite a bit, starting by
making RISCV_MISALIGNED an internal option that users cannot enable but
is selected by the PORTABLE choice options. I would then re-work the
choice options a bit. My 4 would be:

1 probe: probe at boot time, falling back to emulated if not performant
2 emulated: always emulate it in the kernel
3 slow: don't probe or emulate in the kernel
4 fast: Your current fast option

1 & 2 select RISCV_UNALIGNED and are portable because they contain the
emulated support and thus satisfy the UABI constaints.
3 & 4 are marked NONPORTABLE. I think 3 will run on systems that don't
support unaligned accesses but it will have UABI problems. 4 for the
reason mentioned in the Kconfig option above.

I think that that gives you 4 meaningful options, what do you think?

Cheers,
Conor.


Attachments:
(No filename) (5.65 kB)
signature.asc (235.00 B)
Download all attachments

2024-02-29 14:49:04

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] riscv: Set unaligned access speed at compile time

Clement,

On Wed, Feb 28, 2024 at 10:43:20AM +0000, Conor Dooley wrote:
> > @@ -413,7 +411,9 @@ int handle_misaligned_load(struct pt_regs *regs)
> >
> > perf_sw_event(PERF_COUNT_SW_ALIGNMENT_FAULTS, 1, regs, addr);
> >
> > +#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS
>
> if (IS_ENABLED()), no?
> But perhaps more interestingly - why is this being set here at all?
> My understanding of the emulated stuff was that if the in-kernel
> emulation was enabled, it would be used unless the CPU itself supported
> fast accesses.

> I would expect this to be advertised to userspace once
> the system has booted, not set the first time that a cpu emulates an
> access,
> I feel like I need to take a look at how this has been implemented, I
> never really looked too closely at it and how it is enabled does not
> match my expectations.

I did look at this more closely and I understand why this is being done.
In the vast majority of situations I think it is going to work perfectly
fine, since that first access is going to be during the boot time
probing.
.
> > *this_cpu_ptr(&misaligned_access_speed) = RISCV_HWPROBE_MISALIGNED_EMULATED;
> > +#endif
> >
> > if (!unaligned_enabled)
> > return -1;

This is in the diff, so I am commenting here. Firstly, why does this
return -1 and not a regular negative errno?

Secondly, this looks pretty suspect to me - if the sysctl is used to
disable the emulation things like hwprobe will still report that access
is emulated if during boot it was enabled.
I think that if this condition is hit, we actually need to change the
per-cpu variable to set the alignment to unsupported, given the
emulation will (at present & if my understanding is correct) only be
enabled when there's not underlying support for unaligned accesses.

Thoughts?

Cheers,
Conor.


Attachments:
(No filename) (1.81 kB)
signature.asc (235.00 B)
Download all attachments

2024-02-29 18:37:35

by Charlie Jenkins

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] riscv: Set unaligned access speed at compile time

On Thu, Feb 29, 2024 at 12:26:37PM +0000, Conor Dooley wrote:
> On Tue, Feb 27, 2024 at 03:13:14PM -0800, Charlie Jenkins wrote:
>
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index bffbd869a068..ad0a9c1f8802 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -690,25 +690,58 @@ config THREAD_SIZE_ORDER
> > config RISCV_MISALIGNED
> > bool "Support misaligned load/store traps for kernel and userspace"
> > select SYSCTL_ARCH_UNALIGN_ALLOW
> > + depends on RISCV_PROBE_UNALIGNED_ACCESS || RISCV_EMULATED_UNALIGNED_ACCESS
> > default y
> > help
> > Say Y here if you want the kernel to embed support for misaligned
> > load/store for both kernel and userspace. When disable, misaligned
> > accesses will generate SIGBUS in userspace and panic in kernel.
> >
> > +choice
> > + prompt "Unaligned Accesses Support"
> > + default RISCV_PROBE_UNALIGNED_ACCESS
> > + help
> > + This selects the hardware support for unaligned accesses. This
> > + information is used by the kernel to perform optimizations. It is also
> > + exposed to user space via the hwprobe syscall. The hardware will be
> > + probed at boot by default.
> > +
> > +config RISCV_PROBE_UNALIGNED_ACCESS
> > + bool "Probe for hardware unaligned access support"
> > + help
> > + During boot, the kernel will run a series of tests to determine the
> > + speed of unaligned accesses. This probing will dynamically determine
> > + the speed of unaligned accesses on the boot hardware.
> > +
> > +config RISCV_EMULATED_UNALIGNED_ACCESS
> > + bool "Assume the system expects emulated unaligned memory accesses"
> > + help
> > + Assume that the system expects emulated unaligned memory accesses.
> > + When enabled, this option notifies the kernel and userspace that
> > + unaligned memory accesses will be emulated by either the kernel or
> > + firmware.
> > +
> > +config RISCV_SLOW_UNALIGNED_ACCESS
> > + bool "Assume the system supports slow unaligned memory accesses"
> > + depends on NONPORTABLE
> > + help
> > + Assume that the system supports slow unaligned memory accesses. The
> > + kernel may not be able to run at all on systems that do not support
> > + unaligned memory accesses.
> > +
> > config RISCV_EFFICIENT_UNALIGNED_ACCESS
> > - bool "Assume the CPU supports fast unaligned memory accesses"
> > + bool "Assume the system supports fast unaligned memory accesses"
> > depends on NONPORTABLE
> > select DCACHE_WORD_ACCESS if MMU
> > select HAVE_EFFICIENT_UNALIGNED_ACCESS
> > help
> > - Say Y here if you want the kernel to assume that the CPU supports
> > - efficient unaligned memory accesses. When enabled, this option
> > - improves the performance of the kernel on such CPUs. However, the
> > - kernel will run much more slowly, or will not be able to run at all,
> > - on CPUs that do not support efficient unaligned memory accesses.
> > + Assume that the system supports fast unaligned memory accesses. When
> > + enabled, this option improves the performance of the kernel on such
> > + systems. However, the kernel will run much more slowly, or will not
> > + be able to run at all, on systems that do not support efficient
> > + unaligned memory accesses.
> >
> > - If unsure what to do here, say N.
> > +endchoice
>
> Thinking about this some more, you've got 6 different options here:
>
> 1 probed with no emulation available (choice set to probe + RISCV_MISALIGNED=n)
> 2 probe with in-kernel emulation available (choice set to probe + RISCV_MISALIGNED=y)
> 3 in-kernel emulation only (choice set to emulated + RISCV_MISALIGNED=y)
> 4 no in-kernel emulation but report emulated (choice set to emulated + RISCV_MISALIGNED=n)
> 5 slow unaligned (choice set to slow)
> 6 fast unaligned (choice set to fast)
>
> Out of these, only 2 and 3 are portable options, since 1, 4 and 5 will
> cause uabi issues if the CPUs or firmware does not support unaligned
> access & 6 will not run in the same circumstances.
>
> My first thought here was about the motivation for the patch and what it
> has resulted in. Being able to support HAVE_EFFICIENT_ALIGNED_ACCESS is
> pretty nice, but it then seems like beyond that we are introducing
> configuration for configurations sake, without looking at what the
> resultant kernels will be useful for. Having 6 different options for how
> the kernel can be configured in this way seems excessive and I don't
> really get why some of them are even useful.
>
> Take for example situation 4. Unless I have misunderstood the Kconfig
> options above, if you configure a kernel in that way, it will always
> report as emulated, but there is no emulation provided. This just seems
> like a option that's only purpose is setting a hwprobe value, which is
> a dog wagging the tail situation to me.

This goes back to my earlier comment that it would make sense for
somebody to select "emulated" even if the unaligned address is being
emulated by firmware. However, there may be no users for this and if
needed we can add that in the future.

>
> The other thing is about what options are actually marked as
> NONPORTABLE, given it is done in the choice option - but whether or not
> something is actually non-portable actually depends on whether or not
> the in-kernel emulator exists.
>
> I think what I would do here is simplify this quite a bit, starting by
> making RISCV_MISALIGNED an internal option that users cannot enable but
> is selected by the PORTABLE choice options. I would then re-work the
> choice options a bit. My 4 would be:
>
> 1 probe: probe at boot time, falling back to emulated if not performant
> 2 emulated: always emulate it in the kernel
> 3 slow: don't probe or emulate in the kernel
> 4 fast: Your current fast option

Emulated doesn't mean that the kernel will always emulate the unaligned
accesses. It means that the kernel has the ability to emulate them. It
will only emulate them if the CPU traps on unaligned accesses. Kernel
code can choose to forcefully align an address it thinks may cause an
unaligned access, but that's slightly different from emulated.

Emulated is much slower than "slow", so it seems like it unfairly
penalizes "slow" to group the options for firmware emulated with a CPU
that actually just does slow unaligned accesses.

- Charlie

>
> 1 & 2 select RISCV_UNALIGNED and are portable because they contain the
> emulated support and thus satisfy the UABI constaints.
> 3 & 4 are marked NONPORTABLE. I think 3 will run on systems that don't
> support unaligned accesses but it will have UABI problems. 4 for the
> reason mentioned in the Kconfig option above.
>
> I think that that gives you 4 meaningful options, what do you think?
>
> Cheers,
> Conor.



2024-02-29 18:49:45

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] riscv: Set unaligned access speed at compile time

On Thu, Feb 29, 2024 at 10:37:20AM -0800, Charlie Jenkins wrote:
> On Thu, Feb 29, 2024 at 12:26:37PM +0000, Conor Dooley wrote:
> > On Tue, Feb 27, 2024 at 03:13:14PM -0800, Charlie Jenkins wrote:
> >
> > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > index bffbd869a068..ad0a9c1f8802 100644
> > > --- a/arch/riscv/Kconfig
> > > +++ b/arch/riscv/Kconfig
> > > @@ -690,25 +690,58 @@ config THREAD_SIZE_ORDER
> > > config RISCV_MISALIGNED
> > > bool "Support misaligned load/store traps for kernel and userspace"
> > > select SYSCTL_ARCH_UNALIGN_ALLOW
> > > + depends on RISCV_PROBE_UNALIGNED_ACCESS || RISCV_EMULATED_UNALIGNED_ACCESS
> > > default y
> > > help
> > > Say Y here if you want the kernel to embed support for misaligned
> > > load/store for both kernel and userspace. When disable, misaligned
> > > accesses will generate SIGBUS in userspace and panic in kernel.
> > >
> > > +choice
> > > + prompt "Unaligned Accesses Support"
> > > + default RISCV_PROBE_UNALIGNED_ACCESS
> > > + help
> > > + This selects the hardware support for unaligned accesses. This
> > > + information is used by the kernel to perform optimizations. It is also
> > > + exposed to user space via the hwprobe syscall. The hardware will be
> > > + probed at boot by default.
> > > +
> > > +config RISCV_PROBE_UNALIGNED_ACCESS
> > > + bool "Probe for hardware unaligned access support"
> > > + help
> > > + During boot, the kernel will run a series of tests to determine the
> > > + speed of unaligned accesses. This probing will dynamically determine
> > > + the speed of unaligned accesses on the boot hardware.
> > > +
> > > +config RISCV_EMULATED_UNALIGNED_ACCESS
> > > + bool "Assume the system expects emulated unaligned memory accesses"
> > > + help
> > > + Assume that the system expects emulated unaligned memory accesses.
> > > + When enabled, this option notifies the kernel and userspace that
> > > + unaligned memory accesses will be emulated by either the kernel or
> > > + firmware.
> > > +
> > > +config RISCV_SLOW_UNALIGNED_ACCESS
> > > + bool "Assume the system supports slow unaligned memory accesses"
> > > + depends on NONPORTABLE
> > > + help
> > > + Assume that the system supports slow unaligned memory accesses. The
> > > + kernel may not be able to run at all on systems that do not support
> > > + unaligned memory accesses.
> > > +
> > > config RISCV_EFFICIENT_UNALIGNED_ACCESS
> > > - bool "Assume the CPU supports fast unaligned memory accesses"
> > > + bool "Assume the system supports fast unaligned memory accesses"
> > > depends on NONPORTABLE
> > > select DCACHE_WORD_ACCESS if MMU
> > > select HAVE_EFFICIENT_UNALIGNED_ACCESS
> > > help
> > > - Say Y here if you want the kernel to assume that the CPU supports
> > > - efficient unaligned memory accesses. When enabled, this option
> > > - improves the performance of the kernel on such CPUs. However, the
> > > - kernel will run much more slowly, or will not be able to run at all,
> > > - on CPUs that do not support efficient unaligned memory accesses.
> > > + Assume that the system supports fast unaligned memory accesses. When
> > > + enabled, this option improves the performance of the kernel on such
> > > + systems. However, the kernel will run much more slowly, or will not
> > > + be able to run at all, on systems that do not support efficient
> > > + unaligned memory accesses.
> > >
> > > - If unsure what to do here, say N.
> > > +endchoice
> >
> > Thinking about this some more, you've got 6 different options here:
> >
> > 1 probed with no emulation available (choice set to probe + RISCV_MISALIGNED=n)
> > 2 probe with in-kernel emulation available (choice set to probe + RISCV_MISALIGNED=y)
> > 3 in-kernel emulation only (choice set to emulated + RISCV_MISALIGNED=y)
> > 4 no in-kernel emulation but report emulated (choice set to emulated + RISCV_MISALIGNED=n)
> > 5 slow unaligned (choice set to slow)
> > 6 fast unaligned (choice set to fast)
> >
> > Out of these, only 2 and 3 are portable options, since 1, 4 and 5 will
> > cause uabi issues if the CPUs or firmware does not support unaligned
> > access & 6 will not run in the same circumstances.
> >
> > My first thought here was about the motivation for the patch and what it
> > has resulted in. Being able to support HAVE_EFFICIENT_ALIGNED_ACCESS is
> > pretty nice, but it then seems like beyond that we are introducing
> > configuration for configurations sake, without looking at what the
> > resultant kernels will be useful for. Having 6 different options for how
> > the kernel can be configured in this way seems excessive and I don't
> > really get why some of them are even useful.
> >
> > Take for example situation 4. Unless I have misunderstood the Kconfig
> > options above, if you configure a kernel in that way, it will always
> > report as emulated, but there is no emulation provided. This just seems
> > like a option that's only purpose is setting a hwprobe value, which is
> > a dog wagging the tail situation to me.
>
> This goes back to my earlier comment that it would make sense for
> somebody to select "emulated" even if the unaligned address is being
> emulated by firmware. However, there may be no users for this and if
> needed we can add that in the future.

I don't think that is going to provide any useful information, give you
will never be able to know if slow means "slow but done by the cpu" or
"slow but emulated by the firmware".

> > The other thing is about what options are actually marked as
> > NONPORTABLE, given it is done in the choice option - but whether or not
> > something is actually non-portable actually depends on whether or not
> > the in-kernel emulator exists.
> >
> > I think what I would do here is simplify this quite a bit, starting by
> > making RISCV_MISALIGNED an internal option that users cannot enable but
> > is selected by the PORTABLE choice options. I would then re-work the
> > choice options a bit. My 4 would be:
> >
> > 1 probe: probe at boot time, falling back to emulated if not performant
> > 2 emulated: always emulate it in the kernel
> > 3 slow: don't probe or emulate in the kernel
> > 4 fast: Your current fast option
>
> Emulated doesn't mean that the kernel will always emulate the unaligned
> accesses. It means that the kernel has the ability to emulate them. It
> will only emulate them if the CPU traps on unaligned accesses. Kernel
> code can choose to forcefully align an address it thinks may cause an
> unaligned access, but that's slightly different from emulated.

Sure, make option 2 "don't probe at boot time, emulate it in the kernel
if we trap". I suppose in this case though, to get a correct output in
hwprobe you'd have to still attempt an unaligned access at boot time to
see if you trap but it will not perform the speed test?

> Emulated is much slower than "slow", so it seems like it unfairly
> penalizes "slow" to group the options for firmware emulated with a CPU
> that actually just does slow unaligned accesses.

As I said in my previous email, "slow" has a huge variance and may not
be meaningfully faster. I wonder if it might actually even be slower on
some systems that emulate it in firmware?

> > 1 & 2 select RISCV_UNALIGNED and are portable because they contain the
> > emulated support and thus satisfy the UABI constaints.
> > 3 & 4 are marked NONPORTABLE. I think 3 will run on systems that don't
> > support unaligned accesses but it will have UABI problems. 4 for the
> > reason mentioned in the Kconfig option above.
> >
> > I think that that gives you 4 meaningful options, what do you think?


Attachments:
(No filename) (7.65 kB)
signature.asc (235.00 B)
Download all attachments

2024-02-29 19:23:31

by Charlie Jenkins

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] riscv: Set unaligned access speed at compile time

On Thu, Feb 29, 2024 at 06:49:33PM +0000, Conor Dooley wrote:
> On Thu, Feb 29, 2024 at 10:37:20AM -0800, Charlie Jenkins wrote:
> > On Thu, Feb 29, 2024 at 12:26:37PM +0000, Conor Dooley wrote:
> > > On Tue, Feb 27, 2024 at 03:13:14PM -0800, Charlie Jenkins wrote:
> > >
> > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > > index bffbd869a068..ad0a9c1f8802 100644
> > > > --- a/arch/riscv/Kconfig
> > > > +++ b/arch/riscv/Kconfig
> > > > @@ -690,25 +690,58 @@ config THREAD_SIZE_ORDER
> > > > config RISCV_MISALIGNED
> > > > bool "Support misaligned load/store traps for kernel and userspace"
> > > > select SYSCTL_ARCH_UNALIGN_ALLOW
> > > > + depends on RISCV_PROBE_UNALIGNED_ACCESS || RISCV_EMULATED_UNALIGNED_ACCESS
> > > > default y
> > > > help
> > > > Say Y here if you want the kernel to embed support for misaligned
> > > > load/store for both kernel and userspace. When disable, misaligned
> > > > accesses will generate SIGBUS in userspace and panic in kernel.
> > > >
> > > > +choice
> > > > + prompt "Unaligned Accesses Support"
> > > > + default RISCV_PROBE_UNALIGNED_ACCESS
> > > > + help
> > > > + This selects the hardware support for unaligned accesses. This
> > > > + information is used by the kernel to perform optimizations. It is also
> > > > + exposed to user space via the hwprobe syscall. The hardware will be
> > > > + probed at boot by default.
> > > > +
> > > > +config RISCV_PROBE_UNALIGNED_ACCESS
> > > > + bool "Probe for hardware unaligned access support"
> > > > + help
> > > > + During boot, the kernel will run a series of tests to determine the
> > > > + speed of unaligned accesses. This probing will dynamically determine
> > > > + the speed of unaligned accesses on the boot hardware.
> > > > +
> > > > +config RISCV_EMULATED_UNALIGNED_ACCESS
> > > > + bool "Assume the system expects emulated unaligned memory accesses"
> > > > + help
> > > > + Assume that the system expects emulated unaligned memory accesses.
> > > > + When enabled, this option notifies the kernel and userspace that
> > > > + unaligned memory accesses will be emulated by either the kernel or
> > > > + firmware.
> > > > +
> > > > +config RISCV_SLOW_UNALIGNED_ACCESS
> > > > + bool "Assume the system supports slow unaligned memory accesses"
> > > > + depends on NONPORTABLE
> > > > + help
> > > > + Assume that the system supports slow unaligned memory accesses. The
> > > > + kernel may not be able to run at all on systems that do not support
> > > > + unaligned memory accesses.
> > > > +
> > > > config RISCV_EFFICIENT_UNALIGNED_ACCESS
> > > > - bool "Assume the CPU supports fast unaligned memory accesses"
> > > > + bool "Assume the system supports fast unaligned memory accesses"
> > > > depends on NONPORTABLE
> > > > select DCACHE_WORD_ACCESS if MMU
> > > > select HAVE_EFFICIENT_UNALIGNED_ACCESS
> > > > help
> > > > - Say Y here if you want the kernel to assume that the CPU supports
> > > > - efficient unaligned memory accesses. When enabled, this option
> > > > - improves the performance of the kernel on such CPUs. However, the
> > > > - kernel will run much more slowly, or will not be able to run at all,
> > > > - on CPUs that do not support efficient unaligned memory accesses.
> > > > + Assume that the system supports fast unaligned memory accesses. When
> > > > + enabled, this option improves the performance of the kernel on such
> > > > + systems. However, the kernel will run much more slowly, or will not
> > > > + be able to run at all, on systems that do not support efficient
> > > > + unaligned memory accesses.
> > > >
> > > > - If unsure what to do here, say N.
> > > > +endchoice
> > >
> > > Thinking about this some more, you've got 6 different options here:
> > >
> > > 1 probed with no emulation available (choice set to probe + RISCV_MISALIGNED=n)
> > > 2 probe with in-kernel emulation available (choice set to probe + RISCV_MISALIGNED=y)
> > > 3 in-kernel emulation only (choice set to emulated + RISCV_MISALIGNED=y)
> > > 4 no in-kernel emulation but report emulated (choice set to emulated + RISCV_MISALIGNED=n)
> > > 5 slow unaligned (choice set to slow)
> > > 6 fast unaligned (choice set to fast)
> > >
> > > Out of these, only 2 and 3 are portable options, since 1, 4 and 5 will
> > > cause uabi issues if the CPUs or firmware does not support unaligned
> > > access & 6 will not run in the same circumstances.
> > >
> > > My first thought here was about the motivation for the patch and what it
> > > has resulted in. Being able to support HAVE_EFFICIENT_ALIGNED_ACCESS is
> > > pretty nice, but it then seems like beyond that we are introducing
> > > configuration for configurations sake, without looking at what the
> > > resultant kernels will be useful for. Having 6 different options for how
> > > the kernel can be configured in this way seems excessive and I don't
> > > really get why some of them are even useful.
> > >
> > > Take for example situation 4. Unless I have misunderstood the Kconfig
> > > options above, if you configure a kernel in that way, it will always
> > > report as emulated, but there is no emulation provided. This just seems
> > > like a option that's only purpose is setting a hwprobe value, which is
> > > a dog wagging the tail situation to me.
> >
> > This goes back to my earlier comment that it would make sense for
> > somebody to select "emulated" even if the unaligned address is being
> > emulated by firmware. However, there may be no users for this and if
> > needed we can add that in the future.
>
> I don't think that is going to provide any useful information, give you
> will never be able to know if slow means "slow but done by the cpu" or
> "slow but emulated by the firmware".

Okay makes sense. If somebody has a usecase in the future it would be
easy to change without breaking any UABI.

>
> > > The other thing is about what options are actually marked as
> > > NONPORTABLE, given it is done in the choice option - but whether or not
> > > something is actually non-portable actually depends on whether or not
> > > the in-kernel emulator exists.
> > >
> > > I think what I would do here is simplify this quite a bit, starting by
> > > making RISCV_MISALIGNED an internal option that users cannot enable but
> > > is selected by the PORTABLE choice options. I would then re-work the
> > > choice options a bit. My 4 would be:
> > >
> > > 1 probe: probe at boot time, falling back to emulated if not performant
> > > 2 emulated: always emulate it in the kernel
> > > 3 slow: don't probe or emulate in the kernel
> > > 4 fast: Your current fast option
> >
> > Emulated doesn't mean that the kernel will always emulate the unaligned
> > accesses. It means that the kernel has the ability to emulate them. It
> > will only emulate them if the CPU traps on unaligned accesses. Kernel
> > code can choose to forcefully align an address it thinks may cause an
> > unaligned access, but that's slightly different from emulated.
>
> Sure, make option 2 "don't probe at boot time, emulate it in the kernel
> if we trap". I suppose in this case though, to get a correct output in
> hwprobe you'd have to still attempt an unaligned access at boot time to
> see if you trap but it will not perform the speed test?

Are you trying to cover the case here that the kernel is compiled as
"emulate unaligned accesses" but the kernel isn't actually needed to
emulate unaligned accesses? Seems like if the kernel is compiled as
such it would make sense to report emulated with the assumption that if
the kernel isn't emulating it, something else is.

>
> > Emulated is much slower than "slow", so it seems like it unfairly
> > penalizes "slow" to group the options for firmware emulated with a CPU
> > that actually just does slow unaligned accesses.
>
> As I said in my previous email, "slow" has a huge variance and may not
> be meaningfully faster. I wonder if it might actually even be slower on
> some systems that emulate it in firmware?

I have no real data and can only speculate. Your 4 options seem to cover
all of the "meaningful" use cases so I can make the change in a new
version.

- Charlie

>
> > > 1 & 2 select RISCV_UNALIGNED and are portable because they contain the
> > > emulated support and thus satisfy the UABI constaints.
> > > 3 & 4 are marked NONPORTABLE. I think 3 will run on systems that don't
> > > support unaligned accesses but it will have UABI problems. 4 for the
> > > reason mentioned in the Kconfig option above.
> > >
> > > I think that that gives you 4 meaningful options, what do you think?
>



2024-03-01 09:45:17

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] riscv: Set unaligned access speed at compile time

On Thu, Feb 29, 2024 at 11:23:08AM -0800, Charlie Jenkins wrote:
> > > > 1 probe: probe at boot time, falling back to emulated if not performant
> > > > 2 emulated: always emulate it in the kernel
> > > > 3 slow: don't probe or emulate in the kernel
> > > > 4 fast: Your current fast option
> > >
> > > Emulated doesn't mean that the kernel will always emulate the unaligned
> > > accesses. It means that the kernel has the ability to emulate them. It
> > > will only emulate them if the CPU traps on unaligned accesses. Kernel
> > > code can choose to forcefully align an address it thinks may cause an
> > > unaligned access, but that's slightly different from emulated.
> >
> > Sure, make option 2 "don't probe at boot time, emulate it in the kernel
> > if we trap". I suppose in this case though, to get a correct output in
> > hwprobe you'd have to still attempt an unaligned access at boot time to
> > see if you trap but it will not perform the speed test?
>
> Are you trying to cover the case here that the kernel is compiled as
> "emulate unaligned accesses" but the kernel isn't actually needed to
> emulate unaligned accesses?

Nope, the case "don't probe at boot time, emulate it in the kernel if
we trap" which is replacing 2 above.

> Seems like if the kernel is compiled as
> such it would make sense to report emulated with the assumption that if
> the kernel isn't emulating it, something else is.

Or maybe nothing is emulating it, we don't know. Feels to me like it
should report slow by default, given that's the option you can infer the
least information from, and then report emulated on trap.


Attachments:
(No filename) (1.61 kB)
signature.asc (235.00 B)
Download all attachments