2023-02-06 20:15:21

by Evan Green

[permalink] [raw]
Subject: [PATCH v2 0/6] RISC-V Hardware Probing User Interface


These are very much up for discussion, as it's a pretty big new user
interface and it's quite a bit different from how we've historically
done things: this isn't just providing an ISA string to userspace, this
has its own format for providing information to userspace.

There's been a bunch of off-list discussions about this, including at
Plumbers. The original plan was to do something involving providing an
ISA string to userspace, but ISA strings just aren't sufficient for a
stable ABI any more: in order to parse an ISA string users need the
version of the specifications that the string is written to, the version
of each extension (sometimes at a finer granularity than the RISC-V
releases/versions encode), and the expected use case for the ISA string
(ie, is it a U-mode or M-mode string). That's a lot of complexity to
try and keep ABI compatible and it's probably going to continue to grow,
as even if there's no more complexity in the specifications we'll have
to deal with the various ISA string parsing oddities that end up all
over userspace.

Instead this patch set takes a very different approach and provides a set
of key/value pairs that encode various bits about the system. The big
advantage here is that we can clearly define what these mean so we can
ensure ABI stability, but it also allows us to encode information that's
unlikely to ever appear in an ISA string (see the misaligned access
performance, for example). The resulting interface looks a lot like
what arm64 and x86 do, and will hopefully fit well into something like
ACPI in the future.

The actual user interface is a syscall. I'm not really sure that's the
right way to go about this, but it makes for flexible prototying.
Various other approaches have been talked about like making HWCAP2 a
pointer, having a VDSO routine, or exposing this via sysfs. Those seem
like generally reasonable approaches, but I've yet to figure out a way
to get the general case working without a syscall as that's the only way
I've come up with to deal with the heterogenous CPU case. Happy to hear
if someone has a better idea, though, as I don't really want to add a
syscall if we can avoid it.

An example series in glibc exposing this syscall and using it in an
ifunc selector for memcpy can be found at [1].

[1] https://public-inbox.org/libc-alpha/[email protected]/T/#t

Changes in v2:
- Changed the interface to look more like poll(). Rather than supplying
key_offset and getting back an array of values with numerically
contiguous keys, have the user pre-fill the key members of the array,
and the kernel will fill in the corresponding values. For any key it
doesn't recognize, it will set the key of that element to -1. This
allows usermode to quickly ask for exactly the elements it cares
about, and not get bogged down in a back and forth about newer keys
that older kernels might not recognize. In other words, the kernel
can communicate that it doesn't recognize some of the keys while
still providing the data for the keys it does know.
- Added a shortcut to the cpuset parameters that if a size of 0 and
NULL is provided for the CPU set, the kernel will use a cpu mask of
all online CPUs. This is convenient because I suspect most callers
will only want to act on a feature if it's supported on all CPUs, and
it's a headache to dynamically allocate an array of all 1s, not to
mention a waste to have the kernel loop over all of the offline bits.
- Fixed logic error in if(of_property_read_string...) that caused crash
- Include cpufeature.h in cpufeature.h to avoid undeclared variable
warning.
- Added a _MASK define
- Fix random checkpatch complaints
- Updated the selftests to the new API and added some more.
- Fixed indentation, comments in .S, and general checkpatch complaints.

Evan Green (4):
RISC-V: Move struct riscv_cpuinfo to new header
RISC-V: Add a syscall for HW probing
RISC-V: hwprobe: Support probing of misaligned access performance
selftests: Test the new RISC-V hwprobe interface

Palmer Dabbelt (2):
RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA
dt-bindings: Add RISC-V misaligned access performance

.../devicetree/bindings/riscv/cpus.yaml | 15 ++
Documentation/riscv/hwprobe.rst | 66 ++++++
Documentation/riscv/index.rst | 1 +
arch/riscv/include/asm/cpufeature.h | 23 +++
arch/riscv/include/asm/hwprobe.h | 13 ++
arch/riscv/include/asm/smp.h | 9 +
arch/riscv/include/asm/syscall.h | 3 +
arch/riscv/include/uapi/asm/hwprobe.h | 35 ++++
arch/riscv/include/uapi/asm/unistd.h | 8 +
arch/riscv/kernel/cpu.c | 11 +-
arch/riscv/kernel/cpufeature.c | 31 ++-
arch/riscv/kernel/sys_riscv.c | 192 +++++++++++++++++-
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/riscv/Makefile | 58 ++++++
.../testing/selftests/riscv/hwprobe/Makefile | 10 +
.../testing/selftests/riscv/hwprobe/hwprobe.c | 89 ++++++++
.../selftests/riscv/hwprobe/sys_hwprobe.S | 12 ++
tools/testing/selftests/riscv/libc.S | 46 +++++
18 files changed, 613 insertions(+), 10 deletions(-)
create mode 100644 Documentation/riscv/hwprobe.rst
create mode 100644 arch/riscv/include/asm/cpufeature.h
create mode 100644 arch/riscv/include/asm/hwprobe.h
create mode 100644 arch/riscv/include/uapi/asm/hwprobe.h
create mode 100644 tools/testing/selftests/riscv/Makefile
create mode 100644 tools/testing/selftests/riscv/hwprobe/Makefile
create mode 100644 tools/testing/selftests/riscv/hwprobe/hwprobe.c
create mode 100644 tools/testing/selftests/riscv/hwprobe/sys_hwprobe.S
create mode 100644 tools/testing/selftests/riscv/libc.S

--
2.25.1



2023-02-06 20:15:30

by Evan Green

[permalink] [raw]
Subject: [PATCH v2 1/6] RISC-V: Move struct riscv_cpuinfo to new header

In preparation for tracking and exposing microarchitectural details to
userspace (like whether or not unaligned accesses are fast), move the
riscv_cpuinfo struct out to its own new cpufeatures.h header. It will
need to be used by more than just cpu.c.

Signed-off-by: Evan Green <[email protected]>
---

(no changes since v1)

arch/riscv/include/asm/cpufeature.h | 21 +++++++++++++++++++++
arch/riscv/kernel/cpu.c | 8 ++------
2 files changed, 23 insertions(+), 6 deletions(-)
create mode 100644 arch/riscv/include/asm/cpufeature.h

diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
new file mode 100644
index 000000000000..66c251d98290
--- /dev/null
+++ b/arch/riscv/include/asm/cpufeature.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright 2022 Rivos, Inc
+ */
+
+#ifndef _ASM_CPUFEATURE_H
+#define _ASM_CPUFEATURE_H
+
+/*
+ * These are probed via a device_initcall(), via either the SBI or directly
+ * from the cooresponding CSRs.
+ */
+struct riscv_cpuinfo {
+ unsigned long mvendorid;
+ unsigned long marchid;
+ unsigned long mimpid;
+};
+
+DECLARE_PER_CPU(struct riscv_cpuinfo, riscv_cpuinfo);
+
+#endif
diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index 1b9a5a66e55a..684e5419d37d 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -7,6 +7,7 @@
#include <linux/init.h>
#include <linux/seq_file.h>
#include <linux/of.h>
+#include <asm/cpufeature.h>
#include <asm/csr.h>
#include <asm/hwcap.h>
#include <asm/sbi.h>
@@ -70,12 +71,7 @@ int riscv_of_parent_hartid(struct device_node *node, unsigned long *hartid)
return -1;
}

-struct riscv_cpuinfo {
- unsigned long mvendorid;
- unsigned long marchid;
- unsigned long mimpid;
-};
-static DEFINE_PER_CPU(struct riscv_cpuinfo, riscv_cpuinfo);
+DEFINE_PER_CPU(struct riscv_cpuinfo, riscv_cpuinfo);

unsigned long riscv_cached_mvendorid(unsigned int cpu_id)
{
--
2.25.1


2023-02-06 20:15:39

by Evan Green

[permalink] [raw]
Subject: [PATCH v2 2/6] RISC-V: Add a syscall for HW probing

We don't have enough space for these all in ELF_HWCAP{,2} and there's no
system call that quite does this, so let's just provide an arch-specific
one to probe for hardware capabilities. This currently just provides
m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
the future.

Co-developed-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Evan Green <[email protected]>

---

Changes in v2:
- Changed the interface to look more like poll(). Rather than supplying
key_offset and getting back an array of values with numerically
contiguous keys, have the user pre-fill the key members of the array,
and the kernel will fill in the corresponding values. For any key it
doesn't recognize, it will set the key of that element to -1. This
allows usermode to quickly ask for exactly the elements it cares
about, and not get bogged down in a back and forth about newer keys
that older kernels might not recognize. In other words, the kernel
can communicate that it doesn't recognize some of the keys while
still providing the data for the keys it does know.
- Added a shortcut to the cpuset parameters that if a size of 0 and
NULL is provided for the CPU set, the kernel will use a cpu mask of
all online CPUs. This is convenient because I suspect most callers
will only want to act on a feature if it's supported on all CPUs, and
it's a headache to dynamically allocate an array of all 1s, not to
mention a waste to have the kernel loop over all of the offline bits.


---
Documentation/riscv/hwprobe.rst | 37 +++++++
Documentation/riscv/index.rst | 1 +
arch/riscv/include/asm/hwprobe.h | 13 +++
arch/riscv/include/asm/syscall.h | 3 +
arch/riscv/include/uapi/asm/hwprobe.h | 25 +++++
arch/riscv/include/uapi/asm/unistd.h | 8 ++
arch/riscv/kernel/cpu.c | 3 +-
arch/riscv/kernel/sys_riscv.c | 146 +++++++++++++++++++++++++-
8 files changed, 234 insertions(+), 2 deletions(-)
create mode 100644 Documentation/riscv/hwprobe.rst
create mode 100644 arch/riscv/include/asm/hwprobe.h
create mode 100644 arch/riscv/include/uapi/asm/hwprobe.h

diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
new file mode 100644
index 000000000000..97771090e972
--- /dev/null
+++ b/Documentation/riscv/hwprobe.rst
@@ -0,0 +1,37 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+RISC-V Hardware Probing Interface
+---------------------------------
+
+The RISC-V hardware probing interface is based around a single syscall, which
+is defined in <asm/hwprobe.h>::
+
+ struct riscv_hwprobe {
+ __s64 key;
+ __u64 value;
+ };
+
+ long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
+ size_t cpu_count, cpu_set_t *cpus,
+ unsigned long flags);
+
+The arguments are split into three groups: an array of key-value pairs, a CPU
+set, and some flags. The key-value pairs are supplied with a count. Userspace
+must prepopulate the key field for each element, and the kernel will fill in the
+value if the key is recognized. If a key is unknown to the kernel, its key
+field will be cleared to -1, and its value set to 0. The CPU set is defined by
+CPU_SET(3), the indicated features will be supported on all CPUs in the set.
+Usermode can supply NULL for cpus and 0 for cpu_count as a shortcut for all
+online CPUs. There are currently no flags, this value must be zero for future
+compatibility.
+
+On success 0 is returned, on failure a negative error code is returned.
+
+The following keys are defined:
+
+* :RISCV_HWPROBE_KEY_MVENDORID:: Contains the value of :mvendorid:, as per the
+ ISA specifications.
+* :RISCV_HWPROBE_KEY_MARCHID:: Contains the value of :marchid:, as per the ISA
+ specifications.
+* :RISCV_HWPROBE_KEY_MIMPLID:: Contains the value of :mimplid:, as per the ISA
+ specifications.
diff --git a/Documentation/riscv/index.rst b/Documentation/riscv/index.rst
index 2e5b18fbb145..175a91db0200 100644
--- a/Documentation/riscv/index.rst
+++ b/Documentation/riscv/index.rst
@@ -7,6 +7,7 @@ RISC-V architecture

boot-image-header
vm-layout
+ hwprobe
patch-acceptance
uabi

diff --git a/arch/riscv/include/asm/hwprobe.h b/arch/riscv/include/asm/hwprobe.h
new file mode 100644
index 000000000000..08d1c3bdd78a
--- /dev/null
+++ b/arch/riscv/include/asm/hwprobe.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright 2022 Rivos, Inc
+ */
+
+#ifndef _ASM_HWPROBE_H
+#define _ASM_HWPROBE_H
+
+#include <uapi/asm/hwprobe.h>
+
+#define RISCV_HWPROBE_MAX_KEY 2
+
+#endif
diff --git a/arch/riscv/include/asm/syscall.h b/arch/riscv/include/asm/syscall.h
index 384a63b86420..78a6302ef711 100644
--- a/arch/riscv/include/asm/syscall.h
+++ b/arch/riscv/include/asm/syscall.h
@@ -75,4 +75,7 @@ static inline int syscall_get_arch(struct task_struct *task)
}

asmlinkage long sys_riscv_flush_icache(uintptr_t, uintptr_t, uintptr_t);
+
+asmlinkage long sys_riscv_hwprobe(uintptr_t, uintptr_t, uintptr_t, uintptr_t,
+ uintptr_t, uintptr_t);
#endif /* _ASM_RISCV_SYSCALL_H */
diff --git a/arch/riscv/include/uapi/asm/hwprobe.h b/arch/riscv/include/uapi/asm/hwprobe.h
new file mode 100644
index 000000000000..591802047460
--- /dev/null
+++ b/arch/riscv/include/uapi/asm/hwprobe.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright 2022 Rivos, Inc
+ */
+
+#ifndef _UAPI_ASM_HWPROBE_H
+#define _UAPI_ASM_HWPROBE_H
+
+#include <linux/types.h>
+
+/*
+ * Interface for probing hardware capabilities from userspace, see
+ * Documentation/riscv/hwprobe.rst for more information.
+ */
+struct riscv_hwprobe {
+ __s64 key;
+ __u64 value;
+};
+
+#define RISCV_HWPROBE_KEY_MVENDORID 0
+#define RISCV_HWPROBE_KEY_MARCHID 1
+#define RISCV_HWPROBE_KEY_MIMPID 2
+/* Increase RISCV_HWPROBE_MAX_KEY when adding items. */
+
+#endif
diff --git a/arch/riscv/include/uapi/asm/unistd.h b/arch/riscv/include/uapi/asm/unistd.h
index 73d7cdd2ec49..37d47302322a 100644
--- a/arch/riscv/include/uapi/asm/unistd.h
+++ b/arch/riscv/include/uapi/asm/unistd.h
@@ -43,3 +43,11 @@
#define __NR_riscv_flush_icache (__NR_arch_specific_syscall + 15)
#endif
__SYSCALL(__NR_riscv_flush_icache, sys_riscv_flush_icache)
+
+/*
+ * Allows userspace to probe
+ */
+#ifndef __NR_riscv_hwprobe
+#define __NR_riscv_hwprobe (__NR_arch_specific_syscall + 14)
+#endif
+__SYSCALL(__NR_riscv_hwprobe, sys_riscv_hwprobe)
diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index 684e5419d37d..d0fb3567cc3d 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -4,15 +4,16 @@
*/

#include <linux/cpu.h>
+#include <linux/cpuhotplug.h>
#include <linux/init.h>
#include <linux/seq_file.h>
#include <linux/of.h>
#include <asm/cpufeature.h>
#include <asm/csr.h>
#include <asm/hwcap.h>
+#include <asm/pgtable.h>
#include <asm/sbi.h>
#include <asm/smp.h>
-#include <asm/pgtable.h>

/*
* Returns the hart ID of the given device tree node, or -ENODEV if the node
diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
index 5d3f2fbeb33c..868a12384f5a 100644
--- a/arch/riscv/kernel/sys_riscv.c
+++ b/arch/riscv/kernel/sys_riscv.c
@@ -6,8 +6,11 @@
*/

#include <linux/syscalls.h>
-#include <asm/unistd.h>
#include <asm/cacheflush.h>
+#include <asm/cpufeature.h>
+#include <asm/hwprobe.h>
+#include <asm/uaccess.h>
+#include <asm/unistd.h>
#include <asm-generic/mman-common.h>

static long riscv_sys_mmap(unsigned long addr, unsigned long len,
@@ -69,3 +72,144 @@ SYSCALL_DEFINE3(riscv_flush_icache, uintptr_t, start, uintptr_t, end,

return 0;
}
+
+/*
+ * The hwprobe interface, for allowing userspace to probe to see which features
+ * are supported by the hardware. See Documentation/riscv/hwprobe.rst for more
+ * details.
+ */
+static int set_hwprobe(struct riscv_hwprobe __user *pair, u64 val)
+{
+ long ret;
+
+ ret = put_user(val, &pair->value);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+static long hwprobe_mid(struct riscv_hwprobe __user *pair, size_t key,
+ cpumask_t *cpus)
+{
+ long cpu, id;
+ bool first, valid;
+
+ first = true;
+ valid = false;
+ for_each_cpu(cpu, cpus) {
+ struct riscv_cpuinfo *ci = per_cpu_ptr(&riscv_cpuinfo, cpu);
+ long cpu_id;
+
+ switch (key) {
+ case RISCV_HWPROBE_KEY_MVENDORID:
+ cpu_id = ci->mvendorid;
+ break;
+ case RISCV_HWPROBE_KEY_MIMPID:
+ cpu_id = ci->mimpid;
+ break;
+ case RISCV_HWPROBE_KEY_MARCHID:
+ cpu_id = ci->marchid;
+ break;
+ }
+
+ if (first) {
+ id = cpu_id;
+ valid = true;
+ }
+
+ if (id != cpu_id)
+ valid = false;
+ }
+
+ /*
+ * put_user() returns 0 on success, so use 1 to indicate it wasn't
+ * called and we should skip having incremented the output.
+ */
+ if (!valid)
+ return 1;
+
+ return set_hwprobe(pair, id);
+}
+
+static
+long do_riscv_hwprobe(struct riscv_hwprobe __user *pairs, long pair_count,
+ long cpu_count, unsigned long __user *cpus_user,
+ unsigned long flags)
+{
+ size_t out;
+ s64 key;
+ long ret;
+ struct cpumask cpus;
+
+ /* Check the reserved flags. */
+ if (flags != 0)
+ return -EINVAL;
+
+ /*
+ * The only supported values must be the same on all CPUs. Allow
+ * userspace to specify NULL and 0 as a shortcut to all online CPUs.
+ */
+ cpumask_clear(&cpus);
+ if ((cpu_count == 0) && (cpus_user == NULL)) {
+ cpumask_copy(&cpus, cpu_online_mask);
+ } else {
+ if (cpu_count > cpumask_size())
+ cpu_count = cpumask_size();
+ ret = copy_from_user(&cpus, cpus_user, cpu_count);
+ if (!ret)
+ return -EFAULT;
+
+ /*
+ * Userspace must provide at least one online CPU, without that there's
+ * no way to define what is supported.
+ */
+ cpumask_and(&cpus, &cpus, cpu_online_mask);
+ if (cpumask_empty(&cpus))
+ return -EINVAL;
+ }
+
+ for (out = 0; out < pair_count; out++, pairs++) {
+ long ret;
+
+ if (get_user(key, &pairs->key))
+ return -EFAULT;
+
+ switch (key) {
+ case RISCV_HWPROBE_KEY_MVENDORID:
+ case RISCV_HWPROBE_KEY_MARCHID:
+ case RISCV_HWPROBE_KEY_MIMPID:
+ ret = hwprobe_mid(pairs, key, &cpus);
+ break;
+
+ /*
+ * For forward compatibility, unknown keys don't fail the whole
+ * call, but get their element key set to -1 and value set to 0
+ * indicating they're unrecognized.
+ */
+ default:
+ ret = put_user(-1, &pairs->key);
+ if (ret < 0)
+ return ret;
+
+ ret = set_hwprobe(pairs, 0);
+ if (ret)
+ return ret;
+
+ break;
+ }
+
+ if (ret < 0)
+ return ret;
+ }
+
+ return 0;
+
+}
+
+SYSCALL_DEFINE5(riscv_hwprobe, uintptr_t, pairs, uintptr_t, pair_count,
+ uintptr_t, cpu_count, uintptr_t, cpus, uintptr_t, flags)
+{
+ return do_riscv_hwprobe((void __user *)pairs, pair_count, cpu_count,
+ (void __user *)cpus, flags);
+}
--
2.25.1


2023-02-06 20:15:44

by Evan Green

[permalink] [raw]
Subject: [PATCH v2 3/6] RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA

From: Palmer Dabbelt <[email protected]>

We have an implicit set of base behaviors that userspace depends on,
which are mostly defined in various ISA specifications.

Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Evan Green <[email protected]>
---

(no changes since v1)

Documentation/riscv/hwprobe.rst | 16 ++++++++++++++++
arch/riscv/include/asm/hwprobe.h | 2 +-
arch/riscv/include/uapi/asm/hwprobe.h | 6 +++++-
arch/riscv/kernel/sys_riscv.c | 23 +++++++++++++++++++++++
4 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
index 97771090e972..ce186967861f 100644
--- a/Documentation/riscv/hwprobe.rst
+++ b/Documentation/riscv/hwprobe.rst
@@ -35,3 +35,19 @@ The following keys are defined:
specifications.
* :RISCV_HWPROBE_KEY_MIMPLID:: Contains the value of :mimplid:, as per the ISA
specifications.
+* :RISCV_HWPROBE_KEY_BASE_BEHAVIOR:: A bitmask containing the base user-visible
+ behavior that this kernel supports. The following base user ABIs are defined:
+ * :RISCV_HWPROBE_BASE_BEHAVIOR_IMA:: Support for rv32ima or rv64ima, as
+ defined by version 2.2 of the user ISA and version 1.10 of the privileged
+ ISA, with the following known exceptions (more exceptions may be added,
+ but only if it can be demonstrated that the user ABI is not broken):
+ * The :fence.i: instruction cannot be directly executed by userspace
+ programs (it may still be executed in userspace via a
+ kernel-controlled mechanism such as the vDSO).
+* :RISCV_HWPROBE_KEY_IMA_EXT_0:: A bitmask containing the extensions that are
+ compatible with the :RISCV_HWPROBE_BASE_BEHAVIOR_IMA: base system behavior.
+ * :RISCV_HWPROBE_IMA_FD:: The F and D extensions are supported, as defined
+ by commit cd20cee ("FMIN/FMAX now implement minimumNumber/maximumNumber,
+ not minNum/maxNum") of the RISC-V ISA manual.
+ * :RISCV_HWPROBE_IMA_C:: The C extension is supported, as defined by
+ version 2.2 of the RISC-V ISA manual.
diff --git a/arch/riscv/include/asm/hwprobe.h b/arch/riscv/include/asm/hwprobe.h
index 08d1c3bdd78a..7e52f1e1fe10 100644
--- a/arch/riscv/include/asm/hwprobe.h
+++ b/arch/riscv/include/asm/hwprobe.h
@@ -8,6 +8,6 @@

#include <uapi/asm/hwprobe.h>

-#define RISCV_HWPROBE_MAX_KEY 2
+#define RISCV_HWPROBE_MAX_KEY 4

#endif
diff --git a/arch/riscv/include/uapi/asm/hwprobe.h b/arch/riscv/include/uapi/asm/hwprobe.h
index 591802047460..ce39d6e74103 100644
--- a/arch/riscv/include/uapi/asm/hwprobe.h
+++ b/arch/riscv/include/uapi/asm/hwprobe.h
@@ -20,6 +20,10 @@ struct riscv_hwprobe {
#define RISCV_HWPROBE_KEY_MVENDORID 0
#define RISCV_HWPROBE_KEY_MARCHID 1
#define RISCV_HWPROBE_KEY_MIMPID 2
+#define RISCV_HWPROBE_KEY_BASE_BEHAVIOR 3
+#define RISCV_HWPROBE_BASE_BEHAVIOR_IMA (1 << 0)
+#define RISCV_HWPROBE_KEY_IMA_EXT_0 4
+#define RISCV_HWPROBE_IMA_FD (1 << 0)
+#define RISCV_HWPROBE_IMA_C (1 << 1)
/* Increase RISCV_HWPROBE_MAX_KEY when adding items. */
-
#endif
diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
index 868a12384f5a..74e0d72c877d 100644
--- a/arch/riscv/kernel/sys_riscv.c
+++ b/arch/riscv/kernel/sys_riscv.c
@@ -9,6 +9,7 @@
#include <asm/cacheflush.h>
#include <asm/cpufeature.h>
#include <asm/hwprobe.h>
+#include <asm/switch_to.h>
#include <asm/uaccess.h>
#include <asm/unistd.h>
#include <asm-generic/mman-common.h>
@@ -182,6 +183,28 @@ long do_riscv_hwprobe(struct riscv_hwprobe __user *pairs, long pair_count,
ret = hwprobe_mid(pairs, key, &cpus);
break;

+ /*
+ * The kernel already assumes that the base single-letter ISA
+ * extensions are supported on all harts, and only supports the
+ * IMA base, so just cheat a bit here and tell that to
+ * userspace.
+ */
+ case RISCV_HWPROBE_KEY_BASE_BEHAVIOR:
+ ret = set_hwprobe(pairs, RISCV_HWPROBE_BASE_BEHAVIOR_IMA);
+ break;
+
+ case RISCV_HWPROBE_KEY_IMA_EXT_0:
+ {
+ u64 val = 0;
+
+ if (has_fpu())
+ val |= RISCV_HWPROBE_IMA_FD;
+ if (elf_hwcap & RISCV_ISA_EXT_c)
+ val |= RISCV_HWPROBE_IMA_C;
+ ret = set_hwprobe(pairs, val);
+ }
+ break;
+
/*
* For forward compatibility, unknown keys don't fail the whole
* call, but get their element key set to -1 and value set to 0
--
2.25.1


2023-02-06 20:15:55

by Evan Green

[permalink] [raw]
Subject: [PATCH v2 4/6] dt-bindings: Add RISC-V misaligned access performance

From: Palmer Dabbelt <[email protected]>

This key allows device trees to specify the performance of misaligned
accesses to main memory regions from each CPU in the system.

Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Evan Green <[email protected]>
---

(no changes since v1)

Documentation/devicetree/bindings/riscv/cpus.yaml | 15 +++++++++++++++
1 file changed, 15 insertions(+)

diff --git a/Documentation/devicetree/bindings/riscv/cpus.yaml b/Documentation/devicetree/bindings/riscv/cpus.yaml
index c6720764e765..2c09bd6f2927 100644
--- a/Documentation/devicetree/bindings/riscv/cpus.yaml
+++ b/Documentation/devicetree/bindings/riscv/cpus.yaml
@@ -85,6 +85,21 @@ properties:
$ref: "/schemas/types.yaml#/definitions/string"
pattern: ^rv(?:64|32)imaf?d?q?c?b?v?k?h?(?:_[hsxz](?:[a-z])+)*$

+ riscv,misaligned-access-performance:
+ description:
+ Identifies the performance of misaligned memory accesses to main memory
+ regions. There are three flavors of unaligned access performance: "emulated"
+ means that misaligned accesses are emulated via software and thus
+ extremely slow, "slow" means that misaligned accesses are supported by
+ hardware but still slower that aligned accesses sequences, and "fast"
+ means that misaligned accesses are as fast or faster than the
+ cooresponding aligned accesses sequences.
+ $ref: "/schemas/types.yaml#/definitions/string"
+ enum:
+ - emulated
+ - slow
+ - fast
+
# RISC-V requires 'timebase-frequency' in /cpus, so disallow it here
timebase-frequency: false

--
2.25.1


2023-02-06 20:16:01

by Evan Green

[permalink] [raw]
Subject: [PATCH v2 5/6] RISC-V: hwprobe: Support probing of misaligned access performance

This allows userspace to select various routines to use based on the
performance of misaligned access on the target hardware.

Co-developed-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Evan Green <[email protected]>

---

Changes in v2:
- Fixed logic error in if(of_property_read_string...) that caused crash
- Include cpufeature.h in cpufeature.h to avoid undeclared variable
warning.
- Added a _MASK define
- Fix random checkpatch complaints

Documentation/riscv/hwprobe.rst | 13 +++++++++++
arch/riscv/include/asm/cpufeature.h | 2 ++
arch/riscv/include/asm/hwprobe.h | 2 +-
arch/riscv/include/asm/smp.h | 9 ++++++++
arch/riscv/include/uapi/asm/hwprobe.h | 6 ++++++
arch/riscv/kernel/cpufeature.c | 31 +++++++++++++++++++++++++--
arch/riscv/kernel/sys_riscv.c | 23 ++++++++++++++++++++
7 files changed, 83 insertions(+), 3 deletions(-)

diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
index ce186967861f..0dc75e83e127 100644
--- a/Documentation/riscv/hwprobe.rst
+++ b/Documentation/riscv/hwprobe.rst
@@ -51,3 +51,16 @@ The following keys are defined:
not minNum/maxNum") of the RISC-V ISA manual.
* :RISCV_HWPROBE_IMA_C:: The C extension is supported, as defined by
version 2.2 of the RISC-V ISA manual.
+* :RISCV_HWPROBE_KEY_PERF_0:: A bitmask that contains performance information
+ about the selected set of processors.
+ * :RISCV_HWPROBE_MISALIGNED_UNKNOWN:: The performance of misaligned
+ accesses is unknown.
+ * :RISCV_HWPROBE_MISALIGNED_EMULATED:: Misaligned accesses are emulated via
+ software, either in or below the kernel. These accesses are always
+ extremely slow.
+ * :RISCV_HWPROBE_MISALIGNED_SLOW:: Misaligned accesses are supported in
+ hardware, but are slower than the cooresponding aligned accesses
+ sequences.
+ * :RISCV_HWPROBE_MISALIGNED_FAST:: Misaligned accesses are supported in
+ hardware and are faster than the cooresponding aligned accesses
+ sequences.
diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
index 66c251d98290..ac51a9e6387a 100644
--- a/arch/riscv/include/asm/cpufeature.h
+++ b/arch/riscv/include/asm/cpufeature.h
@@ -18,4 +18,6 @@ struct riscv_cpuinfo {

DECLARE_PER_CPU(struct riscv_cpuinfo, riscv_cpuinfo);

+DECLARE_PER_CPU(long, misaligned_access_speed);
+
#endif
diff --git a/arch/riscv/include/asm/hwprobe.h b/arch/riscv/include/asm/hwprobe.h
index 7e52f1e1fe10..4e45e33015bc 100644
--- a/arch/riscv/include/asm/hwprobe.h
+++ b/arch/riscv/include/asm/hwprobe.h
@@ -8,6 +8,6 @@

#include <uapi/asm/hwprobe.h>

-#define RISCV_HWPROBE_MAX_KEY 4
+#define RISCV_HWPROBE_MAX_KEY 5

#endif
diff --git a/arch/riscv/include/asm/smp.h b/arch/riscv/include/asm/smp.h
index 3831b638ecab..6c1759091e44 100644
--- a/arch/riscv/include/asm/smp.h
+++ b/arch/riscv/include/asm/smp.h
@@ -26,6 +26,15 @@ struct riscv_ipi_ops {
*/
extern unsigned long __cpuid_to_hartid_map[NR_CPUS];
#define cpuid_to_hartid_map(cpu) __cpuid_to_hartid_map[cpu]
+static inline long hartid_to_cpuid_map(unsigned long hartid)
+{
+ long i;
+
+ for (i = 0; i < NR_CPUS; ++i)
+ if (cpuid_to_hartid_map(i) == hartid)
+ return i;
+ return -1;
+}

/* print IPI stats */
void show_ipi_stats(struct seq_file *p, int prec);
diff --git a/arch/riscv/include/uapi/asm/hwprobe.h b/arch/riscv/include/uapi/asm/hwprobe.h
index ce39d6e74103..5d55e2da2b1f 100644
--- a/arch/riscv/include/uapi/asm/hwprobe.h
+++ b/arch/riscv/include/uapi/asm/hwprobe.h
@@ -25,5 +25,11 @@ struct riscv_hwprobe {
#define RISCV_HWPROBE_KEY_IMA_EXT_0 4
#define RISCV_HWPROBE_IMA_FD (1 << 0)
#define RISCV_HWPROBE_IMA_C (1 << 1)
+#define RISCV_HWPROBE_KEY_CPUPERF_0 5
+#define RISCV_HWPROBE_MISALIGNED_UNKNOWN (0 << 0)
+#define RISCV_HWPROBE_MISALIGNED_EMULATED (1 << 0)
+#define RISCV_HWPROBE_MISALIGNED_SLOW (2 << 0)
+#define RISCV_HWPROBE_MISALIGNED_FAST (3 << 0)
+#define RISCV_HWPROBE_MISALIGNED_MASK (3 << 0)
/* Increase RISCV_HWPROBE_MAX_KEY when adding items. */
#endif
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 93e45560af30..12af6f7a2f53 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -14,8 +14,10 @@
#include <linux/of.h>
#include <asm/alternative.h>
#include <asm/cacheflush.h>
+#include <asm/cpufeature.h>
#include <asm/errata_list.h>
#include <asm/hwcap.h>
+#include <asm/hwprobe.h>
#include <asm/patch.h>
#include <asm/pgtable.h>
#include <asm/processor.h>
@@ -32,6 +34,9 @@ static DECLARE_BITMAP(riscv_isa, RISCV_ISA_EXT_MAX) __read_mostly;
DEFINE_STATIC_KEY_ARRAY_FALSE(riscv_isa_ext_keys, RISCV_ISA_EXT_KEY_MAX);
EXPORT_SYMBOL(riscv_isa_ext_keys);

+/* Performance information */
+DEFINE_PER_CPU(long, misaligned_access_speed);
+
/**
* riscv_isa_extension_base() - Get base extension word
*
@@ -89,11 +94,11 @@ static bool riscv_isa_extension_check(int id)
void __init riscv_fill_hwcap(void)
{
struct device_node *node;
- const char *isa;
+ const char *isa, *misaligned;
char print_str[NUM_ALPHA_EXTS + 1];
int i, j, rc;
unsigned long isa2hwcap[26] = {0};
- unsigned long hartid;
+ unsigned long hartid, cpu;

isa2hwcap['i' - 'a'] = COMPAT_HWCAP_ISA_I;
isa2hwcap['m' - 'a'] = COMPAT_HWCAP_ISA_M;
@@ -246,6 +251,28 @@ void __init riscv_fill_hwcap(void)
bitmap_copy(riscv_isa, this_isa, RISCV_ISA_EXT_MAX);
else
bitmap_and(riscv_isa, riscv_isa, this_isa, RISCV_ISA_EXT_MAX);
+
+ /*
+ * Check for the performance of misaligned accesses.
+ */
+ cpu = hartid_to_cpuid_map(hartid);
+ if (cpu < 0)
+ continue;
+
+ if (!of_property_read_string(node, "riscv,misaligned-access-performance",
+ &misaligned)) {
+ if (strcmp(misaligned, "emulated") == 0)
+ per_cpu(misaligned_access_speed, cpu) =
+ RISCV_HWPROBE_MISALIGNED_EMULATED;
+
+ if (strcmp(misaligned, "slow") == 0)
+ per_cpu(misaligned_access_speed, cpu) =
+ RISCV_HWPROBE_MISALIGNED_SLOW;
+
+ if (strcmp(misaligned, "fast") == 0)
+ per_cpu(misaligned_access_speed, cpu) =
+ RISCV_HWPROBE_MISALIGNED_FAST;
+ }
}

/* We don't support systems with F but without D, so mask those out
diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
index 74e0d72c877d..73d937c54f4e 100644
--- a/arch/riscv/kernel/sys_riscv.c
+++ b/arch/riscv/kernel/sys_riscv.c
@@ -133,6 +133,25 @@ static long hwprobe_mid(struct riscv_hwprobe __user *pair, size_t key,
return set_hwprobe(pair, id);
}

+static long hwprobe_misaligned(cpumask_t *cpus)
+{
+ long cpu, perf = -1;
+
+ for_each_cpu(cpu, cpus) {
+ long this_perf = per_cpu(misaligned_access_speed, cpu);
+
+ if (perf == -1)
+ perf = this_perf;
+
+ if (perf != this_perf)
+ perf = RISCV_HWPROBE_MISALIGNED_UNKNOWN;
+ }
+
+ if (perf == -1)
+ return RISCV_HWPROBE_MISALIGNED_UNKNOWN;
+ return perf;
+}
+
static
long do_riscv_hwprobe(struct riscv_hwprobe __user *pairs, long pair_count,
long cpu_count, unsigned long __user *cpus_user,
@@ -205,6 +224,10 @@ long do_riscv_hwprobe(struct riscv_hwprobe __user *pairs, long pair_count,
}
break;

+ case RISCV_HWPROBE_KEY_CPUPERF_0:
+ ret = set_hwprobe(pairs, hwprobe_misaligned(&cpus));
+ break;
+
/*
* For forward compatibility, unknown keys don't fail the whole
* call, but get their element key set to -1 and value set to 0
--
2.25.1


2023-02-06 20:16:12

by Evan Green

[permalink] [raw]
Subject: [PATCH v2 6/6] selftests: Test the new RISC-V hwprobe interface

This adds a test for the recently added RISC-V interface for probing
hardware capabilities. It happens to be the first selftest we have for
RISC-V, so I've added some infrastructure for those as well. The build
stuff looks pretty straight-forward, but there's also a tiny C library
to avoid coupling this to any userspace implementation.

Co-developed-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Evan Green <[email protected]>

---

Changes in v2:
- Updated the selftests to the new API and added some more.
- Fixed indentation, comments in .S, and general checkpatch complaints.


---
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/riscv/Makefile | 58 ++++++++++++
.../testing/selftests/riscv/hwprobe/Makefile | 10 +++
.../testing/selftests/riscv/hwprobe/hwprobe.c | 89 +++++++++++++++++++
.../selftests/riscv/hwprobe/sys_hwprobe.S | 12 +++
tools/testing/selftests/riscv/libc.S | 46 ++++++++++
6 files changed, 216 insertions(+)
create mode 100644 tools/testing/selftests/riscv/Makefile
create mode 100644 tools/testing/selftests/riscv/hwprobe/Makefile
create mode 100644 tools/testing/selftests/riscv/hwprobe/hwprobe.c
create mode 100644 tools/testing/selftests/riscv/hwprobe/sys_hwprobe.S
create mode 100644 tools/testing/selftests/riscv/libc.S

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 41b649452560..a599ef726310 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -62,6 +62,7 @@ TARGETS += pstore
TARGETS += ptrace
TARGETS += openat2
TARGETS += resctrl
+TARGETS += riscv
TARGETS += rlimits
TARGETS += rseq
TARGETS += rtc
diff --git a/tools/testing/selftests/riscv/Makefile b/tools/testing/selftests/riscv/Makefile
new file mode 100644
index 000000000000..32a72902d045
--- /dev/null
+++ b/tools/testing/selftests/riscv/Makefile
@@ -0,0 +1,58 @@
+# SPDX-License-Identifier: GPL-2.0
+# Originally tools/testing/arm64/Makefile
+
+# When ARCH not overridden for crosscompiling, lookup machine
+ARCH ?= $(shell uname -m 2>/dev/null || echo not)
+
+ifneq (,$(filter $(ARCH),riscv))
+RISCV_SUBTARGETS ?= hwprobe
+else
+RISCV_SUBTARGETS :=
+endif
+
+CFLAGS := -Wall -O2 -g
+
+# A proper top_srcdir is needed by KSFT(lib.mk)
+top_srcdir = $(realpath ../../../../)
+
+# Additional include paths needed by kselftest.h and local headers
+CFLAGS += -I$(top_srcdir)/tools/testing/selftests/
+
+CFLAGS += $(KHDR_INCLUDES)
+
+export CFLAGS
+export top_srcdir
+
+all:
+ @for DIR in $(RISCV_SUBTARGETS); do \
+ BUILD_TARGET=$(OUTPUT)/$$DIR; \
+ mkdir -p $$BUILD_TARGET; \
+ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$DIR $@; \
+ done
+
+install: all
+ @for DIR in $(RISCV_SUBTARGETS); do \
+ BUILD_TARGET=$(OUTPUT)/$$DIR; \
+ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$DIR $@; \
+ done
+
+run_tests: all
+ @for DIR in $(RISCV_SUBTARGETS); do \
+ BUILD_TARGET=$(OUTPUT)/$$DIR; \
+ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$DIR $@; \
+ done
+
+# Avoid any output on non riscv on emit_tests
+emit_tests: all
+ @for DIR in $(RISCV_SUBTARGETS); do \
+ BUILD_TARGET=$(OUTPUT)/$$DIR; \
+ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$DIR $@; \
+ done
+
+clean:
+ @for DIR in $(RISCV_SUBTARGETS); do \
+ BUILD_TARGET=$(OUTPUT)/$$DIR; \
+ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$DIR $@; \
+ done
+
+.PHONY: all clean install run_tests emit_tests
diff --git a/tools/testing/selftests/riscv/hwprobe/Makefile b/tools/testing/selftests/riscv/hwprobe/Makefile
new file mode 100644
index 000000000000..614501584803
--- /dev/null
+++ b/tools/testing/selftests/riscv/hwprobe/Makefile
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2021 ARM Limited
+# Originally tools/testing/arm64/abi/Makefile
+
+TEST_GEN_PROGS := hwprobe
+
+include ../../lib.mk
+
+$(OUTPUT)/hwprobe: hwprobe.c ../libc.S sys_hwprobe.S
+ $(CC) -o$@ $(CFLAGS) $(LDFLAGS) -nostdlib $^
diff --git a/tools/testing/selftests/riscv/hwprobe/hwprobe.c b/tools/testing/selftests/riscv/hwprobe/hwprobe.c
new file mode 100644
index 000000000000..ddfb61de2938
--- /dev/null
+++ b/tools/testing/selftests/riscv/hwprobe/hwprobe.c
@@ -0,0 +1,89 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <asm/hwprobe.h>
+
+/*
+ * Rather than relying on having a new enough libc to define this, just do it
+ * ourselves. This way we don't need to be coupled to a new-enough libc to
+ * contain the call.
+ */
+long riscv_hwprobe(struct riscv_hwprobe *pairs, long pair_count,
+ long cpu_count, unsigned long *cpus, unsigned long flags);
+
+int main(int argc, char **argv)
+{
+ struct riscv_hwprobe pairs[8];
+ unsigned long cpus;
+ long out;
+
+ /* Fake the CPU_SET ops. */
+ cpus = -1;
+
+ /*
+ * Just run a basic test: pass enough pairs to get up to the base
+ * behavior, and then check to make sure it's sane.
+ */
+ for (long i = 0; i < 8; i++)
+ pairs[i].key = i;
+ out = riscv_hwprobe(pairs, 8, 1, &cpus, 0);
+ if (out != 0)
+ return -1;
+ for (long i = 0; i < 4; ++i) {
+ /* Fail if the kernel claims not to recognize a base key. */
+ if ((i < 4) && (pairs[i].key != i))
+ return -2;
+
+ if (pairs[i].key != RISCV_HWPROBE_KEY_BASE_BEHAVIOR)
+ continue;
+
+ if (pairs[i].value & RISCV_HWPROBE_BASE_BEHAVIOR_IMA)
+ continue;
+
+ return -3;
+ }
+
+ /*
+ * This should also work with a NULL CPU set, but should not work
+ * with an improperly supplied CPU set.
+ */
+ out = riscv_hwprobe(pairs, 8, 0, 0, 0);
+ if (out != 0)
+ return -4;
+
+ out = riscv_hwprobe(pairs, 8, 0, &cpus, 0);
+ if (out == 0)
+ return -5;
+
+ out = riscv_hwprobe(pairs, 8, 1, 0, 0);
+ if (out == 0)
+ return -6;
+
+ /*
+ * Check that keys work by providing one that we know exists, and
+ * checking to make sure the resultig pair is what we asked for.
+ */
+ pairs[0].key = RISCV_HWPROBE_KEY_BASE_BEHAVIOR;
+ out = riscv_hwprobe(pairs, 1, 1, &cpus, 0);
+ if (out != 0)
+ return -7;
+ if (pairs[0].key != RISCV_HWPROBE_KEY_BASE_BEHAVIOR)
+ return -8;
+
+ /*
+ * Check that an unknown key gets overwritten with -1,
+ * but doesn't block elements after it.
+ */
+ pairs[0].key = 0x5555;
+ pairs[1].key = 1;
+ pairs[1].value = 0xAAAA;
+ out = riscv_hwprobe(pairs, 2, 0, 0, 0);
+ if (out != 0)
+ return -9;
+
+ if (pairs[0].key != -1)
+ return -10;
+
+ if ((pairs[1].key != 1) || (pairs[1].value == 0xAAAA))
+ return -11;
+
+ return 0;
+}
diff --git a/tools/testing/selftests/riscv/hwprobe/sys_hwprobe.S b/tools/testing/selftests/riscv/hwprobe/sys_hwprobe.S
new file mode 100644
index 000000000000..ed8d28863b27
--- /dev/null
+++ b/tools/testing/selftests/riscv/hwprobe/sys_hwprobe.S
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2022 Rivos, Inc */
+
+.text
+.global riscv_hwprobe
+riscv_hwprobe:
+ # Put __NR_riscv_hwprobe in the syscall number register, then just shim
+ # back the kernel's return. This doesn't do any sort of errno
+ # handling, the caller can deal with it.
+ li a7, 258
+ ecall
+ ret
diff --git a/tools/testing/selftests/riscv/libc.S b/tools/testing/selftests/riscv/libc.S
new file mode 100644
index 000000000000..1041bbea9b6b
--- /dev/null
+++ b/tools/testing/selftests/riscv/libc.S
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2022 Rivos, Inc */
+/* A C library */
+
+#if __riscv_xlen == 64
+# define REG_S sd
+#else
+# define REG_S sw
+#endif
+
+.text
+.global _start
+_start:
+.option push
+.option norelax
+ la gp, __global_pointer$
+.option pop
+
+ la sp, stack
+
+ la t0, heap
+ la t1, brk
+ REG_S t0, 0(t1)
+
+ li a0, 0
+ li a1, 0
+
+ call main
+
+ li a7, 93
+ ecall
+
+1:
+ j 1b
+
+.data
+brk:
+ .long 0
+
+.global heap
+heap:
+.rep 65536
+.byte 0
+.endr
+.global stack
+stack:
--
2.25.1


2023-02-06 21:11:53

by Jessica Clarke

[permalink] [raw]
Subject: Re: [PATCH v2 0/6] RISC-V Hardware Probing User Interface

On 6 Feb 2023, at 20:14, Evan Green <[email protected]> wrote:
>
>
> These are very much up for discussion, as it's a pretty big new user
> interface and it's quite a bit different from how we've historically
> done things: this isn't just providing an ISA string to userspace, this
> has its own format for providing information to userspace.
>
> There's been a bunch of off-list discussions about this, including at
> Plumbers. The original plan was to do something involving providing an
> ISA string to userspace, but ISA strings just aren't sufficient for a
> stable ABI any more: in order to parse an ISA string users need the
> version of the specifications that the string is written to, the version
> of each extension (sometimes at a finer granularity than the RISC-V
> releases/versions encode), and the expected use case for the ISA string
> (ie, is it a U-mode or M-mode string). That's a lot of complexity to
> try and keep ABI compatible and it's probably going to continue to grow,
> as even if there's no more complexity in the specifications we'll have
> to deal with the various ISA string parsing oddities that end up all
> over userspace.
>
> Instead this patch set takes a very different approach and provides a set
> of key/value pairs that encode various bits about the system. The big
> advantage here is that we can clearly define what these mean so we can
> ensure ABI stability, but it also allows us to encode information that's
> unlikely to ever appear in an ISA string (see the misaligned access
> performance, for example). The resulting interface looks a lot like
> what arm64 and x86 do, and will hopefully fit well into something like
> ACPI in the future.
>
> The actual user interface is a syscall. I'm not really sure that's the
> right way to go about this, but it makes for flexible prototying.
> Various other approaches have been talked about like making HWCAP2 a
> pointer, having a VDSO routine, or exposing this via sysfs. Those seem
> like generally reasonable approaches, but I've yet to figure out a way
> to get the general case working without a syscall as that's the only way
> I've come up with to deal with the heterogenous CPU case. Happy to hear
> if someone has a better idea, though, as I don't really want to add a
> syscall if we can avoid it.

Please work with https://github.com/riscv-non-isa/riscv-c-api-doc as
it’s crucial we have a portable standard interface for applications to
query this information that works on OSes other than Linux. This can be
backed by whatever you want, whether a syscall, magic VDSO thing,
sysfs, etc, but it’s key that the exposed interface outside of libc is
not Linux-specific otherwise we’re going to get fragmentation in this
space.

I would encourage figuring out the right shape for the exposed
interface first before continuing to refine details of how that
information gets communicated between the kernel and libc.

Jess

> An example series in glibc exposing this syscall and using it in an
> ifunc selector for memcpy can be found at [1].
>
> [1] https://public-inbox.org/libc-alpha/[email protected]/T/#t
>
> Changes in v2:
> - Changed the interface to look more like poll(). Rather than supplying
> key_offset and getting back an array of values with numerically
> contiguous keys, have the user pre-fill the key members of the array,
> and the kernel will fill in the corresponding values. For any key it
> doesn't recognize, it will set the key of that element to -1. This
> allows usermode to quickly ask for exactly the elements it cares
> about, and not get bogged down in a back and forth about newer keys
> that older kernels might not recognize. In other words, the kernel
> can communicate that it doesn't recognize some of the keys while
> still providing the data for the keys it does know.
> - Added a shortcut to the cpuset parameters that if a size of 0 and
> NULL is provided for the CPU set, the kernel will use a cpu mask of
> all online CPUs. This is convenient because I suspect most callers
> will only want to act on a feature if it's supported on all CPUs, and
> it's a headache to dynamically allocate an array of all 1s, not to
> mention a waste to have the kernel loop over all of the offline bits.
> - Fixed logic error in if(of_property_read_string...) that caused crash
> - Include cpufeature.h in cpufeature.h to avoid undeclared variable
> warning.
> - Added a _MASK define
> - Fix random checkpatch complaints
> - Updated the selftests to the new API and added some more.
> - Fixed indentation, comments in .S, and general checkpatch complaints.
>
> Evan Green (4):
> RISC-V: Move struct riscv_cpuinfo to new header
> RISC-V: Add a syscall for HW probing
> RISC-V: hwprobe: Support probing of misaligned access performance
> selftests: Test the new RISC-V hwprobe interface
>
> Palmer Dabbelt (2):
> RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA
> dt-bindings: Add RISC-V misaligned access performance
>
> .../devicetree/bindings/riscv/cpus.yaml | 15 ++
> Documentation/riscv/hwprobe.rst | 66 ++++++
> Documentation/riscv/index.rst | 1 +
> arch/riscv/include/asm/cpufeature.h | 23 +++
> arch/riscv/include/asm/hwprobe.h | 13 ++
> arch/riscv/include/asm/smp.h | 9 +
> arch/riscv/include/asm/syscall.h | 3 +
> arch/riscv/include/uapi/asm/hwprobe.h | 35 ++++
> arch/riscv/include/uapi/asm/unistd.h | 8 +
> arch/riscv/kernel/cpu.c | 11 +-
> arch/riscv/kernel/cpufeature.c | 31 ++-
> arch/riscv/kernel/sys_riscv.c | 192 +++++++++++++++++-
> tools/testing/selftests/Makefile | 1 +
> tools/testing/selftests/riscv/Makefile | 58 ++++++
> .../testing/selftests/riscv/hwprobe/Makefile | 10 +
> .../testing/selftests/riscv/hwprobe/hwprobe.c | 89 ++++++++
> .../selftests/riscv/hwprobe/sys_hwprobe.S | 12 ++
> tools/testing/selftests/riscv/libc.S | 46 +++++
> 18 files changed, 613 insertions(+), 10 deletions(-)
> create mode 100644 Documentation/riscv/hwprobe.rst
> create mode 100644 arch/riscv/include/asm/cpufeature.h
> create mode 100644 arch/riscv/include/asm/hwprobe.h
> create mode 100644 arch/riscv/include/uapi/asm/hwprobe.h
> create mode 100644 tools/testing/selftests/riscv/Makefile
> create mode 100644 tools/testing/selftests/riscv/hwprobe/Makefile
> create mode 100644 tools/testing/selftests/riscv/hwprobe/hwprobe.c
> create mode 100644 tools/testing/selftests/riscv/hwprobe/sys_hwprobe.S
> create mode 100644 tools/testing/selftests/riscv/libc.S
>
> --
> 2.25.1
>
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv


2023-02-06 21:29:07

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH v2 6/6] selftests: Test the new RISC-V hwprobe interface

On Mon, Feb 06, 2023 at 12:14:55PM -0800, Evan Green wrote:

> +int main(int argc, char **argv)
> +{

> --- /dev/null
> +++ b/tools/testing/selftests/riscv/libc.S

> +.global _start
> +_start:
> +.option push
> +.option norelax
> + la gp, __global_pointer$
> +.option pop
> +
> + la sp, stack
> +
> + la t0, heap
> + la t1, brk
> + REG_S t0, 0(t1)
> +
> + li a0, 0
> + li a1, 0
> +
> + call main

This looks like it's just a standard program entry but I don't speak
RISC-V asm so I might be missing something. If that's the case might it
make sense to use nolibc here?


Attachments:
(No filename) (570.00 B)
signature.asc (488.00 B)
Download all attachments

2023-02-06 21:49:52

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] dt-bindings: Add RISC-V misaligned access performance


On Mon, 06 Feb 2023 12:14:53 -0800, Evan Green wrote:
> From: Palmer Dabbelt <[email protected]>
>
> This key allows device trees to specify the performance of misaligned
> accesses to main memory regions from each CPU in the system.
>
> Signed-off-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Evan Green <[email protected]>
> ---
>
> (no changes since v1)
>
> Documentation/devicetree/bindings/riscv/cpus.yaml | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>

My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
on your patch (DT_CHECKER_FLAGS is new in v5.13):

yamllint warnings/errors:
./Documentation/devicetree/bindings/riscv/cpus.yaml:91:72: [error] syntax error: mapping values are not allowed here (syntax)

dtschema/dtc warnings/errors:
make[1]: *** Deleting file 'Documentation/devicetree/bindings/riscv/cpus.example.dts'
Documentation/devicetree/bindings/riscv/cpus.yaml:91:72: mapping values are not allowed here
make[1]: *** [Documentation/devicetree/bindings/Makefile:26: Documentation/devicetree/bindings/riscv/cpus.example.dts] Error 1
make[1]: *** Waiting for unfinished jobs....
./Documentation/devicetree/bindings/riscv/cpus.yaml:91:72: mapping values are not allowed here
/builds/robherring/dt-review-ci/linux/Documentation/devicetree/bindings/riscv/cpus.yaml: ignoring, error parsing file
make: *** [Makefile:1508: dt_binding_check] Error 2

doc reference errors (make refcheckdocs):

See https://patchwork.ozlabs.org/project/devicetree-bindings/patch/[email protected]

The base for the series is generally the latest rc1. A different dependency
should be noted in *this* patch.

If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:

pip3 install dtschema --upgrade

Please check and re-submit after running the above command yourself. Note
that DT_SCHEMA_FILES can be set to your schema file to speed up checking
your schema. However, it must be unset to test all examples with your schema.


2023-02-06 22:34:01

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v2 0/6] RISC-V Hardware Probing User Interface

Hey Evan,
Having been talking to Palmer about this series at FOSDEM,
it was a very pleasant surprise to see this when I saw this in my inbox when I landed back home.
I do very much intend reviewing this, but...

On 6 February 2023 20:14:49 GMT, Evan Green <[email protected]> wrote:
>
>These are very much up for discussion, as it's a pretty big new user
>interface and it's quite a bit different from how we've historically
>done things: this isn't just providing an ISA string to userspace, this
>has its own format for providing information to userspace.
>
>There's been a bunch of off-list discussions about this, including at
>Plumbers. The original plan was to do something involving providing an
>ISA string to userspace, but ISA strings just aren't sufficient for a
>stable ABI any more: in order to parse an ISA string users need the
>version of the specifications that the string is written to, the version
>of each extension (sometimes at a finer granularity than the RISC-V
>releases/versions encode), and the expected use case for the ISA string
>(ie, is it a U-mode or M-mode string). That's a lot of complexity to
>try and keep ABI compatible and it's probably going to continue to grow,
>as even if there's no more complexity in the specifications we'll have
>to deal with the various ISA string parsing oddities that end up all
>over userspace.
>
>Instead this patch set takes a very different approach and provides a set
>of key/value pairs that encode various bits about the system. The big
>advantage here is that we can clearly define what these mean so we can
>ensure ABI stability, but it also allows us to encode information that's
>unlikely to ever appear in an ISA string (see the misaligned access
>performance, for example). The resulting interface looks a lot like
>what arm64 and x86 do, and will hopefully fit well into something like
>ACPI in the future.
>
>The actual user interface is a syscall. I'm not really sure that's the
>right way to go about this, but it makes for flexible prototying.
>Various other approaches have been talked about like making HWCAP2 a
>pointer, having a VDSO routine, or exposing this via sysfs. Those seem
>like generally reasonable approaches, but I've yet to figure out a way

This all looks to be the same cover message as Palmer submitted with the v1,
so, as I'd mentioned to him the other day, I'd like to do a bit
of an investigation into the sysfs approach drew suggested
on the v1.
So, if it's a little bit before you hear - I've certainly not forgotten about the series!

Thanks,
Conor.

>to get the general case working without a syscall as that's the only way
>I've come up with to deal with the heterogenous CPU case. Happy to hear
>if someone has a better idea, though, as I don't really want to add a
>syscall if we can avoid it.
>
>An example series in glibc exposing this syscall and using it in an
>ifunc selector for memcpy can be found at [1].
>
>[1] https://public-inbox.org/libc-alpha/[email protected]/T/#t
>
>Changes in v2:
> - Changed the interface to look more like poll(). Rather than supplying
> key_offset and getting back an array of values with numerically
> contiguous keys, have the user pre-fill the key members of the array,
> and the kernel will fill in the corresponding values. For any key it
> doesn't recognize, it will set the key of that element to -1. This
> allows usermode to quickly ask for exactly the elements it cares
> about, and not get bogged down in a back and forth about newer keys
> that older kernels might not recognize. In other words, the kernel
> can communicate that it doesn't recognize some of the keys while
> still providing the data for the keys it does know.
> - Added a shortcut to the cpuset parameters that if a size of 0 and
> NULL is provided for the CPU set, the kernel will use a cpu mask of
> all online CPUs. This is convenient because I suspect most callers
> will only want to act on a feature if it's supported on all CPUs, and
> it's a headache to dynamically allocate an array of all 1s, not to
> mention a waste to have the kernel loop over all of the offline bits.
> - Fixed logic error in if(of_property_read_string...) that caused crash
> - Include cpufeature.h in cpufeature.h to avoid undeclared variable
> warning.
> - Added a _MASK define
> - Fix random checkpatch complaints
> - Updated the selftests to the new API and added some more.
> - Fixed indentation, comments in .S, and general checkpatch complaints.
>
>Evan Green (4):
> RISC-V: Move struct riscv_cpuinfo to new header
> RISC-V: Add a syscall for HW probing
> RISC-V: hwprobe: Support probing of misaligned access performance
> selftests: Test the new RISC-V hwprobe interface
>
>Palmer Dabbelt (2):
> RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA
> dt-bindings: Add RISC-V misaligned access performance
>
> .../devicetree/bindings/riscv/cpus.yaml | 15 ++
> Documentation/riscv/hwprobe.rst | 66 ++++++
> Documentation/riscv/index.rst | 1 +
> arch/riscv/include/asm/cpufeature.h | 23 +++
> arch/riscv/include/asm/hwprobe.h | 13 ++
> arch/riscv/include/asm/smp.h | 9 +
> arch/riscv/include/asm/syscall.h | 3 +
> arch/riscv/include/uapi/asm/hwprobe.h | 35 ++++
> arch/riscv/include/uapi/asm/unistd.h | 8 +
> arch/riscv/kernel/cpu.c | 11 +-
> arch/riscv/kernel/cpufeature.c | 31 ++-
> arch/riscv/kernel/sys_riscv.c | 192 +++++++++++++++++-
> tools/testing/selftests/Makefile | 1 +
> tools/testing/selftests/riscv/Makefile | 58 ++++++
> .../testing/selftests/riscv/hwprobe/Makefile | 10 +
> .../testing/selftests/riscv/hwprobe/hwprobe.c | 89 ++++++++
> .../selftests/riscv/hwprobe/sys_hwprobe.S | 12 ++
> tools/testing/selftests/riscv/libc.S | 46 +++++
> 18 files changed, 613 insertions(+), 10 deletions(-)
> create mode 100644 Documentation/riscv/hwprobe.rst
> create mode 100644 arch/riscv/include/asm/cpufeature.h
> create mode 100644 arch/riscv/include/asm/hwprobe.h
> create mode 100644 arch/riscv/include/uapi/asm/hwprobe.h
> create mode 100644 tools/testing/selftests/riscv/Makefile
> create mode 100644 tools/testing/selftests/riscv/hwprobe/Makefile
> create mode 100644 tools/testing/selftests/riscv/hwprobe/hwprobe.c
> create mode 100644 tools/testing/selftests/riscv/hwprobe/sys_hwprobe.S
> create mode 100644 tools/testing/selftests/riscv/libc.S
>

2023-02-06 22:48:03

by Heinrich Schuchardt

[permalink] [raw]
Subject: Re: [PATCH v2 0/6] RISC-V Hardware Probing User Interface

On 2/6/23 22:11, Jessica Clarke wrote:
> On 6 Feb 2023, at 20:14, Evan Green <[email protected]> wrote:
>>
>>
>> These are very much up for discussion, as it's a pretty big new user
>> interface and it's quite a bit different from how we've historically
>> done things: this isn't just providing an ISA string to userspace, this
>> has its own format for providing information to userspace.
>>
>> There's been a bunch of off-list discussions about this, including at
>> Plumbers. The original plan was to do something involving providing an
>> ISA string to userspace, but ISA strings just aren't sufficient for a
>> stable ABI any more: in order to parse an ISA string users need the
>> version of the specifications that the string is written to, the version
>> of each extension (sometimes at a finer granularity than the RISC-V
>> releases/versions encode), and the expected use case for the ISA string
>> (ie, is it a U-mode or M-mode string). That's a lot of complexity to
>> try and keep ABI compatible and it's probably going to continue to grow,
>> as even if there's no more complexity in the specifications we'll have
>> to deal with the various ISA string parsing oddities that end up all
>> over userspace.
>>
>> Instead this patch set takes a very different approach and provides a set
>> of key/value pairs that encode various bits about the system. The big
>> advantage here is that we can clearly define what these mean so we can
>> ensure ABI stability, but it also allows us to encode information that's
>> unlikely to ever appear in an ISA string (see the misaligned access
>> performance, for example). The resulting interface looks a lot like
>> what arm64 and x86 do, and will hopefully fit well into something like
>> ACPI in the future.
>>
>> The actual user interface is a syscall. I'm not really sure that's the
>> right way to go about this, but it makes for flexible prototying.
>> Various other approaches have been talked about like making HWCAP2 a
>> pointer, having a VDSO routine, or exposing this via sysfs. Those seem
>> like generally reasonable approaches, but I've yet to figure out a way
>> to get the general case working without a syscall as that's the only way
>> I've come up with to deal with the heterogenous CPU case. Happy to hear
>> if someone has a better idea, though, as I don't really want to add a
>> syscall if we can avoid it.

Operating systems tend to reschedule threads moving them between harts.
New threads may be created by processes at any time.

It is not clear to me what information the syscall shall convey in the
heterogeneous case. I see the following alternatives:

* The syscall describes the current hart.
* The syscall provides individual properties of all harts.
* The syscall provides a set of properties that is valid for any hart on
which the thread might be scheduled.
* The syscall provides a set of properties that is valid for any hart
that any thread of the current process might be scheduled to.

Describing only the current hart would not be helpful as the thread
might be rescheduled to a hart with a smaller set of available extensions.

Describing the properties of all harts would not be helpful if the
thread has no control to which hart it is scheduled.

Processes that don't control scheduling would most benefit from a
guaranteed set of properties valid for all threads of the process.

Processes that take control of scheduling would probably want
information about all harts.

Best regards

Heinrich

>
> Please work with https://github.com/riscv-non-isa/riscv-c-api-doc as
> it’s crucial we have a portable standard interface for applications to
> query this information that works on OSes other than Linux. This can be
> backed by whatever you want, whether a syscall, magic VDSO thing,
> sysfs, etc, but it’s key that the exposed interface outside of libc is
> not Linux-specific otherwise we’re going to get fragmentation in this
> space.
>
> I would encourage figuring out the right shape for the exposed
> interface first before continuing to refine details of how that
> information gets communicated between the kernel and libc.
>
> Jess
>
>> An example series in glibc exposing this syscall and using it in an
>> ifunc selector for memcpy can be found at [1].
>>
>> [1] https://public-inbox.org/libc-alpha/[email protected]/T/#t
>>
>> Changes in v2:
>> - Changed the interface to look more like poll(). Rather than supplying
>> key_offset and getting back an array of values with numerically
>> contiguous keys, have the user pre-fill the key members of the array,
>> and the kernel will fill in the corresponding values. For any key it
>> doesn't recognize, it will set the key of that element to -1. This
>> allows usermode to quickly ask for exactly the elements it cares
>> about, and not get bogged down in a back and forth about newer keys
>> that older kernels might not recognize. In other words, the kernel
>> can communicate that it doesn't recognize some of the keys while
>> still providing the data for the keys it does know.
>> - Added a shortcut to the cpuset parameters that if a size of 0 and
>> NULL is provided for the CPU set, the kernel will use a cpu mask of
>> all online CPUs. This is convenient because I suspect most callers
>> will only want to act on a feature if it's supported on all CPUs, and
>> it's a headache to dynamically allocate an array of all 1s, not to
>> mention a waste to have the kernel loop over all of the offline bits.
>> - Fixed logic error in if(of_property_read_string...) that caused crash
>> - Include cpufeature.h in cpufeature.h to avoid undeclared variable
>> warning.
>> - Added a _MASK define
>> - Fix random checkpatch complaints
>> - Updated the selftests to the new API and added some more.
>> - Fixed indentation, comments in .S, and general checkpatch complaints.
>>
>> Evan Green (4):
>> RISC-V: Move struct riscv_cpuinfo to new header
>> RISC-V: Add a syscall for HW probing
>> RISC-V: hwprobe: Support probing of misaligned access performance
>> selftests: Test the new RISC-V hwprobe interface
>>
>> Palmer Dabbelt (2):
>> RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA
>> dt-bindings: Add RISC-V misaligned access performance
>>
>> .../devicetree/bindings/riscv/cpus.yaml | 15 ++
>> Documentation/riscv/hwprobe.rst | 66 ++++++
>> Documentation/riscv/index.rst | 1 +
>> arch/riscv/include/asm/cpufeature.h | 23 +++
>> arch/riscv/include/asm/hwprobe.h | 13 ++
>> arch/riscv/include/asm/smp.h | 9 +
>> arch/riscv/include/asm/syscall.h | 3 +
>> arch/riscv/include/uapi/asm/hwprobe.h | 35 ++++
>> arch/riscv/include/uapi/asm/unistd.h | 8 +
>> arch/riscv/kernel/cpu.c | 11 +-
>> arch/riscv/kernel/cpufeature.c | 31 ++-
>> arch/riscv/kernel/sys_riscv.c | 192 +++++++++++++++++-
>> tools/testing/selftests/Makefile | 1 +
>> tools/testing/selftests/riscv/Makefile | 58 ++++++
>> .../testing/selftests/riscv/hwprobe/Makefile | 10 +
>> .../testing/selftests/riscv/hwprobe/hwprobe.c | 89 ++++++++
>> .../selftests/riscv/hwprobe/sys_hwprobe.S | 12 ++
>> tools/testing/selftests/riscv/libc.S | 46 +++++
>> 18 files changed, 613 insertions(+), 10 deletions(-)
>> create mode 100644 Documentation/riscv/hwprobe.rst
>> create mode 100644 arch/riscv/include/asm/cpufeature.h
>> create mode 100644 arch/riscv/include/asm/hwprobe.h
>> create mode 100644 arch/riscv/include/uapi/asm/hwprobe.h
>> create mode 100644 tools/testing/selftests/riscv/Makefile
>> create mode 100644 tools/testing/selftests/riscv/hwprobe/Makefile
>> create mode 100644 tools/testing/selftests/riscv/hwprobe/hwprobe.c
>> create mode 100644 tools/testing/selftests/riscv/hwprobe/sys_hwprobe.S
>> create mode 100644 tools/testing/selftests/riscv/libc.S
>>
>> --
>> 2.25.1
>>
>>
>> _______________________________________________
>> linux-riscv mailing list
>> [email protected]
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>

2023-02-07 06:13:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] RISC-V: Add a syscall for HW probing

On Mon, Feb 06, 2023 at 12:14:51PM -0800, Evan Green wrote:
> We don't have enough space for these all in ELF_HWCAP{,2} and there's no
> system call that quite does this, so let's just provide an arch-specific
> one to probe for hardware capabilities. This currently just provides
> m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
> the future.

Ick, this is exactly what sysfs is designed to export in a sane way.
Why not just use that instead? The "key" would be the filename, and the
value the value read from the filename. If the key is not present, the
file is not present and it's obvious what is happening, no fancy parsing
and ABI issues at all.

Bonus is that you will also properly document all valid key/value pairs
in Documentation/ABI/ when you do this, so it reinforces what the code
should be doing correctly.

thanks,

greg k-h

2023-02-07 06:32:31

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] RISC-V: Add a syscall for HW probing

Hey Evan, Greg,


On 7 February 2023 06:13:39 GMT, Greg KH <[email protected]> wrote:
>On Mon, Feb 06, 2023 at 12:14:51PM -0800, Evan Green wrote:
>> We don't have enough space for these all in ELF_HWCAP{,2} and there's no
>> system call that quite does this, so let's just provide an arch-specific
>> one to probe for hardware capabilities. This currently just provides
>> m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
>> the future.
>
>Ick, this is exactly what sysfs is designed to export in a sane way.
>Why not just use that instead? The "key" would be the filename, and the
>value the value read from the filename. If the key is not present, the
>file is not present and it's obvious what is happening, no fancy parsing
>and ABI issues at all.

https://lore.kernel.org/linux-riscv/20221201160614.xpomlqq2fzpzfmcm@kamzik/

This is the sysfs interface that I mentioned drew
suggested on the v1.
I think it fits ~perfectly with what Greg is suggesting too.

>
>Bonus is that you will also properly document all valid key/value pairs
>in Documentation/ABI/ when you do this, so it reinforces what the code
>should be doing correctly.
>
>thanks,
>
>greg k-h

2023-02-07 07:04:02

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] RISC-V: hwprobe: Support probing of misaligned access performance

Hi Evan,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on shuah-kselftest/next]
[also build test ERROR on shuah-kselftest/fixes robh/for-next soc/for-next linus/master v6.2-rc7]
[cannot apply to next-20230207]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Evan-Green/RISC-V-Move-struct-riscv_cpuinfo-to-new-header/20230207-041746
base: https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git next
patch link: https://lore.kernel.org/r/20230206201455.1790329-6-evan%40rivosinc.com
patch subject: [PATCH v2 5/6] RISC-V: hwprobe: Support probing of misaligned access performance
config: riscv-randconfig-s052-20230206 (https://download.01.org/0day-ci/archive/20230207/[email protected]/config)
compiler: riscv64-linux-gcc (GCC) 12.1.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# apt-get install sparse
# sparse version: v0.6.4-39-gce1a6720-dirty
# https://github.com/intel-lab-lkp/linux/commit/9e77be253d64809a98f31c17abd12761dd9fad8e
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Evan-Green/RISC-V-Move-struct-riscv_cpuinfo-to-new-header/20230207-041746
git checkout 9e77be253d64809a98f31c17abd12761dd9fad8e
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=riscv olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=riscv SHELL=/bin/bash arch/riscv/kernel/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

arch/riscv/kernel/cpufeature.c: In function 'riscv_fill_hwcap':
>> arch/riscv/kernel/cpufeature.c:258:23: error: implicit declaration of function 'hartid_to_cpuid_map' [-Werror=implicit-function-declaration]
258 | cpu = hartid_to_cpuid_map(hartid);
| ^~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors


vim +/hartid_to_cpuid_map +258 arch/riscv/kernel/cpufeature.c

93
94 void __init riscv_fill_hwcap(void)
95 {
96 struct device_node *node;
97 const char *isa, *misaligned;
98 char print_str[NUM_ALPHA_EXTS + 1];
99 int i, j, rc;
100 unsigned long isa2hwcap[26] = {0};
101 unsigned long hartid, cpu;
102
103 isa2hwcap['i' - 'a'] = COMPAT_HWCAP_ISA_I;
104 isa2hwcap['m' - 'a'] = COMPAT_HWCAP_ISA_M;
105 isa2hwcap['a' - 'a'] = COMPAT_HWCAP_ISA_A;
106 isa2hwcap['f' - 'a'] = COMPAT_HWCAP_ISA_F;
107 isa2hwcap['d' - 'a'] = COMPAT_HWCAP_ISA_D;
108 isa2hwcap['c' - 'a'] = COMPAT_HWCAP_ISA_C;
109
110 elf_hwcap = 0;
111
112 bitmap_zero(riscv_isa, RISCV_ISA_EXT_MAX);
113
114 for_each_of_cpu_node(node) {
115 unsigned long this_hwcap = 0;
116 DECLARE_BITMAP(this_isa, RISCV_ISA_EXT_MAX);
117 const char *temp;
118
119 rc = riscv_of_processor_hartid(node, &hartid);
120 if (rc < 0)
121 continue;
122
123 if (of_property_read_string(node, "riscv,isa", &isa)) {
124 pr_warn("Unable to find \"riscv,isa\" devicetree entry\n");
125 continue;
126 }
127
128 temp = isa;
129 #if IS_ENABLED(CONFIG_32BIT)
130 if (!strncmp(isa, "rv32", 4))
131 isa += 4;
132 #elif IS_ENABLED(CONFIG_64BIT)
133 if (!strncmp(isa, "rv64", 4))
134 isa += 4;
135 #endif
136 /* The riscv,isa DT property must start with rv64 or rv32 */
137 if (temp == isa)
138 continue;
139 bitmap_zero(this_isa, RISCV_ISA_EXT_MAX);
140 for (; *isa; ++isa) {
141 const char *ext = isa++;
142 const char *ext_end = isa;
143 bool ext_long = false, ext_err = false;
144
145 switch (*ext) {
146 case 's':
147 /**
148 * Workaround for invalid single-letter 's' & 'u'(QEMU).
149 * No need to set the bit in riscv_isa as 's' & 'u' are
150 * not valid ISA extensions. It works until multi-letter
151 * extension starting with "Su" appears.
152 */
153 if (ext[-1] != '_' && ext[1] == 'u') {
154 ++isa;
155 ext_err = true;
156 break;
157 }
158 fallthrough;
159 case 'x':
160 case 'z':
161 ext_long = true;
162 /* Multi-letter extension must be delimited */
163 for (; *isa && *isa != '_'; ++isa)
164 if (unlikely(!islower(*isa)
165 && !isdigit(*isa)))
166 ext_err = true;
167 /* Parse backwards */
168 ext_end = isa;
169 if (unlikely(ext_err))
170 break;
171 if (!isdigit(ext_end[-1]))
172 break;
173 /* Skip the minor version */
174 while (isdigit(*--ext_end))
175 ;
176 if (ext_end[0] != 'p'
177 || !isdigit(ext_end[-1])) {
178 /* Advance it to offset the pre-decrement */
179 ++ext_end;
180 break;
181 }
182 /* Skip the major version */
183 while (isdigit(*--ext_end))
184 ;
185 ++ext_end;
186 break;
187 default:
188 if (unlikely(!islower(*ext))) {
189 ext_err = true;
190 break;
191 }
192 /* Find next extension */
193 if (!isdigit(*isa))
194 break;
195 /* Skip the minor version */
196 while (isdigit(*++isa))
197 ;
198 if (*isa != 'p')
199 break;
200 if (!isdigit(*++isa)) {
201 --isa;
202 break;
203 }
204 /* Skip the major version */
205 while (isdigit(*++isa))
206 ;
207 break;
208 }
209 if (*isa != '_')
210 --isa;
211
212 #define SET_ISA_EXT_MAP(name, bit) \
213 do { \
214 if ((ext_end - ext == sizeof(name) - 1) && \
215 !memcmp(ext, name, sizeof(name) - 1) && \
216 riscv_isa_extension_check(bit)) \
217 set_bit(bit, this_isa); \
218 } while (false) \
219
220 if (unlikely(ext_err))
221 continue;
222 if (!ext_long) {
223 int nr = *ext - 'a';
224
225 if (riscv_isa_extension_check(nr)) {
226 this_hwcap |= isa2hwcap[nr];
227 set_bit(nr, this_isa);
228 }
229 } else {
230 SET_ISA_EXT_MAP("sscofpmf", RISCV_ISA_EXT_SSCOFPMF);
231 SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
232 SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
233 SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE);
234 SET_ISA_EXT_MAP("sstc", RISCV_ISA_EXT_SSTC);
235 SET_ISA_EXT_MAP("svinval", RISCV_ISA_EXT_SVINVAL);
236 }
237 #undef SET_ISA_EXT_MAP
238 }
239
240 /*
241 * All "okay" hart should have same isa. Set HWCAP based on
242 * common capabilities of every "okay" hart, in case they don't
243 * have.
244 */
245 if (elf_hwcap)
246 elf_hwcap &= this_hwcap;
247 else
248 elf_hwcap = this_hwcap;
249
250 if (bitmap_empty(riscv_isa, RISCV_ISA_EXT_MAX))
251 bitmap_copy(riscv_isa, this_isa, RISCV_ISA_EXT_MAX);
252 else
253 bitmap_and(riscv_isa, riscv_isa, this_isa, RISCV_ISA_EXT_MAX);
254
255 /*
256 * Check for the performance of misaligned accesses.
257 */
> 258 cpu = hartid_to_cpuid_map(hartid);
259 if (cpu < 0)
260 continue;
261
262 if (!of_property_read_string(node, "riscv,misaligned-access-performance",
263 &misaligned)) {
264 if (strcmp(misaligned, "emulated") == 0)
265 per_cpu(misaligned_access_speed, cpu) =
266 RISCV_HWPROBE_MISALIGNED_EMULATED;
267
268 if (strcmp(misaligned, "slow") == 0)
269 per_cpu(misaligned_access_speed, cpu) =
270 RISCV_HWPROBE_MISALIGNED_SLOW;
271
272 if (strcmp(misaligned, "fast") == 0)
273 per_cpu(misaligned_access_speed, cpu) =
274 RISCV_HWPROBE_MISALIGNED_FAST;
275 }
276 }
277
278 /* We don't support systems with F but without D, so mask those out
279 * here. */
280 if ((elf_hwcap & COMPAT_HWCAP_ISA_F) && !(elf_hwcap & COMPAT_HWCAP_ISA_D)) {
281 pr_info("This kernel does not support systems with F but not D\n");
282 elf_hwcap &= ~COMPAT_HWCAP_ISA_F;
283 }
284
285 memset(print_str, 0, sizeof(print_str));
286 for (i = 0, j = 0; i < NUM_ALPHA_EXTS; i++)
287 if (riscv_isa[0] & BIT_MASK(i))
288 print_str[j++] = (char)('a' + i);
289 pr_info("riscv: base ISA extensions %s\n", print_str);
290
291 memset(print_str, 0, sizeof(print_str));
292 for (i = 0, j = 0; i < NUM_ALPHA_EXTS; i++)
293 if (elf_hwcap & BIT_MASK(i))
294 print_str[j++] = (char)('a' + i);
295 pr_info("riscv: ELF capabilities %s\n", print_str);
296
297 for_each_set_bit(i, riscv_isa, RISCV_ISA_EXT_MAX) {
298 j = riscv_isa_ext2key(i);
299 if (j >= 0)
300 static_branch_enable(&riscv_isa_ext_keys[j]);
301 }
302 }
303

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

2023-02-07 17:06:00

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] dt-bindings: Add RISC-V misaligned access performance

On Mon, Feb 06, 2023 at 12:14:53PM -0800, Evan Green wrote:
> From: Palmer Dabbelt <[email protected]>
>
> This key allows device trees to specify the performance of misaligned
> accesses to main memory regions from each CPU in the system.
>
> Signed-off-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Evan Green <[email protected]>
> ---
>
> (no changes since v1)
>
> Documentation/devicetree/bindings/riscv/cpus.yaml | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/riscv/cpus.yaml b/Documentation/devicetree/bindings/riscv/cpus.yaml
> index c6720764e765..2c09bd6f2927 100644
> --- a/Documentation/devicetree/bindings/riscv/cpus.yaml
> +++ b/Documentation/devicetree/bindings/riscv/cpus.yaml
> @@ -85,6 +85,21 @@ properties:
> $ref: "/schemas/types.yaml#/definitions/string"
> pattern: ^rv(?:64|32)imaf?d?q?c?b?v?k?h?(?:_[hsxz](?:[a-z])+)*$
>
> + riscv,misaligned-access-performance:
> + description:
> + Identifies the performance of misaligned memory accesses to main memory
> + regions. There are three flavors of unaligned access performance: "emulated"
> + means that misaligned accesses are emulated via software and thus
> + extremely slow, "slow" means that misaligned accesses are supported by
> + hardware but still slower that aligned accesses sequences, and "fast"
> + means that misaligned accesses are as fast or faster than the
> + cooresponding aligned accesses sequences.
> + $ref: "/schemas/types.yaml#/definitions/string"
> + enum:
> + - emulated
> + - slow
> + - fast

I don't think this belongs in DT. (I'm not sure about a userspace
interface either.)

Can't this be tested and determined at runtime? Do misaligned accesses
and compare the performance. We already do this for things like memcpy
or crypto implementation selection.

Rob

2023-02-07 23:16:52

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] RISC-V: Add a syscall for HW probing

Hi Evan,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on shuah-kselftest/next]
[also build test WARNING on shuah-kselftest/fixes robh/for-next soc/for-next linus/master v6.2-rc7 next-20230207]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Evan-Green/RISC-V-Move-struct-riscv_cpuinfo-to-new-header/20230207-041746
base: https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git next
patch link: https://lore.kernel.org/r/20230206201455.1790329-3-evan%40rivosinc.com
patch subject: [PATCH v2 2/6] RISC-V: Add a syscall for HW probing
reproduce:
# https://github.com/intel-lab-lkp/linux/commit/4a91a9ca5d81029b702e6e74c8f2015cf50af0ae
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Evan-Green/RISC-V-Move-struct-riscv_cpuinfo-to-new-header/20230207-041746
git checkout 4a91a9ca5d81029b702e6e74c8f2015cf50af0ae
make menuconfig
# enable CONFIG_COMPILE_TEST, CONFIG_WARN_MISSING_DOCUMENTS, CONFIG_WARN_ABI_ERRORS
make htmldocs

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

>> Documentation/riscv/hwprobe.rst:33: WARNING: Field list ends without a blank line; unexpected unindent.

vim +33 Documentation/riscv/hwprobe.rst

31
32 * :RISCV_HWPROBE_KEY_MVENDORID:: Contains the value of :mvendorid:, as per the
> 33 ISA specifications.

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

2023-02-08 12:45:27

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v2 4/6] dt-bindings: Add RISC-V misaligned access performance

From: Rob Herring
> Sent: 07 February 2023 17:06
>
> On Mon, Feb 06, 2023 at 12:14:53PM -0800, Evan Green wrote:
> > From: Palmer Dabbelt <[email protected]>
> >
> > This key allows device trees to specify the performance of misaligned
> > accesses to main memory regions from each CPU in the system.
> >
> > Signed-off-by: Palmer Dabbelt <[email protected]>
> > Signed-off-by: Evan Green <[email protected]>
> > ---
> >
> > (no changes since v1)
> >
> > Documentation/devicetree/bindings/riscv/cpus.yaml | 15 +++++++++++++++
> > 1 file changed, 15 insertions(+)
> >
> > diff --git a/Documentation/devicetree/bindings/riscv/cpus.yaml
> b/Documentation/devicetree/bindings/riscv/cpus.yaml
> > index c6720764e765..2c09bd6f2927 100644
> > --- a/Documentation/devicetree/bindings/riscv/cpus.yaml
> > +++ b/Documentation/devicetree/bindings/riscv/cpus.yaml
> > @@ -85,6 +85,21 @@ properties:
> > $ref: "/schemas/types.yaml#/definitions/string"
> > pattern: ^rv(?:64|32)imaf?d?q?c?b?v?k?h?(?:_[hsxz](?:[a-z])+)*$
> >
> > + riscv,misaligned-access-performance:
> > + description:
> > + Identifies the performance of misaligned memory accesses to main memory
> > + regions. There are three flavors of unaligned access performance: "emulated"
> > + means that misaligned accesses are emulated via software and thus
> > + extremely slow, "slow" means that misaligned accesses are supported by
> > + hardware but still slower that aligned accesses sequences, and "fast"
> > + means that misaligned accesses are as fast or faster than the
> > + cooresponding aligned accesses sequences.
> > + $ref: "/schemas/types.yaml#/definitions/string"
> > + enum:
> > + - emulated
> > + - slow
> > + - fast
>
> I don't think this belongs in DT. (I'm not sure about a userspace
> interface either.)
>
> Can't this be tested and determined at runtime? Do misaligned accesses
> and compare the performance. We already do this for things like memcpy
> or crypto implementation selection.

There is also an long discussion about misaligned accesses
for loooongarch.

Basically if you want to run a common kernel (and userspace)
you have to default to compiling everything with -mno-stict-align
so that the compiler generates byte accesses for anything
marked 'packed' (etc).

Run-time tests can optimise some hot-spots.

In any case 'slow' is probably pointless - unless the accesses
take more than 1 or 2 extra cycles.

Oh, and you really never, ever want to emulate them.

Technically misaligned reads on (some) x86-64 cpu are slower
than aligned ones, but the difference is marginal.
I've measured two 64bit misaligned reads every clock.
But it is consistently slower by much less than one clock
per cache line.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


2023-02-09 16:51:28

by Palmer Dabbelt

[permalink] [raw]
Subject: RE: [PATCH v2 4/6] dt-bindings: Add RISC-V misaligned access performance

On Wed, 08 Feb 2023 04:45:10 PST (-0800), [email protected] wrote:
> From: Rob Herring
>> Sent: 07 February 2023 17:06
>>
>> On Mon, Feb 06, 2023 at 12:14:53PM -0800, Evan Green wrote:
>> > From: Palmer Dabbelt <[email protected]>
>> >
>> > This key allows device trees to specify the performance of misaligned
>> > accesses to main memory regions from each CPU in the system.
>> >
>> > Signed-off-by: Palmer Dabbelt <[email protected]>
>> > Signed-off-by: Evan Green <[email protected]>
>> > ---
>> >
>> > (no changes since v1)
>> >
>> > Documentation/devicetree/bindings/riscv/cpus.yaml | 15 +++++++++++++++
>> > 1 file changed, 15 insertions(+)
>> >
>> > diff --git a/Documentation/devicetree/bindings/riscv/cpus.yaml
>> b/Documentation/devicetree/bindings/riscv/cpus.yaml
>> > index c6720764e765..2c09bd6f2927 100644
>> > --- a/Documentation/devicetree/bindings/riscv/cpus.yaml
>> > +++ b/Documentation/devicetree/bindings/riscv/cpus.yaml
>> > @@ -85,6 +85,21 @@ properties:
>> > $ref: "/schemas/types.yaml#/definitions/string"
>> > pattern: ^rv(?:64|32)imaf?d?q?c?b?v?k?h?(?:_[hsxz](?:[a-z])+)*$
>> >
>> > + riscv,misaligned-access-performance:
>> > + description:
>> > + Identifies the performance of misaligned memory accesses to main memory
>> > + regions. There are three flavors of unaligned access performance: "emulated"
>> > + means that misaligned accesses are emulated via software and thus
>> > + extremely slow, "slow" means that misaligned accesses are supported by
>> > + hardware but still slower that aligned accesses sequences, and "fast"
>> > + means that misaligned accesses are as fast or faster than the
>> > + cooresponding aligned accesses sequences.
>> > + $ref: "/schemas/types.yaml#/definitions/string"
>> > + enum:
>> > + - emulated
>> > + - slow
>> > + - fast
>>
>> I don't think this belongs in DT. (I'm not sure about a userspace
>> interface either.)

[Kind of answered below.]

>> Can't this be tested and determined at runtime? Do misaligned accesses
>> and compare the performance. We already do this for things like memcpy
>> or crypto implementation selection.

We've had a history of broken firmware emulation of misaligned accesses
wreaking havoc. We don't run into concrete bugs there because we avoid
misaligned accesses as much as possible in the kernel, but I'd be
worried that we'd trigger a lot of these when probing for misaligned
accesses.

> There is also an long discussion about misaligned accesses
> for loooongarch.
>
> Basically if you want to run a common kernel (and userspace)
> you have to default to compiling everything with -mno-stict-align
> so that the compiler generates byte accesses for anything
> marked 'packed' (etc).
>
> Run-time tests can optimise some hot-spots.
>
> In any case 'slow' is probably pointless - unless the accesses
> take more than 1 or 2 extra cycles.

[Also below.]

> Oh, and you really never, ever want to emulate them.

Unfortunately we're kind of stuck with this one: the specs used to
require that misaligned accesses were supported and thus there's a bunch
of firmwares that emulate them (and various misaligned accesses spread
around, though they're kind of a mess). The specs no longer require
this support, but just dropping it from firmware will break binaries.

There's been some vague plans to dig out of this, but it'd require some
sort of firmware interface additions in order to turn off the emulation
and that's going to take a while. As it stands we've got a bunch of
users that just want to know when they can emit misaligned accesses.

> Technically misaligned reads on (some) x86-64 cpu are slower
> than aligned ones, but the difference is marginal.
> I've measured two 64bit misaligned reads every clock.
> But it is consistently slower by much less than one clock
> per cache line.

The "fast" case is explicitly written to catch that flavor of
implementation.

The "slow" one is a bit vaguer, but the general idea is to catch
implementations that end up with some sort of pipeline flush on
misaligned accesses. We've got a lot of very small in-order processors
in RISC-V land, and while I haven't gotten around to benchmarking them
all my guess is that the spec requirement for support ended up with some
simple implementations.

FWIW: I checked the c906 RTL and it's setting some exception-related
info on misaligned accesses, but I'd need to actually benchmark on to
know for sure and they're kind of a headache to deal with.

>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

2023-02-09 16:56:42

by Palmer Dabbelt

[permalink] [raw]
Subject: Re: [PATCH v2 0/6] RISC-V Hardware Probing User Interface

On Mon, 06 Feb 2023 14:47:35 PST (-0800), [email protected] wrote:
> On 2/6/23 22:11, Jessica Clarke wrote:
>> On 6 Feb 2023, at 20:14, Evan Green <[email protected]> wrote:
>>>
>>>
>>> These are very much up for discussion, as it's a pretty big new user
>>> interface and it's quite a bit different from how we've historically
>>> done things: this isn't just providing an ISA string to userspace, this
>>> has its own format for providing information to userspace.
>>>
>>> There's been a bunch of off-list discussions about this, including at
>>> Plumbers. The original plan was to do something involving providing an
>>> ISA string to userspace, but ISA strings just aren't sufficient for a
>>> stable ABI any more: in order to parse an ISA string users need the
>>> version of the specifications that the string is written to, the version
>>> of each extension (sometimes at a finer granularity than the RISC-V
>>> releases/versions encode), and the expected use case for the ISA string
>>> (ie, is it a U-mode or M-mode string). That's a lot of complexity to
>>> try and keep ABI compatible and it's probably going to continue to grow,
>>> as even if there's no more complexity in the specifications we'll have
>>> to deal with the various ISA string parsing oddities that end up all
>>> over userspace.
>>>
>>> Instead this patch set takes a very different approach and provides a set
>>> of key/value pairs that encode various bits about the system. The big
>>> advantage here is that we can clearly define what these mean so we can
>>> ensure ABI stability, but it also allows us to encode information that's
>>> unlikely to ever appear in an ISA string (see the misaligned access
>>> performance, for example). The resulting interface looks a lot like
>>> what arm64 and x86 do, and will hopefully fit well into something like
>>> ACPI in the future.
>>>
>>> The actual user interface is a syscall. I'm not really sure that's the
>>> right way to go about this, but it makes for flexible prototying.
>>> Various other approaches have been talked about like making HWCAP2 a
>>> pointer, having a VDSO routine, or exposing this via sysfs. Those seem
>>> like generally reasonable approaches, but I've yet to figure out a way
>>> to get the general case working without a syscall as that's the only way
>>> I've come up with to deal with the heterogenous CPU case. Happy to hear
>>> if someone has a better idea, though, as I don't really want to add a
>>> syscall if we can avoid it.
>
> Operating systems tend to reschedule threads moving them between harts.
> New threads may be created by processes at any time.
>
> It is not clear to me what information the syscall shall convey in the
> heterogeneous case. I see the following alternatives:
>
> * The syscall describes the current hart.
> * The syscall provides individual properties of all harts.
> * The syscall provides a set of properties that is valid for any hart on
> which the thread might be scheduled.
> * The syscall provides a set of properties that is valid for any hart
> that any thread of the current process might be scheduled to.
>
> Describing only the current hart would not be helpful as the thread
> might be rescheduled to a hart with a smaller set of available extensions.
>
> Describing the properties of all harts would not be helpful if the
> thread has no control to which hart it is scheduled.
>
> Processes that don't control scheduling would most benefit from a
> guaranteed set of properties valid for all threads of the process.
>
> Processes that take control of scheduling would probably want
> information about all harts.

There's a cpu_set_t argument. We tried to answer this via the
Documentation patch. It's just the single sentence

The CPU set is defined by CPU_SET(3), the indicated features will be
supported on all CPUs in the set.

so maybe it needs beefing up... Do you mind commenting on the doc diff,
if you've got any ideas as to how to word it better? That way anyone
else reviewing the docs will see too.

> Best regards
>
> Heinrich
>
>>
>> Please work with https://github.com/riscv-non-isa/riscv-c-api-doc as
>> it’s crucial we have a portable standard interface for applications to
>> query this information that works on OSes other than Linux. This can be
>> backed by whatever you want, whether a syscall, magic VDSO thing,
>> sysfs, etc, but it’s key that the exposed interface outside of libc is
>> not Linux-specific otherwise we’re going to get fragmentation in this
>> space.
>>
>> I would encourage figuring out the right shape for the exposed
>> interface first before continuing to refine details of how that
>> information gets communicated between the kernel and libc.
>>
>> Jess
>>
>>> An example series in glibc exposing this syscall and using it in an
>>> ifunc selector for memcpy can be found at [1].
>>>
>>> [1] https://public-inbox.org/libc-alpha/[email protected]/T/#t
>>>
>>> Changes in v2:
>>> - Changed the interface to look more like poll(). Rather than supplying
>>> key_offset and getting back an array of values with numerically
>>> contiguous keys, have the user pre-fill the key members of the array,
>>> and the kernel will fill in the corresponding values. For any key it
>>> doesn't recognize, it will set the key of that element to -1. This
>>> allows usermode to quickly ask for exactly the elements it cares
>>> about, and not get bogged down in a back and forth about newer keys
>>> that older kernels might not recognize. In other words, the kernel
>>> can communicate that it doesn't recognize some of the keys while
>>> still providing the data for the keys it does know.
>>> - Added a shortcut to the cpuset parameters that if a size of 0 and
>>> NULL is provided for the CPU set, the kernel will use a cpu mask of
>>> all online CPUs. This is convenient because I suspect most callers
>>> will only want to act on a feature if it's supported on all CPUs, and
>>> it's a headache to dynamically allocate an array of all 1s, not to
>>> mention a waste to have the kernel loop over all of the offline bits.
>>> - Fixed logic error in if(of_property_read_string...) that caused crash
>>> - Include cpufeature.h in cpufeature.h to avoid undeclared variable
>>> warning.
>>> - Added a _MASK define
>>> - Fix random checkpatch complaints
>>> - Updated the selftests to the new API and added some more.
>>> - Fixed indentation, comments in .S, and general checkpatch complaints.
>>>
>>> Evan Green (4):
>>> RISC-V: Move struct riscv_cpuinfo to new header
>>> RISC-V: Add a syscall for HW probing
>>> RISC-V: hwprobe: Support probing of misaligned access performance
>>> selftests: Test the new RISC-V hwprobe interface
>>>
>>> Palmer Dabbelt (2):
>>> RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA
>>> dt-bindings: Add RISC-V misaligned access performance
>>>
>>> .../devicetree/bindings/riscv/cpus.yaml | 15 ++
>>> Documentation/riscv/hwprobe.rst | 66 ++++++
>>> Documentation/riscv/index.rst | 1 +
>>> arch/riscv/include/asm/cpufeature.h | 23 +++
>>> arch/riscv/include/asm/hwprobe.h | 13 ++
>>> arch/riscv/include/asm/smp.h | 9 +
>>> arch/riscv/include/asm/syscall.h | 3 +
>>> arch/riscv/include/uapi/asm/hwprobe.h | 35 ++++
>>> arch/riscv/include/uapi/asm/unistd.h | 8 +
>>> arch/riscv/kernel/cpu.c | 11 +-
>>> arch/riscv/kernel/cpufeature.c | 31 ++-
>>> arch/riscv/kernel/sys_riscv.c | 192 +++++++++++++++++-
>>> tools/testing/selftests/Makefile | 1 +
>>> tools/testing/selftests/riscv/Makefile | 58 ++++++
>>> .../testing/selftests/riscv/hwprobe/Makefile | 10 +
>>> .../testing/selftests/riscv/hwprobe/hwprobe.c | 89 ++++++++
>>> .../selftests/riscv/hwprobe/sys_hwprobe.S | 12 ++
>>> tools/testing/selftests/riscv/libc.S | 46 +++++
>>> 18 files changed, 613 insertions(+), 10 deletions(-)
>>> create mode 100644 Documentation/riscv/hwprobe.rst
>>> create mode 100644 arch/riscv/include/asm/cpufeature.h
>>> create mode 100644 arch/riscv/include/asm/hwprobe.h
>>> create mode 100644 arch/riscv/include/uapi/asm/hwprobe.h
>>> create mode 100644 tools/testing/selftests/riscv/Makefile
>>> create mode 100644 tools/testing/selftests/riscv/hwprobe/Makefile
>>> create mode 100644 tools/testing/selftests/riscv/hwprobe/hwprobe.c
>>> create mode 100644 tools/testing/selftests/riscv/hwprobe/sys_hwprobe.S
>>> create mode 100644 tools/testing/selftests/riscv/libc.S
>>>
>>> --
>>> 2.25.1
>>>
>>>
>>> _______________________________________________
>>> linux-riscv mailing list
>>> [email protected]
>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>

2023-02-09 17:10:19

by Evan Green

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] RISC-V: Add a syscall for HW probing

On Mon, Feb 6, 2023 at 10:32 PM Conor Dooley <[email protected]> wrote:
>
> Hey Evan, Greg,
>
>
> On 7 February 2023 06:13:39 GMT, Greg KH <[email protected]> wrote:
> >On Mon, Feb 06, 2023 at 12:14:51PM -0800, Evan Green wrote:
> >> We don't have enough space for these all in ELF_HWCAP{,2} and there's no
> >> system call that quite does this, so let's just provide an arch-specific
> >> one to probe for hardware capabilities. This currently just provides
> >> m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
> >> the future.
> >
> >Ick, this is exactly what sysfs is designed to export in a sane way.
> >Why not just use that instead? The "key" would be the filename, and the
> >value the value read from the filename. If the key is not present, the
> >file is not present and it's obvious what is happening, no fancy parsing
> >and ABI issues at all.
>
> https://lore.kernel.org/linux-riscv/20221201160614.xpomlqq2fzpzfmcm@kamzik/
>
> This is the sysfs interface that I mentioned drew
> suggested on the v1.
> I think it fits ~perfectly with what Greg is suggesting too.

Whoops, I'll admit I missed that comment when I reviewed the feedback
from v1. I spent some time thinking about sysfs. The problem is this
interface will be needed in places like very early program startup. If
we're trying to use this in places like the ifunc selector to decide
which memcpy to use, having to go open and read a fistful of files is
going to be complex that early, and rough on performance.

Really this is data that would go great in the aux vector, except
there's probably too much of it to justify preparing and copying into
every new process. You could point the aux vector into a vDSO data
area. This has the advantage of great performance and no syscall, but
has the disadvantages of making that data ABI, and requiring it all to
be known up front (eg the kernel can't compute any answers on the
fly).

After discussions with Palmer, my plan for the next version is to move
this into a vDSO function plus a syscall. Private vDSO data will be
prepped with common answers for the "all CPUs" case, avoiding the need
for a syscall in most cases and making this fast. Since the data is
hidden behind the vdso function, it's not ABI, which is a plus. Then
the vdso function can fall back to the syscall for cases with exotic
CPU masks or keys that are unknown/expensive to compute at runtime.

-Evan

2023-02-09 17:13:32

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] RISC-V: Add a syscall for HW probing

On Thu, Feb 09, 2023 at 09:09:16AM -0800, Evan Green wrote:
> On Mon, Feb 6, 2023 at 10:32 PM Conor Dooley <[email protected]> wrote:
> >
> > Hey Evan, Greg,
> >
> >
> > On 7 February 2023 06:13:39 GMT, Greg KH <[email protected]> wrote:
> > >On Mon, Feb 06, 2023 at 12:14:51PM -0800, Evan Green wrote:
> > >> We don't have enough space for these all in ELF_HWCAP{,2} and there's no
> > >> system call that quite does this, so let's just provide an arch-specific
> > >> one to probe for hardware capabilities. This currently just provides
> > >> m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
> > >> the future.
> > >
> > >Ick, this is exactly what sysfs is designed to export in a sane way.
> > >Why not just use that instead? The "key" would be the filename, and the
> > >value the value read from the filename. If the key is not present, the
> > >file is not present and it's obvious what is happening, no fancy parsing
> > >and ABI issues at all.
> >
> > https://lore.kernel.org/linux-riscv/20221201160614.xpomlqq2fzpzfmcm@kamzik/
> >
> > This is the sysfs interface that I mentioned drew
> > suggested on the v1.
> > I think it fits ~perfectly with what Greg is suggesting too.
>
> Whoops, I'll admit I missed that comment when I reviewed the feedback
> from v1. I spent some time thinking about sysfs. The problem is this
> interface will be needed in places like very early program startup. If
> we're trying to use this in places like the ifunc selector to decide
> which memcpy to use, having to go open and read a fistful of files is
> going to be complex that early, and rough on performance.

How is it going to be any different on "performance" than a syscall? Or
complex? It should be almost identical overall as this is all in-ram
and not any real I/o is happening. You are limited only by the speed of
your cpu.

> Really this is data that would go great in the aux vector, except
> there's probably too much of it to justify preparing and copying into
> every new process. You could point the aux vector into a vDSO data
> area. This has the advantage of great performance and no syscall, but
> has the disadvantages of making that data ABI, and requiring it all to
> be known up front (eg the kernel can't compute any answers on the
> fly).
>
> After discussions with Palmer, my plan for the next version is to move
> this into a vDSO function plus a syscall. Private vDSO data will be
> prepped with common answers for the "all CPUs" case, avoiding the need
> for a syscall in most cases and making this fast. Since the data is
> hidden behind the vdso function, it's not ABI, which is a plus. Then
> the vdso function can fall back to the syscall for cases with exotic
> CPU masks or keys that are unknown/expensive to compute at runtime.

I still think that's wrong, as you are wanting a set of key/values here,
which is exactly what sysfs is designed for.

Please benchmark this first. Heck, if you don't like the
open/read/close syscall overhead, use my readfile() syscall patch that I
keep proposing every 6 months or so to remove that overhead. That would
be a good reason to get that code accepted finally :)

thanks,

greg k-h

2023-02-09 17:22:16

by Jessica Clarke

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] RISC-V: Add a syscall for HW probing

On 9 Feb 2023, at 17:13, Greg KH <[email protected]> wrote:
> On Thu, Feb 09, 2023 at 09:09:16AM -0800, Evan Green wrote:
>> On Mon, Feb 6, 2023 at 10:32 PM Conor Dooley <[email protected]> wrote:
>>>
>>> Hey Evan, Greg,
>>>
>>>
>>> On 7 February 2023 06:13:39 GMT, Greg KH <[email protected]> wrote:
>>>> On Mon, Feb 06, 2023 at 12:14:51PM -0800, Evan Green wrote:
>>>>> We don't have enough space for these all in ELF_HWCAP{,2} and there's no
>>>>> system call that quite does this, so let's just provide an arch-specific
>>>>> one to probe for hardware capabilities. This currently just provides
>>>>> m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
>>>>> the future.
>>>>
>>>> Ick, this is exactly what sysfs is designed to export in a sane way.
>>>> Why not just use that instead? The "key" would be the filename, and the
>>>> value the value read from the filename. If the key is not present, the
>>>> file is not present and it's obvious what is happening, no fancy parsing
>>>> and ABI issues at all.
>>>
>>> https://lore.kernel.org/linux-riscv/20221201160614.xpomlqq2fzpzfmcm@kamzik/
>>>
>>> This is the sysfs interface that I mentioned drew
>>> suggested on the v1.
>>> I think it fits ~perfectly with what Greg is suggesting too.
>>
>> Whoops, I'll admit I missed that comment when I reviewed the feedback
>> from v1. I spent some time thinking about sysfs. The problem is this
>> interface will be needed in places like very early program startup. If
>> we're trying to use this in places like the ifunc selector to decide
>> which memcpy to use, having to go open and read a fistful of files is
>> going to be complex that early, and rough on performance.
>
> How is it going to be any different on "performance" than a syscall? Or
> complex? It should be almost identical overall as this is all in-ram
> and not any real I/o is happening. You are limited only by the speed of
> your cpu.
>
>> Really this is data that would go great in the aux vector, except
>> there's probably too much of it to justify preparing and copying into
>> every new process. You could point the aux vector into a vDSO data
>> area. This has the advantage of great performance and no syscall, but
>> has the disadvantages of making that data ABI, and requiring it all to
>> be known up front (eg the kernel can't compute any answers on the
>> fly).
>>
>> After discussions with Palmer, my plan for the next version is to move
>> this into a vDSO function plus a syscall. Private vDSO data will be
>> prepped with common answers for the "all CPUs" case, avoiding the need
>> for a syscall in most cases and making this fast. Since the data is
>> hidden behind the vdso function, it's not ABI, which is a plus. Then
>> the vdso function can fall back to the syscall for cases with exotic
>> CPU masks or keys that are unknown/expensive to compute at runtime.
>
> I still think that's wrong, as you are wanting a set of key/values here,
> which is exactly what sysfs is designed for.

But this needs to be a RISC-V standard interface that can be programmed
against, not something tied to highly Linux-specific things like sysfs.
You’re free to implement that interface with sysfs, but exposing that
as *the* interface to use would be terrible for portability.

Jess

> Please benchmark this first. Heck, if you don't like the
> open/read/close syscall overhead, use my readfile() syscall patch that I
> keep proposing every 6 months or so to remove that overhead. That would
> be a good reason to get that code accepted finally :)
>
> thanks,
>
> greg k-h
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv


2023-02-09 18:42:34

by Evan Green

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] RISC-V: Add a syscall for HW probing

On Thu, Feb 9, 2023 at 9:13 AM Greg KH <[email protected]> wrote:
>
> On Thu, Feb 09, 2023 at 09:09:16AM -0800, Evan Green wrote:
> > On Mon, Feb 6, 2023 at 10:32 PM Conor Dooley <[email protected]> wrote:
> > >
> > > Hey Evan, Greg,
> > >
> > >
> > > On 7 February 2023 06:13:39 GMT, Greg KH <[email protected]> wrote:
> > > >On Mon, Feb 06, 2023 at 12:14:51PM -0800, Evan Green wrote:
> > > >> We don't have enough space for these all in ELF_HWCAP{,2} and there's no
> > > >> system call that quite does this, so let's just provide an arch-specific
> > > >> one to probe for hardware capabilities. This currently just provides
> > > >> m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
> > > >> the future.
> > > >
> > > >Ick, this is exactly what sysfs is designed to export in a sane way.
> > > >Why not just use that instead? The "key" would be the filename, and the
> > > >value the value read from the filename. If the key is not present, the
> > > >file is not present and it's obvious what is happening, no fancy parsing
> > > >and ABI issues at all.
> > >
> > > https://lore.kernel.org/linux-riscv/20221201160614.xpomlqq2fzpzfmcm@kamzik/
> > >
> > > This is the sysfs interface that I mentioned drew
> > > suggested on the v1.
> > > I think it fits ~perfectly with what Greg is suggesting too.
> >
> > Whoops, I'll admit I missed that comment when I reviewed the feedback
> > from v1. I spent some time thinking about sysfs. The problem is this
> > interface will be needed in places like very early program startup. If
> > we're trying to use this in places like the ifunc selector to decide
> > which memcpy to use, having to go open and read a fistful of files is
> > going to be complex that early, and rough on performance.
>
> How is it going to be any different on "performance" than a syscall? Or
> complex? It should be almost identical overall as this is all in-ram
> and not any real I/o is happening. You are limited only by the speed of
> your cpu.

At best sysfs is 1 syscall per key, whereas this version of the
interface lets you query all the keys you're interested in with a
single syscall. With the
proposed vdso version, we'd be down to ~0 syscalls for most queries.
The complexity aspect is mostly a reference to having to do a bunch of
open/read/parse/close operations at a time when mem* operations are
still being set up. Since this is something that may get run on every
program invocation, it seems worth it to be able to get fast and
simple queries even if it's a slightly separated interface.


-Evan

2023-02-09 18:45:19

by Evan Green

[permalink] [raw]
Subject: Re: [PATCH v2 6/6] selftests: Test the new RISC-V hwprobe interface

On Mon, Feb 6, 2023 at 1:27 PM Mark Brown <[email protected]> wrote:
>
> On Mon, Feb 06, 2023 at 12:14:55PM -0800, Evan Green wrote:
>
> > +int main(int argc, char **argv)
> > +{
>
> > --- /dev/null
> > +++ b/tools/testing/selftests/riscv/libc.S
>
> > +.global _start
> > +_start:
> > +.option push
> > +.option norelax
> > + la gp, __global_pointer$
> > +.option pop
> > +
> > + la sp, stack
> > +
> > + la t0, heap
> > + la t1, brk
> > + REG_S t0, 0(t1)
> > +
> > + li a0, 0
> > + li a1, 0
> > +
> > + call main
>
> This looks like it's just a standard program entry but I don't speak
> RISC-V asm so I might be missing something. If that's the case might it
> make sense to use nolibc here?

I think I can just remove this file entirely along with -nostdlib, and
just let the compiler add in this glue.
-Evan

2023-02-10 06:48:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] RISC-V: Add a syscall for HW probing

On Thu, Feb 09, 2023 at 05:22:09PM +0000, Jessica Clarke wrote:
> On 9 Feb 2023, at 17:13, Greg KH <[email protected]> wrote:
> > On Thu, Feb 09, 2023 at 09:09:16AM -0800, Evan Green wrote:
> >> On Mon, Feb 6, 2023 at 10:32 PM Conor Dooley <[email protected]> wrote:
> >>>
> >>> Hey Evan, Greg,
> >>>
> >>>
> >>> On 7 February 2023 06:13:39 GMT, Greg KH <[email protected]> wrote:
> >>>> On Mon, Feb 06, 2023 at 12:14:51PM -0800, Evan Green wrote:
> >>>>> We don't have enough space for these all in ELF_HWCAP{,2} and there's no
> >>>>> system call that quite does this, so let's just provide an arch-specific
> >>>>> one to probe for hardware capabilities. This currently just provides
> >>>>> m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
> >>>>> the future.
> >>>>
> >>>> Ick, this is exactly what sysfs is designed to export in a sane way.
> >>>> Why not just use that instead? The "key" would be the filename, and the
> >>>> value the value read from the filename. If the key is not present, the
> >>>> file is not present and it's obvious what is happening, no fancy parsing
> >>>> and ABI issues at all.
> >>>
> >>> https://lore.kernel.org/linux-riscv/20221201160614.xpomlqq2fzpzfmcm@kamzik/
> >>>
> >>> This is the sysfs interface that I mentioned drew
> >>> suggested on the v1.
> >>> I think it fits ~perfectly with what Greg is suggesting too.
> >>
> >> Whoops, I'll admit I missed that comment when I reviewed the feedback
> >> from v1. I spent some time thinking about sysfs. The problem is this
> >> interface will be needed in places like very early program startup. If
> >> we're trying to use this in places like the ifunc selector to decide
> >> which memcpy to use, having to go open and read a fistful of files is
> >> going to be complex that early, and rough on performance.
> >
> > How is it going to be any different on "performance" than a syscall? Or
> > complex? It should be almost identical overall as this is all in-ram
> > and not any real I/o is happening. You are limited only by the speed of
> > your cpu.
> >
> >> Really this is data that would go great in the aux vector, except
> >> there's probably too much of it to justify preparing and copying into
> >> every new process. You could point the aux vector into a vDSO data
> >> area. This has the advantage of great performance and no syscall, but
> >> has the disadvantages of making that data ABI, and requiring it all to
> >> be known up front (eg the kernel can't compute any answers on the
> >> fly).
> >>
> >> After discussions with Palmer, my plan for the next version is to move
> >> this into a vDSO function plus a syscall. Private vDSO data will be
> >> prepped with common answers for the "all CPUs" case, avoiding the need
> >> for a syscall in most cases and making this fast. Since the data is
> >> hidden behind the vdso function, it's not ABI, which is a plus. Then
> >> the vdso function can fall back to the syscall for cases with exotic
> >> CPU masks or keys that are unknown/expensive to compute at runtime.
> >
> > I still think that's wrong, as you are wanting a set of key/values here,
> > which is exactly what sysfs is designed for.
>
> But this needs to be a RISC-V standard interface that can be programmed
> against, not something tied to highly Linux-specific things like sysfs.
> You’re free to implement that interface with sysfs, but exposing that
> as *the* interface to use would be terrible for portability.

A vdso and a new kernel syscall is also a highly Linux-specific thing,
so I do not understand the objection here at all. You're going to have
to wrap all of this up in some sort of common userspace library code
anyway, and that will have to handle all of the different operating
system implementations.

Also, frankly, I don't care about non-Linux implementations, so that
isn't a valid argument here :)

thanks,

greg k-h

2023-02-10 06:50:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] RISC-V: Add a syscall for HW probing

On Thu, Feb 09, 2023 at 10:41:51AM -0800, Evan Green wrote:
> On Thu, Feb 9, 2023 at 9:13 AM Greg KH <[email protected]> wrote:
> >
> > On Thu, Feb 09, 2023 at 09:09:16AM -0800, Evan Green wrote:
> > > On Mon, Feb 6, 2023 at 10:32 PM Conor Dooley <[email protected]> wrote:
> > > >
> > > > Hey Evan, Greg,
> > > >
> > > >
> > > > On 7 February 2023 06:13:39 GMT, Greg KH <[email protected]> wrote:
> > > > >On Mon, Feb 06, 2023 at 12:14:51PM -0800, Evan Green wrote:
> > > > >> We don't have enough space for these all in ELF_HWCAP{,2} and there's no
> > > > >> system call that quite does this, so let's just provide an arch-specific
> > > > >> one to probe for hardware capabilities. This currently just provides
> > > > >> m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
> > > > >> the future.
> > > > >
> > > > >Ick, this is exactly what sysfs is designed to export in a sane way.
> > > > >Why not just use that instead? The "key" would be the filename, and the
> > > > >value the value read from the filename. If the key is not present, the
> > > > >file is not present and it's obvious what is happening, no fancy parsing
> > > > >and ABI issues at all.
> > > >
> > > > https://lore.kernel.org/linux-riscv/20221201160614.xpomlqq2fzpzfmcm@kamzik/
> > > >
> > > > This is the sysfs interface that I mentioned drew
> > > > suggested on the v1.
> > > > I think it fits ~perfectly with what Greg is suggesting too.
> > >
> > > Whoops, I'll admit I missed that comment when I reviewed the feedback
> > > from v1. I spent some time thinking about sysfs. The problem is this
> > > interface will be needed in places like very early program startup. If
> > > we're trying to use this in places like the ifunc selector to decide
> > > which memcpy to use, having to go open and read a fistful of files is
> > > going to be complex that early, and rough on performance.
> >
> > How is it going to be any different on "performance" than a syscall? Or
> > complex? It should be almost identical overall as this is all in-ram
> > and not any real I/o is happening. You are limited only by the speed of
> > your cpu.
>
> At best sysfs is 1 syscall per key, whereas this version of the
> interface lets you query all the keys you're interested in with a
> single syscall. With the
> proposed vdso version, we'd be down to ~0 syscalls for most queries.
> The complexity aspect is mostly a reference to having to do a bunch of
> open/read/parse/close operations at a time when mem* operations are
> still being set up. Since this is something that may get run on every
> program invocation, it seems worth it to be able to get fast and
> simple queries even if it's a slightly separated interface.

I'd be interested in the real benchmark numbers and seeing the userspace
and kernel code before arguing this too much.

Again, ignoring the lessons of the past is generally considered an
unwise decision, but hey, you do you, it's something that you are going
to have to maintain for 40+ years going forward, not me :)

good luck!

greg k-h

2023-02-14 21:27:25

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] dt-bindings: Add RISC-V misaligned access performance

On Mon, Feb 06, 2023 at 12:14:53PM -0800, Evan Green wrote:
> From: Palmer Dabbelt <[email protected]>
>
> This key allows device trees to specify the performance of misaligned
> accesses to main memory regions from each CPU in the system.
>
> Signed-off-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Evan Green <[email protected]>
> ---
>
> (no changes since v1)
>
> Documentation/devicetree/bindings/riscv/cpus.yaml | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/riscv/cpus.yaml b/Documentation/devicetree/bindings/riscv/cpus.yaml
> index c6720764e765..2c09bd6f2927 100644
> --- a/Documentation/devicetree/bindings/riscv/cpus.yaml
> +++ b/Documentation/devicetree/bindings/riscv/cpus.yaml
> @@ -85,6 +85,21 @@ properties:
> $ref: "/schemas/types.yaml#/definitions/string"
> pattern: ^rv(?:64|32)imaf?d?q?c?b?v?k?h?(?:_[hsxz](?:[a-z])+)*$
>
> + riscv,misaligned-access-performance:
> + description:
> + Identifies the performance of misaligned memory accesses to main memory
> + regions. There are three flavors of unaligned access performance: "emulated"

Is the performance: emulated the source of the dt_binding_check issues?
And the fix is as simple as:
- description:
+ description: |
?

> + means that misaligned accesses are emulated via software and thus
> + extremely slow, "slow" means that misaligned accesses are supported by
> + hardware but still slower that aligned accesses sequences, and "fast"
> + means that misaligned accesses are as fast or faster than the
> + cooresponding aligned accesses sequences.
> + $ref: "/schemas/types.yaml#/definitions/string"
> + enum:
> + - emulated
> + - slow
> + - fast
> +
> # RISC-V requires 'timebase-frequency' in /cpus, so disallow it here
> timebase-frequency: false
>
> --
> 2.25.1
>


Attachments:
(No filename) (1.87 kB)
signature.asc (228.00 B)
Download all attachments

2023-02-14 21:38:53

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] RISC-V: Move struct riscv_cpuinfo to new header

On Mon, Feb 06, 2023 at 12:14:50PM -0800, Evan Green wrote:
> In preparation for tracking and exposing microarchitectural details to
> userspace (like whether or not unaligned accesses are fast), move the
> riscv_cpuinfo struct out to its own new cpufeatures.h header. It will
> need to be used by more than just cpu.c.
>
> Signed-off-by: Evan Green <[email protected]>
> ---
>
> (no changes since v1)

Really? I don't recall seeing this patch in v1? ;)

>
> arch/riscv/include/asm/cpufeature.h | 21 +++++++++++++++++++++
> arch/riscv/kernel/cpu.c | 8 ++------
> 2 files changed, 23 insertions(+), 6 deletions(-)
> create mode 100644 arch/riscv/include/asm/cpufeature.h
>
> diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
> new file mode 100644
> index 000000000000..66c251d98290
> --- /dev/null
> +++ b/arch/riscv/include/asm/cpufeature.h
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright 2022 Rivos, Inc
> + */
> +
> +#ifndef _ASM_CPUFEATURE_H
> +#define _ASM_CPUFEATURE_H
> +
> +/*
> + * These are probed via a device_initcall(), via either the SBI or directly
> + * from the cooresponding CSRs.

May as well fix the typo here while we are moving the code & a respin is
required anyway.

I'm sure we'll need this patch regardless of approach, so:
Reviewed-by: Conor Dooley <[email protected]>

Cheers,
Conor.


Attachments:
(No filename) (1.39 kB)
signature.asc (228.00 B)
Download all attachments

2023-02-14 21:57:57

by Evan Green

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] RISC-V: Move struct riscv_cpuinfo to new header

On Tue, Feb 14, 2023 at 1:38 PM Conor Dooley <[email protected]> wrote:
>
> On Mon, Feb 06, 2023 at 12:14:50PM -0800, Evan Green wrote:
> > In preparation for tracking and exposing microarchitectural details to
> > userspace (like whether or not unaligned accesses are fast), move the
> > riscv_cpuinfo struct out to its own new cpufeatures.h header. It will
> > need to be used by more than just cpu.c.
> >
> > Signed-off-by: Evan Green <[email protected]>
> > ---
> >
> > (no changes since v1)
>
> Really? I don't recall seeing this patch in v1? ;)

My bad, that's an artifact of the tool I'm using (patman) to help me
with patch formatting.

>
> >
> > arch/riscv/include/asm/cpufeature.h | 21 +++++++++++++++++++++
> > arch/riscv/kernel/cpu.c | 8 ++------
> > 2 files changed, 23 insertions(+), 6 deletions(-)
> > create mode 100644 arch/riscv/include/asm/cpufeature.h
> >
> > diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
> > new file mode 100644
> > index 000000000000..66c251d98290
> > --- /dev/null
> > +++ b/arch/riscv/include/asm/cpufeature.h
> > @@ -0,0 +1,21 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright 2022 Rivos, Inc
> > + */
> > +
> > +#ifndef _ASM_CPUFEATURE_H
> > +#define _ASM_CPUFEATURE_H
> > +
> > +/*
> > + * These are probed via a device_initcall(), via either the SBI or directly
> > + * from the cooresponding CSRs.
>
> May as well fix the typo here while we are moving the code & a respin is
> required anyway.

Will do.

>
> I'm sure we'll need this patch regardless of approach, so:
> Reviewed-by: Conor Dooley <[email protected]>

Thank you!

2023-02-14 23:51:41

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] RISC-V: Add a syscall for HW probing

Hey Evan,

Just as a preface, I'm reviewing this lot from a position of ignorance
on what glibc wants so I'll refrain from commenting on call itself.
I figure that since the commentary has kinda died on the sysfs front &
Palmer seems to be still into the syscall stuff, that we're pushing on
with this approach...

On Mon, Feb 06, 2023 at 12:14:51PM -0800, Evan Green wrote:
> We don't have enough space for these all in ELF_HWCAP{,2} and there's no
> system call that quite does this, so let's just provide an arch-specific
> one to probe for hardware capabilities. This currently just provides
> m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
> the future.
>
> Co-developed-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Evan Green <[email protected]>
>
> ---
>
> Changes in v2:
> - Changed the interface to look more like poll(). Rather than supplying
> key_offset and getting back an array of values with numerically
> contiguous keys, have the user pre-fill the key members of the array,
> and the kernel will fill in the corresponding values. For any key it
> doesn't recognize, it will set the key of that element to -1. This
> allows usermode to quickly ask for exactly the elements it cares
> about, and not get bogged down in a back and forth about newer keys
> that older kernels might not recognize. In other words, the kernel
> can communicate that it doesn't recognize some of the keys while
> still providing the data for the keys it does know.
> - Added a shortcut to the cpuset parameters that if a size of 0 and
> NULL is provided for the CPU set, the kernel will use a cpu mask of
> all online CPUs. This is convenient because I suspect most callers
> will only want to act on a feature if it's supported on all CPUs, and
> it's a headache to dynamically allocate an array of all 1s, not to
> mention a waste to have the kernel loop over all of the offline bits.
>
>
> ---
> Documentation/riscv/hwprobe.rst | 37 +++++++
> Documentation/riscv/index.rst | 1 +
> arch/riscv/include/asm/hwprobe.h | 13 +++
> arch/riscv/include/asm/syscall.h | 3 +
> arch/riscv/include/uapi/asm/hwprobe.h | 25 +++++
> arch/riscv/include/uapi/asm/unistd.h | 8 ++
> arch/riscv/kernel/cpu.c | 3 +-
> arch/riscv/kernel/sys_riscv.c | 146 +++++++++++++++++++++++++-
> 8 files changed, 234 insertions(+), 2 deletions(-)
> create mode 100644 Documentation/riscv/hwprobe.rst
> create mode 100644 arch/riscv/include/asm/hwprobe.h
> create mode 100644 arch/riscv/include/uapi/asm/hwprobe.h
>
> diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
> new file mode 100644
> index 000000000000..97771090e972
> --- /dev/null
> +++ b/Documentation/riscv/hwprobe.rst
> @@ -0,0 +1,37 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +RISC-V Hardware Probing Interface
> +---------------------------------
> +
> +The RISC-V hardware probing interface is based around a single syscall, which
> +is defined in <asm/hwprobe.h>::
> +
> + struct riscv_hwprobe {
> + __s64 key;
> + __u64 value;
> + };
> +
> + long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
> + size_t cpu_count, cpu_set_t *cpus,
> + unsigned long flags);
> +
> +The arguments are split into three groups: an array of key-value pairs, a CPU
> +set, and some flags. The key-value pairs are supplied with a count. Userspace
> +must prepopulate the key field for each element, and the kernel will fill in the
> +value if the key is recognized. If a key is unknown to the kernel, its key
> +field will be cleared to -1, and its value set to 0. The CPU set is defined by
> +CPU_SET(3), the indicated features will be supported on all CPUs in the set.
> +Usermode can supply NULL for cpus and 0 for cpu_count as a shortcut for all
> +online CPUs. There are currently no flags, this value must be zero for future
> +compatibility.
> +
> +On success 0 is returned, on failure a negative error code is returned.
> +
> +The following keys are defined:
> +
> +* :RISCV_HWPROBE_KEY_MVENDORID:: Contains the value of :mvendorid:, as per the
> + ISA specifications.

"per the ISA specifications" sounds like dangerous wording to me! ;)

> +* :RISCV_HWPROBE_KEY_MARCHID:: Contains the value of :marchid:, as per the ISA
> + specifications.
> +* :RISCV_HWPROBE_KEY_MIMPLID:: Contains the value of :mimplid:, as per the ISA
> + specifications.

> diff --git a/arch/riscv/include/uapi/asm/hwprobe.h b/arch/riscv/include/uapi/asm/hwprobe.h
> new file mode 100644
> index 000000000000..591802047460
> --- /dev/null
> +++ b/arch/riscv/include/uapi/asm/hwprobe.h
> @@ -0,0 +1,25 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/*
> + * Copyright 2022 Rivos, Inc
> + */
> +
> +#ifndef _UAPI_ASM_HWPROBE_H
> +#define _UAPI_ASM_HWPROBE_H
> +
> +#include <linux/types.h>
> +
> +/*
> + * Interface for probing hardware capabilities from userspace, see
> + * Documentation/riscv/hwprobe.rst for more information.
> + */
> +struct riscv_hwprobe {
> + __s64 key;
> + __u64 value;
> +};
> +
> +#define RISCV_HWPROBE_KEY_MVENDORID 0
> +#define RISCV_HWPROBE_KEY_MARCHID 1
> +#define RISCV_HWPROBE_KEY_MIMPID 2
> +/* Increase RISCV_HWPROBE_MAX_KEY when adding items. */

Can't wait for that to get forgotten!

> diff --git a/arch/riscv/include/uapi/asm/unistd.h b/arch/riscv/include/uapi/asm/unistd.h
> index 73d7cdd2ec49..37d47302322a 100644
> --- a/arch/riscv/include/uapi/asm/unistd.h
> +++ b/arch/riscv/include/uapi/asm/unistd.h
> @@ -43,3 +43,11 @@
> #define __NR_riscv_flush_icache (__NR_arch_specific_syscall + 15)
> #endif
> __SYSCALL(__NR_riscv_flush_icache, sys_riscv_flush_icache)
> +
> +/*
> + * Allows userspace to probe

That comment looks chopped off midway through.

> diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
> index 5d3f2fbeb33c..868a12384f5a 100644
> --- a/arch/riscv/kernel/sys_riscv.c
> +++ b/arch/riscv/kernel/sys_riscv.c
> @@ -6,8 +6,11 @@
> */
>
> #include <linux/syscalls.h>
> -#include <asm/unistd.h>
> #include <asm/cacheflush.h>
> +#include <asm/cpufeature.h>
> +#include <asm/hwprobe.h>
> +#include <asm/uaccess.h>
> +#include <asm/unistd.h>
> #include <asm-generic/mman-common.h>
>
> static long riscv_sys_mmap(unsigned long addr, unsigned long len,
> @@ -69,3 +72,144 @@ SYSCALL_DEFINE3(riscv_flush_icache, uintptr_t, start, uintptr_t, end,
>
> return 0;
> }
> +
> +/*
> + * The hwprobe interface, for allowing userspace to probe to see which features
> + * are supported by the hardware. See Documentation/riscv/hwprobe.rst for more
> + * details.
> + */
> +static int set_hwprobe(struct riscv_hwprobe __user *pair, u64 val)
> +{
> + long ret;
> +
> + ret = put_user(val, &pair->value);
> + if (ret < 0)
> + return ret;
> +
> + return 0;
> +}
> +
> +static long hwprobe_mid(struct riscv_hwprobe __user *pair, size_t key,

ngl, it took me far too long to figure out that "mid" was not some
shortening of "middle". I don't have any suggestions though, other than
using cpu_id to match your variable and I think it becomes clearer by the
end of the series anyway when the access alignment stuff appears.

> + cpumask_t *cpus)
> +{
> + long cpu, id;
> + bool first, valid;
> +
> + first = true;
> + valid = false;

Just make that
boot first = true, valid = false;
no?

> + for_each_cpu(cpu, cpus) {
> + struct riscv_cpuinfo *ci = per_cpu_ptr(&riscv_cpuinfo, cpu);
> + long cpu_id;
> +
> + switch (key) {
> + case RISCV_HWPROBE_KEY_MVENDORID:
> + cpu_id = ci->mvendorid;

Are you intentionally avoiding using riscv_cached_mfooid()?

> + break;
> + case RISCV_HWPROBE_KEY_MIMPID:
> + cpu_id = ci->mimpid;
> + break;
> + case RISCV_HWPROBE_KEY_MARCHID:
> + cpu_id = ci->marchid;
> + break;
> + }
> +
> + if (first) {
> + id = cpu_id;
> + valid = true;

So, we only end up in this function if there are CPUs in the set.
Does that mean we can assume validity by default and just define valid =
true from the get go?

> + }
> +
> + if (id != cpu_id)
> + valid = false;
> + }
> +
> + /*
> + * put_user() returns 0 on success, so use 1 to indicate it wasn't
> + * called and we should skip having incremented the output.
> + */
> + if (!valid)
> + return 1;

I'm not sure why you're returning 1 here. If the id is deemed to be
invalid, why aren't we treating it as a "real" error.
TL;DR I don't understand the comment explaining this.

> +
> + return set_hwprobe(pair, id);
> +}
> +
> +static
> +long do_riscv_hwprobe(struct riscv_hwprobe __user *pairs, long pair_count,

The files not wrapped at 80 anyway, so why not put this on the same line
as `static`?

> + long cpu_count, unsigned long __user *cpus_user,
> + unsigned long flags)
> +{
> + size_t out;
> + s64 key;
> + long ret;
> + struct cpumask cpus;
> +
> + /* Check the reserved flags. */
> + if (flags != 0)
> + return -EINVAL;
> +
> + /*
> + * The only supported values must be the same on all CPUs. Allow

"The only supported values" doesn't really make much sense. Is that
intended to be read as meaning that we only report the features
supported by all CPUs in the set?

> + * userspace to specify NULL and 0 as a shortcut to all online CPUs.
> + */
> + cpumask_clear(&cpus);
> + if ((cpu_count == 0) && (cpus_user == NULL)) {

Is this not just `if (!cpu_count && !cpus_user)`?

> + cpumask_copy(&cpus, cpu_online_mask);
> + } else {
> + if (cpu_count > cpumask_size())
> + cpu_count = cpumask_size();

nit: newline here please, helps my poor auld eyes out on the days
they're not doing too well!

Cheers,
Conor.

> + ret = copy_from_user(&cpus, cpus_user, cpu_count);
> + if (!ret)
> + return -EFAULT;
> +
> + /*
> + * Userspace must provide at least one online CPU, without that there's
> + * no way to define what is supported.
> + */
> + cpumask_and(&cpus, &cpus, cpu_online_mask);
> + if (cpumask_empty(&cpus))
> + return -EINVAL;
> + }
> +
> + for (out = 0; out < pair_count; out++, pairs++) {
> + long ret;
> +
> + if (get_user(key, &pairs->key))
> + return -EFAULT;
> +
> + switch (key) {
> + case RISCV_HWPROBE_KEY_MVENDORID:
> + case RISCV_HWPROBE_KEY_MARCHID:
> + case RISCV_HWPROBE_KEY_MIMPID:
> + ret = hwprobe_mid(pairs, key, &cpus);
> + break;
> +
> + /*
> + * For forward compatibility, unknown keys don't fail the whole
> + * call, but get their element key set to -1 and value set to 0
> + * indicating they're unrecognized.
> + */
> + default:
> + ret = put_user(-1, &pairs->key);
> + if (ret < 0)
> + return ret;
> +
> + ret = set_hwprobe(pairs, 0);
> + if (ret)
> + return ret;
> +
> + break;
> + }
> +
> + if (ret < 0)
> + return ret;
> + }
> +
> + return 0;
> +
> +}
> +
> +SYSCALL_DEFINE5(riscv_hwprobe, uintptr_t, pairs, uintptr_t, pair_count,
> + uintptr_t, cpu_count, uintptr_t, cpus, uintptr_t, flags)
> +{
> + return do_riscv_hwprobe((void __user *)pairs, pair_count, cpu_count,
> + (void __user *)cpus, flags);
> +}
> --
> 2.25.1
>


Attachments:
(No filename) (10.93 kB)
signature.asc (228.00 B)
Download all attachments

2023-02-15 08:04:19

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] RISC-V: Add a syscall for HW probing

On Tue, Feb 14, 2023 at 11:51:17PM +0000, Conor Dooley wrote:
> Hey Evan,
>
> Just as a preface, I'm reviewing this lot from a position of ignorance
> on what glibc wants so I'll refrain from commenting on call itself.
> I figure that since the commentary has kinda died on the sysfs front &
> Palmer seems to be still into the syscall stuff, that we're pushing on
> with this approach...

If I can find time (that's a big if, atm) I'd be willing to do a sysfs
prototype just for comparison/benchmarking purposes, even if the chances
it gets merged are low. But, time...

Thanks,
drew

2023-02-15 09:57:59

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] RISC-V: Add a syscall for HW probing

On Mon, Feb 6, 2023, at 21:14, Evan Green wrote:
> We don't have enough space for these all in ELF_HWCAP{,2} and there's no
> system call that quite does this, so let's just provide an arch-specific
> one to probe for hardware capabilities. This currently just provides
> m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
> the future.
>
> Co-developed-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Evan Green <[email protected]>

I'm not sure I understand the problem with
AT_HWCAP. While the bits in AT_HWCAP and AT_HWCAP2
are limited, I don't see us running out of new
AT_* words to use for additional bits. Presumably
the kernel would already have to know about the
name of each supported HW feature and could assign
a unique bit number to them.

Arnd

2023-02-15 20:54:43

by Evan Green

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] RISC-V: Add a syscall for HW probing

On Tue, Feb 14, 2023 at 3:51 PM Conor Dooley <[email protected]> wrote:
>
> Hey Evan,
>
> Just as a preface, I'm reviewing this lot from a position of ignorance
> on what glibc wants so I'll refrain from commenting on call itself.
> I figure that since the commentary has kinda died on the sysfs front &
> Palmer seems to be still into the syscall stuff, that we're pushing on
> with this approach...

Yep, sounds good.

>
> On Mon, Feb 06, 2023 at 12:14:51PM -0800, Evan Green wrote:
> > We don't have enough space for these all in ELF_HWCAP{,2} and there's no
> > system call that quite does this, so let's just provide an arch-specific
> > one to probe for hardware capabilities. This currently just provides
> > m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
> > the future.
> >
> > Co-developed-by: Palmer Dabbelt <[email protected]>
> > Signed-off-by: Palmer Dabbelt <[email protected]>
> > Signed-off-by: Evan Green <[email protected]>
> >
> > ---
> >
> > Changes in v2:
> > - Changed the interface to look more like poll(). Rather than supplying
> > key_offset and getting back an array of values with numerically
> > contiguous keys, have the user pre-fill the key members of the array,
> > and the kernel will fill in the corresponding values. For any key it
> > doesn't recognize, it will set the key of that element to -1. This
> > allows usermode to quickly ask for exactly the elements it cares
> > about, and not get bogged down in a back and forth about newer keys
> > that older kernels might not recognize. In other words, the kernel
> > can communicate that it doesn't recognize some of the keys while
> > still providing the data for the keys it does know.
> > - Added a shortcut to the cpuset parameters that if a size of 0 and
> > NULL is provided for the CPU set, the kernel will use a cpu mask of
> > all online CPUs. This is convenient because I suspect most callers
> > will only want to act on a feature if it's supported on all CPUs, and
> > it's a headache to dynamically allocate an array of all 1s, not to
> > mention a waste to have the kernel loop over all of the offline bits.
> >
> >
> > ---
> > Documentation/riscv/hwprobe.rst | 37 +++++++
> > Documentation/riscv/index.rst | 1 +
> > arch/riscv/include/asm/hwprobe.h | 13 +++
> > arch/riscv/include/asm/syscall.h | 3 +
> > arch/riscv/include/uapi/asm/hwprobe.h | 25 +++++
> > arch/riscv/include/uapi/asm/unistd.h | 8 ++
> > arch/riscv/kernel/cpu.c | 3 +-
> > arch/riscv/kernel/sys_riscv.c | 146 +++++++++++++++++++++++++-
> > 8 files changed, 234 insertions(+), 2 deletions(-)
> > create mode 100644 Documentation/riscv/hwprobe.rst
> > create mode 100644 arch/riscv/include/asm/hwprobe.h
> > create mode 100644 arch/riscv/include/uapi/asm/hwprobe.h
> >
> > diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
> > new file mode 100644
> > index 000000000000..97771090e972
> > --- /dev/null
> > +++ b/Documentation/riscv/hwprobe.rst
> > @@ -0,0 +1,37 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +RISC-V Hardware Probing Interface
> > +---------------------------------
> > +
> > +The RISC-V hardware probing interface is based around a single syscall, which
> > +is defined in <asm/hwprobe.h>::
> > +
> > + struct riscv_hwprobe {
> > + __s64 key;
> > + __u64 value;
> > + };
> > +
> > + long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
> > + size_t cpu_count, cpu_set_t *cpus,
> > + unsigned long flags);
> > +
> > +The arguments are split into three groups: an array of key-value pairs, a CPU
> > +set, and some flags. The key-value pairs are supplied with a count. Userspace
> > +must prepopulate the key field for each element, and the kernel will fill in the
> > +value if the key is recognized. If a key is unknown to the kernel, its key
> > +field will be cleared to -1, and its value set to 0. The CPU set is defined by
> > +CPU_SET(3), the indicated features will be supported on all CPUs in the set.
> > +Usermode can supply NULL for cpus and 0 for cpu_count as a shortcut for all
> > +online CPUs. There are currently no flags, this value must be zero for future
> > +compatibility.
> > +
> > +On success 0 is returned, on failure a negative error code is returned.
> > +
> > +The following keys are defined:
> > +
> > +* :RISCV_HWPROBE_KEY_MVENDORID:: Contains the value of :mvendorid:, as per the
> > + ISA specifications.
>
> "per the ISA specifications" sounds like dangerous wording to me! ;)

I can replace "per the ISA specifications" with "as defined by the
RISC-V privileged architecture specification" to try and make that
more crisp.

>
> > +* :RISCV_HWPROBE_KEY_MARCHID:: Contains the value of :marchid:, as per the ISA
> > + specifications.
> > +* :RISCV_HWPROBE_KEY_MIMPLID:: Contains the value of :mimplid:, as per the ISA
> > + specifications.
>
> > diff --git a/arch/riscv/include/uapi/asm/hwprobe.h b/arch/riscv/include/uapi/asm/hwprobe.h
> > new file mode 100644
> > index 000000000000..591802047460
> > --- /dev/null
> > +++ b/arch/riscv/include/uapi/asm/hwprobe.h
> > @@ -0,0 +1,25 @@
> > +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> > +/*
> > + * Copyright 2022 Rivos, Inc
> > + */
> > +
> > +#ifndef _UAPI_ASM_HWPROBE_H
> > +#define _UAPI_ASM_HWPROBE_H
> > +
> > +#include <linux/types.h>
> > +
> > +/*
> > + * Interface for probing hardware capabilities from userspace, see
> > + * Documentation/riscv/hwprobe.rst for more information.
> > + */
> > +struct riscv_hwprobe {
> > + __s64 key;
> > + __u64 value;
> > +};
> > +
> > +#define RISCV_HWPROBE_KEY_MVENDORID 0
> > +#define RISCV_HWPROBE_KEY_MARCHID 1
> > +#define RISCV_HWPROBE_KEY_MIMPID 2
> > +/* Increase RISCV_HWPROBE_MAX_KEY when adding items. */
>
> Can't wait for that to get forgotten!

I know. I could add an if (pair->key > RISCV_HWPROBE_MAX_KEY) goto
unrecognized_key, with a label at the default switch case, which would
effectively be a runtime guard against it. I opted not to as it's
aesthetically harsh, but anyone can holler if they want it.

>
> > diff --git a/arch/riscv/include/uapi/asm/unistd.h b/arch/riscv/include/uapi/asm/unistd.h
> > index 73d7cdd2ec49..37d47302322a 100644
> > --- a/arch/riscv/include/uapi/asm/unistd.h
> > +++ b/arch/riscv/include/uapi/asm/unistd.h
> > @@ -43,3 +43,11 @@
> > #define __NR_riscv_flush_icache (__NR_arch_specific_syscall + 15)
> > #endif
> > __SYSCALL(__NR_riscv_flush_icache, sys_riscv_flush_icache)
> > +
> > +/*
> > + * Allows userspace to probe
>
> That comment looks chopped off midway through.

Whoops yes I

>
> > diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
> > index 5d3f2fbeb33c..868a12384f5a 100644
> > --- a/arch/riscv/kernel/sys_riscv.c
> > +++ b/arch/riscv/kernel/sys_riscv.c
> > @@ -6,8 +6,11 @@
> > */
> >
> > #include <linux/syscalls.h>
> > -#include <asm/unistd.h>
> > #include <asm/cacheflush.h>
> > +#include <asm/cpufeature.h>
> > +#include <asm/hwprobe.h>
> > +#include <asm/uaccess.h>
> > +#include <asm/unistd.h>
> > #include <asm-generic/mman-common.h>
> >
> > static long riscv_sys_mmap(unsigned long addr, unsigned long len,
> > @@ -69,3 +72,144 @@ SYSCALL_DEFINE3(riscv_flush_icache, uintptr_t, start, uintptr_t, end,
> >
> > return 0;
> > }
> > +
> > +/*
> > + * The hwprobe interface, for allowing userspace to probe to see which features
> > + * are supported by the hardware. See Documentation/riscv/hwprobe.rst for more
> > + * details.
> > + */
> > +static int set_hwprobe(struct riscv_hwprobe __user *pair, u64 val)
> > +{
> > + long ret;
> > +
> > + ret = put_user(val, &pair->value);
> > + if (ret < 0)
> > + return ret;
> > +
> > + return 0;
> > +}
> > +
> > +static long hwprobe_mid(struct riscv_hwprobe __user *pair, size_t key,
>
> ngl, it took me far too long to figure out that "mid" was not some
> shortening of "middle". I don't have any suggestions though, other than
> using cpu_id to match your variable and I think it becomes clearer by the
> end of the series anyway when the access alignment stuff appears.

Agreed. I'll rename this to hwprobe_arch_id().

>
> > + cpumask_t *cpus)
> > +{
> > + long cpu, id;
> > + bool first, valid;
> > +
> > + first = true;
> > + valid = false;
>
> Just make that
> boot first = true, valid = false;
> no?
>
> > + for_each_cpu(cpu, cpus) {
> > + struct riscv_cpuinfo *ci = per_cpu_ptr(&riscv_cpuinfo, cpu);
> > + long cpu_id;
> > +
> > + switch (key) {
> > + case RISCV_HWPROBE_KEY_MVENDORID:
> > + cpu_id = ci->mvendorid;
>
> Are you intentionally avoiding using riscv_cached_mfooid()?

This might have been done with the expectation that there would be
more members in that struct to query, so grabbing the pointer in a
higher scope made sense. It didn't evolve that way, so you're right I
should just use those functions.

>
> > + break;
> > + case RISCV_HWPROBE_KEY_MIMPID:
> > + cpu_id = ci->mimpid;
> > + break;
> > + case RISCV_HWPROBE_KEY_MARCHID:
> > + cpu_id = ci->marchid;
> > + break;
> > + }
> > +
> > + if (first) {
> > + id = cpu_id;
> > + valid = true;
>
> So, we only end up in this function if there are CPUs in the set.
> Does that mean we can assume validity by default and just define valid =
> true from the get go?

The valid bool was meant to remember if we had come across a value
that was different from the others. But it can be eliminated with a
different code flow. For the next spin I removed valid altogether and
just break out of the loop if we find a value that's different, since
we already know what the return value will be and don't need to keep
iterating.

>
> > + }
> > +
> > + if (id != cpu_id)
> > + valid = false;
> > + }
> > +
> > + /*
> > + * put_user() returns 0 on success, so use 1 to indicate it wasn't
> > + * called and we should skip having incremented the output.
> > + */
> > + if (!valid)
> > + return 1;
>
> I'm not sure why you're returning 1 here. If the id is deemed to be
> invalid, why aren't we treating it as a "real" error.
> TL;DR I don't understand the comment explaining this.

You're right, this is a mistake, leftover from v1 when the return
value signified the number of entries populated. For v3 I changed the
return type of this function to void, and just stick -1 in value if
the requested key isn't consistent across the given cpu set. That way
the syscall can still populate and return all the other values you
requested while simultaneously indicating "I can't give you a single
answer for that cpuset" for these vendor/arch/impl id elements.

>
> > +
> > + return set_hwprobe(pair, id);
> > +}
> > +
> > +static
> > +long do_riscv_hwprobe(struct riscv_hwprobe __user *pairs, long pair_count,
>
> The files not wrapped at 80 anyway, so why not put this on the same line
> as `static`?

It actually mostly is wrapped at 80, except for another mistake I
made. But I can break up the line between the parameters which I think
is closer to what people are used to seeing.

>
> > + long cpu_count, unsigned long __user *cpus_user,
> > + unsigned long flags)
> > +{
> > + size_t out;
> > + s64 key;
> > + long ret;
> > + struct cpumask cpus;
> > +
> > + /* Check the reserved flags. */
> > + if (flags != 0)
> > + return -EINVAL;
> > +
> > + /*
> > + * The only supported values must be the same on all CPUs. Allow
>
> "The only supported values" doesn't really make much sense. Is that
> intended to be read as meaning that we only report the features
> supported by all CPUs in the set?

Yeah let me change that to:
/*
* The interface supports taking in a CPU mask, and returns values that
* are consistent across that mask. Allow userspace to specify NULL and
* 0 as a shortcut to all online CPUs.
*/


>
> > + * userspace to specify NULL and 0 as a shortcut to all online CPUs.
> > + */
> > + cpumask_clear(&cpus);
> > + if ((cpu_count == 0) && (cpus_user == NULL)) {
>
> Is this not just `if (!cpu_count && !cpus_user)`?
>
> > + cpumask_copy(&cpus, cpu_online_mask);
> > + } else {
> > + if (cpu_count > cpumask_size())
> > + cpu_count = cpumask_size();
>
> nit: newline here please, helps my poor auld eyes out on the days
> they're not doing too well!
>
> Cheers,
> Conor.
>
> > + ret = copy_from_user(&cpus, cpus_user, cpu_count);
> > + if (!ret)
> > + return -EFAULT;
> > +
> > + /*
> > + * Userspace must provide at least one online CPU, without that there's
> > + * no way to define what is supported.
> > + */
> > + cpumask_and(&cpus, &cpus, cpu_online_mask);
> > + if (cpumask_empty(&cpus))
> > + return -EINVAL;
> > + }
> > +
> > + for (out = 0; out < pair_count; out++, pairs++) {
> > + long ret;
> > +
> > + if (get_user(key, &pairs->key))
> > + return -EFAULT;
> > +
> > + switch (key) {
> > + case RISCV_HWPROBE_KEY_MVENDORID:
> > + case RISCV_HWPROBE_KEY_MARCHID:
> > + case RISCV_HWPROBE_KEY_MIMPID:
> > + ret = hwprobe_mid(pairs, key, &cpus);
> > + break;
> > +
> > + /*
> > + * For forward compatibility, unknown keys don't fail the whole
> > + * call, but get their element key set to -1 and value set to 0
> > + * indicating they're unrecognized.
> > + */
> > + default:
> > + ret = put_user(-1, &pairs->key);
> > + if (ret < 0)
> > + return ret;
> > +
> > + ret = set_hwprobe(pairs, 0);
> > + if (ret)
> > + return ret;
> > +
> > + break;
> > + }
> > +
> > + if (ret < 0)
> > + return ret;
> > + }
> > +
> > + return 0;
> > +
> > +}
> > +
> > +SYSCALL_DEFINE5(riscv_hwprobe, uintptr_t, pairs, uintptr_t, pair_count,
> > + uintptr_t, cpu_count, uintptr_t, cpus, uintptr_t, flags)
> > +{
> > + return do_riscv_hwprobe((void __user *)pairs, pair_count, cpu_count,
> > + (void __user *)cpus, flags);
> > +}
> > --
> > 2.25.1
> >

2023-02-15 21:00:58

by Evan Green

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] dt-bindings: Add RISC-V misaligned access performance

On Tue, Feb 14, 2023 at 1:26 PM Conor Dooley <[email protected]> wrote:
>
> On Mon, Feb 06, 2023 at 12:14:53PM -0800, Evan Green wrote:
> > From: Palmer Dabbelt <[email protected]>
> >
> > This key allows device trees to specify the performance of misaligned
> > accesses to main memory regions from each CPU in the system.
> >
> > Signed-off-by: Palmer Dabbelt <[email protected]>
> > Signed-off-by: Evan Green <[email protected]>
> > ---
> >
> > (no changes since v1)
> >
> > Documentation/devicetree/bindings/riscv/cpus.yaml | 15 +++++++++++++++
> > 1 file changed, 15 insertions(+)
> >
> > diff --git a/Documentation/devicetree/bindings/riscv/cpus.yaml b/Documentation/devicetree/bindings/riscv/cpus.yaml
> > index c6720764e765..2c09bd6f2927 100644
> > --- a/Documentation/devicetree/bindings/riscv/cpus.yaml
> > +++ b/Documentation/devicetree/bindings/riscv/cpus.yaml
> > @@ -85,6 +85,21 @@ properties:
> > $ref: "/schemas/types.yaml#/definitions/string"
> > pattern: ^rv(?:64|32)imaf?d?q?c?b?v?k?h?(?:_[hsxz](?:[a-z])+)*$
> >
> > + riscv,misaligned-access-performance:
> > + description:
> > + Identifies the performance of misaligned memory accesses to main memory
> > + regions. There are three flavors of unaligned access performance: "emulated"
>
> Is the performance: emulated the source of the dt_binding_check issues?
> And the fix is as simple as:
> - description:
> + description: |
> ?

Yep, I can pass cleanly with that change. Thanks!

2023-02-15 21:11:01

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] RISC-V: Add a syscall for HW probing

heh, this came in right as I went to check out by branch with this on it
and look at the rest of the series.

On Wed, Feb 15, 2023 at 12:49:35PM -0800, Evan Green wrote:
> On Tue, Feb 14, 2023 at 3:51 PM Conor Dooley <[email protected]> wrote:
> > On Mon, Feb 06, 2023 at 12:14:51PM -0800, Evan Green wrote:

> > > +On success 0 is returned, on failure a negative error code is returned.
> > > +
> > > +The following keys are defined:
> > > +
> > > +* :RISCV_HWPROBE_KEY_MVENDORID:: Contains the value of :mvendorid:, as per the
> > > + ISA specifications.
> >
> > "per the ISA specifications" sounds like dangerous wording to me! ;)
>
> I can replace "per the ISA specifications" with "as defined by the
> RISC-V privileged architecture specification" to try and make that
> more crisp.

Meh was a comment about not trusting the ISA specs, not an attempt to
be a pedant!

> > > +#define RISCV_HWPROBE_KEY_MVENDORID 0
> > > +#define RISCV_HWPROBE_KEY_MARCHID 1
> > > +#define RISCV_HWPROBE_KEY_MIMPID 2
> > > +/* Increase RISCV_HWPROBE_MAX_KEY when adding items. */
> >
> > Can't wait for that to get forgotten!
>
> I know. I could add an if (pair->key > RISCV_HWPROBE_MAX_KEY) goto
> unrecognized_key, with a label at the default switch case, which would
> effectively be a runtime guard against it. I opted not to as it's
> aesthetically harsh, but anyone can holler if they want it.

The other question to ask is, do we need RISCV_HWPROBE_MAX_KEY?
What's it for?

> > > diff --git a/arch/riscv/include/uapi/asm/unistd.h b/arch/riscv/include/uapi/asm/unistd.h
> > > index 73d7cdd2ec49..37d47302322a 100644
> > > --- a/arch/riscv/include/uapi/asm/unistd.h
> > > +++ b/arch/riscv/include/uapi/asm/unistd.h
> > > @@ -43,3 +43,11 @@
> > > #define __NR_riscv_flush_icache (__NR_arch_specific_syscall + 15)
> > > #endif
> > > __SYSCALL(__NR_riscv_flush_icache, sys_riscv_flush_icache)
> > > +
> > > +/*
> > > + * Allows userspace to probe
> >
> > That comment looks chopped off midway through.
>
> Whoops yes I

If you could flesh it


Attachments:
(No filename) (2.00 kB)
signature.asc (228.00 B)
Download all attachments

2023-02-15 21:14:59

by Evan Green

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] RISC-V: Add a syscall for HW probing

On Wed, Feb 15, 2023 at 1:57 AM Arnd Bergmann <[email protected]> wrote:
>
> On Mon, Feb 6, 2023, at 21:14, Evan Green wrote:
> > We don't have enough space for these all in ELF_HWCAP{,2} and there's no
> > system call that quite does this, so let's just provide an arch-specific
> > one to probe for hardware capabilities. This currently just provides
> > m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
> > the future.
> >
> > Co-developed-by: Palmer Dabbelt <[email protected]>
> > Signed-off-by: Palmer Dabbelt <[email protected]>
> > Signed-off-by: Evan Green <[email protected]>
>
> I'm not sure I understand the problem with
> AT_HWCAP. While the bits in AT_HWCAP and AT_HWCAP2
> are limited, I don't see us running out of new
> AT_* words to use for additional bits. Presumably
> the kernel would already have to know about the
> name of each supported HW feature and could assign
> a unique bit number to them.

Palmer can probably speak to this with more authority, but my
understanding about the motivation for an approach like this goes
something like:
* With the nature of RISC-V, we expect a lot of these types of bits
and bobs, many more than we've seen with the likes of x86 and ARM.
* We also expect in some cases these values to be inconsistent across CPUs.
* While we could copy all that data into the aux vector every time,
it starts to look like a lot of data, not all programs care about all
of it, and a lot of it is static, making all the copying wasteful.
* Another option that would solve most of this would be to point to a
vDSO data area from the aux vector. This solves the copy complaints,
but makes that vDSO data ABI, and requires it all to be known up
front.
* So, a syscall with a vDSO function in front of it seemed like a
good combination of speed and flexibility.

You're certainly right that HWCAPn would work for what we're exposing
today, so the question probably comes down to our relative predictions
of how this data will grow.

-Evan

2023-02-15 21:25:31

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA

On Mon, Feb 06, 2023 at 12:14:52PM -0800, Evan Green wrote:
> From: Palmer Dabbelt <[email protected]>
>
> We have an implicit set of base behaviors that userspace depends on,
> which are mostly defined in various ISA specifications.
>
> Signed-off-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Evan Green <[email protected]>
> ---
>
> (no changes since v1)
>
> Documentation/riscv/hwprobe.rst | 16 ++++++++++++++++
> arch/riscv/include/asm/hwprobe.h | 2 +-
> arch/riscv/include/uapi/asm/hwprobe.h | 6 +++++-
> arch/riscv/kernel/sys_riscv.c | 23 +++++++++++++++++++++++
> 4 files changed, 45 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
> index 97771090e972..ce186967861f 100644
> --- a/Documentation/riscv/hwprobe.rst
> +++ b/Documentation/riscv/hwprobe.rst
> @@ -35,3 +35,19 @@ The following keys are defined:
> specifications.
> * :RISCV_HWPROBE_KEY_MIMPLID:: Contains the value of :mimplid:, as per the ISA
> specifications.
> +* :RISCV_HWPROBE_KEY_BASE_BEHAVIOR:: A bitmask containing the base user-visible
> + behavior that this kernel supports. The following base user ABIs are defined:
> + * :RISCV_HWPROBE_BASE_BEHAVIOR_IMA:: Support for rv32ima or rv64ima, as
> + defined by version 2.2 of the user ISA and version 1.10 of the privileged
> + ISA, with the following known exceptions (more exceptions may be added,
> + but only if it can be demonstrated that the user ABI is not broken):
> + * The :fence.i: instruction cannot be directly executed by userspace
> + programs (it may still be executed in userspace via a
> + kernel-controlled mechanism such as the vDSO).

I don't really do the whole rst thing at all, are we able to have
newlines between list items? If we can, I think one would go nicely here.

> +* :RISCV_HWPROBE_KEY_IMA_EXT_0:: A bitmask containing the extensions that are
> + compatible with the :RISCV_HWPROBE_BASE_BEHAVIOR_IMA: base system behavior.

Why do we specifically care if they're compatible with IMA?
What's the "fear" here?

> + * :RISCV_HWPROBE_IMA_FD:: The F and D extensions are supported, as defined

Also, is this IMA and FD thing a kinda commitment to only supporting
hardware that has IMA* or IMAFD*
I know that's what we do now, but only under the hood?
As per usual, I'm probably missing something. What is it?

> + by commit cd20cee ("FMIN/FMAX now implement minimumNumber/maximumNumber,
> + not minNum/maxNum") of the RISC-V ISA manual.
> + * :RISCV_HWPROBE_IMA_C:: The C extension is supported, as defined by
> + version 2.2 of the RISC-V ISA manual.

See, this seems to be how we have to treat specs, list the exact
versions! I don't even have to look to know that this was in the v1 ;)


Attachments:
(No filename) (2.76 kB)
signature.asc (228.00 B)
Download all attachments

2023-02-15 21:57:27

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] RISC-V: hwprobe: Support probing of misaligned access performance

On Mon, Feb 06, 2023 at 12:14:54PM -0800, Evan Green wrote:
> This allows userspace to select various routines to use based on the
> performance of misaligned access on the target hardware.
>
> Co-developed-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Evan Green <[email protected]>
>
> ---
>
> Changes in v2:
> - Fixed logic error in if(of_property_read_string...) that caused crash
> - Include cpufeature.h in cpufeature.h to avoid undeclared variable
> warning.
> - Added a _MASK define
> - Fix random checkpatch complaints
>
> Documentation/riscv/hwprobe.rst | 13 +++++++++++
> arch/riscv/include/asm/cpufeature.h | 2 ++
> arch/riscv/include/asm/hwprobe.h | 2 +-
> arch/riscv/include/asm/smp.h | 9 ++++++++
> arch/riscv/include/uapi/asm/hwprobe.h | 6 ++++++
> arch/riscv/kernel/cpufeature.c | 31 +++++++++++++++++++++++++--
> arch/riscv/kernel/sys_riscv.c | 23 ++++++++++++++++++++
> 7 files changed, 83 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
> index ce186967861f..0dc75e83e127 100644
> --- a/Documentation/riscv/hwprobe.rst
> +++ b/Documentation/riscv/hwprobe.rst
> @@ -51,3 +51,16 @@ The following keys are defined:
> not minNum/maxNum") of the RISC-V ISA manual.
> * :RISCV_HWPROBE_IMA_C:: The C extension is supported, as defined by
> version 2.2 of the RISC-V ISA manual.
> +* :RISCV_HWPROBE_KEY_PERF_0:: A bitmask that contains performance information

This doesn't match what's defined?

> + about the selected set of processors.
> + * :RISCV_HWPROBE_MISALIGNED_UNKNOWN:: The performance of misaligned
> + accesses is unknown.
> + * :RISCV_HWPROBE_MISALIGNED_EMULATED:: Misaligned accesses are emulated via
> + software, either in or below the kernel. These accesses are always
> + extremely slow.
> + * :RISCV_HWPROBE_MISALIGNED_SLOW:: Misaligned accesses are supported in
> + hardware, but are slower than the cooresponding aligned accesses
> + sequences.
> + * :RISCV_HWPROBE_MISALIGNED_FAST:: Misaligned accesses are supported in
> + hardware and are faster than the cooresponding aligned accesses
> + sequences.

> diff --git a/arch/riscv/include/asm/smp.h b/arch/riscv/include/asm/smp.h
> index 3831b638ecab..6c1759091e44 100644
> --- a/arch/riscv/include/asm/smp.h
> +++ b/arch/riscv/include/asm/smp.h
> @@ -26,6 +26,15 @@ struct riscv_ipi_ops {
> */
> extern unsigned long __cpuid_to_hartid_map[NR_CPUS];
> #define cpuid_to_hartid_map(cpu) __cpuid_to_hartid_map[cpu]
> +static inline long hartid_to_cpuid_map(unsigned long hartid)
> +{
> + long i;
> +
> + for (i = 0; i < NR_CPUS; ++i)

I'm never (or not yet?) sure about these things.
Should this be for_each_possible_cpu()?

> + if (cpuid_to_hartid_map(i) == hartid)
> + return i;
> + return -1;
> +}
>
> /* print IPI stats */
> void show_ipi_stats(struct seq_file *p, int prec);
> diff --git a/arch/riscv/include/uapi/asm/hwprobe.h b/arch/riscv/include/uapi/asm/hwprobe.h
> index ce39d6e74103..5d55e2da2b1f 100644
> --- a/arch/riscv/include/uapi/asm/hwprobe.h
> +++ b/arch/riscv/include/uapi/asm/hwprobe.h
> @@ -25,5 +25,11 @@ struct riscv_hwprobe {
> #define RISCV_HWPROBE_KEY_IMA_EXT_0 4
> #define RISCV_HWPROBE_IMA_FD (1 << 0)
> #define RISCV_HWPROBE_IMA_C (1 << 1)
> +#define RISCV_HWPROBE_KEY_CPUPERF_0 5
> +#define RISCV_HWPROBE_MISALIGNED_UNKNOWN (0 << 0)
> +#define RISCV_HWPROBE_MISALIGNED_EMULATED (1 << 0)
> +#define RISCV_HWPROBE_MISALIGNED_SLOW (2 << 0)
> +#define RISCV_HWPROBE_MISALIGNED_FAST (3 << 0)
> +#define RISCV_HWPROBE_MISALIGNED_MASK (3 << 0)

Why is it UNKNOWN rather than UNSUPPORTED?
I thought I saw Palmer saying that there is no requirement to support
misaligned accesses any more.
Plenty of old DTs are going to lack this property so would be UNKNOWN,
and I *assume* that the user of the syscall is gonna conflate the two,
but the rationale interests me.

> /* Increase RISCV_HWPROBE_MAX_KEY when adding items. */
> #endif
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 93e45560af30..12af6f7a2f53 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -14,8 +14,10 @@
> #include <linux/of.h>
> #include <asm/alternative.h>
> #include <asm/cacheflush.h>
> +#include <asm/cpufeature.h>
> #include <asm/errata_list.h>
> #include <asm/hwcap.h>
> +#include <asm/hwprobe.h>
> #include <asm/patch.h>
> #include <asm/pgtable.h>
> #include <asm/processor.h>
> @@ -32,6 +34,9 @@ static DECLARE_BITMAP(riscv_isa, RISCV_ISA_EXT_MAX) __read_mostly;
> DEFINE_STATIC_KEY_ARRAY_FALSE(riscv_isa_ext_keys, RISCV_ISA_EXT_KEY_MAX);
> EXPORT_SYMBOL(riscv_isa_ext_keys);
>
> +/* Performance information */
> +DEFINE_PER_CPU(long, misaligned_access_speed);
> +
> /**
> * riscv_isa_extension_base() - Get base extension word
> *
> @@ -89,11 +94,11 @@ static bool riscv_isa_extension_check(int id)
> void __init riscv_fill_hwcap(void)
> {
> struct device_node *node;
> - const char *isa;
> + const char *isa, *misaligned;
> char print_str[NUM_ALPHA_EXTS + 1];
> int i, j, rc;
> unsigned long isa2hwcap[26] = {0};
> - unsigned long hartid;
> + unsigned long hartid, cpu;
>
> isa2hwcap['i' - 'a'] = COMPAT_HWCAP_ISA_I;
> isa2hwcap['m' - 'a'] = COMPAT_HWCAP_ISA_M;
> @@ -246,6 +251,28 @@ void __init riscv_fill_hwcap(void)
> bitmap_copy(riscv_isa, this_isa, RISCV_ISA_EXT_MAX);
> else
> bitmap_and(riscv_isa, riscv_isa, this_isa, RISCV_ISA_EXT_MAX);
> +
> + /*
> + * Check for the performance of misaligned accesses.
> + */
> + cpu = hartid_to_cpuid_map(hartid);
> + if (cpu < 0)
> + continue;
> +
> + if (!of_property_read_string(node, "riscv,misaligned-access-performance",
> + &misaligned)) {
> + if (strcmp(misaligned, "emulated") == 0)
> + per_cpu(misaligned_access_speed, cpu) =
> + RISCV_HWPROBE_MISALIGNED_EMULATED;
> +
> + if (strcmp(misaligned, "slow") == 0)
> + per_cpu(misaligned_access_speed, cpu) =
> + RISCV_HWPROBE_MISALIGNED_SLOW;
> +
> + if (strcmp(misaligned, "fast") == 0)
> + per_cpu(misaligned_access_speed, cpu) =
> + RISCV_HWPROBE_MISALIGNED_FAST;
> + }
> }
>
> /* We don't support systems with F but without D, so mask those out
> diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
> index 74e0d72c877d..73d937c54f4e 100644
> --- a/arch/riscv/kernel/sys_riscv.c
> +++ b/arch/riscv/kernel/sys_riscv.c
> @@ -133,6 +133,25 @@ static long hwprobe_mid(struct riscv_hwprobe __user *pair, size_t key,
> return set_hwprobe(pair, id);
> }
>
> +static long hwprobe_misaligned(cpumask_t *cpus)
> +{
> + long cpu, perf = -1;
> +
> + for_each_cpu(cpu, cpus) {
> + long this_perf = per_cpu(misaligned_access_speed, cpu);
> +
> + if (perf == -1)
> + perf = this_perf;
> +
> + if (perf != this_perf)
> + perf = RISCV_HWPROBE_MISALIGNED_UNKNOWN;

Is there any reason to continue in the loop if this condition is met?

> + }
> +
> + if (perf == -1)
> + return RISCV_HWPROBE_MISALIGNED_UNKNOWN;
> + return perf;

heh, nitpicking the maintainer's use of whitespace... newline before
return please :)

Cheers,
Conor.

> +}
> +
> static
> long do_riscv_hwprobe(struct riscv_hwprobe __user *pairs, long pair_count,
> long cpu_count, unsigned long __user *cpus_user,
> @@ -205,6 +224,10 @@ long do_riscv_hwprobe(struct riscv_hwprobe __user *pairs, long pair_count,
> }
> break;
>
> + case RISCV_HWPROBE_KEY_CPUPERF_0:
> + ret = set_hwprobe(pairs, hwprobe_misaligned(&cpus));
> + break;
> +
> /*
> * For forward compatibility, unknown keys don't fail the whole
> * call, but get their element key set to -1 and value set to 0
> --
> 2.25.1
>


Attachments:
(No filename) (7.68 kB)
signature.asc (228.00 B)
Download all attachments

2023-02-15 22:09:23

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA

On Mon, Feb 06, 2023 at 12:14:52PM -0800, Evan Green wrote:

> + case RISCV_HWPROBE_KEY_IMA_EXT_0:
> + {
> + u64 val = 0;
> +
> + if (has_fpu())
> + val |= RISCV_HWPROBE_IMA_FD;

The indent caught my eye here for a sec so my attention was drawn back
here. Would you mind adding a line of whitespace between these checks?
Cheers,
Conor.

> + if (elf_hwcap & RISCV_ISA_EXT_c)
> + val |= RISCV_HWPROBE_IMA_C;
> + ret = set_hwprobe(pairs, val);
> + }
> + break;
> +


Attachments:
(No filename) (490.00 B)
signature.asc (228.00 B)
Download all attachments

2023-02-15 22:43:49

by Jessica Clarke

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] RISC-V: Add a syscall for HW probing

On 15 Feb 2023, at 21:14, Evan Green <[email protected]> wrote:
>
> On Wed, Feb 15, 2023 at 1:57 AM Arnd Bergmann <[email protected]> wrote:
>>
>> On Mon, Feb 6, 2023, at 21:14, Evan Green wrote:
>>> We don't have enough space for these all in ELF_HWCAP{,2} and there's no
>>> system call that quite does this, so let's just provide an arch-specific
>>> one to probe for hardware capabilities. This currently just provides
>>> m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
>>> the future.
>>>
>>> Co-developed-by: Palmer Dabbelt <[email protected]>
>>> Signed-off-by: Palmer Dabbelt <[email protected]>
>>> Signed-off-by: Evan Green <[email protected]>
>>
>> I'm not sure I understand the problem with
>> AT_HWCAP. While the bits in AT_HWCAP and AT_HWCAP2
>> are limited, I don't see us running out of new
>> AT_* words to use for additional bits. Presumably
>> the kernel would already have to know about the
>> name of each supported HW feature and could assign
>> a unique bit number to them.
>
> Palmer can probably speak to this with more authority, but my
> understanding about the motivation for an approach like this goes
> something like:
> * With the nature of RISC-V, we expect a lot of these types of bits
> and bobs, many more than we've seen with the likes of x86 and ARM.

We’re already at (I think) 51 standard user-level extensions that LLVM
knows about.

> * We also expect in some cases these values to be inconsistent across CPUs.

That’s also true of some Arm SoCs.

> * While we could copy all that data into the aux vector every time,
> it starts to look like a lot of data, not all programs care about all
> of it, and a lot of it is static, making all the copying wasteful.

Bitvectors are pretty cheap, this is negligible.

> * Another option that would solve most of this would be to point to a
> vDSO data area from the aux vector. This solves the copy complaints,
> but makes that vDSO data ABI, and requires it all to be known up
> front.

That doesn't seem like a huge deal, other than my usual point of
needing a standardised portable cross-platform API for this, so that
shouldn’t be “the” generic interface programmed against by applications.

> * So, a syscall with a vDSO function in front of it seemed like a
> good combination of speed and flexibility.
>
> You're certainly right that HWCAPn would work for what we're exposing
> today, so the question probably comes down to our relative predictions
> of how this data will grow.

The other big problem is vendor extensions.

Jess

> -Evan
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv


2023-02-16 13:30:09

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] RISC-V: Add a syscall for HW probing

On Wed, Feb 15, 2023, at 23:43, Jessica Clarke wrote:
> On 15 Feb 2023, at 21:14, Evan Green <[email protected]> wrote:
>> On Wed, Feb 15, 2023 at 1:57 AM Arnd Bergmann <[email protected]> wrote:
>> Palmer can probably speak to this with more authority, but my
>> understanding about the motivation for an approach like this goes
>> something like:
>> * With the nature of RISC-V, we expect a lot of these types of bits
>> and bobs, many more than we've seen with the likes of x86 and ARM.
>
> We’re already at (I think) 51 standard user-level extensions that LLVM
> knows about.

Do you have an estimate of how many of these require kernel support
beyond identifying the extensions?

>> * We also expect in some cases these values to be inconsistent across CPUs.
>
> That’s also true of some Arm SoCs.

Right, but it's also something that we should not encourage, or
need to make easy to use. On arm64, the kernel support for having
asymmetric aarch32 mode was kept to an absolute minimum, and an
application is expected to get the information from /proc/cpuinfo
before pinning down a task to the correct subset of all CPUs.

>> * So, a syscall with a vDSO function in front of it seemed like a
>> good combination of speed and flexibility.
>>
>> You're certainly right that HWCAPn would work for what we're exposing
>> today, so the question probably comes down to our relative predictions
>> of how this data will grow.
>
> The other big problem is vendor extensions.

My biggest concern is how this would be synchronized between the
interfaces that are available to users. What we have on other architectures
is a set of string identifiers in /proc/cpuinfo and a bitmask in HWCAP.
Ideally these are added in pairs so the information available to shell
scripts in human readers is the same that is available in the auxvec
data.

Adding a third interface with the same information or a superset
requires more work in ensuring that each extension is available
in exactly the right places. Ideally I think there should be only
one table of possible CPU features so nobody has to make the
decision about which ones are important enough to add to one
interface or another.

Arnd

2023-02-16 23:19:33

by Evan Green

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] RISC-V: Add a syscall for HW probing

On Thu, Feb 16, 2023 at 5:30 AM Arnd Bergmann <[email protected]> wrote:
>
> On Wed, Feb 15, 2023, at 23:43, Jessica Clarke wrote:
> > On 15 Feb 2023, at 21:14, Evan Green <[email protected]> wrote:
> >> On Wed, Feb 15, 2023 at 1:57 AM Arnd Bergmann <[email protected]> wrote:
> >> Palmer can probably speak to this with more authority, but my
> >> understanding about the motivation for an approach like this goes
> >> something like:
> >> * With the nature of RISC-V, we expect a lot of these types of bits
> >> and bobs, many more than we've seen with the likes of x86 and ARM.
> >
> > We’re already at (I think) 51 standard user-level extensions that LLVM
> > knows about.
>
> Do you have an estimate of how many of these require kernel support
> beyond identifying the extensions?
>
> >> * We also expect in some cases these values to be inconsistent across CPUs.
> >
> > That’s also true of some Arm SoCs.
>
> Right, but it's also something that we should not encourage, or
> need to make easy to use. On arm64, the kernel support for having
> asymmetric aarch32 mode was kept to an absolute minimum, and an
> application is expected to get the information from /proc/cpuinfo
> before pinning down a task to the correct subset of all CPUs.
>
> >> * So, a syscall with a vDSO function in front of it seemed like a
> >> good combination of speed and flexibility.
> >>
> >> You're certainly right that HWCAPn would work for what we're exposing
> >> today, so the question probably comes down to our relative predictions
> >> of how this data will grow.
> >
> > The other big problem is vendor extensions.

Since the key range can grow without accruing a process startup time
penalty, this proposal handles vendor extensions fairly well, no?
(Contrasting at least against hwcap bits, which once allocated have to
be copied into every new process forever).

>
> My biggest concern is how this would be synchronized between the
> interfaces that are available to users. What we have on other architectures
> is a set of string identifiers in /proc/cpuinfo and a bitmask in HWCAP.
> Ideally these are added in pairs so the information available to shell
> scripts in human readers is the same that is available in the auxvec
> data.
>
> Adding a third interface with the same information or a superset
> requires more work in ensuring that each extension is available
> in exactly the right places. Ideally I think there should be only
> one table of possible CPU features so nobody has to make the
> decision about which ones are important enough to add to one
> interface or another.

That makes sense. In this case, the proposal is that RISC-V would use
this mechanism and generally abandon hwcap. So my hope is there should
still only be about 2 spots to maintain.
-Evan

2023-02-18 00:16:18

by Evan Green

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] RISC-V: hwprobe: Support probing of misaligned access performance

On Wed, Feb 15, 2023 at 1:57 PM Conor Dooley <[email protected]> wrote:
>
> On Mon, Feb 06, 2023 at 12:14:54PM -0800, Evan Green wrote:
> > This allows userspace to select various routines to use based on the
> > performance of misaligned access on the target hardware.
> >
> > Co-developed-by: Palmer Dabbelt <[email protected]>
> > Signed-off-by: Palmer Dabbelt <[email protected]>
> > Signed-off-by: Evan Green <[email protected]>
> >
> > ---
> >
> > Changes in v2:
> > - Fixed logic error in if(of_property_read_string...) that caused crash
> > - Include cpufeature.h in cpufeature.h to avoid undeclared variable
> > warning.
> > - Added a _MASK define
> > - Fix random checkpatch complaints
> >
> > Documentation/riscv/hwprobe.rst | 13 +++++++++++
> > arch/riscv/include/asm/cpufeature.h | 2 ++
> > arch/riscv/include/asm/hwprobe.h | 2 +-
> > arch/riscv/include/asm/smp.h | 9 ++++++++
> > arch/riscv/include/uapi/asm/hwprobe.h | 6 ++++++
> > arch/riscv/kernel/cpufeature.c | 31 +++++++++++++++++++++++++--
> > arch/riscv/kernel/sys_riscv.c | 23 ++++++++++++++++++++
> > 7 files changed, 83 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
> > index ce186967861f..0dc75e83e127 100644
> > --- a/Documentation/riscv/hwprobe.rst
> > +++ b/Documentation/riscv/hwprobe.rst
> > @@ -51,3 +51,16 @@ The following keys are defined:
> > not minNum/maxNum") of the RISC-V ISA manual.
> > * :RISCV_HWPROBE_IMA_C:: The C extension is supported, as defined by
> > version 2.2 of the RISC-V ISA manual.
> > +* :RISCV_HWPROBE_KEY_PERF_0:: A bitmask that contains performance information
>
> This doesn't match what's defined?
>
> > + about the selected set of processors.
> > + * :RISCV_HWPROBE_MISALIGNED_UNKNOWN:: The performance of misaligned
> > + accesses is unknown.
> > + * :RISCV_HWPROBE_MISALIGNED_EMULATED:: Misaligned accesses are emulated via
> > + software, either in or below the kernel. These accesses are always
> > + extremely slow.
> > + * :RISCV_HWPROBE_MISALIGNED_SLOW:: Misaligned accesses are supported in
> > + hardware, but are slower than the cooresponding aligned accesses
> > + sequences.
> > + * :RISCV_HWPROBE_MISALIGNED_FAST:: Misaligned accesses are supported in
> > + hardware and are faster than the cooresponding aligned accesses
> > + sequences.
>
> > diff --git a/arch/riscv/include/asm/smp.h b/arch/riscv/include/asm/smp.h
> > index 3831b638ecab..6c1759091e44 100644
> > --- a/arch/riscv/include/asm/smp.h
> > +++ b/arch/riscv/include/asm/smp.h
> > @@ -26,6 +26,15 @@ struct riscv_ipi_ops {
> > */
> > extern unsigned long __cpuid_to_hartid_map[NR_CPUS];
> > #define cpuid_to_hartid_map(cpu) __cpuid_to_hartid_map[cpu]
> > +static inline long hartid_to_cpuid_map(unsigned long hartid)
> > +{
> > + long i;
> > +
> > + for (i = 0; i < NR_CPUS; ++i)
>
> I'm never (or not yet?) sure about these things.
> Should this be for_each_possible_cpu()?

Me neither. I believe it's the same, as for_each_possible_cpu()
iterates over a CPU mask of all 1s, and the size of struct cpumask is
set by NR_CPUS. Some architectures appear to have an
init_cpu_possible() function to further restrict the set, though riscv
does not. It's probably better to use for_each_possible_cpu() though
in case a call to init_cpu_possible() ever does get added.

>
> > + if (cpuid_to_hartid_map(i) == hartid)
> > + return i;
> > + return -1;
> > +}
> >
> > /* print IPI stats */
> > void show_ipi_stats(struct seq_file *p, int prec);
> > diff --git a/arch/riscv/include/uapi/asm/hwprobe.h b/arch/riscv/include/uapi/asm/hwprobe.h
> > index ce39d6e74103..5d55e2da2b1f 100644
> > --- a/arch/riscv/include/uapi/asm/hwprobe.h
> > +++ b/arch/riscv/include/uapi/asm/hwprobe.h
> > @@ -25,5 +25,11 @@ struct riscv_hwprobe {
> > #define RISCV_HWPROBE_KEY_IMA_EXT_0 4
> > #define RISCV_HWPROBE_IMA_FD (1 << 0)
> > #define RISCV_HWPROBE_IMA_C (1 << 1)
> > +#define RISCV_HWPROBE_KEY_CPUPERF_0 5
> > +#define RISCV_HWPROBE_MISALIGNED_UNKNOWN (0 << 0)
> > +#define RISCV_HWPROBE_MISALIGNED_EMULATED (1 << 0)
> > +#define RISCV_HWPROBE_MISALIGNED_SLOW (2 << 0)
> > +#define RISCV_HWPROBE_MISALIGNED_FAST (3 << 0)
> > +#define RISCV_HWPROBE_MISALIGNED_MASK (3 << 0)
>
> Why is it UNKNOWN rather than UNSUPPORTED?
> I thought I saw Palmer saying that there is no requirement to support
> misaligned accesses any more.
> Plenty of old DTs are going to lack this property so would be UNKNOWN,
> and I *assume* that the user of the syscall is gonna conflate the two,
> but the rationale interests me.

Palmer had mentioned on the DT bindings patch that historically it was
required but emulated. So because old binaries assumed it was there,
the default values for DTs without this needs to imply "supported, but
no idea how fast it is".

But you bring up an interesting point: should the bindings and these
defines have a value that indicates no support at all for unaligned
accesses? We could always add the value to the bindings later, but
maybe we should leave space in this field now.

-Evan

2023-02-28 14:56:52

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] dt-bindings: Add RISC-V misaligned access performance

On Thu, Feb 09, 2023 at 08:51:22AM -0800, Palmer Dabbelt wrote:
> On Wed, 08 Feb 2023 04:45:10 PST (-0800), [email protected] wrote:
> > From: Rob Herring
> > > Sent: 07 February 2023 17:06
> > >
> > > On Mon, Feb 06, 2023 at 12:14:53PM -0800, Evan Green wrote:
> > > > From: Palmer Dabbelt <[email protected]>
> > > >
> > > > This key allows device trees to specify the performance of misaligned
> > > > accesses to main memory regions from each CPU in the system.
> > > >
> > > > Signed-off-by: Palmer Dabbelt <[email protected]>
> > > > Signed-off-by: Evan Green <[email protected]>
> > > > ---
> > > >
> > > > (no changes since v1)
> > > >
> > > > Documentation/devicetree/bindings/riscv/cpus.yaml | 15 +++++++++++++++
> > > > 1 file changed, 15 insertions(+)
> > > >
> > > > diff --git a/Documentation/devicetree/bindings/riscv/cpus.yaml
> > > b/Documentation/devicetree/bindings/riscv/cpus.yaml
> > > > index c6720764e765..2c09bd6f2927 100644
> > > > --- a/Documentation/devicetree/bindings/riscv/cpus.yaml
> > > > +++ b/Documentation/devicetree/bindings/riscv/cpus.yaml
> > > > @@ -85,6 +85,21 @@ properties:
> > > > $ref: "/schemas/types.yaml#/definitions/string"
> > > > pattern: ^rv(?:64|32)imaf?d?q?c?b?v?k?h?(?:_[hsxz](?:[a-z])+)*$
> > > >
> > > > + riscv,misaligned-access-performance:
> > > > + description:
> > > > + Identifies the performance of misaligned memory accesses to main memory
> > > > + regions. There are three flavors of unaligned access performance: "emulated"
> > > > + means that misaligned accesses are emulated via software and thus
> > > > + extremely slow, "slow" means that misaligned accesses are supported by
> > > > + hardware but still slower that aligned accesses sequences, and "fast"
> > > > + means that misaligned accesses are as fast or faster than the
> > > > + cooresponding aligned accesses sequences.
> > > > + $ref: "/schemas/types.yaml#/definitions/string"
> > > > + enum:
> > > > + - emulated
> > > > + - slow
> > > > + - fast
> > >
> > > I don't think this belongs in DT. (I'm not sure about a userspace
> > > interface either.)
>
> [Kind of answered below.]
>
> > > Can't this be tested and determined at runtime? Do misaligned accesses
> > > and compare the performance. We already do this for things like memcpy
> > > or crypto implementation selection.
>
> We've had a history of broken firmware emulation of misaligned accesses
> wreaking havoc. We don't run into concrete bugs there because we avoid
> misaligned accesses as much as possible in the kernel, but I'd be worried
> that we'd trigger a lot of these when probing for misaligned accesses.

Then how do you distinguish between emulated and working vs. emulated
and broken? Sounds like the kernel running things would motivate fixing
firmware. :) If not, then broken platforms can disable the check with a
kernel command line flag.

>
> > There is also an long discussion about misaligned accesses
> > for loooongarch.
> >
> > Basically if you want to run a common kernel (and userspace)
> > you have to default to compiling everything with -mno-stict-align
> > so that the compiler generates byte accesses for anything
> > marked 'packed' (etc).
> >
> > Run-time tests can optimise some hot-spots.
> >
> > In any case 'slow' is probably pointless - unless the accesses
> > take more than 1 or 2 extra cycles.
>
> [Also below.]
>
> > Oh, and you really never, ever want to emulate them.
>
> Unfortunately we're kind of stuck with this one: the specs used to require
> that misaligned accesses were supported and thus there's a bunch of
> firmwares that emulate them (and various misaligned accesses spread around,
> though they're kind of a mess). The specs no longer require this support,
> but just dropping it from firmware will break binaries.
>
> There's been some vague plans to dig out of this, but it'd require some sort
> of firmware interface additions in order to turn off the emulation and
> that's going to take a while. As it stands we've got a bunch of users that
> just want to know when they can emit misaligned accesses.
>
> > Technically misaligned reads on (some) x86-64 cpu are slower
> > than aligned ones, but the difference is marginal.
> > I've measured two 64bit misaligned reads every clock.
> > But it is consistently slower by much less than one clock
> > per cache line.
>
> The "fast" case is explicitly written to catch that flavor of
> implementation.
>
> The "slow" one is a bit vaguer, but the general idea is to catch
> implementations that end up with some sort of pipeline flush on misaligned
> accesses. We've got a lot of very small in-order processors in RISC-V land,
> and while I haven't gotten around to benchmarking them all my guess is that
> the spec requirement for support ended up with some simple implementations.

If userspace wants to get into microarchitecture level optimizations, it
should just look at the CPU model. IOW, use the CPU compatible to infer
things rather than continuously adding properties in an adhoc manor
trying to parameterize everything.

Rob