There's been a bunch of off-list discussions about this, including at
Plumbers. The original plan was to do something involving providing an
ISA string to userspace, but ISA strings just aren't sufficient for a
stable ABI any more: in order to parse an ISA string users need the
version of the specifications that the string is written to, the version
of each extension (sometimes at a finer granularity than the RISC-V
releases/versions encode), and the expected use case for the ISA string
(ie, is it a U-mode or M-mode string). That's a lot of complexity to
try and keep ABI compatible and it's probably going to continue to grow,
as even if there's no more complexity in the specifications we'll have
to deal with the various ISA string parsing oddities that end up all
over userspace.
Instead this patch set takes a very different approach and provides a set
of key/value pairs that encode various bits about the system. The big
advantage here is that we can clearly define what these mean so we can
ensure ABI stability, but it also allows us to encode information that's
unlikely to ever appear in an ISA string (see the misaligned access
performance, for example). The resulting interface looks a lot like
what arm64 and x86 do, and will hopefully fit well into something like
ACPI in the future.
The actual user interface is a syscall, with a vDSO function in front of
it. The vDSO function can answer some queries without a syscall at all,
and falls back to the syscall for cases it doesn't have answers to.
Currently we prepopulate it with an array of answers for all keys and
a CPU set of "all CPUs". This can be adjusted as necessary to provide
fast answers to the most common queries.
An example series in glibc exposing this syscall and using it in an
ifunc selector for memcpy can be found at [1]. I'm about to send a v2
of that series out that incorporates the vDSO function.
I was asked about the performance delta between this and something like
sysfs. I created a small test program [2] and ran it on a riscv64 qemu
instance. Doing each operation 100000 times and dividing, these
operations take the following amount of time:
- open()+read()+close() of /sys/kernel/cpu_byteorder: 114us
- access("/sys/kernel/cpu_byteorder", R_OK): 69us
- riscv_hwprobe() vDSO and syscall: 13us
- riscv_hwprobe() vDSO with no syscall: 0.07us
These numbers get farther apart if we query multiple keys, as sysfs will
scale linearly with the number of keys, where the dedicated syscall
stays the same. To frame these numbers, I also did a tight
fork/exec/wait loop, which I measured as 23ms. So doing 4
open/read/close operations is a delta of about 2%, versus a single vDSO
call is a delta of 0.0003%.
This being qemu rather than real hardware, the numbers
themselves are somewhat inaccurate, though the relative orders of
magnitude are probably good enough.
[1] https://public-inbox.org/libc-alpha/[email protected]/T/#t
[2] https://pastebin.com/x84NEKaS
Changes in v3:
- Updated copyright date in cpufeature.h
- Fixed typo in cpufeature.h comment (Conor)
- Refactored functions so that kernel mode can query too, in
preparation for the vDSO data population.
- Changed the vendor/arch/imp IDs to return a value of -1 on mismatch
rather than failing the whole call.
- Const cpumask pointer in hwprobe_mid()
- Embellished documentation WRT cpu_set and the returned values.
- Renamed hwprobe_mid() to hwprobe_arch_id() (Conor)
- Fixed machine ID doc warnings, changed elements to c:macro:.
- Completed dangling unistd.h comment (Conor)
- Fixed line breaks and minor logic optimization (Conor).
- Use riscv_cached_mxxxid() (Conor)
- Refactored base ISA behavior probe to allow kernel probing as well,
in prep for vDSO data initialization.
- Fixed doc warnings in IMA text list, use :c:macro:.
- Added | to description: to make dt-checker happy.
- Have hwprobe_misaligned return int instead of long.
- Constify cpumask pointer in hwprobe_misaligned()
- Fix warnings in _PERF_O list documentation, use :c:macro:.
- Move include cpufeature.h to misaligned patch.
- Fix documentation mismatch for RISCV_HWPROBE_KEY_CPUPERF_0 (Conor)
- Use for_each_possible_cpu() instead of NR_CPUS (Conor)
- Break early in misaligned access iteration (Conor)
- Increase MISALIGNED_MASK from 2 bits to 3 for possible UNSUPPORTED future
value (Conor)
- Introduced vDSO function
Changes in v2:
- Factored the move of struct riscv_cpuinfo to its own header
- Changed the interface to look more like poll(). Rather than supplying
key_offset and getting back an array of values with numerically
contiguous keys, have the user pre-fill the key members of the array,
and the kernel will fill in the corresponding values. For any key it
doesn't recognize, it will set the key of that element to -1. This
allows usermode to quickly ask for exactly the elements it cares
about, and not get bogged down in a back and forth about newer keys
that older kernels might not recognize. In other words, the kernel
can communicate that it doesn't recognize some of the keys while
still providing the data for the keys it does know.
- Added a shortcut to the cpuset parameters that if a size of 0 and
NULL is provided for the CPU set, the kernel will use a cpu mask of
all online CPUs. This is convenient because I suspect most callers
will only want to act on a feature if it's supported on all CPUs, and
it's a headache to dynamically allocate an array of all 1s, not to
mention a waste to have the kernel loop over all of the offline bits.
- Fixed logic error in if(of_property_read_string...) that caused crash
- Include cpufeature.h in cpufeature.h to avoid undeclared variable
warning.
- Added a _MASK define
- Fix random checkpatch complaints
- Updated the selftests to the new API and added some more.
- Fixed indentation, comments in .S, and general checkpatch complaints.
Evan Green (6):
RISC-V: Move struct riscv_cpuinfo to new header
RISC-V: Add a syscall for HW probing
RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA
RISC-V: hwprobe: Support probing of misaligned access performance
selftests: Test the new RISC-V hwprobe interface
RISC-V: Add hwprobe vDSO function and data
Palmer Dabbelt (1):
dt-bindings: Add RISC-V misaligned access performance
.../devicetree/bindings/riscv/cpus.yaml | 15 ++
Documentation/riscv/hwprobe.rst | 74 ++++++
Documentation/riscv/index.rst | 1 +
arch/riscv/Kconfig | 1 +
arch/riscv/include/asm/cpufeature.h | 23 ++
arch/riscv/include/asm/hwprobe.h | 13 +
arch/riscv/include/asm/smp.h | 11 +
arch/riscv/include/asm/syscall.h | 3 +
arch/riscv/include/asm/vdso/data.h | 17 ++
arch/riscv/include/uapi/asm/hwprobe.h | 36 +++
arch/riscv/include/uapi/asm/unistd.h | 9 +
arch/riscv/kernel/cpu.c | 11 +-
arch/riscv/kernel/cpufeature.c | 31 ++-
arch/riscv/kernel/sys_riscv.c | 222 +++++++++++++++++-
arch/riscv/kernel/vdso/Makefile | 2 +
arch/riscv/kernel/vdso/hwprobe.c | 47 ++++
arch/riscv/kernel/vdso/sys_hwprobe.S | 15 ++
arch/riscv/kernel/vdso/vdso.lds.S | 1 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/riscv/Makefile | 58 +++++
.../testing/selftests/riscv/hwprobe/Makefile | 10 +
.../testing/selftests/riscv/hwprobe/hwprobe.c | 89 +++++++
.../selftests/riscv/hwprobe/sys_hwprobe.S | 12 +
23 files changed, 692 insertions(+), 10 deletions(-)
create mode 100644 Documentation/riscv/hwprobe.rst
create mode 100644 arch/riscv/include/asm/cpufeature.h
create mode 100644 arch/riscv/include/asm/hwprobe.h
create mode 100644 arch/riscv/include/asm/vdso/data.h
create mode 100644 arch/riscv/include/uapi/asm/hwprobe.h
create mode 100644 arch/riscv/kernel/vdso/hwprobe.c
create mode 100644 arch/riscv/kernel/vdso/sys_hwprobe.S
create mode 100644 tools/testing/selftests/riscv/Makefile
create mode 100644 tools/testing/selftests/riscv/hwprobe/Makefile
create mode 100644 tools/testing/selftests/riscv/hwprobe/hwprobe.c
create mode 100644 tools/testing/selftests/riscv/hwprobe/sys_hwprobe.S
--
2.25.1
In preparation for tracking and exposing microarchitectural details to
userspace (like whether or not unaligned accesses are fast), move the
riscv_cpuinfo struct out to its own new cpufeatures.h header. It will
need to be used by more than just cpu.c.
Signed-off-by: Evan Green <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
---
Changes in v3:
- Updated copyright date in cpufeature.h
- Fixed typo in cpufeature.h comment (Conor)
Changes in v2:
- Factored the move of struct riscv_cpuinfo to its own header
arch/riscv/include/asm/cpufeature.h | 21 +++++++++++++++++++++
arch/riscv/kernel/cpu.c | 8 ++------
2 files changed, 23 insertions(+), 6 deletions(-)
create mode 100644 arch/riscv/include/asm/cpufeature.h
diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
new file mode 100644
index 000000000000..66ebaae449c8
--- /dev/null
+++ b/arch/riscv/include/asm/cpufeature.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright 2022-2023 Rivos, Inc
+ */
+
+#ifndef _ASM_CPUFEATURE_H
+#define _ASM_CPUFEATURE_H
+
+/*
+ * These are probed via a device_initcall(), via either the SBI or directly
+ * from the corresponding CSRs.
+ */
+struct riscv_cpuinfo {
+ unsigned long mvendorid;
+ unsigned long marchid;
+ unsigned long mimpid;
+};
+
+DECLARE_PER_CPU(struct riscv_cpuinfo, riscv_cpuinfo);
+
+#endif
diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index 1b9a5a66e55a..684e5419d37d 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -7,6 +7,7 @@
#include <linux/init.h>
#include <linux/seq_file.h>
#include <linux/of.h>
+#include <asm/cpufeature.h>
#include <asm/csr.h>
#include <asm/hwcap.h>
#include <asm/sbi.h>
@@ -70,12 +71,7 @@ int riscv_of_parent_hartid(struct device_node *node, unsigned long *hartid)
return -1;
}
-struct riscv_cpuinfo {
- unsigned long mvendorid;
- unsigned long marchid;
- unsigned long mimpid;
-};
-static DEFINE_PER_CPU(struct riscv_cpuinfo, riscv_cpuinfo);
+DEFINE_PER_CPU(struct riscv_cpuinfo, riscv_cpuinfo);
unsigned long riscv_cached_mvendorid(unsigned int cpu_id)
{
--
2.25.1
We don't have enough space for these all in ELF_HWCAP{,2} and there's no
system call that quite does this, so let's just provide an arch-specific
one to probe for hardware capabilities. This currently just provides
m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
the future.
Co-developed-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Evan Green <[email protected]>
---
Changes in v3:
- Refactored functions so that kernel mode can query too, in
preparation for the vDSO data population.
- Changed the vendor/arch/imp IDs to return a value of -1 on mismatch
rather than failing the whole call.
- Const cpumask pointer in hwprobe_mid()
- Embellished documentation WRT cpu_set and the returned values.
- Renamed hwprobe_mid() to hwprobe_arch_id() (Conor)
- Fixed machine ID doc warnings, changed elements to c:macro:.
- Completed dangling unistd.h comment (Conor)
- Fixed line breaks and minor logic optimization (Conor).
- Use riscv_cached_mxxxid() (Conor)
Changes in v2:
- Changed the interface to look more like poll(). Rather than supplying
key_offset and getting back an array of values with numerically
contiguous keys, have the user pre-fill the key members of the array,
and the kernel will fill in the corresponding values. For any key it
doesn't recognize, it will set the key of that element to -1. This
allows usermode to quickly ask for exactly the elements it cares
about, and not get bogged down in a back and forth about newer keys
that older kernels might not recognize. In other words, the kernel
can communicate that it doesn't recognize some of the keys while
still providing the data for the keys it does know.
- Added a shortcut to the cpuset parameters that if a size of 0 and
NULL is provided for the CPU set, the kernel will use a cpu mask of
all online CPUs. This is convenient because I suspect most callers
will only want to act on a feature if it's supported on all CPUs, and
it's a headache to dynamically allocate an array of all 1s, not to
mention a waste to have the kernel loop over all of the offline bits.
---
Documentation/riscv/hwprobe.rst | 39 ++++++++
Documentation/riscv/index.rst | 1 +
arch/riscv/include/asm/hwprobe.h | 13 +++
arch/riscv/include/asm/syscall.h | 3 +
arch/riscv/include/uapi/asm/hwprobe.h | 25 +++++
arch/riscv/include/uapi/asm/unistd.h | 9 ++
arch/riscv/kernel/cpu.c | 3 +-
arch/riscv/kernel/sys_riscv.c | 134 +++++++++++++++++++++++++-
8 files changed, 225 insertions(+), 2 deletions(-)
create mode 100644 Documentation/riscv/hwprobe.rst
create mode 100644 arch/riscv/include/asm/hwprobe.h
create mode 100644 arch/riscv/include/uapi/asm/hwprobe.h
diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
new file mode 100644
index 000000000000..88b015a2026e
--- /dev/null
+++ b/Documentation/riscv/hwprobe.rst
@@ -0,0 +1,39 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+RISC-V Hardware Probing Interface
+---------------------------------
+
+The RISC-V hardware probing interface is based around a single syscall, which
+is defined in <asm/hwprobe.h>::
+
+ struct riscv_hwprobe {
+ __s64 key;
+ __u64 value;
+ };
+
+ long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
+ size_t cpu_count, cpu_set_t *cpus,
+ unsigned long flags);
+
+The arguments are split into three groups: an array of key-value pairs, a CPU
+set, and some flags. The key-value pairs are supplied with a count. Userspace
+must prepopulate the key field for each element, and the kernel will fill in the
+value if the key is recognized. If a key is unknown to the kernel, its key field
+will be cleared to -1, and its value set to 0. The CPU set is defined by
+CPU_SET(3). For value-like keys (eg. vendor/arch/impl), the returned value will
+be only be valid if all CPUs in the given set have the same value. Otherwise -1
+will be returned. For boolean-like keys, the value returned will be a logical
+AND of the values for the specified CPUs. Usermode can supply NULL for cpus and
+0 for cpu_count as a shortcut for all online CPUs. There are currently no flags,
+this value must be zero for future compatibility.
+
+On success 0 is returned, on failure a negative error code is returned.
+
+The following keys are defined:
+
+* :c:macro:`RISCV_HWPROBE_KEY_MVENDORID`: Contains the value of ``mvendorid``,
+ as defined by the RISC-V privileged architecture specification.
+* :c:macro:`RISCV_HWPROBE_KEY_MARCHID`: Contains the value of ``marchid``, as
+ defined by the RISC-V privileged architecture specification.
+* :c:macro:`RISCV_HWPROBE_KEY_MIMPLID`: Contains the value of ``mimplid``, as
+ defined by the RISC-V privileged architecture specification.
diff --git a/Documentation/riscv/index.rst b/Documentation/riscv/index.rst
index 2e5b18fbb145..175a91db0200 100644
--- a/Documentation/riscv/index.rst
+++ b/Documentation/riscv/index.rst
@@ -7,6 +7,7 @@ RISC-V architecture
boot-image-header
vm-layout
+ hwprobe
patch-acceptance
uabi
diff --git a/arch/riscv/include/asm/hwprobe.h b/arch/riscv/include/asm/hwprobe.h
new file mode 100644
index 000000000000..08d1c3bdd78a
--- /dev/null
+++ b/arch/riscv/include/asm/hwprobe.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright 2022 Rivos, Inc
+ */
+
+#ifndef _ASM_HWPROBE_H
+#define _ASM_HWPROBE_H
+
+#include <uapi/asm/hwprobe.h>
+
+#define RISCV_HWPROBE_MAX_KEY 2
+
+#endif
diff --git a/arch/riscv/include/asm/syscall.h b/arch/riscv/include/asm/syscall.h
index 384a63b86420..78a6302ef711 100644
--- a/arch/riscv/include/asm/syscall.h
+++ b/arch/riscv/include/asm/syscall.h
@@ -75,4 +75,7 @@ static inline int syscall_get_arch(struct task_struct *task)
}
asmlinkage long sys_riscv_flush_icache(uintptr_t, uintptr_t, uintptr_t);
+
+asmlinkage long sys_riscv_hwprobe(uintptr_t, uintptr_t, uintptr_t, uintptr_t,
+ uintptr_t, uintptr_t);
#endif /* _ASM_RISCV_SYSCALL_H */
diff --git a/arch/riscv/include/uapi/asm/hwprobe.h b/arch/riscv/include/uapi/asm/hwprobe.h
new file mode 100644
index 000000000000..591802047460
--- /dev/null
+++ b/arch/riscv/include/uapi/asm/hwprobe.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright 2022 Rivos, Inc
+ */
+
+#ifndef _UAPI_ASM_HWPROBE_H
+#define _UAPI_ASM_HWPROBE_H
+
+#include <linux/types.h>
+
+/*
+ * Interface for probing hardware capabilities from userspace, see
+ * Documentation/riscv/hwprobe.rst for more information.
+ */
+struct riscv_hwprobe {
+ __s64 key;
+ __u64 value;
+};
+
+#define RISCV_HWPROBE_KEY_MVENDORID 0
+#define RISCV_HWPROBE_KEY_MARCHID 1
+#define RISCV_HWPROBE_KEY_MIMPID 2
+/* Increase RISCV_HWPROBE_MAX_KEY when adding items. */
+
+#endif
diff --git a/arch/riscv/include/uapi/asm/unistd.h b/arch/riscv/include/uapi/asm/unistd.h
index 73d7cdd2ec49..950ab3fd4409 100644
--- a/arch/riscv/include/uapi/asm/unistd.h
+++ b/arch/riscv/include/uapi/asm/unistd.h
@@ -43,3 +43,12 @@
#define __NR_riscv_flush_icache (__NR_arch_specific_syscall + 15)
#endif
__SYSCALL(__NR_riscv_flush_icache, sys_riscv_flush_icache)
+
+/*
+ * Allows userspace to query the kernel for CPU architecture and
+ * microarchitecture details across a given set of CPUs.
+ */
+#ifndef __NR_riscv_hwprobe
+#define __NR_riscv_hwprobe (__NR_arch_specific_syscall + 14)
+#endif
+__SYSCALL(__NR_riscv_hwprobe, sys_riscv_hwprobe)
diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index 684e5419d37d..d0fb3567cc3d 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -4,15 +4,16 @@
*/
#include <linux/cpu.h>
+#include <linux/cpuhotplug.h>
#include <linux/init.h>
#include <linux/seq_file.h>
#include <linux/of.h>
#include <asm/cpufeature.h>
#include <asm/csr.h>
#include <asm/hwcap.h>
+#include <asm/pgtable.h>
#include <asm/sbi.h>
#include <asm/smp.h>
-#include <asm/pgtable.h>
/*
* Returns the hart ID of the given device tree node, or -ENODEV if the node
diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
index 5d3f2fbeb33c..02c2f1f7417e 100644
--- a/arch/riscv/kernel/sys_riscv.c
+++ b/arch/riscv/kernel/sys_riscv.c
@@ -6,8 +6,11 @@
*/
#include <linux/syscalls.h>
-#include <asm/unistd.h>
#include <asm/cacheflush.h>
+#include <asm/hwprobe.h>
+#include <asm/sbi.h>
+#include <asm/uaccess.h>
+#include <asm/unistd.h>
#include <asm-generic/mman-common.h>
static long riscv_sys_mmap(unsigned long addr, unsigned long len,
@@ -69,3 +72,132 @@ SYSCALL_DEFINE3(riscv_flush_icache, uintptr_t, start, uintptr_t, end,
return 0;
}
+
+/*
+ * The hwprobe interface, for allowing userspace to probe to see which features
+ * are supported by the hardware. See Documentation/riscv/hwprobe.rst for more
+ * details.
+ */
+static void hwprobe_arch_id(struct riscv_hwprobe *pair,
+ const struct cpumask *cpus)
+{
+ u64 cpu, id = -1ULL;
+ bool first = true;
+
+ for_each_cpu(cpu, cpus) {
+ u64 cpu_id;
+
+ switch (pair->key) {
+ case RISCV_HWPROBE_KEY_MVENDORID:
+ cpu_id = riscv_cached_mvendorid(cpu);
+ break;
+ case RISCV_HWPROBE_KEY_MIMPID:
+ cpu_id = riscv_cached_mimpid(cpu);
+ break;
+ case RISCV_HWPROBE_KEY_MARCHID:
+ cpu_id = riscv_cached_marchid(cpu);
+ break;
+ }
+
+ if (first)
+ id = cpu_id;
+
+ /*
+ * If there's a mismatch for the given set, return -1 in the
+ * value.
+ */
+ if (id != cpu_id) {
+ id = -1ULL;
+ break;
+ }
+ }
+
+ pair->value = id;
+}
+
+static void hwprobe_one_pair(struct riscv_hwprobe *pair,
+ const struct cpumask *cpus)
+{
+ switch (pair->key) {
+ case RISCV_HWPROBE_KEY_MVENDORID:
+ case RISCV_HWPROBE_KEY_MARCHID:
+ case RISCV_HWPROBE_KEY_MIMPID:
+ hwprobe_arch_id(pair, cpus);
+ break;
+
+ /*
+ * For forward compatibility, unknown keys don't fail the whole
+ * call, but get their element key set to -1 and value set to 0
+ * indicating they're unrecognized.
+ */
+ default:
+ pair->key = -1;
+ pair->value = 0;
+ break;
+ }
+}
+
+static
+int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs, long pair_count,
+ long cpu_count, unsigned long __user *cpus_user,
+ unsigned long flags)
+{
+ size_t out;
+ int ret;
+ cpumask_t cpus;
+
+ /* Check the reserved flags. */
+ if (flags != 0)
+ return -EINVAL;
+
+ /*
+ * The interface supports taking in a CPU mask, and returns values that
+ * are consistent across that mask. Allow userspace to specify NULL and
+ * 0 as a shortcut to all online CPUs.
+ */
+ cpumask_clear(&cpus);
+ if (!cpu_count && !cpus_user) {
+ cpumask_copy(&cpus, cpu_online_mask);
+ } else {
+ if (cpu_count > cpumask_size())
+ cpu_count = cpumask_size();
+
+ ret = copy_from_user(&cpus, cpus_user, cpu_count);
+ if (!ret)
+ return -EFAULT;
+
+ /*
+ * Userspace must provide at least one online CPU, without that
+ * there's no way to define what is supported.
+ */
+ cpumask_and(&cpus, &cpus, cpu_online_mask);
+ if (cpumask_empty(&cpus))
+ return -EINVAL;
+ }
+
+ for (out = 0; out < pair_count; out++, pairs++) {
+ struct riscv_hwprobe pair;
+
+ if (get_user(pair.key, &pairs->key))
+ return -EFAULT;
+
+ pair.value = 0;
+ hwprobe_one_pair(&pair, &cpus);
+ ret = put_user(pair.key, &pairs->key);
+ if (ret == 0)
+ ret = put_user(pair.value, &pairs->value);
+
+ if (ret)
+ return -EFAULT;
+ }
+
+ return 0;
+
+}
+
+SYSCALL_DEFINE5(riscv_hwprobe, uintptr_t, pairs, uintptr_t, pair_count,
+ uintptr_t, cpu_count, uintptr_t, cpus, uintptr_t, flags)
+{
+ return do_riscv_hwprobe((void __user *)pairs, pair_count, cpu_count,
+ (void __user *)cpus, flags);
+}
--
2.25.1
We have an implicit set of base behaviors that userspace depends on,
which are mostly defined in various ISA specifications.
Co-developed-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Evan Green <[email protected]>
---
Changes in v3:
- Refactored base ISA behavior probe to allow kernel probing as well,
in prep for vDSO data initialization.
- Fixed doc warnings in IMA text list, use :c:macro:.
Documentation/riscv/hwprobe.rst | 21 +++++++++++++++++++++
arch/riscv/include/asm/hwprobe.h | 2 +-
arch/riscv/include/uapi/asm/hwprobe.h | 5 +++++
arch/riscv/kernel/sys_riscv.c | 20 ++++++++++++++++++++
4 files changed, 47 insertions(+), 1 deletion(-)
diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
index 88b015a2026e..9f2da414fbf8 100644
--- a/Documentation/riscv/hwprobe.rst
+++ b/Documentation/riscv/hwprobe.rst
@@ -37,3 +37,24 @@ The following keys are defined:
defined by the RISC-V privileged architecture specification.
* :c:macro:`RISCV_HWPROBE_KEY_MIMPLID`: Contains the value of ``mimplid``, as
defined by the RISC-V privileged architecture specification.
+* :c:macro:`RISCV_HWPROBE_KEY_BASE_BEHAVIOR`: A bitmask containing the base
+ user-visible behavior that this kernel supports. The following base user ABIs
+ are defined:
+
+ * :c:macro:`RISCV_HWPROBE_BASE_BEHAVIOR_IMA`: Support for rv32ima or
+ rv64ima, as defined by version 2.2 of the user ISA and version 1.10 of the
+ privileged ISA, with the following known exceptions (more exceptions may be
+ added, but only if it can be demonstrated that the user ABI is not broken):
+
+ * The :fence.i: instruction cannot be directly executed by userspace
+ programs (it may still be executed in userspace via a
+ kernel-controlled mechanism such as the vDSO).
+* :c:macro:`RISCV_HWPROBE_KEY_IMA_EXT_0`: A bitmask containing the extensions
+ that are compatible with the :c:macro:`RISCV_HWPROBE_BASE_BEHAVIOR_IMA`:
+ base system behavior.
+
+ * :c:macro:`RISCV_HWPROBE_IMA_FD`: The F and D extensions are supported, as
+ defined by commit cd20cee ("FMIN/FMAX now implement
+ minimumNumber/maximumNumber, not minNum/maxNum") of the RISC-V ISA manual.
+ * :c:macro:`RISCV_HWPROBE_IMA_C`: The C extension is supported, as defined
+ by version 2.2 of the RISC-V ISA manual.
diff --git a/arch/riscv/include/asm/hwprobe.h b/arch/riscv/include/asm/hwprobe.h
index 08d1c3bdd78a..7e52f1e1fe10 100644
--- a/arch/riscv/include/asm/hwprobe.h
+++ b/arch/riscv/include/asm/hwprobe.h
@@ -8,6 +8,6 @@
#include <uapi/asm/hwprobe.h>
-#define RISCV_HWPROBE_MAX_KEY 2
+#define RISCV_HWPROBE_MAX_KEY 4
#endif
diff --git a/arch/riscv/include/uapi/asm/hwprobe.h b/arch/riscv/include/uapi/asm/hwprobe.h
index 591802047460..fc5665411782 100644
--- a/arch/riscv/include/uapi/asm/hwprobe.h
+++ b/arch/riscv/include/uapi/asm/hwprobe.h
@@ -20,6 +20,11 @@ struct riscv_hwprobe {
#define RISCV_HWPROBE_KEY_MVENDORID 0
#define RISCV_HWPROBE_KEY_MARCHID 1
#define RISCV_HWPROBE_KEY_MIMPID 2
+#define RISCV_HWPROBE_KEY_BASE_BEHAVIOR 3
+#define RISCV_HWPROBE_BASE_BEHAVIOR_IMA (1 << 0)
+#define RISCV_HWPROBE_KEY_IMA_EXT_0 4
+#define RISCV_HWPROBE_IMA_FD (1 << 0)
+#define RISCV_HWPROBE_IMA_C (1 << 1)
/* Increase RISCV_HWPROBE_MAX_KEY when adding items. */
#endif
diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
index 02c2f1f7417e..f2b224550923 100644
--- a/arch/riscv/kernel/sys_riscv.c
+++ b/arch/riscv/kernel/sys_riscv.c
@@ -9,6 +9,7 @@
#include <asm/cacheflush.h>
#include <asm/hwprobe.h>
#include <asm/sbi.h>
+#include <asm/switch_to.h>
#include <asm/uaccess.h>
#include <asm/unistd.h>
#include <asm-generic/mman-common.h>
@@ -124,6 +125,25 @@ static void hwprobe_one_pair(struct riscv_hwprobe *pair,
case RISCV_HWPROBE_KEY_MIMPID:
hwprobe_arch_id(pair, cpus);
break;
+ /*
+ * The kernel already assumes that the base single-letter ISA
+ * extensions are supported on all harts, and only supports the
+ * IMA base, so just cheat a bit here and tell that to
+ * userspace.
+ */
+ case RISCV_HWPROBE_KEY_BASE_BEHAVIOR:
+ pair->value = RISCV_HWPROBE_BASE_BEHAVIOR_IMA;
+ break;
+
+ case RISCV_HWPROBE_KEY_IMA_EXT_0:
+ pair->value = 0;
+ if (has_fpu())
+ pair->value |= RISCV_HWPROBE_IMA_FD;
+
+ if (elf_hwcap & RISCV_ISA_EXT_c)
+ pair->value |= RISCV_HWPROBE_IMA_C;
+
+ break;
/*
* For forward compatibility, unknown keys don't fail the whole
--
2.25.1
This allows userspace to select various routines to use based on the
performance of misaligned access on the target hardware.
Co-developed-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Evan Green <[email protected]>
---
Changes in v3:
- Have hwprobe_misaligned return int instead of long.
- Constify cpumask pointer in hwprobe_misaligned()
- Fix warnings in _PERF_O list documentation, use :c:macro:.
- Move include cpufeature.h to misaligned patch.
- Fix documentation mismatch for RISCV_HWPROBE_KEY_CPUPERF_0 (Conor)
- Use for_each_possible_cpu() instead of NR_CPUS (Conor)
- Break early in misaligned access iteration (Conor)
- Increase MISALIGNED_MASK from 2 bits to 3 for possible UNSUPPORTED future
value (Conor)
Changes in v2:
- Fixed logic error in if(of_property_read_string...) that caused crash
- Include cpufeature.h in cpufeature.h to avoid undeclared variable
warning.
- Added a _MASK define
- Fix random checkpatch complaints
Documentation/riscv/hwprobe.rst | 14 ++++++++++++
arch/riscv/include/asm/cpufeature.h | 2 ++
arch/riscv/include/asm/hwprobe.h | 2 +-
arch/riscv/include/asm/smp.h | 11 ++++++++++
arch/riscv/include/uapi/asm/hwprobe.h | 6 ++++++
arch/riscv/kernel/cpufeature.c | 31 +++++++++++++++++++++++++--
arch/riscv/kernel/sys_riscv.c | 27 +++++++++++++++++++++++
7 files changed, 90 insertions(+), 3 deletions(-)
diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
index 9f2da414fbf8..ac6ded9eb0da 100644
--- a/Documentation/riscv/hwprobe.rst
+++ b/Documentation/riscv/hwprobe.rst
@@ -58,3 +58,17 @@ The following keys are defined:
minimumNumber/maximumNumber, not minNum/maxNum") of the RISC-V ISA manual.
* :c:macro:`RISCV_HWPROBE_IMA_C`: The C extension is supported, as defined
by version 2.2 of the RISC-V ISA manual.
+* :c:macro:`RISCV_HWPROBE_KEY_CPUPERF_0`: A bitmask that contains performance
+ information about the selected set of processors.
+
+ * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNKNOWN`: The performance of misaligned
+ accesses is unknown.
+ * :c:macro:`RISCV_HWPROBE_MISALIGNED_EMULATED`: Misaligned accesses are
+ emulated via software, either in or below the kernel. These accesses are
+ always extremely slow.
+ * :c:macro:`RISCV_HWPROBE_MISALIGNED_SLOW`: Misaligned accesses are supported
+ in hardware, but are slower than the cooresponding aligned accesses
+ sequences.
+ * :c:macro:`RISCV_HWPROBE_MISALIGNED_FAST`: Misaligned accesses are supported
+ in hardware and are faster than the cooresponding aligned accesses
+ sequences.
diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
index 66ebaae449c8..808d5403f2ac 100644
--- a/arch/riscv/include/asm/cpufeature.h
+++ b/arch/riscv/include/asm/cpufeature.h
@@ -18,4 +18,6 @@ struct riscv_cpuinfo {
DECLARE_PER_CPU(struct riscv_cpuinfo, riscv_cpuinfo);
+DECLARE_PER_CPU(long, misaligned_access_speed);
+
#endif
diff --git a/arch/riscv/include/asm/hwprobe.h b/arch/riscv/include/asm/hwprobe.h
index 7e52f1e1fe10..4e45e33015bc 100644
--- a/arch/riscv/include/asm/hwprobe.h
+++ b/arch/riscv/include/asm/hwprobe.h
@@ -8,6 +8,6 @@
#include <uapi/asm/hwprobe.h>
-#define RISCV_HWPROBE_MAX_KEY 4
+#define RISCV_HWPROBE_MAX_KEY 5
#endif
diff --git a/arch/riscv/include/asm/smp.h b/arch/riscv/include/asm/smp.h
index 3831b638ecab..4f77a64194e5 100644
--- a/arch/riscv/include/asm/smp.h
+++ b/arch/riscv/include/asm/smp.h
@@ -26,6 +26,17 @@ struct riscv_ipi_ops {
*/
extern unsigned long __cpuid_to_hartid_map[NR_CPUS];
#define cpuid_to_hartid_map(cpu) __cpuid_to_hartid_map[cpu]
+static inline long hartid_to_cpuid_map(unsigned long hartid)
+{
+ long i;
+
+ for_each_possible_cpu(i) {
+ if (cpuid_to_hartid_map(i) == hartid)
+ return i;
+ }
+
+ return -1;
+}
/* print IPI stats */
void show_ipi_stats(struct seq_file *p, int prec);
diff --git a/arch/riscv/include/uapi/asm/hwprobe.h b/arch/riscv/include/uapi/asm/hwprobe.h
index fc5665411782..93fde30fc4a8 100644
--- a/arch/riscv/include/uapi/asm/hwprobe.h
+++ b/arch/riscv/include/uapi/asm/hwprobe.h
@@ -25,6 +25,12 @@ struct riscv_hwprobe {
#define RISCV_HWPROBE_KEY_IMA_EXT_0 4
#define RISCV_HWPROBE_IMA_FD (1 << 0)
#define RISCV_HWPROBE_IMA_C (1 << 1)
+#define RISCV_HWPROBE_KEY_CPUPERF_0 5
+#define RISCV_HWPROBE_MISALIGNED_UNKNOWN (0 << 0)
+#define RISCV_HWPROBE_MISALIGNED_EMULATED (1 << 0)
+#define RISCV_HWPROBE_MISALIGNED_SLOW (2 << 0)
+#define RISCV_HWPROBE_MISALIGNED_FAST (3 << 0)
+#define RISCV_HWPROBE_MISALIGNED_MASK (7 << 0)
/* Increase RISCV_HWPROBE_MAX_KEY when adding items. */
#endif
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 93e45560af30..12af6f7a2f53 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -14,8 +14,10 @@
#include <linux/of.h>
#include <asm/alternative.h>
#include <asm/cacheflush.h>
+#include <asm/cpufeature.h>
#include <asm/errata_list.h>
#include <asm/hwcap.h>
+#include <asm/hwprobe.h>
#include <asm/patch.h>
#include <asm/pgtable.h>
#include <asm/processor.h>
@@ -32,6 +34,9 @@ static DECLARE_BITMAP(riscv_isa, RISCV_ISA_EXT_MAX) __read_mostly;
DEFINE_STATIC_KEY_ARRAY_FALSE(riscv_isa_ext_keys, RISCV_ISA_EXT_KEY_MAX);
EXPORT_SYMBOL(riscv_isa_ext_keys);
+/* Performance information */
+DEFINE_PER_CPU(long, misaligned_access_speed);
+
/**
* riscv_isa_extension_base() - Get base extension word
*
@@ -89,11 +94,11 @@ static bool riscv_isa_extension_check(int id)
void __init riscv_fill_hwcap(void)
{
struct device_node *node;
- const char *isa;
+ const char *isa, *misaligned;
char print_str[NUM_ALPHA_EXTS + 1];
int i, j, rc;
unsigned long isa2hwcap[26] = {0};
- unsigned long hartid;
+ unsigned long hartid, cpu;
isa2hwcap['i' - 'a'] = COMPAT_HWCAP_ISA_I;
isa2hwcap['m' - 'a'] = COMPAT_HWCAP_ISA_M;
@@ -246,6 +251,28 @@ void __init riscv_fill_hwcap(void)
bitmap_copy(riscv_isa, this_isa, RISCV_ISA_EXT_MAX);
else
bitmap_and(riscv_isa, riscv_isa, this_isa, RISCV_ISA_EXT_MAX);
+
+ /*
+ * Check for the performance of misaligned accesses.
+ */
+ cpu = hartid_to_cpuid_map(hartid);
+ if (cpu < 0)
+ continue;
+
+ if (!of_property_read_string(node, "riscv,misaligned-access-performance",
+ &misaligned)) {
+ if (strcmp(misaligned, "emulated") == 0)
+ per_cpu(misaligned_access_speed, cpu) =
+ RISCV_HWPROBE_MISALIGNED_EMULATED;
+
+ if (strcmp(misaligned, "slow") == 0)
+ per_cpu(misaligned_access_speed, cpu) =
+ RISCV_HWPROBE_MISALIGNED_SLOW;
+
+ if (strcmp(misaligned, "fast") == 0)
+ per_cpu(misaligned_access_speed, cpu) =
+ RISCV_HWPROBE_MISALIGNED_FAST;
+ }
}
/* We don't support systems with F but without D, so mask those out
diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
index f2b224550923..afc1b96d5c1a 100644
--- a/arch/riscv/kernel/sys_riscv.c
+++ b/arch/riscv/kernel/sys_riscv.c
@@ -7,6 +7,7 @@
#include <linux/syscalls.h>
#include <asm/cacheflush.h>
+#include <asm/cpufeature.h>
#include <asm/hwprobe.h>
#include <asm/sbi.h>
#include <asm/switch_to.h>
@@ -116,6 +117,28 @@ static void hwprobe_arch_id(struct riscv_hwprobe *pair,
pair->value = id;
}
+static int hwprobe_misaligned(const struct cpumask *cpus)
+{
+ int cpu, perf = -1;
+
+ for_each_cpu(cpu, cpus) {
+ int this_perf = per_cpu(misaligned_access_speed, cpu);
+
+ if (perf == -1)
+ perf = this_perf;
+
+ if (perf != this_perf) {
+ perf = RISCV_HWPROBE_MISALIGNED_UNKNOWN;
+ break;
+ }
+ }
+
+ if (perf == -1)
+ return RISCV_HWPROBE_MISALIGNED_UNKNOWN;
+
+ return perf;
+}
+
static void hwprobe_one_pair(struct riscv_hwprobe *pair,
const struct cpumask *cpus)
{
@@ -145,6 +168,10 @@ static void hwprobe_one_pair(struct riscv_hwprobe *pair,
break;
+ case RISCV_HWPROBE_KEY_CPUPERF_0:
+ pair->value = hwprobe_misaligned(cpus);
+ break;
+
/*
* For forward compatibility, unknown keys don't fail the whole
* call, but get their element key set to -1 and value set to 0
--
2.25.1
From: Palmer Dabbelt <[email protected]>
This key allows device trees to specify the performance of misaligned
accesses to main memory regions from each CPU in the system.
Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Evan Green <[email protected]>
---
Changes in v3:
- Added | to description: to make dt-checker happy.
Documentation/devicetree/bindings/riscv/cpus.yaml | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/Documentation/devicetree/bindings/riscv/cpus.yaml b/Documentation/devicetree/bindings/riscv/cpus.yaml
index c6720764e765..f79e9e5c5ee9 100644
--- a/Documentation/devicetree/bindings/riscv/cpus.yaml
+++ b/Documentation/devicetree/bindings/riscv/cpus.yaml
@@ -85,6 +85,21 @@ properties:
$ref: "/schemas/types.yaml#/definitions/string"
pattern: ^rv(?:64|32)imaf?d?q?c?b?v?k?h?(?:_[hsxz](?:[a-z])+)*$
+ riscv,misaligned-access-performance:
+ description: |
+ Identifies the performance of misaligned memory accesses to main memory
+ regions. There are three flavors of unaligned access performance: "emulated"
+ means that misaligned accesses are emulated via software and thus
+ extremely slow, "slow" means that misaligned accesses are supported by
+ hardware but still slower that aligned accesses sequences, and "fast"
+ means that misaligned accesses are as fast or faster than the
+ cooresponding aligned accesses sequences.
+ $ref: "/schemas/types.yaml#/definitions/string"
+ enum:
+ - emulated
+ - slow
+ - fast
+
# RISC-V requires 'timebase-frequency' in /cpus, so disallow it here
timebase-frequency: false
--
2.25.1
This adds a test for the recently added RISC-V interface for probing
hardware capabilities. It happens to be the first selftest we have for
RISC-V, so I've added some infrastructure for those as well. The build
stuff looks pretty straight-forward, but there's also a tiny C library
to avoid coupling this to any userspace implementation.
Co-developed-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
Signed-off-by: Evan Green <[email protected]>
---
(no changes since v2)
Changes in v2:
- Updated the selftests to the new API and added some more.
- Fixed indentation, comments in .S, and general checkpatch complaints.
---
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/riscv/Makefile | 58 ++++++++++++
.../testing/selftests/riscv/hwprobe/Makefile | 10 +++
.../testing/selftests/riscv/hwprobe/hwprobe.c | 89 +++++++++++++++++++
.../selftests/riscv/hwprobe/sys_hwprobe.S | 12 +++
5 files changed, 170 insertions(+)
create mode 100644 tools/testing/selftests/riscv/Makefile
create mode 100644 tools/testing/selftests/riscv/hwprobe/Makefile
create mode 100644 tools/testing/selftests/riscv/hwprobe/hwprobe.c
create mode 100644 tools/testing/selftests/riscv/hwprobe/sys_hwprobe.S
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 41b649452560..a599ef726310 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -62,6 +62,7 @@ TARGETS += pstore
TARGETS += ptrace
TARGETS += openat2
TARGETS += resctrl
+TARGETS += riscv
TARGETS += rlimits
TARGETS += rseq
TARGETS += rtc
diff --git a/tools/testing/selftests/riscv/Makefile b/tools/testing/selftests/riscv/Makefile
new file mode 100644
index 000000000000..32a72902d045
--- /dev/null
+++ b/tools/testing/selftests/riscv/Makefile
@@ -0,0 +1,58 @@
+# SPDX-License-Identifier: GPL-2.0
+# Originally tools/testing/arm64/Makefile
+
+# When ARCH not overridden for crosscompiling, lookup machine
+ARCH ?= $(shell uname -m 2>/dev/null || echo not)
+
+ifneq (,$(filter $(ARCH),riscv))
+RISCV_SUBTARGETS ?= hwprobe
+else
+RISCV_SUBTARGETS :=
+endif
+
+CFLAGS := -Wall -O2 -g
+
+# A proper top_srcdir is needed by KSFT(lib.mk)
+top_srcdir = $(realpath ../../../../)
+
+# Additional include paths needed by kselftest.h and local headers
+CFLAGS += -I$(top_srcdir)/tools/testing/selftests/
+
+CFLAGS += $(KHDR_INCLUDES)
+
+export CFLAGS
+export top_srcdir
+
+all:
+ @for DIR in $(RISCV_SUBTARGETS); do \
+ BUILD_TARGET=$(OUTPUT)/$$DIR; \
+ mkdir -p $$BUILD_TARGET; \
+ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$DIR $@; \
+ done
+
+install: all
+ @for DIR in $(RISCV_SUBTARGETS); do \
+ BUILD_TARGET=$(OUTPUT)/$$DIR; \
+ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$DIR $@; \
+ done
+
+run_tests: all
+ @for DIR in $(RISCV_SUBTARGETS); do \
+ BUILD_TARGET=$(OUTPUT)/$$DIR; \
+ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$DIR $@; \
+ done
+
+# Avoid any output on non riscv on emit_tests
+emit_tests: all
+ @for DIR in $(RISCV_SUBTARGETS); do \
+ BUILD_TARGET=$(OUTPUT)/$$DIR; \
+ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$DIR $@; \
+ done
+
+clean:
+ @for DIR in $(RISCV_SUBTARGETS); do \
+ BUILD_TARGET=$(OUTPUT)/$$DIR; \
+ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$DIR $@; \
+ done
+
+.PHONY: all clean install run_tests emit_tests
diff --git a/tools/testing/selftests/riscv/hwprobe/Makefile b/tools/testing/selftests/riscv/hwprobe/Makefile
new file mode 100644
index 000000000000..ebdbb3c22e54
--- /dev/null
+++ b/tools/testing/selftests/riscv/hwprobe/Makefile
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2021 ARM Limited
+# Originally tools/testing/arm64/abi/Makefile
+
+TEST_GEN_PROGS := hwprobe
+
+include ../../lib.mk
+
+$(OUTPUT)/hwprobe: hwprobe.c sys_hwprobe.S
+ $(CC) -o$@ $(CFLAGS) $(LDFLAGS) $^
diff --git a/tools/testing/selftests/riscv/hwprobe/hwprobe.c b/tools/testing/selftests/riscv/hwprobe/hwprobe.c
new file mode 100644
index 000000000000..ddfb61de2938
--- /dev/null
+++ b/tools/testing/selftests/riscv/hwprobe/hwprobe.c
@@ -0,0 +1,89 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <asm/hwprobe.h>
+
+/*
+ * Rather than relying on having a new enough libc to define this, just do it
+ * ourselves. This way we don't need to be coupled to a new-enough libc to
+ * contain the call.
+ */
+long riscv_hwprobe(struct riscv_hwprobe *pairs, long pair_count,
+ long cpu_count, unsigned long *cpus, unsigned long flags);
+
+int main(int argc, char **argv)
+{
+ struct riscv_hwprobe pairs[8];
+ unsigned long cpus;
+ long out;
+
+ /* Fake the CPU_SET ops. */
+ cpus = -1;
+
+ /*
+ * Just run a basic test: pass enough pairs to get up to the base
+ * behavior, and then check to make sure it's sane.
+ */
+ for (long i = 0; i < 8; i++)
+ pairs[i].key = i;
+ out = riscv_hwprobe(pairs, 8, 1, &cpus, 0);
+ if (out != 0)
+ return -1;
+ for (long i = 0; i < 4; ++i) {
+ /* Fail if the kernel claims not to recognize a base key. */
+ if ((i < 4) && (pairs[i].key != i))
+ return -2;
+
+ if (pairs[i].key != RISCV_HWPROBE_KEY_BASE_BEHAVIOR)
+ continue;
+
+ if (pairs[i].value & RISCV_HWPROBE_BASE_BEHAVIOR_IMA)
+ continue;
+
+ return -3;
+ }
+
+ /*
+ * This should also work with a NULL CPU set, but should not work
+ * with an improperly supplied CPU set.
+ */
+ out = riscv_hwprobe(pairs, 8, 0, 0, 0);
+ if (out != 0)
+ return -4;
+
+ out = riscv_hwprobe(pairs, 8, 0, &cpus, 0);
+ if (out == 0)
+ return -5;
+
+ out = riscv_hwprobe(pairs, 8, 1, 0, 0);
+ if (out == 0)
+ return -6;
+
+ /*
+ * Check that keys work by providing one that we know exists, and
+ * checking to make sure the resultig pair is what we asked for.
+ */
+ pairs[0].key = RISCV_HWPROBE_KEY_BASE_BEHAVIOR;
+ out = riscv_hwprobe(pairs, 1, 1, &cpus, 0);
+ if (out != 0)
+ return -7;
+ if (pairs[0].key != RISCV_HWPROBE_KEY_BASE_BEHAVIOR)
+ return -8;
+
+ /*
+ * Check that an unknown key gets overwritten with -1,
+ * but doesn't block elements after it.
+ */
+ pairs[0].key = 0x5555;
+ pairs[1].key = 1;
+ pairs[1].value = 0xAAAA;
+ out = riscv_hwprobe(pairs, 2, 0, 0, 0);
+ if (out != 0)
+ return -9;
+
+ if (pairs[0].key != -1)
+ return -10;
+
+ if ((pairs[1].key != 1) || (pairs[1].value == 0xAAAA))
+ return -11;
+
+ return 0;
+}
diff --git a/tools/testing/selftests/riscv/hwprobe/sys_hwprobe.S b/tools/testing/selftests/riscv/hwprobe/sys_hwprobe.S
new file mode 100644
index 000000000000..ed8d28863b27
--- /dev/null
+++ b/tools/testing/selftests/riscv/hwprobe/sys_hwprobe.S
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2022 Rivos, Inc */
+
+.text
+.global riscv_hwprobe
+riscv_hwprobe:
+ # Put __NR_riscv_hwprobe in the syscall number register, then just shim
+ # back the kernel's return. This doesn't do any sort of errno
+ # handling, the caller can deal with it.
+ li a7, 258
+ ecall
+ ret
--
2.25.1
Add a vDSO function __vdso_riscv_hwprobe, which can sit in front of the
riscv_hwprobe syscall and answer common queries. We stash a copy of
static answers for the "all CPUs" case in the vDSO data page. This data
is private to the vDSO, so we can decide later to change what's stored
there or under what conditions we defer to the syscall. Currently all
data can be discovered at boot, so the vDSO function answers all queries
when the cpumask is set to the "all CPUs" hint.
There's also a boolean in the data that lets the vDSO function know that
all CPUs are the same. In that case, the vDSO will also answer queries
for arbitrary CPU masks in addition to the "all CPUs" hint.
Signed-off-by: Evan Green <[email protected]>
---
Changes in v3:
- Introduced vDSO function
One aspect of this that's less than perfect is that there are two copies
of the arch_vdso_data in the data page. This stems from the fact that
vdso_data is arrayed by CS_BASES. Since we're very far away from filling
up the page, and this is not ABI, I left it for now. If things get full
we can move this data out to its own page, or refactor vdso_data
tree-wide to allow for non-arrayed data.
---
arch/riscv/Kconfig | 1 +
arch/riscv/include/asm/vdso/data.h | 17 ++++++++++
arch/riscv/kernel/sys_riscv.c | 41 ++++++++++++++++++++++++
arch/riscv/kernel/vdso/Makefile | 2 ++
arch/riscv/kernel/vdso/hwprobe.c | 47 ++++++++++++++++++++++++++++
arch/riscv/kernel/vdso/sys_hwprobe.S | 15 +++++++++
arch/riscv/kernel/vdso/vdso.lds.S | 1 +
7 files changed, 124 insertions(+)
create mode 100644 arch/riscv/include/asm/vdso/data.h
create mode 100644 arch/riscv/kernel/vdso/hwprobe.c
create mode 100644 arch/riscv/kernel/vdso/sys_hwprobe.S
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index e2b656043abf..218e75975a15 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -33,6 +33,7 @@ config RISCV
select ARCH_HAS_STRICT_MODULE_RWX if MMU && !XIP_KERNEL
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UBSAN_SANITIZE_ALL
+ select ARCH_HAS_VDSO_DATA
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT
select ARCH_STACKWALK
diff --git a/arch/riscv/include/asm/vdso/data.h b/arch/riscv/include/asm/vdso/data.h
new file mode 100644
index 000000000000..dc2f76f58b76
--- /dev/null
+++ b/arch/riscv/include/asm/vdso/data.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __RISCV_ASM_VDSO_DATA_H
+#define __RISCV_ASM_VDSO_DATA_H
+
+#include <linux/types.h>
+#include <vdso/datapage.h>
+#include <asm/hwprobe.h>
+
+struct arch_vdso_data {
+ /* Stash static answers to the hwprobe queries when all CPUs are selected. */
+ __u64 all_cpu_hwprobe_values[RISCV_HWPROBE_MAX_KEY + 1];
+
+ /* Boolean indicating all CPUs have the same static hwprobe values. */
+ __u8 homogeneous_cpus;
+};
+
+#endif /* __RISCV_ASM_VDSO_DATA_H */
diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
index afc1b96d5c1a..113b3d97c8cc 100644
--- a/arch/riscv/kernel/sys_riscv.c
+++ b/arch/riscv/kernel/sys_riscv.c
@@ -14,6 +14,7 @@
#include <asm/uaccess.h>
#include <asm/unistd.h>
#include <asm-generic/mman-common.h>
+#include <vdso/vsyscall.h>
static long riscv_sys_mmap(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
@@ -242,6 +243,46 @@ int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs, long pair_count,
}
+static int __init init_hwprobe_vdso_data(void)
+{
+ struct vdso_data *vd = __arch_get_k_vdso_data();
+ struct arch_vdso_data *avd = &vd->arch_data;
+ u64 id_bitsmash = 0;
+ struct riscv_hwprobe pair;
+ int key;
+
+ /*
+ * Initialize vDSO data with the answers for the "all CPUs" case, to
+ * save a syscall in the common case.
+ */
+ for (key = 0; key <= RISCV_HWPROBE_MAX_KEY; key++) {
+ pair.key = key;
+ hwprobe_one_pair(&pair, cpu_online_mask);
+
+ WARN_ON_ONCE(pair.key < 0);
+
+ avd->all_cpu_hwprobe_values[key] = pair.value;
+ /*
+ * Smash together the vendor, arch, and impl IDs to see if
+ * they're all 0 or any negative.
+ */
+ if (key <= RISCV_HWPROBE_KEY_MIMPID)
+ id_bitsmash |= pair.value;
+ }
+
+ /*
+ * If the arch, vendor, and implementation ID are all the same across
+ * all harts, then assume all CPUs are the same, and allow the vDSO to
+ * answer queries for arbitrary masks. However if all values are 0 (not
+ * populated) or any value returns -1 (varies across CPUs), then the
+ * vDSO should defer to the kernel for exotic cpu masks.
+ */
+ avd->homogeneous_cpus = (id_bitsmash > 0);
+ return 0;
+}
+
+arch_initcall_sync(init_hwprobe_vdso_data);
+
SYSCALL_DEFINE5(riscv_hwprobe, uintptr_t, pairs, uintptr_t, pair_count,
uintptr_t, cpu_count, uintptr_t, cpus, uintptr_t, flags)
{
diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
index 06e6b27f3bcc..107fd2162cdf 100644
--- a/arch/riscv/kernel/vdso/Makefile
+++ b/arch/riscv/kernel/vdso/Makefile
@@ -12,6 +12,8 @@ vdso-syms += vgettimeofday
endif
vdso-syms += getcpu
vdso-syms += flush_icache
+vdso-syms += hwprobe
+vdso-syms += sys_hwprobe
# Files to link into the vdso
obj-vdso = $(patsubst %, %.o, $(vdso-syms)) note.o
diff --git a/arch/riscv/kernel/vdso/hwprobe.c b/arch/riscv/kernel/vdso/hwprobe.c
new file mode 100644
index 000000000000..1792d6c30c97
--- /dev/null
+++ b/arch/riscv/kernel/vdso/hwprobe.c
@@ -0,0 +1,47 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2023 Rivos, Inc
+ */
+
+#include <linux/types.h>
+#include <vdso/datapage.h>
+#include <vdso/helpers.h>
+
+extern int riscv_hwprobe(struct riscv_hwprobe *pairs, long pair_count,
+ long cpu_count, unsigned long *cpus,
+ unsigned long flags);
+
+int __vdso_riscv_hwprobe(struct riscv_hwprobe *pairs, long pair_count,
+ long cpu_count, unsigned long *cpus,
+ unsigned long flags)
+{
+ const struct vdso_data *vd = __arch_get_vdso_data();
+ const struct arch_vdso_data *avd = &vd->arch_data;
+ bool all_cpus = !cpu_count && !cpus;
+ struct riscv_hwprobe *p = pairs;
+ struct riscv_hwprobe *end = pairs + pair_count;
+
+ /*
+ * Defer to the syscall for exotic requests. The vdso has answers
+ * stashed away only for the "all cpus" case. If all CPUs are
+ * homogeneous, then this function can handle requests for arbitrary
+ * masks.
+ */
+ if ((flags != 0) || (!all_cpus && !avd->homogeneous_cpus))
+ return riscv_hwprobe(pairs, pair_count, cpu_count, cpus, flags);
+
+ /* This is something we can handle, fill out the pairs. */
+ while (p < end) {
+ if (p->key <= RISCV_HWPROBE_MAX_KEY) {
+ p->value = avd->all_cpu_hwprobe_values[p->key];
+
+ } else {
+ p->key = -1;
+ p->value = 0;
+ }
+
+ p++;
+ }
+
+ return 0;
+}
diff --git a/arch/riscv/kernel/vdso/sys_hwprobe.S b/arch/riscv/kernel/vdso/sys_hwprobe.S
new file mode 100644
index 000000000000..2511c8ba8f86
--- /dev/null
+++ b/arch/riscv/kernel/vdso/sys_hwprobe.S
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2022 Rivos, Inc */
+
+#include <linux/linkage.h>
+#include <asm/unistd.h>
+
+.text
+ENTRY(riscv_hwprobe)
+ .cfi_startproc
+ li a7, __NR_riscv_hwprobe
+ ecall
+ ret
+
+ .cfi_endproc
+ENDPROC(riscv_hwprobe)
diff --git a/arch/riscv/kernel/vdso/vdso.lds.S b/arch/riscv/kernel/vdso/vdso.lds.S
index 150b1a572e61..dfe63fb71041 100644
--- a/arch/riscv/kernel/vdso/vdso.lds.S
+++ b/arch/riscv/kernel/vdso/vdso.lds.S
@@ -75,6 +75,7 @@ VERSION
#endif
__vdso_getcpu;
__vdso_flush_icache;
+ __vdso_riscv_hwprobe;
local: *;
};
}
--
2.25.1
Hi Evan,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on shuah-kselftest/next]
[also build test WARNING on shuah-kselftest/fixes robh/for-next soc/for-next linus/master v6.2]
[cannot apply to next-20230221]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Evan-Green/RISC-V-Move-struct-riscv_cpuinfo-to-new-header/20230222-031210
base: https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git next
patch link: https://lore.kernel.org/r/20230221190858.3159617-8-evan%40rivosinc.com
patch subject: [PATCH v3 7/7] RISC-V: Add hwprobe vDSO function and data
config: riscv-allyesconfig (https://download.01.org/0day-ci/archive/20230222/[email protected]/config)
compiler: riscv64-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/3f47f3f920ffcb37a461d04a1ee0e245451c869e
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Evan-Green/RISC-V-Move-struct-riscv_cpuinfo-to-new-header/20230222-031210
git checkout 3f47f3f920ffcb37a461d04a1ee0e245451c869e
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=riscv olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=riscv prepare
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/
All warnings (new ones prefixed by >>):
>> arch/riscv/kernel/vdso/hwprobe.c:14:5: warning: no previous prototype for '__vdso_riscv_hwprobe' [-Wmissing-prototypes]
14 | int __vdso_riscv_hwprobe(struct riscv_hwprobe *pairs, long pair_count,
| ^~~~~~~~~~~~~~~~~~~~
--
scripts/genksyms/parse.y: warning: 9 shift/reduce conflicts [-Wconflicts-sr]
scripts/genksyms/parse.y: warning: 5 reduce/reduce conflicts [-Wconflicts-rr]
scripts/genksyms/parse.y: note: rerun with option '-Wcounterexamples' to generate conflict counterexamples
>> arch/riscv/kernel/vdso/hwprobe.c:14:5: warning: no previous prototype for '__vdso_riscv_hwprobe' [-Wmissing-prototypes]
14 | int __vdso_riscv_hwprobe(struct riscv_hwprobe *pairs, long pair_count,
| ^~~~~~~~~~~~~~~~~~~~
vim +/__vdso_riscv_hwprobe +14 arch/riscv/kernel/vdso/hwprobe.c
9
10 extern int riscv_hwprobe(struct riscv_hwprobe *pairs, long pair_count,
11 long cpu_count, unsigned long *cpus,
12 unsigned long flags);
13
> 14 int __vdso_riscv_hwprobe(struct riscv_hwprobe *pairs, long pair_count,
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
On Tue, Feb 21, 2023 at 11:08:57AM -0800, Evan Green wrote:
> RISC-V, so I've added some infrastructure for those as well. The build
> stuff looks pretty straight-forward, but there's also a tiny C library
> to avoid coupling this to any userspace implementation.
It looks like this has been updated to just use the normal system
libc so the changelog isn't accurate any more?
Hi Evan,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on shuah-kselftest/next]
[also build test ERROR on shuah-kselftest/fixes robh/for-next soc/for-next linus/master v6.2]
[cannot apply to next-20230222]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Evan-Green/RISC-V-Move-struct-riscv_cpuinfo-to-new-header/20230222-031210
base: https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git next
patch link: https://lore.kernel.org/r/20230221190858.3159617-8-evan%40rivosinc.com
patch subject: [PATCH v3 7/7] RISC-V: Add hwprobe vDSO function and data
config: riscv-rv32_defconfig (https://download.01.org/0day-ci/archive/20230222/[email protected]/config)
compiler: riscv32-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/3f47f3f920ffcb37a461d04a1ee0e245451c869e
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Evan-Green/RISC-V-Move-struct-riscv_cpuinfo-to-new-header/20230222-031210
git checkout 3f47f3f920ffcb37a461d04a1ee0e245451c869e
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=riscv olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=riscv prepare
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/
All errors (new ones prefixed by >>):
In file included from include/vdso/datapage.h:137,
from arch/riscv/kernel/vdso/hwprobe.c:7:
arch/riscv/include/asm/vdso/gettimeofday.h: In function 'gettimeofday_fallback':
>> arch/riscv/include/asm/vdso/gettimeofday.h:21:38: error: '__NR_gettimeofday' undeclared (first use in this function)
21 | register long nr asm("a7") = __NR_gettimeofday;
| ^~~~~~~~~~~~~~~~~
arch/riscv/include/asm/vdso/gettimeofday.h:21:38: note: each undeclared identifier is reported only once for each function it appears in
arch/riscv/include/asm/vdso/gettimeofday.h: In function 'clock_gettime_fallback':
>> arch/riscv/include/asm/vdso/gettimeofday.h:37:38: error: '__NR_clock_gettime' undeclared (first use in this function)
37 | register long nr asm("a7") = __NR_clock_gettime;
| ^~~~~~~~~~~~~~~~~~
arch/riscv/include/asm/vdso/gettimeofday.h: In function 'clock_getres_fallback':
>> arch/riscv/include/asm/vdso/gettimeofday.h:53:38: error: '__NR_clock_getres' undeclared (first use in this function)
53 | register long nr asm("a7") = __NR_clock_getres;
| ^~~~~~~~~~~~~~~~~
arch/riscv/kernel/vdso/hwprobe.c: At top level:
arch/riscv/kernel/vdso/hwprobe.c:14:5: warning: no previous prototype for '__vdso_riscv_hwprobe' [-Wmissing-prototypes]
14 | int __vdso_riscv_hwprobe(struct riscv_hwprobe *pairs, long pair_count,
| ^~~~~~~~~~~~~~~~~~~~
make[2]: *** [scripts/Makefile.build:252: arch/riscv/kernel/vdso/hwprobe.o] Error 1
make[2]: Target 'include/generated/vdso-offsets.h' not remade because of errors.
make[1]: *** [arch/riscv/Makefile:126: vdso_prepare] Error 2
make[1]: Target 'prepare' not remade because of errors.
make: *** [Makefile:242: __sub-make] Error 2
make: Target 'prepare' not remade because of errors.
vim +/__NR_gettimeofday +21 arch/riscv/include/asm/vdso/gettimeofday.h
ad5d1122b82fbd6 Vincent Chen 2020-06-09 13
ad5d1122b82fbd6 Vincent Chen 2020-06-09 14 static __always_inline
ad5d1122b82fbd6 Vincent Chen 2020-06-09 15 int gettimeofday_fallback(struct __kernel_old_timeval *_tv,
ad5d1122b82fbd6 Vincent Chen 2020-06-09 16 struct timezone *_tz)
ad5d1122b82fbd6 Vincent Chen 2020-06-09 17 {
ad5d1122b82fbd6 Vincent Chen 2020-06-09 18 register struct __kernel_old_timeval *tv asm("a0") = _tv;
ad5d1122b82fbd6 Vincent Chen 2020-06-09 19 register struct timezone *tz asm("a1") = _tz;
ad5d1122b82fbd6 Vincent Chen 2020-06-09 20 register long ret asm("a0");
ad5d1122b82fbd6 Vincent Chen 2020-06-09 @21 register long nr asm("a7") = __NR_gettimeofday;
ad5d1122b82fbd6 Vincent Chen 2020-06-09 22
ad5d1122b82fbd6 Vincent Chen 2020-06-09 23 asm volatile ("ecall\n"
ad5d1122b82fbd6 Vincent Chen 2020-06-09 24 : "=r" (ret)
ad5d1122b82fbd6 Vincent Chen 2020-06-09 25 : "r"(tv), "r"(tz), "r"(nr)
ad5d1122b82fbd6 Vincent Chen 2020-06-09 26 : "memory");
ad5d1122b82fbd6 Vincent Chen 2020-06-09 27
ad5d1122b82fbd6 Vincent Chen 2020-06-09 28 return ret;
ad5d1122b82fbd6 Vincent Chen 2020-06-09 29 }
ad5d1122b82fbd6 Vincent Chen 2020-06-09 30
ad5d1122b82fbd6 Vincent Chen 2020-06-09 31 static __always_inline
ad5d1122b82fbd6 Vincent Chen 2020-06-09 32 long clock_gettime_fallback(clockid_t _clkid, struct __kernel_timespec *_ts)
ad5d1122b82fbd6 Vincent Chen 2020-06-09 33 {
ad5d1122b82fbd6 Vincent Chen 2020-06-09 34 register clockid_t clkid asm("a0") = _clkid;
ad5d1122b82fbd6 Vincent Chen 2020-06-09 35 register struct __kernel_timespec *ts asm("a1") = _ts;
ad5d1122b82fbd6 Vincent Chen 2020-06-09 36 register long ret asm("a0");
ad5d1122b82fbd6 Vincent Chen 2020-06-09 @37 register long nr asm("a7") = __NR_clock_gettime;
ad5d1122b82fbd6 Vincent Chen 2020-06-09 38
ad5d1122b82fbd6 Vincent Chen 2020-06-09 39 asm volatile ("ecall\n"
ad5d1122b82fbd6 Vincent Chen 2020-06-09 40 : "=r" (ret)
ad5d1122b82fbd6 Vincent Chen 2020-06-09 41 : "r"(clkid), "r"(ts), "r"(nr)
ad5d1122b82fbd6 Vincent Chen 2020-06-09 42 : "memory");
ad5d1122b82fbd6 Vincent Chen 2020-06-09 43
ad5d1122b82fbd6 Vincent Chen 2020-06-09 44 return ret;
ad5d1122b82fbd6 Vincent Chen 2020-06-09 45 }
ad5d1122b82fbd6 Vincent Chen 2020-06-09 46
ad5d1122b82fbd6 Vincent Chen 2020-06-09 47 static __always_inline
ad5d1122b82fbd6 Vincent Chen 2020-06-09 48 int clock_getres_fallback(clockid_t _clkid, struct __kernel_timespec *_ts)
ad5d1122b82fbd6 Vincent Chen 2020-06-09 49 {
ad5d1122b82fbd6 Vincent Chen 2020-06-09 50 register clockid_t clkid asm("a0") = _clkid;
ad5d1122b82fbd6 Vincent Chen 2020-06-09 51 register struct __kernel_timespec *ts asm("a1") = _ts;
ad5d1122b82fbd6 Vincent Chen 2020-06-09 52 register long ret asm("a0");
ad5d1122b82fbd6 Vincent Chen 2020-06-09 @53 register long nr asm("a7") = __NR_clock_getres;
ad5d1122b82fbd6 Vincent Chen 2020-06-09 54
ad5d1122b82fbd6 Vincent Chen 2020-06-09 55 asm volatile ("ecall\n"
ad5d1122b82fbd6 Vincent Chen 2020-06-09 56 : "=r" (ret)
ad5d1122b82fbd6 Vincent Chen 2020-06-09 57 : "r"(clkid), "r"(ts), "r"(nr)
ad5d1122b82fbd6 Vincent Chen 2020-06-09 58 : "memory");
ad5d1122b82fbd6 Vincent Chen 2020-06-09 59
ad5d1122b82fbd6 Vincent Chen 2020-06-09 60 return ret;
ad5d1122b82fbd6 Vincent Chen 2020-06-09 61 }
ad5d1122b82fbd6 Vincent Chen 2020-06-09 62
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
On Tue, 2023-02-21 at 11:08 -0800, Evan Green wrote:
> This allows userspace to select various routines to use based on the
> performance of misaligned access on the target hardware.
[]
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
[]
> @@ -89,11 +94,11 @@ static bool riscv_isa_extension_check(int id)
> void __init riscv_fill_hwcap(void)
> {
> struct device_node *node;
> - const char *isa;
> + const char *isa, *misaligned;
> char print_str[NUM_ALPHA_EXTS + 1];
> int i, j, rc;
> unsigned long isa2hwcap[26] = {0};
> - unsigned long hartid;
> + unsigned long hartid, cpu;
>
> isa2hwcap['i' - 'a'] = COMPAT_HWCAP_ISA_I;
> isa2hwcap['m' - 'a'] = COMPAT_HWCAP_ISA_M;
> @@ -246,6 +251,28 @@ void __init riscv_fill_hwcap(void)
> bitmap_copy(riscv_isa, this_isa, RISCV_ISA_EXT_MAX);
> else
> bitmap_and(riscv_isa, riscv_isa, this_isa, RISCV_ISA_EXT_MAX);
> +
> + /*
> + * Check for the performance of misaligned accesses.
> + */
> + cpu = hartid_to_cpuid_map(hartid);
> + if (cpu < 0)
> + continue;
unsigned long can't be less than 0
Likely cpu should be long not unsigned long
It seems cpu is rather randomly either int or long.
Perhaps standardizing on int would be better.
On Tue, Feb 21, 2023, at 20:08, Evan Green wrote:
> We don't have enough space for these all in ELF_HWCAP{,2} and there's no
> system call that quite does this, so let's just provide an arch-specific
> one to probe for hardware capabilities. This currently just provides
> m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
> the future.
>
> Co-developed-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Evan Green <[email protected]>
I'm still skeptical about the need for a custom syscall interface here.
I had not looked at the interface so far, but there are a few things
that stick out:
> +RISC-V Hardware Probing Interface
> +---------------------------------
> +
> +The RISC-V hardware probing interface is based around a single
> syscall, which
> +is defined in <asm/hwprobe.h>::
> +
> + struct riscv_hwprobe {
> + __s64 key;
> + __u64 value;
> + };
The way this is defined, the kernel will always have to know
about the specific set of features, it can't just forward
unknown features to user space after probing them from an
architectured hardware interface or from DT.
If 'key' is just an enumerated value with a small number of
possible values, I don't see anything wrong with using elf
aux data. I understand it's hard to know how many keys
might be needed in the long run, from the way you define
the key/value pairs here, I would expect it to have a lot
of the same limitations that the aux data has, except for
a few bytes to be copied.
> + long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t
> pair_count,
> + size_t cpu_count, cpu_set_t *cpus,
> + unsigned long flags);
The cpu set argument worries me more: there should never be a
need to optimize for broken hardware that has an asymmetric set
of features. Just let the kernel figure out the minimum set
of features that works across all CPUs and report that like we
do with HWCAP. If there is a SoC that is so broken that it has
important features on a subset of cores that some user might
actually want to rely on, then have them go through the slow
sysfs interface for probing the CPUs indidually, but don't make
the broken case easier at the expense of normal users that
run on working hardware.
> +asmlinkage long sys_riscv_hwprobe(uintptr_t, uintptr_t, uintptr_t,
> uintptr_t,
> + uintptr_t, uintptr_t);
Why 'uintptr_t' rather than the correct type?
Arnd
On Tue, Feb 21, 2023 at 11:08:53AM -0800, Evan Green wrote:
> We don't have enough space for these all in ELF_HWCAP{,2} and there's no
> system call that quite does this, so let's just provide an arch-specific
> one to probe for hardware capabilities. This currently just provides
> m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
> the future.
>
> Co-developed-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Evan Green <[email protected]>
> +static
> +int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs, long pair_count,
> + long cpu_count, unsigned long __user *cpus_user,
> + unsigned long flags)
I almost feel bad commenting this, but this is now the only function
split like this w/ the static on its own line.
With the same caveat about ignorance about glibc's desires & lingering
doubt as to whether this interface is the right way to go, this looks
good to me now.
I'm reluctant to give an R-b for the latter reason, but assuming the
Higher Power deem this approach acceptable:
Reviewed-by: Conor Dooley <[email protected]>
Cheers,
Conor.
Hey Evan,
On Tue, Feb 21, 2023 at 11:08:54AM -0800, Evan Green wrote:
> We have an implicit set of base behaviors that userspace depends on,
> which are mostly defined in various ISA specifications.
>
> Co-developed-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Evan Green <[email protected]>
> ---
>
> Changes in v3:
> - Refactored base ISA behavior probe to allow kernel probing as well,
> in prep for vDSO data initialization.
> - Fixed doc warnings in IMA text list, use :c:macro:.
>
> Documentation/riscv/hwprobe.rst | 21 +++++++++++++++++++++
> arch/riscv/include/asm/hwprobe.h | 2 +-
> arch/riscv/include/uapi/asm/hwprobe.h | 5 +++++
> arch/riscv/kernel/sys_riscv.c | 20 ++++++++++++++++++++
> 4 files changed, 47 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
> index 88b015a2026e..9f2da414fbf8 100644
> --- a/Documentation/riscv/hwprobe.rst
> +++ b/Documentation/riscv/hwprobe.rst
> @@ -37,3 +37,24 @@ The following keys are defined:
> defined by the RISC-V privileged architecture specification.
> * :c:macro:`RISCV_HWPROBE_KEY_MIMPLID`: Contains the value of ``mimplid``, as
> defined by the RISC-V privileged architecture specification.
> +* :c:macro:`RISCV_HWPROBE_KEY_BASE_BEHAVIOR`: A bitmask containing the base
> + user-visible behavior that this kernel supports. The following base user ABIs
> + are defined:
> +
> + * :c:macro:`RISCV_HWPROBE_BASE_BEHAVIOR_IMA`: Support for rv32ima or
> + rv64ima, as defined by version 2.2 of the user ISA and version 1.10 of the
> + privileged ISA, with the following known exceptions (more exceptions may be
> + added, but only if it can be demonstrated that the user ABI is not broken):
> +
> + * The :fence.i: instruction cannot be directly executed by userspace
> + programs (it may still be executed in userspace via a
> + kernel-controlled mechanism such as the vDSO).
> +* :c:macro:`RISCV_HWPROBE_KEY_IMA_EXT_0`: A bitmask containing the extensions
> + that are compatible with the :c:macro:`RISCV_HWPROBE_BASE_BEHAVIOR_IMA`:
> + base system behavior.
> +
> + * :c:macro:`RISCV_HWPROBE_IMA_FD`: The F and D extensions are supported, as
> + defined by commit cd20cee ("FMIN/FMAX now implement
> + minimumNumber/maximumNumber, not minNum/maxNum") of the RISC-V ISA manual.
> + * :c:macro:`RISCV_HWPROBE_IMA_C`: The C extension is supported, as defined
> + by version 2.2 of the RISC-V ISA manual.
I think I asked for some newlines, but this all seems kinda random now
as to whether there is a blank line between list items or not.
> diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
> index 02c2f1f7417e..f2b224550923 100644
> --- a/arch/riscv/kernel/sys_riscv.c
> +++ b/arch/riscv/kernel/sys_riscv.c
> @@ -9,6 +9,7 @@
> #include <asm/cacheflush.h>
> #include <asm/hwprobe.h>
> #include <asm/sbi.h>
> +#include <asm/switch_to.h>
> #include <asm/uaccess.h>
> #include <asm/unistd.h>
> #include <asm-generic/mman-common.h>
> @@ -124,6 +125,25 @@ static void hwprobe_one_pair(struct riscv_hwprobe *pair,
> case RISCV_HWPROBE_KEY_MIMPID:
> hwprobe_arch_id(pair, cpus);
> break;
> + /*
> + * The kernel already assumes that the base single-letter ISA
> + * extensions are supported on all harts, and only supports the
> + * IMA base, so just cheat a bit here and tell that to
> + * userspace.
> + */
> + case RISCV_HWPROBE_KEY_BASE_BEHAVIOR:
> + pair->value = RISCV_HWPROBE_BASE_BEHAVIOR_IMA;
> + break;
> +
> + case RISCV_HWPROBE_KEY_IMA_EXT_0:
> + pair->value = 0;
> + if (has_fpu())
> + pair->value |= RISCV_HWPROBE_IMA_FD;
> +
> + if (elf_hwcap & RISCV_ISA_EXT_c)
> + pair->value |= RISCV_HWPROBE_IMA_C;
> +
> + break;
>
> /*
> * For forward compatibility, unknown keys don't fail the whole
This looks a lot nicer after the refactor, sans the {}.
With a consistent approach taken to newlines:
Reviewed-by: Conor Dooley <[email protected]>
Cheers,
Conor.
Hey Evan,
On Tue, Feb 21, 2023 at 11:08:55AM -0800, Evan Green wrote:
> From: Palmer Dabbelt <[email protected]>
>
> This key allows device trees to specify the performance of misaligned
> accesses to main memory regions from each CPU in the system.
Could you fold some of Palmer's explanation for why this must be in the
devicetree? Think this is where he explained it:
https://lore.kernel.org/all/mhng-8736b349-e27a-4372-81ca-3a25d2ec1e94@palmer-ri-x1c9/
>
> Signed-off-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Evan Green <[email protected]>
> ---
>
> Changes in v3:
> - Added | to description: to make dt-checker happy.
>
> Documentation/devicetree/bindings/riscv/cpus.yaml | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/riscv/cpus.yaml b/Documentation/devicetree/bindings/riscv/cpus.yaml
> index c6720764e765..f79e9e5c5ee9 100644
> --- a/Documentation/devicetree/bindings/riscv/cpus.yaml
> +++ b/Documentation/devicetree/bindings/riscv/cpus.yaml
> @@ -85,6 +85,21 @@ properties:
> $ref: "/schemas/types.yaml#/definitions/string"
> pattern: ^rv(?:64|32)imaf?d?q?c?b?v?k?h?(?:_[hsxz](?:[a-z])+)*$
>
> + riscv,misaligned-access-performance:
> + description: |
> + Identifies the performance of misaligned memory accesses to main memory
> + regions. There are three flavors of unaligned access performance: "emulated"
> + means that misaligned accesses are emulated via software and thus
> + extremely slow, "slow" means that misaligned accesses are supported by
> + hardware but still slower that aligned accesses sequences, and "fast"
s/that/than/
> + means that misaligned accesses are as fast or faster than the
> + cooresponding aligned accesses sequences.
s/cooresponding/corresponding/
Thanks,
Conor.
> + $ref: "/schemas/types.yaml#/definitions/string"
> + enum:
> + - emulated
> + - slow
> + - fast
> +
> # RISC-V requires 'timebase-frequency' in /cpus, so disallow it here
> timebase-frequency: false
>
> --
> 2.25.1
>
Hey Evan,
On Tue, Feb 21, 2023 at 11:08:56AM -0800, Evan Green wrote:
> This allows userspace to select various routines to use based on the
> performance of misaligned access on the target hardware.
>
> Co-developed-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Palmer Dabbelt <[email protected]>
> Signed-off-by: Evan Green <[email protected]>
>
> ---
>
> Changes in v3:
> - Have hwprobe_misaligned return int instead of long.
> - Constify cpumask pointer in hwprobe_misaligned()
> - Fix warnings in _PERF_O list documentation, use :c:macro:.
> - Move include cpufeature.h to misaligned patch.
> - Fix documentation mismatch for RISCV_HWPROBE_KEY_CPUPERF_0 (Conor)
> - Use for_each_possible_cpu() instead of NR_CPUS (Conor)
> - Break early in misaligned access iteration (Conor)
> - Increase MISALIGNED_MASK from 2 bits to 3 for possible UNSUPPORTED future
> value (Conor)
I'm not quite sure why we don't just go ahead and plumb this in already?
Whether the specs allow this or not, someone is going to end up doing
it (and it sounds like the specs now do allow it).
Is it wise to plug the hole in the syscall now, rather than leaving the
gap?
Otherwise, this looks fine, modulo Joe's comment about types.
Cheers,
Conor.
On Mon, Feb 27, 2023 at 10:57:55PM +0000, Conor Dooley wrote:
> Hey Evan,
>
> On Tue, Feb 21, 2023 at 11:08:55AM -0800, Evan Green wrote:
> > From: Palmer Dabbelt <[email protected]>
> >
> > This key allows device trees to specify the performance of misaligned
> > accesses to main memory regions from each CPU in the system.
>
> Could you fold some of Palmer's explanation for why this must be in the
> devicetree? Think this is where he explained it:
> https://lore.kernel.org/all/mhng-8736b349-e27a-4372-81ca-3a25d2ec1e94@palmer-ri-x1c9/
I still don't think this belongs in DT and replied on the above thread.
Rob
On Mon, Feb 27, 2023 at 2:47 PM Conor Dooley <[email protected]> wrote:
>
> Hey Evan,
>
> On Tue, Feb 21, 2023 at 11:08:54AM -0800, Evan Green wrote:
> > We have an implicit set of base behaviors that userspace depends on,
> > which are mostly defined in various ISA specifications.
> >
> > Co-developed-by: Palmer Dabbelt <[email protected]>
> > Signed-off-by: Palmer Dabbelt <[email protected]>
> > Signed-off-by: Evan Green <[email protected]>
> > ---
> >
> > Changes in v3:
> > - Refactored base ISA behavior probe to allow kernel probing as well,
> > in prep for vDSO data initialization.
> > - Fixed doc warnings in IMA text list, use :c:macro:.
> >
> > Documentation/riscv/hwprobe.rst | 21 +++++++++++++++++++++
> > arch/riscv/include/asm/hwprobe.h | 2 +-
> > arch/riscv/include/uapi/asm/hwprobe.h | 5 +++++
> > arch/riscv/kernel/sys_riscv.c | 20 ++++++++++++++++++++
> > 4 files changed, 47 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
> > index 88b015a2026e..9f2da414fbf8 100644
> > --- a/Documentation/riscv/hwprobe.rst
> > +++ b/Documentation/riscv/hwprobe.rst
> > @@ -37,3 +37,24 @@ The following keys are defined:
> > defined by the RISC-V privileged architecture specification.
> > * :c:macro:`RISCV_HWPROBE_KEY_MIMPLID`: Contains the value of ``mimplid``, as
> > defined by the RISC-V privileged architecture specification.
> > +* :c:macro:`RISCV_HWPROBE_KEY_BASE_BEHAVIOR`: A bitmask containing the base
> > + user-visible behavior that this kernel supports. The following base user ABIs
> > + are defined:
> > +
> > + * :c:macro:`RISCV_HWPROBE_BASE_BEHAVIOR_IMA`: Support for rv32ima or
> > + rv64ima, as defined by version 2.2 of the user ISA and version 1.10 of the
> > + privileged ISA, with the following known exceptions (more exceptions may be
> > + added, but only if it can be demonstrated that the user ABI is not broken):
> > +
> > + * The :fence.i: instruction cannot be directly executed by userspace
> > + programs (it may still be executed in userspace via a
> > + kernel-controlled mechanism such as the vDSO).
> > +* :c:macro:`RISCV_HWPROBE_KEY_IMA_EXT_0`: A bitmask containing the extensions
> > + that are compatible with the :c:macro:`RISCV_HWPROBE_BASE_BEHAVIOR_IMA`:
> > + base system behavior.
> > +
> > + * :c:macro:`RISCV_HWPROBE_IMA_FD`: The F and D extensions are supported, as
> > + defined by commit cd20cee ("FMIN/FMAX now implement
> > + minimumNumber/maximumNumber, not minNum/maxNum") of the RISC-V ISA manual.
> > + * :c:macro:`RISCV_HWPROBE_IMA_C`: The C extension is supported, as defined
> > + by version 2.2 of the RISC-V ISA manual.
>
> I think I asked for some newlines, but this all seems kinda random now
> as to whether there is a blank line between list items or not.
Yeah, this was the minimum number of newlines needed to make it
actually render correctly. The relevant rules which I've now learned
are that newlines are required before a list beginning (including a
sub-list), but optional between items. I'll fix this up to add
newlines between elements as well, then it will look pretty I think.
>
> > diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
> > index 02c2f1f7417e..f2b224550923 100644
> > --- a/arch/riscv/kernel/sys_riscv.c
> > +++ b/arch/riscv/kernel/sys_riscv.c
> > @@ -9,6 +9,7 @@
> > #include <asm/cacheflush.h>
> > #include <asm/hwprobe.h>
> > #include <asm/sbi.h>
> > +#include <asm/switch_to.h>
> > #include <asm/uaccess.h>
> > #include <asm/unistd.h>
> > #include <asm-generic/mman-common.h>
> > @@ -124,6 +125,25 @@ static void hwprobe_one_pair(struct riscv_hwprobe *pair,
> > case RISCV_HWPROBE_KEY_MIMPID:
> > hwprobe_arch_id(pair, cpus);
> > break;
> > + /*
> > + * The kernel already assumes that the base single-letter ISA
> > + * extensions are supported on all harts, and only supports the
> > + * IMA base, so just cheat a bit here and tell that to
> > + * userspace.
> > + */
> > + case RISCV_HWPROBE_KEY_BASE_BEHAVIOR:
> > + pair->value = RISCV_HWPROBE_BASE_BEHAVIOR_IMA;
> > + break;
> > +
> > + case RISCV_HWPROBE_KEY_IMA_EXT_0:
> > + pair->value = 0;
> > + if (has_fpu())
> > + pair->value |= RISCV_HWPROBE_IMA_FD;
> > +
> > + if (elf_hwcap & RISCV_ISA_EXT_c)
> > + pair->value |= RISCV_HWPROBE_IMA_C;
> > +
> > + break;
> >
> > /*
> > * For forward compatibility, unknown keys don't fail the whole
>
> This looks a lot nicer after the refactor, sans the {}.
> With a consistent approach taken to newlines:
> Reviewed-by: Conor Dooley <[email protected]>
Thanks Conor!
-Evan
On Thu, Feb 23, 2023 at 2:06 AM Arnd Bergmann <[email protected]> wrote:
>
> On Tue, Feb 21, 2023, at 20:08, Evan Green wrote:
> > We don't have enough space for these all in ELF_HWCAP{,2} and there's no
> > system call that quite does this, so let's just provide an arch-specific
> > one to probe for hardware capabilities. This currently just provides
> > m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
> > the future.
> >
> > Co-developed-by: Palmer Dabbelt <[email protected]>
> > Signed-off-by: Palmer Dabbelt <[email protected]>
> > Signed-off-by: Evan Green <[email protected]>
>
> I'm still skeptical about the need for a custom syscall interface here.
> I had not looked at the interface so far, but there are a few things
> that stick out:
>
> > +RISC-V Hardware Probing Interface
> > +---------------------------------
> > +
> > +The RISC-V hardware probing interface is based around a single
> > syscall, which
> > +is defined in <asm/hwprobe.h>::
> > +
> > + struct riscv_hwprobe {
> > + __s64 key;
> > + __u64 value;
> > + };
>
> The way this is defined, the kernel will always have to know
> about the specific set of features, it can't just forward
> unknown features to user space after probing them from an
> architectured hardware interface or from DT.
You're correct that this interface wasn't intended to have usermode
come in with augmented data or additional key/value pairs. This was
purely meant to provide access to the kernel's repository of
architectural and microarchitectural details. If usermode wants to
provide extra info in this same form, maybe they could wrap this
interface.
>
> If 'key' is just an enumerated value with a small number of
> possible values, I don't see anything wrong with using elf
> aux data. I understand it's hard to know how many keys
> might be needed in the long run, from the way you define
> the key/value pairs here, I would expect it to have a lot
> of the same limitations that the aux data has, except for
> a few bytes to be copied.
Correct, this makes allocating bits out of here cheaper by not
requiring that we actively copy them into every new process forever.
You're right that the aux vector would work as well, but the thinking
behind this series was that an interface like this might be better for
an architecture as extensible as risc-v.
>
> > + long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t
> > pair_count,
> > + size_t cpu_count, cpu_set_t *cpus,
> > + unsigned long flags);
>
> The cpu set argument worries me more: there should never be a
> need to optimize for broken hardware that has an asymmetric set
> of features. Just let the kernel figure out the minimum set
> of features that works across all CPUs and report that like we
> do with HWCAP. If there is a SoC that is so broken that it has
> important features on a subset of cores that some user might
> actually want to rely on, then have them go through the slow
> sysfs interface for probing the CPUs indidually, but don't make
> the broken case easier at the expense of normal users that
> run on working hardware.
I'm not so sure. While I agree with you for major classes of features
(eg one CPU has floating point support but another does not), I expect
these bits to contain more subtle details as well, which might vary
across asymmetric implementations without breaking ABI compatibility
per-se. Maybe some vendor has implemented exotic video decoding
acceleration instructions that only work on the big core. Or maybe the
big cores support v3.1 of some extension (where certain things run
faster), but the little cores only have v3.0, where it's a little
slower. Certain apps would likely want to know these things so they
can allocate their work optimally across cores.
>
> > +asmlinkage long sys_riscv_hwprobe(uintptr_t, uintptr_t, uintptr_t,
> > uintptr_t,
> > + uintptr_t, uintptr_t);
>
> Why 'uintptr_t' rather than the correct type?
Fixed.
-Evan
Am Donnerstag, 30. März 2023, 20:30:29 CEST schrieb Evan Green:
> On Thu, Feb 23, 2023 at 2:06 AM Arnd Bergmann <[email protected]> wrote:
> >
> > On Tue, Feb 21, 2023, at 20:08, Evan Green wrote:
> > > We don't have enough space for these all in ELF_HWCAP{,2} and there's no
> > > system call that quite does this, so let's just provide an arch-specific
> > > one to probe for hardware capabilities. This currently just provides
> > > m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
> > > the future.
> > >
> > > Co-developed-by: Palmer Dabbelt <[email protected]>
> > > Signed-off-by: Palmer Dabbelt <[email protected]>
> > > Signed-off-by: Evan Green <[email protected]>
> >
> > I'm still skeptical about the need for a custom syscall interface here.
> > I had not looked at the interface so far, but there are a few things
> > that stick out:
> >
> > > +RISC-V Hardware Probing Interface
> > > +---------------------------------
> > > +
> > > +The RISC-V hardware probing interface is based around a single
> > > syscall, which
> > > +is defined in <asm/hwprobe.h>::
> > > +
> > > + struct riscv_hwprobe {
> > > + __s64 key;
> > > + __u64 value;
> > > + };
> >
> > The way this is defined, the kernel will always have to know
> > about the specific set of features, it can't just forward
> > unknown features to user space after probing them from an
> > architectured hardware interface or from DT.
>
> You're correct that this interface wasn't intended to have usermode
> come in with augmented data or additional key/value pairs. This was
> purely meant to provide access to the kernel's repository of
> architectural and microarchitectural details. If usermode wants to
> provide extra info in this same form, maybe they could wrap this
> interface.
>
> > If 'key' is just an enumerated value with a small number of
> > possible values, I don't see anything wrong with using elf
> > aux data. I understand it's hard to know how many keys
> > might be needed in the long run, from the way you define
> > the key/value pairs here, I would expect it to have a lot
> > of the same limitations that the aux data has, except for
> > a few bytes to be copied.
>
> Correct, this makes allocating bits out of here cheaper by not
> requiring that we actively copy them into every new process forever.
> You're right that the aux vector would work as well, but the thinking
> behind this series was that an interface like this might be better for
> an architecture as extensible as risc-v.
What would be the ramifications of defining some sort of vdso-like
data-structure and just putting the address into AT_HWCAP2 ?
(similar to what vdso does) - that could then even be re-usable
with other OS kernels.
And would also save declaring numerous new AT_* keys.
Because there are already nearly 130 standard extensions and vendors
are allowed to defines their own as well, and we will probably also want
to tell userspace about them.
Heiko
> > > + long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t
> > > pair_count,
> > > + size_t cpu_count, cpu_set_t *cpus,
> > > + unsigned long flags);
> >
> > The cpu set argument worries me more: there should never be a
> > need to optimize for broken hardware that has an asymmetric set
> > of features. Just let the kernel figure out the minimum set
> > of features that works across all CPUs and report that like we
> > do with HWCAP. If there is a SoC that is so broken that it has
> > important features on a subset of cores that some user might
> > actually want to rely on, then have them go through the slow
> > sysfs interface for probing the CPUs indidually, but don't make
> > the broken case easier at the expense of normal users that
> > run on working hardware.
>
> I'm not so sure. While I agree with you for major classes of features
> (eg one CPU has floating point support but another does not), I expect
> these bits to contain more subtle details as well, which might vary
> across asymmetric implementations without breaking ABI compatibility
> per-se. Maybe some vendor has implemented exotic video decoding
> acceleration instructions that only work on the big core. Or maybe the
> big cores support v3.1 of some extension (where certain things run
> faster), but the little cores only have v3.0, where it's a little
> slower. Certain apps would likely want to know these things so they
> can allocate their work optimally across cores.
>
> >
> > > +asmlinkage long sys_riscv_hwprobe(uintptr_t, uintptr_t, uintptr_t,
> > > uintptr_t,
> > > + uintptr_t, uintptr_t);
> >
> > Why 'uintptr_t' rather than the correct type?
>
> Fixed.
> -Evan
>
On Thu, Mar 30, 2023 at 1:20 PM Heiko Stübner <[email protected]> wrote:
>
> Am Donnerstag, 30. März 2023, 20:30:29 CEST schrieb Evan Green:
> > On Thu, Feb 23, 2023 at 2:06 AM Arnd Bergmann <[email protected]> wrote:
> > >
> > > On Tue, Feb 21, 2023, at 20:08, Evan Green wrote:
> > > > We don't have enough space for these all in ELF_HWCAP{,2} and there's no
> > > > system call that quite does this, so let's just provide an arch-specific
> > > > one to probe for hardware capabilities. This currently just provides
> > > > m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
> > > > the future.
> > > >
> > > > Co-developed-by: Palmer Dabbelt <[email protected]>
> > > > Signed-off-by: Palmer Dabbelt <[email protected]>
> > > > Signed-off-by: Evan Green <[email protected]>
> > >
> > > I'm still skeptical about the need for a custom syscall interface here.
> > > I had not looked at the interface so far, but there are a few things
> > > that stick out:
> > >
> > > > +RISC-V Hardware Probing Interface
> > > > +---------------------------------
> > > > +
> > > > +The RISC-V hardware probing interface is based around a single
> > > > syscall, which
> > > > +is defined in <asm/hwprobe.h>::
> > > > +
> > > > + struct riscv_hwprobe {
> > > > + __s64 key;
> > > > + __u64 value;
> > > > + };
> > >
> > > The way this is defined, the kernel will always have to know
> > > about the specific set of features, it can't just forward
> > > unknown features to user space after probing them from an
> > > architectured hardware interface or from DT.
> >
> > You're correct that this interface wasn't intended to have usermode
> > come in with augmented data or additional key/value pairs. This was
> > purely meant to provide access to the kernel's repository of
> > architectural and microarchitectural details. If usermode wants to
> > provide extra info in this same form, maybe they could wrap this
> > interface.
> >
> > > If 'key' is just an enumerated value with a small number of
> > > possible values, I don't see anything wrong with using elf
> > > aux data. I understand it's hard to know how many keys
> > > might be needed in the long run, from the way you define
> > > the key/value pairs here, I would expect it to have a lot
> > > of the same limitations that the aux data has, except for
> > > a few bytes to be copied.
> >
> > Correct, this makes allocating bits out of here cheaper by not
> > requiring that we actively copy them into every new process forever.
> > You're right that the aux vector would work as well, but the thinking
> > behind this series was that an interface like this might be better for
> > an architecture as extensible as risc-v.
>
> What would be the ramifications of defining some sort of vdso-like
> data-structure and just putting the address into AT_HWCAP2 ?
> (similar to what vdso does) - that could then even be re-usable
> with other OS kernels.
>
> And would also save declaring numerous new AT_* keys.
>
>
> Because there are already nearly 130 standard extensions and vendors
> are allowed to defines their own as well, and we will probably also want
> to tell userspace about them.
Yeah I mulled that approach over a bit originally as well. The
downside is the vdso data then becomes part of the ABI. So you can
never change the layout of that vdso data, and you lose the ability to
change what gets cached in the vdso versus what bounces up to the
syscall. To poach a scenario from a glibc discussion underway, if for
instance cpu hotplug comes along and you need to invalidate some
portion of your cached data, that's easy when there's a function in
front of it, but difficult if apps are crawling the data themselves.
130 extensions is certainly a lot, and illustrates how auxvec may get
out of hand quickly. One nice thing about this mechanism (though other
approaches share this trait) is that it's agnostic of where the data
comes from. In other words, it doesn't require that data come from the
DT, or alternative.c, etc, as long as the kernel can access it and
plunk it in a key/value store.
-Evan
>
>
> Heiko
>
>
> > > > + long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t
> > > > pair_count,
> > > > + size_t cpu_count, cpu_set_t *cpus,
> > > > + unsigned long flags);
> > >
> > > The cpu set argument worries me more: there should never be a
> > > need to optimize for broken hardware that has an asymmetric set
> > > of features. Just let the kernel figure out the minimum set
> > > of features that works across all CPUs and report that like we
> > > do with HWCAP. If there is a SoC that is so broken that it has
> > > important features on a subset of cores that some user might
> > > actually want to rely on, then have them go through the slow
> > > sysfs interface for probing the CPUs indidually, but don't make
> > > the broken case easier at the expense of normal users that
> > > run on working hardware.
> >
> > I'm not so sure. While I agree with you for major classes of features
> > (eg one CPU has floating point support but another does not), I expect
> > these bits to contain more subtle details as well, which might vary
> > across asymmetric implementations without breaking ABI compatibility
> > per-se. Maybe some vendor has implemented exotic video decoding
> > acceleration instructions that only work on the big core. Or maybe the
> > big cores support v3.1 of some extension (where certain things run
> > faster), but the little cores only have v3.0, where it's a little
> > slower. Certain apps would likely want to know these things so they
> > can allocate their work optimally across cores.
> >
> > >
> > > > +asmlinkage long sys_riscv_hwprobe(uintptr_t, uintptr_t, uintptr_t,
> > > > uintptr_t,
> > > > + uintptr_t, uintptr_t);
> > >
> > > Why 'uintptr_t' rather than the correct type?
> >
> > Fixed.
> > -Evan
> >
>
>
>
>
On Thu, Mar 30, 2023, at 20:30, Evan Green wrote:
> On Thu, Feb 23, 2023 at 2:06 AM Arnd Bergmann <[email protected]> wrote:
>> > + long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t
>> > pair_count,
>> > + size_t cpu_count, cpu_set_t *cpus,
>> > + unsigned long flags);
>>
>> The cpu set argument worries me more: there should never be a
>> need to optimize for broken hardware that has an asymmetric set
>> of features. Just let the kernel figure out the minimum set
>> of features that works across all CPUs and report that like we
>> do with HWCAP. If there is a SoC that is so broken that it has
>> important features on a subset of cores that some user might
>> actually want to rely on, then have them go through the slow
>> sysfs interface for probing the CPUs indidually, but don't make
>> the broken case easier at the expense of normal users that
>> run on working hardware.
>
> I'm not so sure. While I agree with you for major classes of features
> (eg one CPU has floating point support but another does not), I expect
> these bits to contain more subtle details as well, which might vary
> across asymmetric implementations without breaking ABI compatibility
> per-se. Maybe some vendor has implemented exotic video decoding
> acceleration instructions that only work on the big core. Or maybe the
> big cores support v3.1 of some extension (where certain things run
> faster), but the little cores only have v3.0, where it's a little
> slower. Certain apps would likely want to know these things so they
> can allocate their work optimally across cores.
Do you have a specific feature in mind where hardware would be
intentionally designed this way? I still can't come up with a
scenario where this would actually work in practice, as having
asymmetric features is incompatible with so many other things
we normally do.
- In a virtual machine, the VCPU tents to get scheduled arbitrarily
to physical CPUs, so setting affinity in a guest won't actually
guarantee that the feature is still there.
- Using a CPU feature from library code is practically impossible
if it requires special CPU affinity, as the application may
already be started on specific CPUs for another reason, and
having a library call sched_setaffinity will conflict with those.
- Even in the simplest case of having a standalone application
without any shared libraries try to pick a sensible CPU to
run on is hard to do in a generic way, as it would need to
weigh availabilty of features on certain cores against the
number of cores with or without the feature and their current
and expected system load.
As long as there isn't a specific requirement, I think it's better
to not actually encourage hardware vendors to implement designs
like that, or at least not designing an interface to make getting
this information a few microseconds faster that what already
exists.
Arnd
Hi Arnd,
On Fri, Mar 31, 2023 at 6:21 AM Arnd Bergmann <[email protected]> wrote:
>
> On Thu, Mar 30, 2023, at 20:30, Evan Green wrote:
> > On Thu, Feb 23, 2023 at 2:06 AM Arnd Bergmann <[email protected]> wrote:
> >> > + long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t
> >> > pair_count,
> >> > + size_t cpu_count, cpu_set_t *cpus,
> >> > + unsigned long flags);
> >>
> >> The cpu set argument worries me more: there should never be a
> >> need to optimize for broken hardware that has an asymmetric set
> >> of features. Just let the kernel figure out the minimum set
> >> of features that works across all CPUs and report that like we
> >> do with HWCAP. If there is a SoC that is so broken that it has
> >> important features on a subset of cores that some user might
> >> actually want to rely on, then have them go through the slow
> >> sysfs interface for probing the CPUs indidually, but don't make
> >> the broken case easier at the expense of normal users that
> >> run on working hardware.
> >
> > I'm not so sure. While I agree with you for major classes of features
> > (eg one CPU has floating point support but another does not), I expect
> > these bits to contain more subtle details as well, which might vary
> > across asymmetric implementations without breaking ABI compatibility
> > per-se. Maybe some vendor has implemented exotic video decoding
> > acceleration instructions that only work on the big core. Or maybe the
> > big cores support v3.1 of some extension (where certain things run
> > faster), but the little cores only have v3.0, where it's a little
> > slower. Certain apps would likely want to know these things so they
> > can allocate their work optimally across cores.
>
> Do you have a specific feature in mind where hardware would be
> intentionally designed this way? I still can't come up with a
> scenario where this would actually work in practice, as having
> asymmetric features is incompatible with so many other things
> we normally do.
Isn't this already a reality, with the ARM folks building systems
where only some cores can run arm32 code, and the rest are aarch64
only?
https://lwn.net/Articles/838339/
The way I see it, it's going to be close to impossible to _not_ design
hardware this way. As long as die area and energy efficiency are a
constraint, big.LITTLE and other asymmetric architectures seem
inevitable. Given this interface is sort of a software cpuid
instruction, pretending there won't be differences between cores feels
like an incomplete API.
>
> - In a virtual machine, the VCPU tents to get scheduled arbitrarily
> to physical CPUs, so setting affinity in a guest won't actually
> guarantee that the feature is still there.
Sure, but I wouldn't expect the guest to be able to observe physical
properties from a transient host CPU (I'd consider that a bug). I'd
expect either the VCPU is pinned to a physical CPU with no chance of
migration, in which case you might reasonably plumb down some physical
properties to the guest, or the guest sees a watered down generic CPU
containing the least common denominator of features.
>
> - Using a CPU feature from library code is practically impossible
> if it requires special CPU affinity, as the application may
> already be started on specific CPUs for another reason, and
> having a library call sched_setaffinity will conflict with those.
>
> - Even in the simplest case of having a standalone application
> without any shared libraries try to pick a sensible CPU to
> run on is hard to do in a generic way, as it would need to
> weigh availabilty of features on certain cores against the
> number of cores with or without the feature and their current
> and expected system load.
In ChromeOS at least, we struggled a lot with chrome getting scheduled
on the big vs little cores on some ARM platforms, as we'd get things
like dropped video frames if we stayed stuck on the wrong cores. I
think we managed to solve that with uclamp, but again that
acknowledges that there are differences between cores, which I think
justifies the cpuset parameter to a cpu feature interface like this.
>
> As long as there isn't a specific requirement, I think it's better
> to not actually encourage hardware vendors to implement designs
> like that, or at least not designing an interface to make getting
> this information a few microseconds faster that what already
> exists.
As this is ABI, there is a bit of "crystal ball gazing" I'm doing,
though I hope I've backed it up with enough to show it's not a
completely wild parameter.
-Evan