This patchset clarifies some unclear things about hwprobe's misaligned
performance. Including:
- hwprobe misaligned performance is only applied to scalar from patch [1]
- The defined keys of RISCV_HWPROBE_MISALIGNED_* are values not bitmasks
I cherry-picked [1] rather than write dependency because the original patch
was submitted with a line wrapped to 80 characters. We can't directly apply
that patch using `git am.`
[1] https://lore.kernel.org/linux-riscv/CAJgzZorn5anPH8dVPqvjVWmLKqTi5bkLDR=FH-ZAcdXFnNe8Eg@mail.gmail.com/
Yangyu Chen (1):
docs: riscv: hwprobe: Clarify misaligned keys are values not bitmasks
enh (1):
docs: riscv: Clarify risc-v hwprobe RISCV_HWPROBE_MISALIGNED_* docs.
Documentation/arch/riscv/hwprobe.rst | 31 ++++++++++++++++------------
1 file changed, 18 insertions(+), 13 deletions(-)
--
2.43.0
The original documentation says hwprobe keys are bitmasks, but actually,
they are values. This patch clarifies this to avoid confusion.
Signed-off-by: Yangyu Chen <[email protected]>
---
Documentation/arch/riscv/hwprobe.rst | 31 ++++++++++++++++------------
1 file changed, 18 insertions(+), 13 deletions(-)
diff --git a/Documentation/arch/riscv/hwprobe.rst b/Documentation/arch/riscv/hwprobe.rst
index 239be63f5089..4abfa3f9fe44 100644
--- a/Documentation/arch/riscv/hwprobe.rst
+++ b/Documentation/arch/riscv/hwprobe.rst
@@ -188,25 +188,30 @@ The following keys are defined:
manual starting from commit 95cf1f9 ("Add changes requested by Ved
during signoff")
-* :c:macro:`RISCV_HWPROBE_KEY_CPUPERF_0`: A bitmask that contains performance
+* :c:macro:`RISCV_HWPROBE_KEY_CPUPERF_0`: A value that contains performance
information about the selected set of processors.
- * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNKNOWN`: The performance of misaligned
- scalar accesses is unknown.
+ * :c:macro:`RISCV_HWPROBE_MISALIGNED_MASK`: The bitmask of the misaligned
+ access performance field in the value of key `RISCV_HWPROBE_KEY_CPUPERF_0`.
- * :c:macro:`RISCV_HWPROBE_MISALIGNED_EMULATED`: Misaligned scalar accesses are
- emulated via software, either in or below the kernel. These accesses are
- always extremely slow.
+ The following values (not bitmasks) in this field are defined:
- * :c:macro:`RISCV_HWPROBE_MISALIGNED_SLOW`: Misaligned scalar accesses are
- slower than equivalent byte accesses. Misaligned accesses may be supported
- directly in hardware, or trapped and emulated by software.
+ * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNKNOWN`: The performance of misaligned
+ scalar accesses is unknown.
- * :c:macro:`RISCV_HWPROBE_MISALIGNED_FAST`: Misaligned scalar accesses are
- faster than equivalent byte accesses.
+ * :c:macro:`RISCV_HWPROBE_MISALIGNED_EMULATED`: Misaligned scalar accesses are
+ emulated via software, either in or below the kernel. These accesses are
+ always extremely slow.
- * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNSUPPORTED`: Misaligned scalar accesses
- are not supported at all and will generate a misaligned address fault.
+ * :c:macro:`RISCV_HWPROBE_MISALIGNED_SLOW`: Misaligned scalar accesses are
+ slower than equivalent byte accesses. Misaligned accesses may be supported
+ directly in hardware, or trapped and emulated by software.
+
+ * :c:macro:`RISCV_HWPROBE_MISALIGNED_FAST`: Misaligned scalar accesses are
+ faster than equivalent byte accesses.
+
+ * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNSUPPORTED`: Misaligned scalar accesses
+ are not supported at all and will generate a misaligned address fault.
* :c:macro:`RISCV_HWPROBE_KEY_ZICBOZ_BLOCK_SIZE`: An unsigned int which
represents the size of the Zicboz block in bytes.
--
2.43.0
From: enh <[email protected]>
These only tell you about scalar accesses, not vector accesses.
Signed-off-by: Elliott Hughes <[email protected]>
Reviewed-by: Charlie Jenkins <[email protected]>
Signed-off-by: Yangyu Chen <[email protected]>
---
Documentation/arch/riscv/hwprobe.rst | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/Documentation/arch/riscv/hwprobe.rst b/Documentation/arch/riscv/hwprobe.rst
index b2bcc9eed9aa..239be63f5089 100644
--- a/Documentation/arch/riscv/hwprobe.rst
+++ b/Documentation/arch/riscv/hwprobe.rst
@@ -192,21 +192,21 @@ The following keys are defined:
information about the selected set of processors.
* :c:macro:`RISCV_HWPROBE_MISALIGNED_UNKNOWN`: The performance of misaligned
- accesses is unknown.
+ scalar accesses is unknown.
- * :c:macro:`RISCV_HWPROBE_MISALIGNED_EMULATED`: Misaligned accesses are
+ * :c:macro:`RISCV_HWPROBE_MISALIGNED_EMULATED`: Misaligned scalar accesses are
emulated via software, either in or below the kernel. These accesses are
always extremely slow.
- * :c:macro:`RISCV_HWPROBE_MISALIGNED_SLOW`: Misaligned accesses are slower
- than equivalent byte accesses. Misaligned accesses may be supported
+ * :c:macro:`RISCV_HWPROBE_MISALIGNED_SLOW`: Misaligned scalar accesses are
+ slower than equivalent byte accesses. Misaligned accesses may be supported
directly in hardware, or trapped and emulated by software.
- * :c:macro:`RISCV_HWPROBE_MISALIGNED_FAST`: Misaligned accesses are faster
- than equivalent byte accesses.
+ * :c:macro:`RISCV_HWPROBE_MISALIGNED_FAST`: Misaligned scalar accesses are
+ faster than equivalent byte accesses.
- * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNSUPPORTED`: Misaligned accesses are
- not supported at all and will generate a misaligned address fault.
+ * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNSUPPORTED`: Misaligned scalar accesses
+ are not supported at all and will generate a misaligned address fault.
* :c:macro:`RISCV_HWPROBE_KEY_ZICBOZ_BLOCK_SIZE`: An unsigned int which
represents the size of the Zicboz block in bytes.
--
2.43.0
On Sat, May 18, 2024 at 9:00 AM Yangyu Chen <[email protected]> wrote:
>
> The original documentation says hwprobe keys are bitmasks, but actually,
> they are values. This patch clarifies this to avoid confusion.
>
> Signed-off-by: Yangyu Chen <[email protected]>
Hm, we also have this problem in the code, since
hwprobe_key_is_bitmask() returns true for KEY_CPUPERF_0. This results
in wrong information being returned for queries using the WHICH_CPU
flag. If usermode asked for the set of CPUs that was specifically SLOW
or EMULATED, the returned cpuset would also include cpus that were
FAST. I believe all other queries are okay.
The one-liner fix is to just not return true for that key in
hwprobe_key_is_bitmask(). But that's technically user-visible: if some
software relied on the buggy behavior of FAST cpus being swept up in
the query for SLOW or EMULATED cpus, this change would expose that.
The grownups-eat-their-vegetables thing to do would be to define a new
key that returns this same value, but doesn't return true in
hwprobe_key_is_bitmask(). What do people think?
-Evan
> ---
> Documentation/arch/riscv/hwprobe.rst | 31 ++++++++++++++++------------
> 1 file changed, 18 insertions(+), 13 deletions(-)
>
> diff --git a/Documentation/arch/riscv/hwprobe.rst b/Documentation/arch/riscv/hwprobe.rst
> index 239be63f5089..4abfa3f9fe44 100644
> --- a/Documentation/arch/riscv/hwprobe.rst
> +++ b/Documentation/arch/riscv/hwprobe.rst
> @@ -188,25 +188,30 @@ The following keys are defined:
> manual starting from commit 95cf1f9 ("Add changes requested by Ved
> during signoff")
>
> -* :c:macro:`RISCV_HWPROBE_KEY_CPUPERF_0`: A bitmask that contains performance
> +* :c:macro:`RISCV_HWPROBE_KEY_CPUPERF_0`: A value that contains performance
> information about the selected set of processors.
>
> - * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNKNOWN`: The performance of misaligned
> - scalar accesses is unknown.
> + * :c:macro:`RISCV_HWPROBE_MISALIGNED_MASK`: The bitmask of the misaligned
> + access performance field in the value of key `RISCV_HWPROBE_KEY_CPUPERF_0`.
>
> - * :c:macro:`RISCV_HWPROBE_MISALIGNED_EMULATED`: Misaligned scalar accesses are
> - emulated via software, either in or below the kernel. These accesses are
> - always extremely slow.
> + The following values (not bitmasks) in this field are defined:
>
> - * :c:macro:`RISCV_HWPROBE_MISALIGNED_SLOW`: Misaligned scalar accesses are
> - slower than equivalent byte accesses. Misaligned accesses may be supported
> - directly in hardware, or trapped and emulated by software.
> + * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNKNOWN`: The performance of misaligned
> + scalar accesses is unknown.
>
> - * :c:macro:`RISCV_HWPROBE_MISALIGNED_FAST`: Misaligned scalar accesses are
> - faster than equivalent byte accesses.
> + * :c:macro:`RISCV_HWPROBE_MISALIGNED_EMULATED`: Misaligned scalar accesses are
> + emulated via software, either in or below the kernel. These accesses are
> + always extremely slow.
>
> - * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNSUPPORTED`: Misaligned scalar accesses
> - are not supported at all and will generate a misaligned address fault.
> + * :c:macro:`RISCV_HWPROBE_MISALIGNED_SLOW`: Misaligned scalar accesses are
> + slower than equivalent byte accesses. Misaligned accesses may be supported
> + directly in hardware, or trapped and emulated by software.
> +
> + * :c:macro:`RISCV_HWPROBE_MISALIGNED_FAST`: Misaligned scalar accesses are
> + faster than equivalent byte accesses.
> +
> + * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNSUPPORTED`: Misaligned scalar accesses
> + are not supported at all and will generate a misaligned address fault.
>
> * :c:macro:`RISCV_HWPROBE_KEY_ZICBOZ_BLOCK_SIZE`: An unsigned int which
> represents the size of the Zicboz block in bytes.
> --
> 2.43.0
>
On Tue, May 21, 2024 at 11:36:06AM -0700, Evan Green wrote:
> On Sat, May 18, 2024 at 9:00 AM Yangyu Chen <[email protected]> wrote:
> >
> > The original documentation says hwprobe keys are bitmasks, but actually,
> > they are values. This patch clarifies this to avoid confusion.
> >
> > Signed-off-by: Yangyu Chen <[email protected]>
>
> Hm, we also have this problem in the code, since
> hwprobe_key_is_bitmask() returns true for KEY_CPUPERF_0. This results
> in wrong information being returned for queries using the WHICH_CPU
> flag. If usermode asked for the set of CPUs that was specifically SLOW
> or EMULATED, the returned cpuset would also include cpus that were
> FAST. I believe all other queries are okay.
>
> The one-liner fix is to just not return true for that key in
> hwprobe_key_is_bitmask(). But that's technically user-visible: if some
> software relied on the buggy behavior of FAST cpus being swept up in
> the query for SLOW or EMULATED cpus, this change would expose that.
> The grownups-eat-their-vegetables thing to do would be to define a new
> key that returns this same value, but doesn't return true in
> hwprobe_key_is_bitmask(). What do people think?
I agree. A new key seems like the best option, keeping the old key for
backward compatibility. However, perhaps instead of the new key
returning the same value but not returning true in
hwprobe_key_is_bitmask() it could instead be a correct bitmask so people
would be able to have the feature of or-ing together the keys.
- Charlie
>
> -Evan
>
> > ---
> > Documentation/arch/riscv/hwprobe.rst | 31 ++++++++++++++++------------
> > 1 file changed, 18 insertions(+), 13 deletions(-)
> >
> > diff --git a/Documentation/arch/riscv/hwprobe.rst b/Documentation/arch/riscv/hwprobe.rst
> > index 239be63f5089..4abfa3f9fe44 100644
> > --- a/Documentation/arch/riscv/hwprobe.rst
> > +++ b/Documentation/arch/riscv/hwprobe.rst
> > @@ -188,25 +188,30 @@ The following keys are defined:
> > manual starting from commit 95cf1f9 ("Add changes requested by Ved
> > during signoff")
> >
> > -* :c:macro:`RISCV_HWPROBE_KEY_CPUPERF_0`: A bitmask that contains performance
> > +* :c:macro:`RISCV_HWPROBE_KEY_CPUPERF_0`: A value that contains performance
> > information about the selected set of processors.
> >
> > - * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNKNOWN`: The performance of misaligned
> > - scalar accesses is unknown.
> > + * :c:macro:`RISCV_HWPROBE_MISALIGNED_MASK`: The bitmask of the misaligned
> > + access performance field in the value of key `RISCV_HWPROBE_KEY_CPUPERF_0`.
> >
> > - * :c:macro:`RISCV_HWPROBE_MISALIGNED_EMULATED`: Misaligned scalar accesses are
> > - emulated via software, either in or below the kernel. These accesses are
> > - always extremely slow.
> > + The following values (not bitmasks) in this field are defined:
> >
> > - * :c:macro:`RISCV_HWPROBE_MISALIGNED_SLOW`: Misaligned scalar accesses are
> > - slower than equivalent byte accesses. Misaligned accesses may be supported
> > - directly in hardware, or trapped and emulated by software.
> > + * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNKNOWN`: The performance of misaligned
> > + scalar accesses is unknown.
> >
> > - * :c:macro:`RISCV_HWPROBE_MISALIGNED_FAST`: Misaligned scalar accesses are
> > - faster than equivalent byte accesses.
> > + * :c:macro:`RISCV_HWPROBE_MISALIGNED_EMULATED`: Misaligned scalar accesses are
> > + emulated via software, either in or below the kernel. These accesses are
> > + always extremely slow.
> >
> > - * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNSUPPORTED`: Misaligned scalar accesses
> > - are not supported at all and will generate a misaligned address fault.
> > + * :c:macro:`RISCV_HWPROBE_MISALIGNED_SLOW`: Misaligned scalar accesses are
> > + slower than equivalent byte accesses. Misaligned accesses may be supported
> > + directly in hardware, or trapped and emulated by software.
> > +
> > + * :c:macro:`RISCV_HWPROBE_MISALIGNED_FAST`: Misaligned scalar accesses are
> > + faster than equivalent byte accesses.
> > +
> > + * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNSUPPORTED`: Misaligned scalar accesses
> > + are not supported at all and will generate a misaligned address fault.
> >
> > * :c:macro:`RISCV_HWPROBE_KEY_ZICBOZ_BLOCK_SIZE`: An unsigned int which
> > represents the size of the Zicboz block in bytes.
> > --
> > 2.43.0
> >
On Tue, May 21, 2024 at 11:36:06AM GMT, Evan Green wrote:
> On Sat, May 18, 2024 at 9:00 AM Yangyu Chen <[email protected]> wrote:
> >
> > The original documentation says hwprobe keys are bitmasks, but actually,
> > they are values. This patch clarifies this to avoid confusion.
> >
> > Signed-off-by: Yangyu Chen <[email protected]>
>
> Hm, we also have this problem in the code, since
> hwprobe_key_is_bitmask() returns true for KEY_CPUPERF_0. This results
> in wrong information being returned for queries using the WHICH_CPU
> flag. If usermode asked for the set of CPUs that was specifically SLOW
> or EMULATED, the returned cpuset would also include cpus that were
> FAST. I believe all other queries are okay.
>
> The one-liner fix is to just not return true for that key in
> hwprobe_key_is_bitmask(). But that's technically user-visible: if some
> software relied on the buggy behavior of FAST cpus being swept up in
> the query for SLOW or EMULATED cpus, this change would expose that.
> The grownups-eat-their-vegetables thing to do would be to define a new
> key that returns this same value, but doesn't return true in
> hwprobe_key_is_bitmask(). What do people think?
Even though I actually enjoy eating vegetables, I think it's unlikely
that we need to be so cautious for this. I feel like kernel updates
provide a bit of freedom to change results of hardware query syscalls,
even when run on the same hardware. Particularly the EMULATED query,
which I guess could change with a firmware update. And, even the SLOW
query could change if the probing was modified directly or indirectly.
IOW, applications that use the which-cpus syscall shouldn't freak out
if they don't get the same cpuset after a kernel update, which means
we can drop the FAST cpus from the result.
Thanks,
drew
>
> -Evan
>
> > ---
> > Documentation/arch/riscv/hwprobe.rst | 31 ++++++++++++++++------------
> > 1 file changed, 18 insertions(+), 13 deletions(-)
> >
> > diff --git a/Documentation/arch/riscv/hwprobe.rst b/Documentation/arch/riscv/hwprobe.rst
> > index 239be63f5089..4abfa3f9fe44 100644
> > --- a/Documentation/arch/riscv/hwprobe.rst
> > +++ b/Documentation/arch/riscv/hwprobe.rst
> > @@ -188,25 +188,30 @@ The following keys are defined:
> > manual starting from commit 95cf1f9 ("Add changes requested by Ved
> > during signoff")
> >
> > -* :c:macro:`RISCV_HWPROBE_KEY_CPUPERF_0`: A bitmask that contains performance
> > +* :c:macro:`RISCV_HWPROBE_KEY_CPUPERF_0`: A value that contains performance
> > information about the selected set of processors.
> >
> > - * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNKNOWN`: The performance of misaligned
> > - scalar accesses is unknown.
> > + * :c:macro:`RISCV_HWPROBE_MISALIGNED_MASK`: The bitmask of the misaligned
> > + access performance field in the value of key `RISCV_HWPROBE_KEY_CPUPERF_0`.
> >
> > - * :c:macro:`RISCV_HWPROBE_MISALIGNED_EMULATED`: Misaligned scalar accesses are
> > - emulated via software, either in or below the kernel. These accesses are
> > - always extremely slow.
> > + The following values (not bitmasks) in this field are defined:
> >
> > - * :c:macro:`RISCV_HWPROBE_MISALIGNED_SLOW`: Misaligned scalar accesses are
> > - slower than equivalent byte accesses. Misaligned accesses may be supported
> > - directly in hardware, or trapped and emulated by software.
> > + * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNKNOWN`: The performance of misaligned
> > + scalar accesses is unknown.
> >
> > - * :c:macro:`RISCV_HWPROBE_MISALIGNED_FAST`: Misaligned scalar accesses are
> > - faster than equivalent byte accesses.
> > + * :c:macro:`RISCV_HWPROBE_MISALIGNED_EMULATED`: Misaligned scalar accesses are
> > + emulated via software, either in or below the kernel. These accesses are
> > + always extremely slow.
> >
> > - * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNSUPPORTED`: Misaligned scalar accesses
> > - are not supported at all and will generate a misaligned address fault.
> > + * :c:macro:`RISCV_HWPROBE_MISALIGNED_SLOW`: Misaligned scalar accesses are
> > + slower than equivalent byte accesses. Misaligned accesses may be supported
> > + directly in hardware, or trapped and emulated by software.
> > +
> > + * :c:macro:`RISCV_HWPROBE_MISALIGNED_FAST`: Misaligned scalar accesses are
> > + faster than equivalent byte accesses.
> > +
> > + * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNSUPPORTED`: Misaligned scalar accesses
> > + are not supported at all and will generate a misaligned address fault.
> >
> > * :c:macro:`RISCV_HWPROBE_KEY_ZICBOZ_BLOCK_SIZE`: An unsigned int which
> > represents the size of the Zicboz block in bytes.
> > --
> > 2.43.0
> >
On Wed, May 22, 2024 at 09:26:08AM +0200, Andrew Jones wrote:
> On Tue, May 21, 2024 at 11:36:06AM GMT, Evan Green wrote:
> > On Sat, May 18, 2024 at 9:00 AM Yangyu Chen <[email protected]> wrote:
> > >
> > > The original documentation says hwprobe keys are bitmasks, but actually,
> > > they are values. This patch clarifies this to avoid confusion.
> > >
> > > Signed-off-by: Yangyu Chen <[email protected]>
> >
> > Hm, we also have this problem in the code, since
> > hwprobe_key_is_bitmask() returns true for KEY_CPUPERF_0. This results
> > in wrong information being returned for queries using the WHICH_CPU
> > flag. If usermode asked for the set of CPUs that was specifically SLOW
> > or EMULATED, the returned cpuset would also include cpus that were
> > FAST. I believe all other queries are okay.
> >
> > The one-liner fix is to just not return true for that key in
> > hwprobe_key_is_bitmask(). But that's technically user-visible: if some
> > software relied on the buggy behavior of FAST cpus being swept up in
> > the query for SLOW or EMULATED cpus, this change would expose that.
> > The grownups-eat-their-vegetables thing to do would be to define a new
> > key that returns this same value, but doesn't return true in
> > hwprobe_key_is_bitmask(). What do people think?
>
> Even though I actually enjoy eating vegetables, I think it's unlikely
> that we need to be so cautious for this. I feel like kernel updates
> provide a bit of freedom to change results of hardware query syscalls,
> even when run on the same hardware. Particularly the EMULATED query,
> which I guess could change with a firmware update. And, even the SLOW
> query could change if the probing was modified directly or indirectly.
> IOW, applications that use the which-cpus syscall shouldn't freak out
> if they don't get the same cpuset after a kernel update, which means
> we can drop the FAST cpus from the result.
Dropping FAST does not really address the problem here. These keys are
being treated as a bitmap when they are in fact just values. Because
they are currently treated as bitmaps, it is semantically acceptable to
"or" two of the keys together and pass them into hwprobe. Dropping FAST
from the input of EMULATED | SLOW will make it appear to work, but it is
not reasonable to try to support or'ing of values that are not bitmaps. If a
key that maps to a value of 5 is ever introduced, then the input of
UNSUPPORTED | EMULATED will also return cpus that map to the new key.
If hwprobe recieves a value of 3 for CPUPERF_0 and or'ing is supported,
that could either be interpreted that the user is asking for cpus that are
EMULATED or SLOW or FAST, or just FAST, or just EMULATED or SLOW. The
current implementation defines that only FAST will be returned because
or'ing values like this is nonsense.
Allowing an input where keys are or'd together should not be supported
in CPUPERF_0 since it is what is causing this issue. However, the
original intent seems to be to allow an or'ing of keys. The verbosity of
using CPUPERF_0 will be greatly increased without the ability to or keys
together. If an application wants to know if misaligned accesses are
supported, it currently needs 3 separate hwprobe calls, one for each
EMULATED, SLOW, and FAST. Changing CPUPERF_0 to be a true bitmap would
require changing the values of each of the existing keys which is not an
acceptable API change.
In my previous email I proposed a new key, which could be something like
RISCV_HWPROBE_KEY_CPUPERF_1, that is actually a bitmap. That way a query
of EMULATED | SLOW | FAST would be possible in a single hwprobe call,
which seems to be the original intent of CPUPERF_0.
- Charlie
>
> Thanks,
> drew
>
> >
> > -Evan
> >
> > > ---
> > > Documentation/arch/riscv/hwprobe.rst | 31 ++++++++++++++++------------
> > > 1 file changed, 18 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/Documentation/arch/riscv/hwprobe.rst b/Documentation/arch/riscv/hwprobe.rst
> > > index 239be63f5089..4abfa3f9fe44 100644
> > > --- a/Documentation/arch/riscv/hwprobe.rst
> > > +++ b/Documentation/arch/riscv/hwprobe.rst
> > > @@ -188,25 +188,30 @@ The following keys are defined:
> > > manual starting from commit 95cf1f9 ("Add changes requested by Ved
> > > during signoff")
> > >
> > > -* :c:macro:`RISCV_HWPROBE_KEY_CPUPERF_0`: A bitmask that contains performance
> > > +* :c:macro:`RISCV_HWPROBE_KEY_CPUPERF_0`: A value that contains performance
> > > information about the selected set of processors.
> > >
> > > - * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNKNOWN`: The performance of misaligned
> > > - scalar accesses is unknown.
> > > + * :c:macro:`RISCV_HWPROBE_MISALIGNED_MASK`: The bitmask of the misaligned
> > > + access performance field in the value of key `RISCV_HWPROBE_KEY_CPUPERF_0`.
> > >
> > > - * :c:macro:`RISCV_HWPROBE_MISALIGNED_EMULATED`: Misaligned scalar accesses are
> > > - emulated via software, either in or below the kernel. These accesses are
> > > - always extremely slow.
> > > + The following values (not bitmasks) in this field are defined:
> > >
> > > - * :c:macro:`RISCV_HWPROBE_MISALIGNED_SLOW`: Misaligned scalar accesses are
> > > - slower than equivalent byte accesses. Misaligned accesses may be supported
> > > - directly in hardware, or trapped and emulated by software.
> > > + * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNKNOWN`: The performance of misaligned
> > > + scalar accesses is unknown.
> > >
> > > - * :c:macro:`RISCV_HWPROBE_MISALIGNED_FAST`: Misaligned scalar accesses are
> > > - faster than equivalent byte accesses.
> > > + * :c:macro:`RISCV_HWPROBE_MISALIGNED_EMULATED`: Misaligned scalar accesses are
> > > + emulated via software, either in or below the kernel. These accesses are
> > > + always extremely slow.
> > >
> > > - * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNSUPPORTED`: Misaligned scalar accesses
> > > - are not supported at all and will generate a misaligned address fault.
> > > + * :c:macro:`RISCV_HWPROBE_MISALIGNED_SLOW`: Misaligned scalar accesses are
> > > + slower than equivalent byte accesses. Misaligned accesses may be supported
> > > + directly in hardware, or trapped and emulated by software.
> > > +
> > > + * :c:macro:`RISCV_HWPROBE_MISALIGNED_FAST`: Misaligned scalar accesses are
> > > + faster than equivalent byte accesses.
> > > +
> > > + * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNSUPPORTED`: Misaligned scalar accesses
> > > + are not supported at all and will generate a misaligned address fault.
> > >
> > > * :c:macro:`RISCV_HWPROBE_KEY_ZICBOZ_BLOCK_SIZE`: An unsigned int which
> > > represents the size of the Zicboz block in bytes.
> > > --
> > > 2.43.0
> > >
On Thu, May 23, 2024 at 2:17 PM Charlie Jenkins <[email protected]> wrote:
>
> On Wed, May 22, 2024 at 09:26:08AM +0200, Andrew Jones wrote:
> > On Tue, May 21, 2024 at 11:36:06AM GMT, Evan Green wrote:
> > > On Sat, May 18, 2024 at 9:00 AM Yangyu Chen <[email protected]> wrote:
> > > >
> > > > The original documentation says hwprobe keys are bitmasks, but actually,
> > > > they are values. This patch clarifies this to avoid confusion.
> > > >
> > > > Signed-off-by: Yangyu Chen <[email protected]>
> > >
> > > Hm, we also have this problem in the code, since
> > > hwprobe_key_is_bitmask() returns true for KEY_CPUPERF_0. This results
> > > in wrong information being returned for queries using the WHICH_CPU
> > > flag. If usermode asked for the set of CPUs that was specifically SLOW
> > > or EMULATED, the returned cpuset would also include cpus that were
> > > FAST. I believe all other queries are okay.
> > >
> > > The one-liner fix is to just not return true for that key in
> > > hwprobe_key_is_bitmask(). But that's technically user-visible: if some
> > > software relied on the buggy behavior of FAST cpus being swept up in
> > > the query for SLOW or EMULATED cpus, this change would expose that.
> > > The grownups-eat-their-vegetables thing to do would be to define a new
> > > key that returns this same value, but doesn't return true in
> > > hwprobe_key_is_bitmask(). What do people think?
> >
> > Even though I actually enjoy eating vegetables, I think it's unlikely
> > that we need to be so cautious for this. I feel like kernel updates
> > provide a bit of freedom to change results of hardware query syscalls,
> > even when run on the same hardware. Particularly the EMULATED query,
> > which I guess could change with a firmware update. And, even the SLOW
> > query could change if the probing was modified directly or indirectly.
> > IOW, applications that use the which-cpus syscall shouldn't freak out
> > if they don't get the same cpuset after a kernel update, which means
> > we can drop the FAST cpus from the result.
>
> Dropping FAST does not really address the problem here. These keys are
> being treated as a bitmap when they are in fact just values. Because
> they are currently treated as bitmaps, it is semantically acceptable to
> "or" two of the keys together and pass them into hwprobe. Dropping FAST
> from the input of EMULATED | SLOW will make it appear to work, but it is
> not reasonable to try to support or'ing of values that are not bitmaps. If a
> key that maps to a value of 5 is ever introduced, then the input of
> UNSUPPORTED | EMULATED will also return cpus that map to the new key.
> If hwprobe recieves a value of 3 for CPUPERF_0 and or'ing is supported,
> that could either be interpreted that the user is asking for cpus that are
> EMULATED or SLOW or FAST, or just FAST, or just EMULATED or SLOW. The
> current implementation defines that only FAST will be returned because
> or'ing values like this is nonsense.
You're right that if usermode had taken the bitmask aspect to heart,
they might try to ask for cpus that were SLOW | EMULATED, and expect
to get the set of both. None of the other combinations make sense, so
I think this is the only one worth worrying about. That case is super
broken today as they'd get SLOW | EMULATED | FAST (basically all CPUs
always). We have no way to distinguish whether they actually passed
SLOW | EMULATED or FAST. I'm tempted to not worry about this case and
assume anyone who passed 3 meant FAST, but you're right it's there.
>
> Allowing an input where keys are or'd together should not be supported
> in CPUPERF_0 since it is what is causing this issue. However, the
> original intent seems to be to allow an or'ing of keys. The verbosity of
> using CPUPERF_0 will be greatly increased without the ability to or keys
> together. If an application wants to know if misaligned accesses are
> supported, it currently needs 3 separate hwprobe calls, one for each
> EMULATED, SLOW, and FAST. Changing CPUPERF_0 to be a true bitmap would
> require changing the values of each of the existing keys which is not an
> acceptable API change.
Specifying CPUPERF_0 as a bitmask was a mistake (mine) in the
documentation that got translated to code when the WHICH_CPUS flag was
introduced and hwprobe_key_is_bitmask() showed up. It was always
supposed to be an enum, with EMULATED being a (significant) step worse
than SLOW.
My expectation is that most software will just ask "which cpus are
FAST" and nothing else. It's not easy to write software that somehow
takes advantage of the difference between EMULATED and SLOW, though it
is nice to know for reporting/telemetry purposes.
>
> In my previous email I proposed a new key, which could be something like
> RISCV_HWPROBE_KEY_CPUPERF_1, that is actually a bitmap. That way a query
> of EMULATED | SLOW | FAST would be possible in a single hwprobe call,
> which seems to be the original intent of CPUPERF_0.
Agreed, it makes sense to me to have HWPROBE_MISALIGNED_* be the only
value in CPUPERF_0, and add a new CPUPERF_1 key for true bitmap
features. I currently don't have any bits to stuff into the newly
proposed key, but when they do I agree it makes sense to put them in a
new key.
-Evan
>
> - Charlie
>
>
> >
> > Thanks,
> > drew
> >
> > >
> > > -Evan
> > >
> > > > ---
> > > > Documentation/arch/riscv/hwprobe.rst | 31 ++++++++++++++++------------
> > > > 1 file changed, 18 insertions(+), 13 deletions(-)
> > > >
> > > > diff --git a/Documentation/arch/riscv/hwprobe.rst b/Documentation/arch/riscv/hwprobe.rst
> > > > index 239be63f5089..4abfa3f9fe44 100644
> > > > --- a/Documentation/arch/riscv/hwprobe.rst
> > > > +++ b/Documentation/arch/riscv/hwprobe.rst
> > > > @@ -188,25 +188,30 @@ The following keys are defined:
> > > > manual starting from commit 95cf1f9 ("Add changes requested by Ved
> > > > during signoff")
> > > >
> > > > -* :c:macro:`RISCV_HWPROBE_KEY_CPUPERF_0`: A bitmask that contains performance
> > > > +* :c:macro:`RISCV_HWPROBE_KEY_CPUPERF_0`: A value that contains performance
> > > > information about the selected set of processors.
> > > >
> > > > - * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNKNOWN`: The performance of misaligned
> > > > - scalar accesses is unknown.
> > > > + * :c:macro:`RISCV_HWPROBE_MISALIGNED_MASK`: The bitmask of the misaligned
> > > > + access performance field in the value of key `RISCV_HWPROBE_KEY_CPUPERF_0`.
> > > >
> > > > - * :c:macro:`RISCV_HWPROBE_MISALIGNED_EMULATED`: Misaligned scalar accesses are
> > > > - emulated via software, either in or below the kernel. These accesses are
> > > > - always extremely slow.
> > > > + The following values (not bitmasks) in this field are defined:
> > > >
> > > > - * :c:macro:`RISCV_HWPROBE_MISALIGNED_SLOW`: Misaligned scalar accesses are
> > > > - slower than equivalent byte accesses. Misaligned accesses may be supported
> > > > - directly in hardware, or trapped and emulated by software.
> > > > + * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNKNOWN`: The performance of misaligned
> > > > + scalar accesses is unknown.
> > > >
> > > > - * :c:macro:`RISCV_HWPROBE_MISALIGNED_FAST`: Misaligned scalar accesses are
> > > > - faster than equivalent byte accesses.
> > > > + * :c:macro:`RISCV_HWPROBE_MISALIGNED_EMULATED`: Misaligned scalar accesses are
> > > > + emulated via software, either in or below the kernel. These accesses are
> > > > + always extremely slow.
> > > >
> > > > - * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNSUPPORTED`: Misaligned scalar accesses
> > > > - are not supported at all and will generate a misaligned address fault.
> > > > + * :c:macro:`RISCV_HWPROBE_MISALIGNED_SLOW`: Misaligned scalar accesses are
> > > > + slower than equivalent byte accesses. Misaligned accesses may be supported
> > > > + directly in hardware, or trapped and emulated by software.
> > > > +
> > > > + * :c:macro:`RISCV_HWPROBE_MISALIGNED_FAST`: Misaligned scalar accesses are
> > > > + faster than equivalent byte accesses.
> > > > +
> > > > + * :c:macro:`RISCV_HWPROBE_MISALIGNED_UNSUPPORTED`: Misaligned scalar accesses
> > > > + are not supported at all and will generate a misaligned address fault.
> > > >
> > > > * :c:macro:`RISCV_HWPROBE_KEY_ZICBOZ_BLOCK_SIZE`: An unsigned int which
> > > > represents the size of the Zicboz block in bytes.
> > > > --
> > > > 2.43.0
> > > >