2023-02-24 10:04:12

by Heiko Carstens

[permalink] [raw]
Subject: [PATCH 0/2] s390: don't use 128-bit cmpxchg for READ_ONCE() purposes

Introduce and use an s390 specific READ_ONCE_ALIGNED_128() macro in order
to get rid of the odd 128-bit cmpxchg READ_ONCE() usage in cpum_sf, which
was introduced with commit 82d3edb50a11 ("s390/cpum_sf: add READ_ONCE()
semantics to compare and swap loops").

Heiko Carstens (2):
s390/rwonce: add READ_ONCE_ALIGNED_128() macro
s390/cpum_sf: use READ_ONCE_ALIGNED_128() instead of 128-bit cmpxchg

arch/s390/include/asm/rwonce.h | 31 +++++++++++++++++++++++++++++++
arch/s390/kernel/perf_cpum_sf.c | 9 +++------
2 files changed, 34 insertions(+), 6 deletions(-)
create mode 100644 arch/s390/include/asm/rwonce.h

--
2.37.2



2023-02-24 10:04:15

by Heiko Carstens

[permalink] [raw]
Subject: [PATCH 1/2] s390/rwonce: add READ_ONCE_ALIGNED_128() macro

Add an s390 specific READ_ONCE_ALIGNED_128() helper, which can be used for
fast block concurrent (atomic) 128-bit accesses.

The used lpq instruction requires 128-bit alignment. This is also the
reason why the compiler doesn't emit this instruction if __READ_ONCE() is
used for 128-bit accesses.

Signed-off-by: Heiko Carstens <[email protected]>
---
arch/s390/include/asm/rwonce.h | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
create mode 100644 arch/s390/include/asm/rwonce.h

diff --git a/arch/s390/include/asm/rwonce.h b/arch/s390/include/asm/rwonce.h
new file mode 100644
index 000000000000..91fc24520e82
--- /dev/null
+++ b/arch/s390/include/asm/rwonce.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ASM_S390_RWONCE_H
+#define __ASM_S390_RWONCE_H
+
+#include <linux/compiler_types.h>
+
+/*
+ * Use READ_ONCE_ALIGNED_128() for 128-bit block concurrent (atomic) read
+ * accesses. Note that x must be 128-bit aligned, otherwise a specification
+ * exception is generated.
+ */
+#define READ_ONCE_ALIGNED_128(x) \
+({ \
+ union { \
+ typeof(x) __x; \
+ __uint128_t val; \
+ } __u; \
+ \
+ BUILD_BUG_ON(sizeof(x) != 16); \
+ asm volatile( \
+ " lpq %[val],%[_x]\n" \
+ : [val] "=d" (__u.val) \
+ : [_x] "QS" (x) \
+ : "memory"); \
+ __u.__x; \
+})
+
+#include <asm-generic/rwonce.h>
+
+#endif /* __ASM_S390_RWONCE_H */
--
2.37.2


2023-02-24 10:04:18

by Heiko Carstens

[permalink] [raw]
Subject: [PATCH 2/2] s390/cpum_sf: use READ_ONCE_ALIGNED_128() instead of 128-bit cmpxchg

Use READ_ONCE_ALIGNED_128() to read the previous value in front of a
128-bit cmpxchg loop, instead of (mis-)using a 128-bit cmpxchg operation to
do the same.

This makes the code more readable and is faster.

Signed-off-by: Heiko Carstens <[email protected]>
---
arch/s390/kernel/perf_cpum_sf.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/s390/kernel/perf_cpum_sf.c b/arch/s390/kernel/perf_cpum_sf.c
index 79904a839fb9..e7b867e2f73f 100644
--- a/arch/s390/kernel/perf_cpum_sf.c
+++ b/arch/s390/kernel/perf_cpum_sf.c
@@ -1355,8 +1355,7 @@ static void hw_perf_event_update(struct perf_event *event, int flush_all)
num_sdb++;

/* Reset trailer (using compare-double-and-swap) */
- /* READ_ONCE() 16 byte header */
- prev.val = __cdsg(&te->header.val, 0, 0);
+ prev.val = READ_ONCE_ALIGNED_128(te->header.val);
do {
old.val = prev.val;
new.val = prev.val;
@@ -1558,8 +1557,7 @@ static bool aux_set_alert(struct aux_buffer *aux, unsigned long alert_index,
struct hws_trailer_entry *te;

te = aux_sdb_trailer(aux, alert_index);
- /* READ_ONCE() 16 byte header */
- prev.val = __cdsg(&te->header.val, 0, 0);
+ prev.val = READ_ONCE_ALIGNED_128(te->header.val);
do {
old.val = prev.val;
new.val = prev.val;
@@ -1637,8 +1635,7 @@ static bool aux_reset_buffer(struct aux_buffer *aux, unsigned long range,
idx_old = idx = aux->empty_mark + 1;
for (i = 0; i < range_scan; i++, idx++) {
te = aux_sdb_trailer(aux, idx);
- /* READ_ONCE() 16 byte header */
- prev.val = __cdsg(&te->header.val, 0, 0);
+ prev.val = READ_ONCE_ALIGNED_128(te->header.val);
do {
old.val = prev.val;
new.val = prev.val;
--
2.37.2


2023-02-25 16:51:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] s390/rwonce: add READ_ONCE_ALIGNED_128() macro

On Fri, Feb 24, 2023 at 11:02:36AM +0100, Heiko Carstens wrote:
> Add an s390 specific READ_ONCE_ALIGNED_128() helper, which can be used for
> fast block concurrent (atomic) 128-bit accesses.
>
> The used lpq instruction requires 128-bit alignment. This is also the
> reason why the compiler doesn't emit this instruction if __READ_ONCE() is
> used for 128-bit accesses.

Does your u128 not have natural alignment? Does it help if you force
align the u128 type?

>
> Signed-off-by: Heiko Carstens <[email protected]>
> ---
> arch/s390/include/asm/rwonce.h | 31 +++++++++++++++++++++++++++++++
> 1 file changed, 31 insertions(+)
> create mode 100644 arch/s390/include/asm/rwonce.h
>
> diff --git a/arch/s390/include/asm/rwonce.h b/arch/s390/include/asm/rwonce.h
> new file mode 100644
> index 000000000000..91fc24520e82
> --- /dev/null
> +++ b/arch/s390/include/asm/rwonce.h
> @@ -0,0 +1,31 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __ASM_S390_RWONCE_H
> +#define __ASM_S390_RWONCE_H
> +
> +#include <linux/compiler_types.h>
> +
> +/*
> + * Use READ_ONCE_ALIGNED_128() for 128-bit block concurrent (atomic) read
> + * accesses. Note that x must be 128-bit aligned, otherwise a specification
> + * exception is generated.
> + */
> +#define READ_ONCE_ALIGNED_128(x) \
> +({ \
> + union { \
> + typeof(x) __x; \
> + __uint128_t val; \
> + } __u; \
> + \
> + BUILD_BUG_ON(sizeof(x) != 16); \
> + asm volatile( \
> + " lpq %[val],%[_x]\n" \
> + : [val] "=d" (__u.val) \
> + : [_x] "QS" (x) \
> + : "memory"); \
> + __u.__x; \
> +})
> +
> +#include <asm-generic/rwonce.h>
> +
> +#endif /* __ASM_S390_RWONCE_H */
> --
> 2.37.2
>

2023-02-26 20:57:01

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH 1/2] s390/rwonce: add READ_ONCE_ALIGNED_128() macro

On Sat, Feb 25, 2023 at 05:50:58PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 24, 2023 at 11:02:36AM +0100, Heiko Carstens wrote:
> > Add an s390 specific READ_ONCE_ALIGNED_128() helper, which can be used for
> > fast block concurrent (atomic) 128-bit accesses.
> >
> > The used lpq instruction requires 128-bit alignment. This is also the
> > reason why the compiler doesn't emit this instruction if __READ_ONCE() is
> > used for 128-bit accesses.
>
> Does your u128 not have natural alignment? Does it help if you force
> align the u128 type?

s390 seems to be the only architecture which has a 64 bit alignment for
__uint128_t. But making it explicitly naturally aligned doesn't help.
I guess that's because the lpq instruction requires an even-odd register
pair where it reads to, while the now used lmg instruction can use any
register pair; but lmg doesn't come with atomic semantics.

2023-02-27 11:51:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/2] s390/rwonce: add READ_ONCE_ALIGNED_128() macro

On Sun, Feb 26, 2023 at 09:56:44PM +0100, Heiko Carstens wrote:
> On Sat, Feb 25, 2023 at 05:50:58PM +0100, Peter Zijlstra wrote:
> > On Fri, Feb 24, 2023 at 11:02:36AM +0100, Heiko Carstens wrote:
> > > Add an s390 specific READ_ONCE_ALIGNED_128() helper, which can be used for
> > > fast block concurrent (atomic) 128-bit accesses.
> > >
> > > The used lpq instruction requires 128-bit alignment. This is also the
> > > reason why the compiler doesn't emit this instruction if __READ_ONCE() is
> > > used for 128-bit accesses.
> >
> > Does your u128 not have natural alignment? Does it help if you force
> > align the u128 type?
>
> s390 seems to be the only architecture which has a 64 bit alignment for
> __uint128_t. But making it explicitly naturally aligned doesn't help.
> I guess that's because the lpq instruction requires an even-odd register
> pair where it reads to, while the now used lmg instruction can use any
> register pair; but lmg doesn't come with atomic semantics.

One thing you could do it talk with your compiler folks to allow using
lpq for volatile loads. That won't help you now and you'll have to do
these patches, but it makes sense to change the toolchains to me.