2022-02-07 16:28:10

by Mark Rutland

[permalink] [raw]
Subject: [PATCH v2] atomics: fix atomic64_{read_acquire,set_release} fallbacks

Arnd reports that on 32-bit architectures, the fallbacks for
atomic64_read_acquire() and atomic64_set_release() are broken as they
use smp_load_acquire() and smp_store_release() respectively, which do
not work on types larger than the native word size.

Since those contain compiletime_assert_atomic_type(), any attempt to use
those fallbacks will result in a build-time error. e.g. with the
following added to arch/arm/kernel/setup.c:

| void test_atomic64(atomic64_t *v)
| {
| atomic64_set_release(v, 5);
| atomic64_read_acquire(v);
| }

The compiler will complain as follows:

| In file included from <command-line>:
| In function 'arch_atomic64_set_release',
| inlined from 'test_atomic64' at ./include/linux/atomic/atomic-instrumented.h:669:2:
| ././include/linux/compiler_types.h:346:38: error: call to '__compiletime_assert_9' declared with attribute error: Need native word sized stores/loads for atomicity.
| 346 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| | ^
| ././include/linux/compiler_types.h:327:4: note: in definition of macro '__compiletime_assert'
| 327 | prefix ## suffix(); \
| | ^~~~~~
| ././include/linux/compiler_types.h:346:2: note: in expansion of macro '_compiletime_assert'
| 346 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| | ^~~~~~~~~~~~~~~~~~~
| ././include/linux/compiler_types.h:349:2: note: in expansion of macro 'compiletime_assert'
| 349 | compiletime_assert(__native_word(t), \
| | ^~~~~~~~~~~~~~~~~~
| ./include/asm-generic/barrier.h:133:2: note: in expansion of macro 'compiletime_assert_atomic_type'
| 133 | compiletime_assert_atomic_type(*p); \
| | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| ./include/asm-generic/barrier.h:164:55: note: in expansion of macro '__smp_store_release'
| 164 | #define smp_store_release(p, v) do { kcsan_release(); __smp_store_release(p, v); } while (0)
| | ^~~~~~~~~~~~~~~~~~~
| ./include/linux/atomic/atomic-arch-fallback.h:1270:2: note: in expansion of macro 'smp_store_release'
| 1270 | smp_store_release(&(v)->counter, i);
| | ^~~~~~~~~~~~~~~~~
| make[2]: *** [scripts/Makefile.build:288: arch/arm/kernel/setup.o] Error 1
| make[1]: *** [scripts/Makefile.build:550: arch/arm/kernel] Error 2
| make: *** [Makefile:1831: arch/arm] Error 2

Fix this by only using smp_load_acquire() and smp_store_release() for
native atomic types, and otherwise falling back to the regular barriers
necessary for acquire/release semantics, as we do in the more generic
acquire and release fallbacks.

For the example above this works as expected on 32-bit, e.g. for arm
multi_v7_defconfig:

| <test_atomic64>:
| push {r4, r5}
| dmb ish
| pldw [r0]
| mov r2, #5
| mov r3, #0
| ldrexd r4, [r0]
| strexd r4, r2, [r0]
| teq r4, #0
| bne 484 <test_atomic64+0x14>
| ldrexd r2, [r0]
| dmb ish
| pop {r4, r5}
| bx lr

... and also on 64-bit, e.g. for arm64 defconfig:

| <test_atomic64>:
| bti c
| paciasp
| mov x1, #0x5
| stlr x1, [x0]
| ldar x0, [x0]
| autiasp
| ret

Reported-by: Arnd Bergmann <[email protected]>
Signed-off-by: Mark Rutland <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Will Deacon <[email protected]>
---
include/linux/atomic/atomic-arch-fallback.h | 38 ++++++++++++++++++---
scripts/atomic/fallbacks/read_acquire | 11 +++++-
scripts/atomic/fallbacks/set_release | 7 +++-
3 files changed, 49 insertions(+), 7 deletions(-)

Since v1 [1]:
* Fix templates to use arch_${atomic}_{read,write}()
* Update 32-bit sample codegen
* Correct typo in commit message

[1] https://lore.kernel.org/lkml/[email protected]

diff --git a/include/linux/atomic/atomic-arch-fallback.h b/include/linux/atomic/atomic-arch-fallback.h
index a3dba31df01e..6db58d180866 100644
--- a/include/linux/atomic/atomic-arch-fallback.h
+++ b/include/linux/atomic/atomic-arch-fallback.h
@@ -151,7 +151,16 @@
static __always_inline int
arch_atomic_read_acquire(const atomic_t *v)
{
- return smp_load_acquire(&(v)->counter);
+ int ret;
+
+ if (__native_word(atomic_t)) {
+ ret = smp_load_acquire(&(v)->counter);
+ } else {
+ ret = arch_atomic_read(v);
+ __atomic_acquire_fence();
+ }
+
+ return ret;
}
#define arch_atomic_read_acquire arch_atomic_read_acquire
#endif
@@ -160,7 +169,12 @@ arch_atomic_read_acquire(const atomic_t *v)
static __always_inline void
arch_atomic_set_release(atomic_t *v, int i)
{
- smp_store_release(&(v)->counter, i);
+ if (__native_word(atomic_t)) {
+ smp_store_release(&(v)->counter, i);
+ } else {
+ __atomic_release_fence();
+ arch_atomic_set(v, i);
+ }
}
#define arch_atomic_set_release arch_atomic_set_release
#endif
@@ -1258,7 +1272,16 @@ arch_atomic_dec_if_positive(atomic_t *v)
static __always_inline s64
arch_atomic64_read_acquire(const atomic64_t *v)
{
- return smp_load_acquire(&(v)->counter);
+ s64 ret;
+
+ if (__native_word(atomic64_t)) {
+ ret = smp_load_acquire(&(v)->counter);
+ } else {
+ ret = arch_atomic64_read(v);
+ __atomic_acquire_fence();
+ }
+
+ return ret;
}
#define arch_atomic64_read_acquire arch_atomic64_read_acquire
#endif
@@ -1267,7 +1290,12 @@ arch_atomic64_read_acquire(const atomic64_t *v)
static __always_inline void
arch_atomic64_set_release(atomic64_t *v, s64 i)
{
- smp_store_release(&(v)->counter, i);
+ if (__native_word(atomic64_t)) {
+ smp_store_release(&(v)->counter, i);
+ } else {
+ __atomic_release_fence();
+ arch_atomic64_set(v, i);
+ }
}
#define arch_atomic64_set_release arch_atomic64_set_release
#endif
@@ -2358,4 +2386,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v)
#endif

#endif /* _LINUX_ATOMIC_FALLBACK_H */
-// cca554917d7ea73d5e3e7397dd70c484cad9b2c4
+// 8e2cc06bc0d2c0967d2f8424762bd48555ee40ae
diff --git a/scripts/atomic/fallbacks/read_acquire b/scripts/atomic/fallbacks/read_acquire
index 803ba7561076..a0ea1d26e6b2 100755
--- a/scripts/atomic/fallbacks/read_acquire
+++ b/scripts/atomic/fallbacks/read_acquire
@@ -2,6 +2,15 @@ cat <<EOF
static __always_inline ${ret}
arch_${atomic}_read_acquire(const ${atomic}_t *v)
{
- return smp_load_acquire(&(v)->counter);
+ ${int} ret;
+
+ if (__native_word(${atomic}_t)) {
+ ret = smp_load_acquire(&(v)->counter);
+ } else {
+ ret = arch_${atomic}_read(v);
+ __atomic_acquire_fence();
+ }
+
+ return ret;
}
EOF
diff --git a/scripts/atomic/fallbacks/set_release b/scripts/atomic/fallbacks/set_release
index 86ede759f24e..05cdb7f42477 100755
--- a/scripts/atomic/fallbacks/set_release
+++ b/scripts/atomic/fallbacks/set_release
@@ -2,6 +2,11 @@ cat <<EOF
static __always_inline void
arch_${atomic}_set_release(${atomic}_t *v, ${int} i)
{
- smp_store_release(&(v)->counter, i);
+ if (__native_word(${atomic}_t)) {
+ smp_store_release(&(v)->counter, i);
+ } else {
+ __atomic_release_fence();
+ arch_${atomic}_set(v, i);
+ }
}
EOF
--
2.30.2



2022-11-04 23:19:58

by Thorsten Glaser

[permalink] [raw]
Subject: Re: [PATCH v2] atomics: fix atomic64_{read_acquire,set_release} fallbacks

On Thu, 3 Feb 2022, Mark Rutland wrote:

> - smp_store_release(&(v)->counter, i);
> + if (__native_word(atomic_t)) {
> + smp_store_release(&(v)->counter, i);
> + } else {
> + __atomic_release_fence();
> + arch_atomic_set(v, i);
> + }

Shouldn’t this also update Documentation/atomic_t.txt which
currently states:

| The non-RMW ops are (typically) regular LOADs and STOREs and are canonically
| implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and
| smp_store_release() respectively. Therefore, if you find yourself only using
| the Non-RMW operations of atomic_t, you do not in fact need atomic_t at all
| and are doing it wrong.

With this, direct use of atomic64_set_release() and atomic64_read_acquire()
is (IIUC) not “doing it wrong” any more?

bye,
//mirabilos
--
Infrastrukturexperte • tarent solutions GmbH
Am Dickobskreuz 10, D-53121 Bonn • http://www.tarent.de/
Telephon +49 228 54881-393 • Fax: +49 228 54881-235
HRB AG Bonn 5168 • USt-ID (VAT): DE122264941
Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg

****************************************************
/⁀\ The UTF-8 Ribbon
╲ ╱ Campaign against Mit dem tarent-Newsletter nichts mehr verpassen:
 ╳  HTML eMail! Also, https://www.tarent.de/newsletter
╱ ╲ header encryption!
****************************************************

2022-11-07 11:46:13

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v2] atomics: fix atomic64_{read_acquire,set_release} fallbacks

On Sat, Nov 05, 2022 at 12:05:30AM +0100, Thorsten Glaser wrote:
> On Thu, 3 Feb 2022, Mark Rutland wrote:
>
> > - smp_store_release(&(v)->counter, i);
> > + if (__native_word(atomic_t)) {
> > + smp_store_release(&(v)->counter, i);
> > + } else {
> > + __atomic_release_fence();
> > + arch_atomic_set(v, i);
> > + }
>
> Shouldn’t this also update Documentation/atomic_t.txt which
> currently states:
>
> | The non-RMW ops are (typically) regular LOADs and STOREs and are canonically
> | implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and
> | smp_store_release() respectively. Therefore, if you find yourself only using
> | the Non-RMW operations of atomic_t, you do not in fact need atomic_t at all
> | and are doing it wrong.
>
> With this, direct use of atomic64_set_release() and atomic64_read_acquire()
> is (IIUC) not “doing it wrong” any more?

Direct use was never "wrong" if you were doing anything other than
atomic64_set_release() and atomic64_read_acquire(), and I suspect we don't want
to see those abused as a way to get a 64-bit smp_store_release() or
smp_load_acquire() since those won't necessarily do the right thing w.r.t. a
plain READ_ONCE() and so on.

So I think this is still correct as-is.

Do you have a particular case in mind that you care about?

Thanks,
Mark.

2022-11-07 12:42:02

by Thorsten Glaser

[permalink] [raw]
Subject: Re: [PATCH v2] atomics: fix atomic64_{read_acquire,set_release} fallbacks

On Mon, 7 Nov 2022, Mark Rutland wrote:

> Do you have a particular case in mind that you care about?

Definitely:

I have a qdisc which does rate limiting, and I need to update the
rate 100+ times a second and “tc qdisc change” introduces too much
delay (I *think* but I’m implementing this right now and we’ll see
whether this is indeed the problem).

The qdisc already uses relayfs over debugfs anyway to communicate
status info *to* userspace, so I’m adding another debugfs file and
will update the rate from its write fop.

So, after *much* reading and asking around, atomic64_set_release()
in the write fop, combined with atomic64_read_acquire() in the qdisc
hot path (which runs in softirq context and thus ought to be lockless)
seems to be “the answer”.

Basically, atomically changing a 64-bit value while keeping at least
the read part lockless was my requirement. (I’d be happy enough even
if the new value was read with _some_ delay, say within 1 µs, as long
as after the old value only the new value would be read, and never
a different value. Giving up ILP32 support for this is acceptable,
even if it may limit other uses later, although given this already
throws tons of u64 quantities around for nanoseconds it’d have sucked
on ILP32 already and I can tell the users to run it on LP64 only, it’s
for a specific use case only anyway.)

bye,
//mirabilos
--
Infrastrukturexperte • tarent solutions GmbH
Am Dickobskreuz 10, D-53121 Bonn • http://www.tarent.de/
Telephon +49 228 54881-393 • Fax: +49 228 54881-235
HRB AG Bonn 5168 • USt-ID (VAT): DE122264941
Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg

****************************************************
/⁀\ The UTF-8 Ribbon
╲ ╱ Campaign against Mit dem tarent-Newsletter nichts mehr verpassen:
 ╳  HTML eMail! Also, https://www.tarent.de/newsletter
╱ ╲ header encryption!
****************************************************