2022-06-17 17:08:25

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants

While I was working on converting some structure fields from a fixed
type to a bitmap, I started observing code size increase not only in
places where the code works with the converted structure fields, but
also where the converted vars were on the stack. That said, the
following code:

DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
unsigned long bar = BIT(BAR_BIT);
unsigned long baz = 0;

__set_bit(FOO_BIT, foo);
baz |= BIT(BAZ_BIT);

BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));

triggers the first assertion on x86_64, which means that the
compiler is unable to evaluate it to a compile-time initializer
when the architecture-specific bitop is used even if it's obvious.
I found that this is due to that many architecture-specific
non-atomic bitop implementations use inline asm or other hacks which
are faster or more robust when working with "real" variables (i.e.
fields from the structures etc.), but the compilers have no clue how
to optimize them out when called on compile-time constants.

So, in order to let the compiler optimize out such cases, expand the
test_bit() and __*_bit() definitions with a compile-time condition
check, so that they will pick the generic C non-atomic bitop
implementations when all of the arguments passed are compile-time
constants, which means that the result will be a compile-time
constant as well and the compiler will produce more efficient and
simple code in 100% cases (no changes when there's at least one
non-compile-time-constant argument).
The condition itself:

if (
__builtin_constant_p(nr) && /* <- bit position is constant */
__builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is
always either NULL or not */
addr && /* <- bitmap addr is not NULL */
__builtin_constant_p(*addr) /* <- compiler knows the value of
the target bitmap */
)
/* then pick the generic C variant
else
/* old code path, arch-specific

I also tried __is_constexpr() as suggested by Andy, but it was
always returning 0 ('not a constant') for the 2,3 and 4th
conditions.

The savings are architecture, compiler and compiler flags dependent,
for example, on x86_64 -O2:

GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)

and ARM64 (courtesy of Mark[0]):

GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)

And the following:

DECLARE_BITMAP(flags, __IP_TUNNEL_FLAG_NUM) = { };
__be16 flags;

__set_bit(IP_TUNNEL_CSUM_BIT, flags);

tun_flags = cpu_to_be16(*flags & U16_MAX);

if (test_bit(IP_TUNNEL_VTI_BIT, flags))
tun_flags |= VTI_ISVTI;

BUILD_BUG_ON(!__builtin_constant_p(tun_flags));

doesn't blow up anymore (which is being checked now at build time),
so that we can now e.g. use fixed bitmaps in compile-time assertions
etc.

The series has been in intel-next for a while with no reported issues.

From v2[1]:
* collect several Reviewed-bys (Andy, Yury);
* add a comment to generic_test_bit() that it is atomic-safe and
must always stay like that (the first version of this series
errorneously tried to change this) (Andy, Marco);
* unify the way how architectures define platform-specific bitops,
both supporting instrumentation and not: now they define only
'arch_' versions and asm-generic includes take care of the rest;
* micro-optimize the diffstat of 0004/0007 (__check_bitop_pr())
(Andy);
* add compile-time tests to lib/test_bitmap to make sure everything
works as expected on any setup (Yury).

From v1[2]:
* change 'gen_' prefixes to '_generic' to disambiguate from
'generated' etc. (Mark);
* define a separate 'const_' set to use in the optimization to keep
the generic test_bit() atomic-safe (Marco);
* unify arch_{test,__*}_bit() as well and include them in the type
check;
* add more relevant and up-to-date bloat-o-meter results, including
ARM64 (me, Mark);
* pick a couple '*-by' tags (Mark, Yury).

Also available on my open GH[3].

[0] https://lore.kernel.org/all/Yp4GQFQYD32Rs9Cw@FVFF77S0Q05N
[1] https://lore.kernel.org/linux-arch/[email protected]
[2] https://lore.kernel.org/all/[email protected]
[3] https://github.com/alobakin/linux/commits/bitops

Alexander Lobakin (7):
ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()
bitops: always define asm-generic non-atomic bitops
bitops: unify non-atomic bitops prototypes across architectures
bitops: define const_*() versions of the non-atomics
bitops: wrap non-atomic bitops with a transparent macro
bitops: let optimize out non-atomic bitops on compile-time constants
lib: test_bitmap: add compile-time optimization/evaluations assertions

arch/alpha/include/asm/bitops.h | 32 ++--
arch/hexagon/include/asm/bitops.h | 24 ++-
arch/ia64/include/asm/bitops.h | 42 ++---
arch/ia64/include/asm/processor.h | 2 +-
arch/m68k/include/asm/bitops.h | 49 ++++--
arch/sh/include/asm/bitops-op32.h | 34 ++--
arch/sparc/include/asm/bitops_32.h | 18 +-
arch/sparc/lib/atomic32.c | 12 +-
arch/x86/include/asm/bitops.h | 22 +--
.../asm-generic/bitops/generic-non-atomic.h | 161 ++++++++++++++++++
.../bitops/instrumented-non-atomic.h | 35 ++--
include/asm-generic/bitops/non-atomic.h | 121 +------------
.../bitops/non-instrumented-non-atomic.h | 16 ++
include/linux/bitops.h | 50 ++++++
lib/test_bitmap.c | 45 +++++
tools/include/asm-generic/bitops/non-atomic.h | 34 ++--
tools/include/linux/bitops.h | 16 ++
17 files changed, 476 insertions(+), 237 deletions(-)
create mode 100644 include/asm-generic/bitops/generic-non-atomic.h
create mode 100644 include/asm-generic/bitops/non-instrumented-non-atomic.h

--
2.36.1


2022-06-17 17:09:05

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH v3 3/7] bitops: unify non-atomic bitops prototypes across architectures

Currently, there is a mess with the prototypes of the non-atomic
bitops across the different architectures:

ret bool, int, unsigned long
nr int, long, unsigned int, unsigned long
addr volatile unsigned long *, volatile void *

Thankfully, it doesn't provoke any bugs, but can sometimes make
the compiler angry when it's not handy at all.
Adjust all the prototypes to the following standard:

ret bool retval can be only 0 or 1
nr unsigned long native; signed makes no sense
addr volatile unsigned long * bitmaps are arrays of ulongs

Next, some architectures don't define 'arch_' versions as they don't
support instrumentation, others do. To make sure there is always the
same set of callables present and to ease any potential future
changes, make them all follow the rule:
* architecture-specific files define only 'arch_' versions;
* non-prefixed versions can be defined only in asm-generic files;
and place the non-prefixed definitions into a new file in
asm-generic to be included by non-instrumented architectures.

Finally, add some static assertions in order to prevent people from
making a mess in this room again.
I also used the %__always_inline attribute consistently, so that
they always get resolved to the actual operations.

Suggested-by: Andy Shevchenko <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Reviewed-by: Yury Norov <[email protected]>
---
arch/alpha/include/asm/bitops.h | 32 ++++++------
arch/hexagon/include/asm/bitops.h | 24 +++++----
arch/ia64/include/asm/bitops.h | 42 ++++++++--------
arch/m68k/include/asm/bitops.h | 49 +++++++++++++------
arch/sh/include/asm/bitops-op32.h | 34 ++++++++-----
arch/x86/include/asm/bitops.h | 22 +++++----
.../asm-generic/bitops/generic-non-atomic.h | 24 ++++-----
.../bitops/instrumented-non-atomic.h | 21 +++++---
include/asm-generic/bitops/non-atomic.h | 13 +----
.../bitops/non-instrumented-non-atomic.h | 16 ++++++
include/linux/bitops.h | 17 +++++++
tools/include/asm-generic/bitops/non-atomic.h | 24 +++++----
12 files changed, 198 insertions(+), 120 deletions(-)
create mode 100644 include/asm-generic/bitops/non-instrumented-non-atomic.h

diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
index e1d8483a45f2..492c7713ddae 100644
--- a/arch/alpha/include/asm/bitops.h
+++ b/arch/alpha/include/asm/bitops.h
@@ -46,8 +46,8 @@ set_bit(unsigned long nr, volatile void * addr)
/*
* WARNING: non atomic version.
*/
-static inline void
-__set_bit(unsigned long nr, volatile void * addr)
+static __always_inline void
+arch___set_bit(unsigned long nr, volatile unsigned long *addr)
{
int *m = ((int *) addr) + (nr >> 5);

@@ -82,8 +82,8 @@ clear_bit_unlock(unsigned long nr, volatile void * addr)
/*
* WARNING: non atomic version.
*/
-static __inline__ void
-__clear_bit(unsigned long nr, volatile void * addr)
+static __always_inline void
+arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
{
int *m = ((int *) addr) + (nr >> 5);

@@ -94,7 +94,7 @@ static inline void
__clear_bit_unlock(unsigned long nr, volatile void * addr)
{
smp_mb();
- __clear_bit(nr, addr);
+ arch___clear_bit(nr, addr);
}

static inline void
@@ -118,8 +118,8 @@ change_bit(unsigned long nr, volatile void * addr)
/*
* WARNING: non atomic version.
*/
-static __inline__ void
-__change_bit(unsigned long nr, volatile void * addr)
+static __always_inline void
+arch___change_bit(unsigned long nr, volatile unsigned long *addr)
{
int *m = ((int *) addr) + (nr >> 5);

@@ -186,8 +186,8 @@ test_and_set_bit_lock(unsigned long nr, volatile void *addr)
/*
* WARNING: non atomic version.
*/
-static inline int
-__test_and_set_bit(unsigned long nr, volatile void * addr)
+static __always_inline bool
+arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
{
unsigned long mask = 1 << (nr & 0x1f);
int *m = ((int *) addr) + (nr >> 5);
@@ -230,8 +230,8 @@ test_and_clear_bit(unsigned long nr, volatile void * addr)
/*
* WARNING: non atomic version.
*/
-static inline int
-__test_and_clear_bit(unsigned long nr, volatile void * addr)
+static __always_inline bool
+arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
{
unsigned long mask = 1 << (nr & 0x1f);
int *m = ((int *) addr) + (nr >> 5);
@@ -272,8 +272,8 @@ test_and_change_bit(unsigned long nr, volatile void * addr)
/*
* WARNING: non atomic version.
*/
-static __inline__ int
-__test_and_change_bit(unsigned long nr, volatile void * addr)
+static __always_inline bool
+arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
{
unsigned long mask = 1 << (nr & 0x1f);
int *m = ((int *) addr) + (nr >> 5);
@@ -283,8 +283,8 @@ __test_and_change_bit(unsigned long nr, volatile void * addr)
return (old & mask) != 0;
}

-static inline int
-test_bit(int nr, const volatile void * addr)
+static __always_inline bool
+arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
{
return (1UL & (((const int *) addr)[nr >> 5] >> (nr & 31))) != 0UL;
}
@@ -450,6 +450,8 @@ sched_find_first_bit(const unsigned long b[2])
return __ffs(tmp) + ofs;
}

+#include <asm-generic/bitops/non-instrumented-non-atomic.h>
+
#include <asm-generic/bitops/le.h>

#include <asm-generic/bitops/ext2-atomic-setbit.h>
diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
index 75d6ba3643b8..da500471ac73 100644
--- a/arch/hexagon/include/asm/bitops.h
+++ b/arch/hexagon/include/asm/bitops.h
@@ -127,38 +127,45 @@ static inline void change_bit(int nr, volatile void *addr)
* be atomic, particularly for things like slab_lock and slab_unlock.
*
*/
-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
{
test_and_clear_bit(nr, addr);
}

-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+arch___set_bit(unsigned long nr, volatile unsigned long *addr)
{
test_and_set_bit(nr, addr);
}

-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+arch___change_bit(unsigned long nr, volatile unsigned long *addr)
{
test_and_change_bit(nr, addr);
}

/* Apparently, at least some of these are allowed to be non-atomic */
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
{
return test_and_clear_bit(nr, addr);
}

-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
{
return test_and_set_bit(nr, addr);
}

-static inline int __test_and_change_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
{
return test_and_change_bit(nr, addr);
}

-static inline int __test_bit(int nr, const volatile unsigned long *addr)
+static __always_inline bool
+arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
{
int retval;

@@ -172,8 +179,6 @@ static inline int __test_bit(int nr, const volatile unsigned long *addr)
return retval;
}

-#define test_bit(nr, addr) __test_bit(nr, addr)
-
/*
* ffz - find first zero in word.
* @word: The word to search
@@ -271,6 +276,7 @@ static inline unsigned long __fls(unsigned long word)
}

#include <asm-generic/bitops/lock.h>
+#include <asm-generic/bitops/non-instrumented-non-atomic.h>

#include <asm-generic/bitops/fls64.h>
#include <asm-generic/bitops/sched.h>
diff --git a/arch/ia64/include/asm/bitops.h b/arch/ia64/include/asm/bitops.h
index 577be93c0818..9f62af7fd7c4 100644
--- a/arch/ia64/include/asm/bitops.h
+++ b/arch/ia64/include/asm/bitops.h
@@ -53,7 +53,7 @@ set_bit (int nr, volatile void *addr)
}

/**
- * __set_bit - Set a bit in memory
+ * arch___set_bit - Set a bit in memory
* @nr: the bit to set
* @addr: the address to start counting from
*
@@ -61,8 +61,8 @@ set_bit (int nr, volatile void *addr)
* If it's called on the same region of memory simultaneously, the effect
* may be that only one operation succeeds.
*/
-static __inline__ void
-__set_bit (int nr, volatile void *addr)
+static __always_inline void
+arch___set_bit(unsigned long nr, volatile unsigned long *addr)
{
*((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
}
@@ -135,7 +135,7 @@ __clear_bit_unlock(int nr, void *addr)
}

/**
- * __clear_bit - Clears a bit in memory (non-atomic version)
+ * arch___clear_bit - Clears a bit in memory (non-atomic version)
* @nr: the bit to clear
* @addr: the address to start counting from
*
@@ -143,8 +143,8 @@ __clear_bit_unlock(int nr, void *addr)
* If it's called on the same region of memory simultaneously, the effect
* may be that only one operation succeeds.
*/
-static __inline__ void
-__clear_bit (int nr, volatile void *addr)
+static __always_inline void
+arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
{
*((__u32 *) addr + (nr >> 5)) &= ~(1 << (nr & 31));
}
@@ -175,7 +175,7 @@ change_bit (int nr, volatile void *addr)
}

/**
- * __change_bit - Toggle a bit in memory
+ * arch___change_bit - Toggle a bit in memory
* @nr: the bit to toggle
* @addr: the address to start counting from
*
@@ -183,8 +183,8 @@ change_bit (int nr, volatile void *addr)
* If it's called on the same region of memory simultaneously, the effect
* may be that only one operation succeeds.
*/
-static __inline__ void
-__change_bit (int nr, volatile void *addr)
+static __always_inline void
+arch___change_bit(unsigned long nr, volatile unsigned long *addr)
{
*((__u32 *) addr + (nr >> 5)) ^= (1 << (nr & 31));
}
@@ -224,7 +224,7 @@ test_and_set_bit (int nr, volatile void *addr)
#define test_and_set_bit_lock test_and_set_bit

/**
- * __test_and_set_bit - Set a bit and return its old value
+ * arch___test_and_set_bit - Set a bit and return its old value
* @nr: Bit to set
* @addr: Address to count from
*
@@ -232,8 +232,8 @@ test_and_set_bit (int nr, volatile void *addr)
* If two examples of this operation race, one can appear to succeed
* but actually fail. You must protect multiple accesses with a lock.
*/
-static __inline__ int
-__test_and_set_bit (int nr, volatile void *addr)
+static __always_inline bool
+arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
{
__u32 *p = (__u32 *) addr + (nr >> 5);
__u32 m = 1 << (nr & 31);
@@ -269,7 +269,7 @@ test_and_clear_bit (int nr, volatile void *addr)
}

/**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * arch___test_and_clear_bit - Clear a bit and return its old value
* @nr: Bit to clear
* @addr: Address to count from
*
@@ -277,8 +277,8 @@ test_and_clear_bit (int nr, volatile void *addr)
* If two examples of this operation race, one can appear to succeed
* but actually fail. You must protect multiple accesses with a lock.
*/
-static __inline__ int
-__test_and_clear_bit(int nr, volatile void * addr)
+static __always_inline bool
+arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
{
__u32 *p = (__u32 *) addr + (nr >> 5);
__u32 m = 1 << (nr & 31);
@@ -314,14 +314,14 @@ test_and_change_bit (int nr, volatile void *addr)
}

/**
- * __test_and_change_bit - Change a bit and return its old value
+ * arch___test_and_change_bit - Change a bit and return its old value
* @nr: Bit to change
* @addr: Address to count from
*
* This operation is non-atomic and can be reordered.
*/
-static __inline__ int
-__test_and_change_bit (int nr, void *addr)
+static __always_inline bool
+arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
{
__u32 old, bit = (1 << (nr & 31));
__u32 *m = (__u32 *) addr + (nr >> 5);
@@ -331,8 +331,8 @@ __test_and_change_bit (int nr, void *addr)
return (old & bit) != 0;
}

-static __inline__ int
-test_bit (int nr, const volatile void *addr)
+static __always_inline bool
+arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
{
return 1 & (((const volatile __u32 *) addr)[nr >> 5] >> (nr & 31));
}
@@ -443,6 +443,8 @@ static __inline__ unsigned long __arch_hweight64(unsigned long x)

#ifdef __KERNEL__

+#include <asm-generic/bitops/non-instrumented-non-atomic.h>
+
#include <asm-generic/bitops/le.h>

#include <asm-generic/bitops/ext2-atomic-setbit.h>
diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index 51283db53667..71495faf2a90 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -65,8 +65,11 @@ static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
bfset_mem_set_bit(nr, vaddr))
#endif

-#define __set_bit(nr, vaddr) set_bit(nr, vaddr)
-
+static __always_inline void
+arch___set_bit(unsigned long nr, volatile unsigned long *addr)
+{
+ set_bit(nr, addr);
+}

static inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr)
{
@@ -105,8 +108,11 @@ static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
bfclr_mem_clear_bit(nr, vaddr))
#endif

-#define __clear_bit(nr, vaddr) clear_bit(nr, vaddr)
-
+static __always_inline void
+arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
+{
+ clear_bit(nr, addr);
+}

static inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr)
{
@@ -145,14 +151,17 @@ static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
bfchg_mem_change_bit(nr, vaddr))
#endif

-#define __change_bit(nr, vaddr) change_bit(nr, vaddr)
-
-
-static inline int test_bit(int nr, const volatile unsigned long *vaddr)
+static __always_inline void
+arch___change_bit(unsigned long nr, volatile unsigned long *addr)
{
- return (vaddr[nr >> 5] & (1UL << (nr & 31))) != 0;
+ change_bit(nr, addr);
}

+static __always_inline bool
+arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
+{
+ return (addr[nr >> 5] & (1UL << (nr & 31))) != 0;
+}

static inline int bset_reg_test_and_set_bit(int nr,
volatile unsigned long *vaddr)
@@ -201,8 +210,11 @@ static inline int bfset_mem_test_and_set_bit(int nr,
bfset_mem_test_and_set_bit(nr, vaddr))
#endif

-#define __test_and_set_bit(nr, vaddr) test_and_set_bit(nr, vaddr)
-
+static __always_inline bool
+arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
+{
+ return test_and_set_bit(nr, addr);
+}

static inline int bclr_reg_test_and_clear_bit(int nr,
volatile unsigned long *vaddr)
@@ -251,8 +263,11 @@ static inline int bfclr_mem_test_and_clear_bit(int nr,
bfclr_mem_test_and_clear_bit(nr, vaddr))
#endif

-#define __test_and_clear_bit(nr, vaddr) test_and_clear_bit(nr, vaddr)
-
+static __always_inline bool
+arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
+{
+ return test_and_clear_bit(nr, addr);
+}

static inline int bchg_reg_test_and_change_bit(int nr,
volatile unsigned long *vaddr)
@@ -301,8 +316,11 @@ static inline int bfchg_mem_test_and_change_bit(int nr,
bfchg_mem_test_and_change_bit(nr, vaddr))
#endif

-#define __test_and_change_bit(nr, vaddr) test_and_change_bit(nr, vaddr)
-
+static __always_inline bool
+arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
+{
+ return test_and_change_bit(nr, addr);
+}

/*
* The true 68020 and more advanced processors support the "bfffo"
@@ -522,6 +540,7 @@ static inline int __fls(int x)
#define clear_bit_unlock clear_bit
#define __clear_bit_unlock clear_bit_unlock

+#include <asm-generic/bitops/non-instrumented-non-atomic.h>
#include <asm-generic/bitops/ext2-atomic.h>
#include <asm-generic/bitops/fls64.h>
#include <asm-generic/bitops/sched.h>
diff --git a/arch/sh/include/asm/bitops-op32.h b/arch/sh/include/asm/bitops-op32.h
index cfe5465acce7..565a85d8b7fb 100644
--- a/arch/sh/include/asm/bitops-op32.h
+++ b/arch/sh/include/asm/bitops-op32.h
@@ -2,6 +2,8 @@
#ifndef __ASM_SH_BITOPS_OP32_H
#define __ASM_SH_BITOPS_OP32_H

+#include <linux/bits.h>
+
/*
* The bit modifying instructions on SH-2A are only capable of working
* with a 3-bit immediate, which signifies the shift position for the bit
@@ -16,7 +18,8 @@
#define BYTE_OFFSET(nr) ((nr) % BITS_PER_BYTE)
#endif

-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+arch___set_bit(unsigned long nr, volatile unsigned long *addr)
{
if (__builtin_constant_p(nr)) {
__asm__ __volatile__ (
@@ -33,7 +36,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
}
}

-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
{
if (__builtin_constant_p(nr)) {
__asm__ __volatile__ (
@@ -52,7 +56,7 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
}

/**
- * __change_bit - Toggle a bit in memory
+ * arch___change_bit - Toggle a bit in memory
* @nr: the bit to change
* @addr: the address to start counting from
*
@@ -60,7 +64,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
* If it's called on the same region of memory simultaneously, the effect
* may be that only one operation succeeds.
*/
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+arch___change_bit(unsigned long nr, volatile unsigned long *addr)
{
if (__builtin_constant_p(nr)) {
__asm__ __volatile__ (
@@ -79,7 +84,7 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
}

/**
- * __test_and_set_bit - Set a bit and return its old value
+ * arch___test_and_set_bit - Set a bit and return its old value
* @nr: Bit to set
* @addr: Address to count from
*
@@ -87,7 +92,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
* If two examples of this operation race, one can appear to succeed
* but actually fail. You must protect multiple accesses with a lock.
*/
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -98,7 +104,7 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
}

/**
- * __test_and_clear_bit - Clear a bit and return its old value
+ * arch___test_and_clear_bit - Clear a bit and return its old value
* @nr: Bit to clear
* @addr: Address to count from
*
@@ -106,7 +112,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
* If two examples of this operation race, one can appear to succeed
* but actually fail. You must protect multiple accesses with a lock.
*/
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -117,8 +124,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
}

/* WARNING: non atomic and it can be reordered! */
-static inline int __test_and_change_bit(int nr,
- volatile unsigned long *addr)
+static __always_inline bool
+arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -129,13 +136,16 @@ static inline int __test_and_change_bit(int nr,
}

/**
- * test_bit - Determine whether a bit is set
+ * arch_test_bit - Determine whether a bit is set
* @nr: bit number to test
* @addr: Address to start counting from
*/
-static inline int test_bit(int nr, const volatile unsigned long *addr)
+static __always_inline bool
+arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
{
return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
}

+#include <asm-generic/bitops/non-instrumented-non-atomic.h>
+
#endif /* __ASM_SH_BITOPS_OP32_H */
diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index a288ecd230ab..973c6bd17f98 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -63,7 +63,7 @@ arch_set_bit(long nr, volatile unsigned long *addr)
}

static __always_inline void
-arch___set_bit(long nr, volatile unsigned long *addr)
+arch___set_bit(unsigned long nr, volatile unsigned long *addr)
{
asm volatile(__ASM_SIZE(bts) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
}
@@ -89,7 +89,7 @@ arch_clear_bit_unlock(long nr, volatile unsigned long *addr)
}

static __always_inline void
-arch___clear_bit(long nr, volatile unsigned long *addr)
+arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
{
asm volatile(__ASM_SIZE(btr) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
}
@@ -114,7 +114,7 @@ arch___clear_bit_unlock(long nr, volatile unsigned long *addr)
}

static __always_inline void
-arch___change_bit(long nr, volatile unsigned long *addr)
+arch___change_bit(unsigned long nr, volatile unsigned long *addr)
{
asm volatile(__ASM_SIZE(btc) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
}
@@ -145,7 +145,7 @@ arch_test_and_set_bit_lock(long nr, volatile unsigned long *addr)
}

static __always_inline bool
-arch___test_and_set_bit(long nr, volatile unsigned long *addr)
+arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
{
bool oldbit;

@@ -171,7 +171,7 @@ arch_test_and_clear_bit(long nr, volatile unsigned long *addr)
* this without also updating arch/x86/kernel/kvm.c
*/
static __always_inline bool
-arch___test_and_clear_bit(long nr, volatile unsigned long *addr)
+arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
{
bool oldbit;

@@ -183,7 +183,7 @@ arch___test_and_clear_bit(long nr, volatile unsigned long *addr)
}

static __always_inline bool
-arch___test_and_change_bit(long nr, volatile unsigned long *addr)
+arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
{
bool oldbit;

@@ -219,10 +219,12 @@ static __always_inline bool variable_test_bit(long nr, volatile const unsigned l
return oldbit;
}

-#define arch_test_bit(nr, addr) \
- (__builtin_constant_p((nr)) \
- ? constant_test_bit((nr), (addr)) \
- : variable_test_bit((nr), (addr)))
+static __always_inline bool
+arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
+{
+ return __builtin_constant_p(nr) ? constant_test_bit(nr, addr) :
+ variable_test_bit(nr, addr);
+}

/**
* __ffs - find first set bit in word
diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
index 7226488810e5..b85b8a2ac239 100644
--- a/include/asm-generic/bitops/generic-non-atomic.h
+++ b/include/asm-generic/bitops/generic-non-atomic.h
@@ -24,7 +24,7 @@
* may be that only one operation succeeds.
*/
static __always_inline void
-generic___set_bit(unsigned int nr, volatile unsigned long *addr)
+generic___set_bit(unsigned long nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -33,7 +33,7 @@ generic___set_bit(unsigned int nr, volatile unsigned long *addr)
}

static __always_inline void
-generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
+generic___clear_bit(unsigned long nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -50,8 +50,8 @@ generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
* If it's called on the same region of memory simultaneously, the effect
* may be that only one operation succeeds.
*/
-static __always_inline
-void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline void
+generic___change_bit(unsigned long nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -68,8 +68,8 @@ void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
* If two examples of this operation race, one can appear to succeed
* but actually fail. You must protect multiple accesses with a lock.
*/
-static __always_inline int
-generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline bool
+generic___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -88,8 +88,8 @@ generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
* If two examples of this operation race, one can appear to succeed
* but actually fail. You must protect multiple accesses with a lock.
*/
-static __always_inline int
-generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline bool
+generic___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -100,8 +100,8 @@ generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
}

/* WARNING: non atomic and it can be reordered! */
-static __always_inline int
-generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
+static __always_inline bool
+generic___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -116,8 +116,8 @@ generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
* @nr: bit number to test
* @addr: Address to start counting from
*/
-static __always_inline int
-generic_test_bit(unsigned int nr, const volatile unsigned long *addr)
+static __always_inline bool
+generic_test_bit(unsigned long nr, const volatile unsigned long *addr)
{
/*
* Unlike the bitops with the '__' prefix above, this one *is* atomic,
diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
index 7ab1ecc37782..b019f77ef21c 100644
--- a/include/asm-generic/bitops/instrumented-non-atomic.h
+++ b/include/asm-generic/bitops/instrumented-non-atomic.h
@@ -22,7 +22,8 @@
* region of memory concurrently, the effect may be that only one operation
* succeeds.
*/
-static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
{
instrument_write(addr + BIT_WORD(nr), sizeof(long));
arch___set_bit(nr, addr);
@@ -37,7 +38,8 @@ static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
* region of memory concurrently, the effect may be that only one operation
* succeeds.
*/
-static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
{
instrument_write(addr + BIT_WORD(nr), sizeof(long));
arch___clear_bit(nr, addr);
@@ -52,7 +54,8 @@ static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
* region of memory concurrently, the effect may be that only one operation
* succeeds.
*/
-static __always_inline void __change_bit(long nr, volatile unsigned long *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
{
instrument_write(addr + BIT_WORD(nr), sizeof(long));
arch___change_bit(nr, addr);
@@ -90,7 +93,8 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
* This operation is non-atomic. If two instances of this operation race, one
* can appear to succeed but actually fail.
*/
-static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
{
__instrument_read_write_bitop(nr, addr);
return arch___test_and_set_bit(nr, addr);
@@ -104,7 +108,8 @@ static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *
* This operation is non-atomic. If two instances of this operation race, one
* can appear to succeed but actually fail.
*/
-static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
{
__instrument_read_write_bitop(nr, addr);
return arch___test_and_clear_bit(nr, addr);
@@ -118,7 +123,8 @@ static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long
* This operation is non-atomic. If two instances of this operation race, one
* can appear to succeed but actually fail.
*/
-static __always_inline bool __test_and_change_bit(long nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
{
__instrument_read_write_bitop(nr, addr);
return arch___test_and_change_bit(nr, addr);
@@ -129,7 +135,8 @@ static __always_inline bool __test_and_change_bit(long nr, volatile unsigned lon
* @nr: bit number to test
* @addr: Address to start counting from
*/
-static __always_inline bool test_bit(long nr, const volatile unsigned long *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
{
instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
return arch_test_bit(nr, addr);
diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
index 23d3abc1e10d..5c37ced343ae 100644
--- a/include/asm-generic/bitops/non-atomic.h
+++ b/include/asm-generic/bitops/non-atomic.h
@@ -5,24 +5,15 @@
#include <asm-generic/bitops/generic-non-atomic.h>

#define arch___set_bit generic___set_bit
-#define __set_bit arch___set_bit
-
#define arch___clear_bit generic___clear_bit
-#define __clear_bit arch___clear_bit
-
#define arch___change_bit generic___change_bit
-#define __change_bit arch___change_bit

#define arch___test_and_set_bit generic___test_and_set_bit
-#define __test_and_set_bit arch___test_and_set_bit
-
#define arch___test_and_clear_bit generic___test_and_clear_bit
-#define __test_and_clear_bit arch___test_and_clear_bit
-
#define arch___test_and_change_bit generic___test_and_change_bit
-#define __test_and_change_bit arch___test_and_change_bit

#define arch_test_bit generic_test_bit
-#define test_bit arch_test_bit
+
+#include <asm-generic/bitops/non-instrumented-non-atomic.h>

#endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
diff --git a/include/asm-generic/bitops/non-instrumented-non-atomic.h b/include/asm-generic/bitops/non-instrumented-non-atomic.h
new file mode 100644
index 000000000000..e0fd7bf72a56
--- /dev/null
+++ b/include/asm-generic/bitops/non-instrumented-non-atomic.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
+#define __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
+
+#define __set_bit arch___set_bit
+#define __clear_bit arch___clear_bit
+#define __change_bit arch___change_bit
+
+#define __test_and_set_bit arch___test_and_set_bit
+#define __test_and_clear_bit arch___test_and_clear_bit
+#define __test_and_change_bit arch___test_and_change_bit
+
+#define test_bit arch_test_bit
+
+#endif /* __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H */
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 7aaed501f768..87087454a288 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -26,12 +26,29 @@ extern unsigned int __sw_hweight16(unsigned int w);
extern unsigned int __sw_hweight32(unsigned int w);
extern unsigned long __sw_hweight64(__u64 w);

+#include <asm-generic/bitops/generic-non-atomic.h>
+
/*
* Include this here because some architectures need generic_ffs/fls in
* scope
*/
#include <asm/bitops.h>

+/* Check that the bitops prototypes are sane */
+#define __check_bitop_pr(name) \
+ static_assert(__same_type(arch_##name, generic_##name) && \
+ __same_type(name, generic_##name))
+
+__check_bitop_pr(__set_bit);
+__check_bitop_pr(__clear_bit);
+__check_bitop_pr(__change_bit);
+__check_bitop_pr(__test_and_set_bit);
+__check_bitop_pr(__test_and_clear_bit);
+__check_bitop_pr(__test_and_change_bit);
+__check_bitop_pr(test_bit);
+
+#undef __check_bitop_pr
+
static inline int get_bitmask_order(unsigned int count)
{
int order;
diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
index 7e10c4b50c5d..e5e78e42e57b 100644
--- a/tools/include/asm-generic/bitops/non-atomic.h
+++ b/tools/include/asm-generic/bitops/non-atomic.h
@@ -2,7 +2,7 @@
#ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
#define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_

-#include <asm/types.h>
+#include <linux/bits.h>

/**
* __set_bit - Set a bit in memory
@@ -13,7 +13,8 @@
* If it's called on the same region of memory simultaneously, the effect
* may be that only one operation succeeds.
*/
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__set_bit(unsigned long nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -21,7 +22,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
*p |= mask;
}

-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__clear_bit(unsigned long nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -38,7 +40,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
* If it's called on the same region of memory simultaneously, the effect
* may be that only one operation succeeds.
*/
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static __always_inline void
+__change_bit(unsigned long nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -55,7 +58,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
* If two examples of this operation race, one can appear to succeed
* but actually fail. You must protect multiple accesses with a lock.
*/
-static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -74,7 +78,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
* If two examples of this operation race, one can appear to succeed
* but actually fail. You must protect multiple accesses with a lock.
*/
-static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
+static __always_inline bool
+__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -85,8 +90,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
}

/* WARNING: non atomic and it can be reordered! */
-static inline int __test_and_change_bit(int nr,
- volatile unsigned long *addr)
+static __always_inline bool
+__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
@@ -101,7 +106,8 @@ static inline int __test_and_change_bit(int nr,
* @nr: bit number to test
* @addr: Address to start counting from
*/
-static inline int test_bit(int nr, const volatile unsigned long *addr)
+static __always_inline bool
+test_bit(unsigned long nr, const volatile unsigned long *addr)
{
return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
}
--
2.36.1

2022-06-17 17:13:07

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH v3 4/7] bitops: define const_*() versions of the non-atomics

Define const_*() variants of the non-atomic bitops to be used when
the input arguments are compile-time constants, so that the compiler
will be always able to resolve those to compile-time constants as
well. Those are mostly direct aliases for generic_*() with one
exception for const_test_bit(): the original one is declared
atomic-safe and thus doesn't discard the `volatile` qualifier, so
in order to let optimize code, define it separately disregarding
the qualifier.
Add them to the compile-time type checks as well just in case.

Suggested-by: Marco Elver <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
---
.../asm-generic/bitops/generic-non-atomic.h | 31 +++++++++++++++++++
include/linux/bitops.h | 1 +
2 files changed, 32 insertions(+)

diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
index b85b8a2ac239..3d5ebd24652b 100644
--- a/include/asm-generic/bitops/generic-non-atomic.h
+++ b/include/asm-generic/bitops/generic-non-atomic.h
@@ -127,4 +127,35 @@ generic_test_bit(unsigned long nr, const volatile unsigned long *addr)
return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
}

+/*
+ * const_*() definitions provide good compile-time optimizations when
+ * the passed arguments can be resolved at compile time.
+ */
+#define const___set_bit generic___set_bit
+#define const___clear_bit generic___clear_bit
+#define const___change_bit generic___change_bit
+#define const___test_and_set_bit generic___test_and_set_bit
+#define const___test_and_clear_bit generic___test_and_clear_bit
+#define const___test_and_change_bit generic___test_and_change_bit
+
+/**
+ * const_test_bit - Determine whether a bit is set
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ *
+ * A version of generic_test_bit() which discards the `volatile` qualifier to
+ * allow a compiler to optimize code harder. Non-atomic and to be called only
+ * for testing compile-time constants, e.g. by the corresponding macros, not
+ * directly from "regular" code.
+ */
+static __always_inline bool
+const_test_bit(unsigned long nr, const volatile unsigned long *addr)
+{
+ const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long val = *p;
+
+ return !!(val & mask);
+}
+
#endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 87087454a288..d393297287d5 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -37,6 +37,7 @@ extern unsigned long __sw_hweight64(__u64 w);
/* Check that the bitops prototypes are sane */
#define __check_bitop_pr(name) \
static_assert(__same_type(arch_##name, generic_##name) && \
+ __same_type(const_##name, generic_##name) && \
__same_type(name, generic_##name))

__check_bitop_pr(__set_bit);
--
2.36.1

2022-06-17 17:23:42

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH v3 1/7] ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()

test_bit(), as any other bitmap op, takes `unsigned long *` as a
second argument (pointer to the actual bitmap), as any bitmap
itself is an array of unsigned longs. However, the ia64_get_irr()
code passes a ref to `u64` as a second argument.
This works with the ia64 bitops implementation due to that they
have `void *` as the second argument and then cast it later on.
This works with the bitmap API itself due to that `unsigned long`
has the same size on ia64 as `u64` (`unsigned long long`), but
from the compiler PoV those two are different.
Define @irr as `unsigned long` to fix that. That implies no
functional changes. Has been hidden for 16 years!

Fixes: a58786917ce2 ("[IA64] avoid broken SAL_CACHE_FLUSH implementations")
Cc: [email protected] # 2.6.16+
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Reviewed-by: Yury Norov <[email protected]>
---
arch/ia64/include/asm/processor.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index 7cbce290f4e5..757c2f6d8d4b 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -538,7 +538,7 @@ ia64_get_irr(unsigned int vector)
{
unsigned int reg = vector / 64;
unsigned int bit = vector % 64;
- u64 irr;
+ unsigned long irr;

switch (reg) {
case 0: irr = ia64_getreg(_IA64_REG_CR_IRR0); break;
--
2.36.1

2022-06-17 17:26:32

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH v3 2/7] bitops: always define asm-generic non-atomic bitops

Move generic non-atomic bitops from the asm-generic header which
gets included only when there are no architecture-specific
alternatives, to a separate independent file to make them always
available.
Almost no actual code changes, only one comment added to
generic_test_bit() saying that it's an atomic operation itself
and thus `volatile` must always stay there with no cast-aways.

Suggested-by: Andy Shevchenko <[email protected]> # comment
Suggested-by: Marco Elver <[email protected]> # reference to kernel-doc
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
---
.../asm-generic/bitops/generic-non-atomic.h | 130 ++++++++++++++++++
include/asm-generic/bitops/non-atomic.h | 110 ++-------------
2 files changed, 138 insertions(+), 102 deletions(-)
create mode 100644 include/asm-generic/bitops/generic-non-atomic.h

diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
new file mode 100644
index 000000000000..7226488810e5
--- /dev/null
+++ b/include/asm-generic/bitops/generic-non-atomic.h
@@ -0,0 +1,130 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
+
+#include <linux/bits.h>
+
+#ifndef _LINUX_BITOPS_H
+#error only <linux/bitops.h> can be included directly
+#endif
+
+/*
+ * Generic definitions for bit operations, should not be used in regular code
+ * directly.
+ */
+
+/**
+ * generic___set_bit - Set a bit in memory
+ * @nr: the bit to set
+ * @addr: the address to start counting from
+ *
+ * Unlike set_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline void
+generic___set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+ *p |= mask;
+}
+
+static __always_inline void
+generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+ *p &= ~mask;
+}
+
+/**
+ * generic___change_bit - Toggle a bit in memory
+ * @nr: the bit to change
+ * @addr: the address to start counting from
+ *
+ * Unlike change_bit(), this function is non-atomic and may be reordered.
+ * If it's called on the same region of memory simultaneously, the effect
+ * may be that only one operation succeeds.
+ */
+static __always_inline
+void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+
+ *p ^= mask;
+}
+
+/**
+ * generic___test_and_set_bit - Set a bit and return its old value
+ * @nr: Bit to set
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail. You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+ unsigned long old = *p;
+
+ *p = old | mask;
+ return (old & mask) != 0;
+}
+
+/**
+ * generic___test_and_clear_bit - Clear a bit and return its old value
+ * @nr: Bit to clear
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail. You must protect multiple accesses with a lock.
+ */
+static __always_inline int
+generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+ unsigned long old = *p;
+
+ *p = old & ~mask;
+ return (old & mask) != 0;
+}
+
+/* WARNING: non atomic and it can be reordered! */
+static __always_inline int
+generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
+{
+ unsigned long mask = BIT_MASK(nr);
+ unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
+ unsigned long old = *p;
+
+ *p = old ^ mask;
+ return (old & mask) != 0;
+}
+
+/**
+ * generic_test_bit - Determine whether a bit is set
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ */
+static __always_inline int
+generic_test_bit(unsigned int nr, const volatile unsigned long *addr)
+{
+ /*
+ * Unlike the bitops with the '__' prefix above, this one *is* atomic,
+ * so `volatile` must always stay here with no cast-aways. See
+ * `Documentation/atomic_bitops.txt` for the details.
+ */
+ return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
+}
+
+#endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
index 078cc68be2f1..23d3abc1e10d 100644
--- a/include/asm-generic/bitops/non-atomic.h
+++ b/include/asm-generic/bitops/non-atomic.h
@@ -2,121 +2,27 @@
#ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
#define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_

-#include <asm/types.h>
+#include <asm-generic/bitops/generic-non-atomic.h>

-/**
- * arch___set_bit - Set a bit in memory
- * @nr: the bit to set
- * @addr: the address to start counting from
- *
- * Unlike set_bit(), this function is non-atomic and may be reordered.
- * If it's called on the same region of memory simultaneously, the effect
- * may be that only one operation succeeds.
- */
-static __always_inline void
-arch___set_bit(unsigned int nr, volatile unsigned long *addr)
-{
- unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
- *p |= mask;
-}
+#define arch___set_bit generic___set_bit
#define __set_bit arch___set_bit

-static __always_inline void
-arch___clear_bit(unsigned int nr, volatile unsigned long *addr)
-{
- unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
- *p &= ~mask;
-}
+#define arch___clear_bit generic___clear_bit
#define __clear_bit arch___clear_bit

-/**
- * arch___change_bit - Toggle a bit in memory
- * @nr: the bit to change
- * @addr: the address to start counting from
- *
- * Unlike change_bit(), this function is non-atomic and may be reordered.
- * If it's called on the same region of memory simultaneously, the effect
- * may be that only one operation succeeds.
- */
-static __always_inline
-void arch___change_bit(unsigned int nr, volatile unsigned long *addr)
-{
- unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
-
- *p ^= mask;
-}
+#define arch___change_bit generic___change_bit
#define __change_bit arch___change_bit

-/**
- * arch___test_and_set_bit - Set a bit and return its old value
- * @nr: Bit to set
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail. You must protect multiple accesses with a lock.
- */
-static __always_inline int
-arch___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
-{
- unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long old = *p;
-
- *p = old | mask;
- return (old & mask) != 0;
-}
+#define arch___test_and_set_bit generic___test_and_set_bit
#define __test_and_set_bit arch___test_and_set_bit

-/**
- * arch___test_and_clear_bit - Clear a bit and return its old value
- * @nr: Bit to clear
- * @addr: Address to count from
- *
- * This operation is non-atomic and can be reordered.
- * If two examples of this operation race, one can appear to succeed
- * but actually fail. You must protect multiple accesses with a lock.
- */
-static __always_inline int
-arch___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
-{
- unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long old = *p;
-
- *p = old & ~mask;
- return (old & mask) != 0;
-}
+#define arch___test_and_clear_bit generic___test_and_clear_bit
#define __test_and_clear_bit arch___test_and_clear_bit

-/* WARNING: non atomic and it can be reordered! */
-static __always_inline int
-arch___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
-{
- unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long old = *p;
-
- *p = old ^ mask;
- return (old & mask) != 0;
-}
+#define arch___test_and_change_bit generic___test_and_change_bit
#define __test_and_change_bit arch___test_and_change_bit

-/**
- * arch_test_bit - Determine whether a bit is set
- * @nr: bit number to test
- * @addr: Address to start counting from
- */
-static __always_inline int
-arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
-{
- return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
-}
+#define arch_test_bit generic_test_bit
#define test_bit arch_test_bit

#endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
--
2.36.1

2022-06-20 09:54:17

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH v3 2/7] bitops: always define asm-generic non-atomic bitops

On Fri, 17 Jun 2022 at 19:19, Alexander Lobakin
<[email protected]> wrote:
>
> Move generic non-atomic bitops from the asm-generic header which
> gets included only when there are no architecture-specific
> alternatives, to a separate independent file to make them always
> available.
> Almost no actual code changes, only one comment added to
> generic_test_bit() saying that it's an atomic operation itself
> and thus `volatile` must always stay there with no cast-aways.
>
> Suggested-by: Andy Shevchenko <[email protected]> # comment
> Suggested-by: Marco Elver <[email protected]> # reference to kernel-doc
> Signed-off-by: Alexander Lobakin <[email protected]>
> Reviewed-by: Andy Shevchenko <[email protected]>

Reviewed-by: Marco Elver <[email protected]>

> ---
> .../asm-generic/bitops/generic-non-atomic.h | 130 ++++++++++++++++++
> include/asm-generic/bitops/non-atomic.h | 110 ++-------------
> 2 files changed, 138 insertions(+), 102 deletions(-)
> create mode 100644 include/asm-generic/bitops/generic-non-atomic.h
>
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> new file mode 100644
> index 000000000000..7226488810e5
> --- /dev/null
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -0,0 +1,130 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#ifndef __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
> +#define __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H
> +
> +#include <linux/bits.h>
> +
> +#ifndef _LINUX_BITOPS_H
> +#error only <linux/bitops.h> can be included directly
> +#endif
> +
> +/*
> + * Generic definitions for bit operations, should not be used in regular code
> + * directly.
> + */
> +
> +/**
> + * generic___set_bit - Set a bit in memory
> + * @nr: the bit to set
> + * @addr: the address to start counting from
> + *
> + * Unlike set_bit(), this function is non-atomic and may be reordered.
> + * If it's called on the same region of memory simultaneously, the effect
> + * may be that only one operation succeeds.
> + */
> +static __always_inline void
> +generic___set_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> + unsigned long mask = BIT_MASK(nr);
> + unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +
> + *p |= mask;
> +}
> +
> +static __always_inline void
> +generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> + unsigned long mask = BIT_MASK(nr);
> + unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +
> + *p &= ~mask;
> +}
> +
> +/**
> + * generic___change_bit - Toggle a bit in memory
> + * @nr: the bit to change
> + * @addr: the address to start counting from
> + *
> + * Unlike change_bit(), this function is non-atomic and may be reordered.
> + * If it's called on the same region of memory simultaneously, the effect
> + * may be that only one operation succeeds.
> + */
> +static __always_inline
> +void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> + unsigned long mask = BIT_MASK(nr);
> + unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> +
> + *p ^= mask;
> +}
> +
> +/**
> + * generic___test_and_set_bit - Set a bit and return its old value
> + * @nr: Bit to set
> + * @addr: Address to count from
> + *
> + * This operation is non-atomic and can be reordered.
> + * If two examples of this operation race, one can appear to succeed
> + * but actually fail. You must protect multiple accesses with a lock.
> + */
> +static __always_inline int
> +generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> + unsigned long mask = BIT_MASK(nr);
> + unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> + unsigned long old = *p;
> +
> + *p = old | mask;
> + return (old & mask) != 0;
> +}
> +
> +/**
> + * generic___test_and_clear_bit - Clear a bit and return its old value
> + * @nr: Bit to clear
> + * @addr: Address to count from
> + *
> + * This operation is non-atomic and can be reordered.
> + * If two examples of this operation race, one can appear to succeed
> + * but actually fail. You must protect multiple accesses with a lock.
> + */
> +static __always_inline int
> +generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> + unsigned long mask = BIT_MASK(nr);
> + unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> + unsigned long old = *p;
> +
> + *p = old & ~mask;
> + return (old & mask) != 0;
> +}
> +
> +/* WARNING: non atomic and it can be reordered! */
> +static __always_inline int
> +generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> +{
> + unsigned long mask = BIT_MASK(nr);
> + unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> + unsigned long old = *p;
> +
> + *p = old ^ mask;
> + return (old & mask) != 0;
> +}
> +
> +/**
> + * generic_test_bit - Determine whether a bit is set
> + * @nr: bit number to test
> + * @addr: Address to start counting from
> + */
> +static __always_inline int
> +generic_test_bit(unsigned int nr, const volatile unsigned long *addr)
> +{
> + /*
> + * Unlike the bitops with the '__' prefix above, this one *is* atomic,
> + * so `volatile` must always stay here with no cast-aways. See
> + * `Documentation/atomic_bitops.txt` for the details.
> + */
> + return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> +}
> +
> +#endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
> diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
> index 078cc68be2f1..23d3abc1e10d 100644
> --- a/include/asm-generic/bitops/non-atomic.h
> +++ b/include/asm-generic/bitops/non-atomic.h
> @@ -2,121 +2,27 @@
> #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
> #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>
> -#include <asm/types.h>
> +#include <asm-generic/bitops/generic-non-atomic.h>
>
> -/**
> - * arch___set_bit - Set a bit in memory
> - * @nr: the bit to set
> - * @addr: the address to start counting from
> - *
> - * Unlike set_bit(), this function is non-atomic and may be reordered.
> - * If it's called on the same region of memory simultaneously, the effect
> - * may be that only one operation succeeds.
> - */
> -static __always_inline void
> -arch___set_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> - unsigned long mask = BIT_MASK(nr);
> - unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -
> - *p |= mask;
> -}
> +#define arch___set_bit generic___set_bit
> #define __set_bit arch___set_bit
>
> -static __always_inline void
> -arch___clear_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> - unsigned long mask = BIT_MASK(nr);
> - unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -
> - *p &= ~mask;
> -}
> +#define arch___clear_bit generic___clear_bit
> #define __clear_bit arch___clear_bit
>
> -/**
> - * arch___change_bit - Toggle a bit in memory
> - * @nr: the bit to change
> - * @addr: the address to start counting from
> - *
> - * Unlike change_bit(), this function is non-atomic and may be reordered.
> - * If it's called on the same region of memory simultaneously, the effect
> - * may be that only one operation succeeds.
> - */
> -static __always_inline
> -void arch___change_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> - unsigned long mask = BIT_MASK(nr);
> - unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> -
> - *p ^= mask;
> -}
> +#define arch___change_bit generic___change_bit
> #define __change_bit arch___change_bit
>
> -/**
> - * arch___test_and_set_bit - Set a bit and return its old value
> - * @nr: Bit to set
> - * @addr: Address to count from
> - *
> - * This operation is non-atomic and can be reordered.
> - * If two examples of this operation race, one can appear to succeed
> - * but actually fail. You must protect multiple accesses with a lock.
> - */
> -static __always_inline int
> -arch___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> - unsigned long mask = BIT_MASK(nr);
> - unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> - unsigned long old = *p;
> -
> - *p = old | mask;
> - return (old & mask) != 0;
> -}
> +#define arch___test_and_set_bit generic___test_and_set_bit
> #define __test_and_set_bit arch___test_and_set_bit
>
> -/**
> - * arch___test_and_clear_bit - Clear a bit and return its old value
> - * @nr: Bit to clear
> - * @addr: Address to count from
> - *
> - * This operation is non-atomic and can be reordered.
> - * If two examples of this operation race, one can appear to succeed
> - * but actually fail. You must protect multiple accesses with a lock.
> - */
> -static __always_inline int
> -arch___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> - unsigned long mask = BIT_MASK(nr);
> - unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> - unsigned long old = *p;
> -
> - *p = old & ~mask;
> - return (old & mask) != 0;
> -}
> +#define arch___test_and_clear_bit generic___test_and_clear_bit
> #define __test_and_clear_bit arch___test_and_clear_bit
>
> -/* WARNING: non atomic and it can be reordered! */
> -static __always_inline int
> -arch___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> -{
> - unsigned long mask = BIT_MASK(nr);
> - unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> - unsigned long old = *p;
> -
> - *p = old ^ mask;
> - return (old & mask) != 0;
> -}
> +#define arch___test_and_change_bit generic___test_and_change_bit
> #define __test_and_change_bit arch___test_and_change_bit
>
> -/**
> - * arch_test_bit - Determine whether a bit is set
> - * @nr: bit number to test
> - * @addr: Address to start counting from
> - */
> -static __always_inline int
> -arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
> -{
> - return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> -}
> +#define arch_test_bit generic_test_bit
> #define test_bit arch_test_bit
>
> #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
> --
> 2.36.1
>

2022-06-20 10:08:25

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v3 3/7] bitops: unify non-atomic bitops prototypes across architectures

On Fri, Jun 17, 2022 at 04:40:27PM +0200, Alexander Lobakin wrote:
> Currently, there is a mess with the prototypes of the non-atomic
> bitops across the different architectures:
>
> ret bool, int, unsigned long
> nr int, long, unsigned int, unsigned long
> addr volatile unsigned long *, volatile void *
>
> Thankfully, it doesn't provoke any bugs, but can sometimes make
> the compiler angry when it's not handy at all.
> Adjust all the prototypes to the following standard:
>
> ret bool retval can be only 0 or 1
> nr unsigned long native; signed makes no sense
> addr volatile unsigned long * bitmaps are arrays of ulongs
>
> Next, some architectures don't define 'arch_' versions as they don't
> support instrumentation, others do. To make sure there is always the
> same set of callables present and to ease any potential future
> changes, make them all follow the rule:
> * architecture-specific files define only 'arch_' versions;
> * non-prefixed versions can be defined only in asm-generic files;
> and place the non-prefixed definitions into a new file in
> asm-generic to be included by non-instrumented architectures.
>
> Finally, add some static assertions in order to prevent people from
> making a mess in this room again.
> I also used the %__always_inline attribute consistently, so that
> they always get resolved to the actual operations.
>
> Suggested-by: Andy Shevchenko <[email protected]>
> Signed-off-by: Alexander Lobakin <[email protected]>
> Acked-by: Mark Rutland <[email protected]>
> Reviewed-by: Yury Norov <[email protected]>

Reviewed-by: Andy Shevchenko <[email protected]>
Thanks for doing this!

> ---
> arch/alpha/include/asm/bitops.h | 32 ++++++------
> arch/hexagon/include/asm/bitops.h | 24 +++++----
> arch/ia64/include/asm/bitops.h | 42 ++++++++--------
> arch/m68k/include/asm/bitops.h | 49 +++++++++++++------
> arch/sh/include/asm/bitops-op32.h | 34 ++++++++-----
> arch/x86/include/asm/bitops.h | 22 +++++----
> .../asm-generic/bitops/generic-non-atomic.h | 24 ++++-----
> .../bitops/instrumented-non-atomic.h | 21 +++++---
> include/asm-generic/bitops/non-atomic.h | 13 +----
> .../bitops/non-instrumented-non-atomic.h | 16 ++++++
> include/linux/bitops.h | 17 +++++++
> tools/include/asm-generic/bitops/non-atomic.h | 24 +++++----
> 12 files changed, 198 insertions(+), 120 deletions(-)
> create mode 100644 include/asm-generic/bitops/non-instrumented-non-atomic.h
>
> diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h
> index e1d8483a45f2..492c7713ddae 100644
> --- a/arch/alpha/include/asm/bitops.h
> +++ b/arch/alpha/include/asm/bitops.h
> @@ -46,8 +46,8 @@ set_bit(unsigned long nr, volatile void * addr)
> /*
> * WARNING: non atomic version.
> */
> -static inline void
> -__set_bit(unsigned long nr, volatile void * addr)
> +static __always_inline void
> +arch___set_bit(unsigned long nr, volatile unsigned long *addr)
> {
> int *m = ((int *) addr) + (nr >> 5);
>
> @@ -82,8 +82,8 @@ clear_bit_unlock(unsigned long nr, volatile void * addr)
> /*
> * WARNING: non atomic version.
> */
> -static __inline__ void
> -__clear_bit(unsigned long nr, volatile void * addr)
> +static __always_inline void
> +arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
> {
> int *m = ((int *) addr) + (nr >> 5);
>
> @@ -94,7 +94,7 @@ static inline void
> __clear_bit_unlock(unsigned long nr, volatile void * addr)
> {
> smp_mb();
> - __clear_bit(nr, addr);
> + arch___clear_bit(nr, addr);
> }
>
> static inline void
> @@ -118,8 +118,8 @@ change_bit(unsigned long nr, volatile void * addr)
> /*
> * WARNING: non atomic version.
> */
> -static __inline__ void
> -__change_bit(unsigned long nr, volatile void * addr)
> +static __always_inline void
> +arch___change_bit(unsigned long nr, volatile unsigned long *addr)
> {
> int *m = ((int *) addr) + (nr >> 5);
>
> @@ -186,8 +186,8 @@ test_and_set_bit_lock(unsigned long nr, volatile void *addr)
> /*
> * WARNING: non atomic version.
> */
> -static inline int
> -__test_and_set_bit(unsigned long nr, volatile void * addr)
> +static __always_inline bool
> +arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> {
> unsigned long mask = 1 << (nr & 0x1f);
> int *m = ((int *) addr) + (nr >> 5);
> @@ -230,8 +230,8 @@ test_and_clear_bit(unsigned long nr, volatile void * addr)
> /*
> * WARNING: non atomic version.
> */
> -static inline int
> -__test_and_clear_bit(unsigned long nr, volatile void * addr)
> +static __always_inline bool
> +arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> {
> unsigned long mask = 1 << (nr & 0x1f);
> int *m = ((int *) addr) + (nr >> 5);
> @@ -272,8 +272,8 @@ test_and_change_bit(unsigned long nr, volatile void * addr)
> /*
> * WARNING: non atomic version.
> */
> -static __inline__ int
> -__test_and_change_bit(unsigned long nr, volatile void * addr)
> +static __always_inline bool
> +arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> {
> unsigned long mask = 1 << (nr & 0x1f);
> int *m = ((int *) addr) + (nr >> 5);
> @@ -283,8 +283,8 @@ __test_and_change_bit(unsigned long nr, volatile void * addr)
> return (old & mask) != 0;
> }
>
> -static inline int
> -test_bit(int nr, const volatile void * addr)
> +static __always_inline bool
> +arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
> {
> return (1UL & (((const int *) addr)[nr >> 5] >> (nr & 31))) != 0UL;
> }
> @@ -450,6 +450,8 @@ sched_find_first_bit(const unsigned long b[2])
> return __ffs(tmp) + ofs;
> }
>
> +#include <asm-generic/bitops/non-instrumented-non-atomic.h>
> +
> #include <asm-generic/bitops/le.h>
>
> #include <asm-generic/bitops/ext2-atomic-setbit.h>
> diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h
> index 75d6ba3643b8..da500471ac73 100644
> --- a/arch/hexagon/include/asm/bitops.h
> +++ b/arch/hexagon/include/asm/bitops.h
> @@ -127,38 +127,45 @@ static inline void change_bit(int nr, volatile void *addr)
> * be atomic, particularly for things like slab_lock and slab_unlock.
> *
> */
> -static inline void __clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
> {
> test_and_clear_bit(nr, addr);
> }
>
> -static inline void __set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +arch___set_bit(unsigned long nr, volatile unsigned long *addr)
> {
> test_and_set_bit(nr, addr);
> }
>
> -static inline void __change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +arch___change_bit(unsigned long nr, volatile unsigned long *addr)
> {
> test_and_change_bit(nr, addr);
> }
>
> /* Apparently, at least some of these are allowed to be non-atomic */
> -static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> {
> return test_and_clear_bit(nr, addr);
> }
>
> -static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> {
> return test_and_set_bit(nr, addr);
> }
>
> -static inline int __test_and_change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> {
> return test_and_change_bit(nr, addr);
> }
>
> -static inline int __test_bit(int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
> {
> int retval;
>
> @@ -172,8 +179,6 @@ static inline int __test_bit(int nr, const volatile unsigned long *addr)
> return retval;
> }
>
> -#define test_bit(nr, addr) __test_bit(nr, addr)
> -
> /*
> * ffz - find first zero in word.
> * @word: The word to search
> @@ -271,6 +276,7 @@ static inline unsigned long __fls(unsigned long word)
> }
>
> #include <asm-generic/bitops/lock.h>
> +#include <asm-generic/bitops/non-instrumented-non-atomic.h>
>
> #include <asm-generic/bitops/fls64.h>
> #include <asm-generic/bitops/sched.h>
> diff --git a/arch/ia64/include/asm/bitops.h b/arch/ia64/include/asm/bitops.h
> index 577be93c0818..9f62af7fd7c4 100644
> --- a/arch/ia64/include/asm/bitops.h
> +++ b/arch/ia64/include/asm/bitops.h
> @@ -53,7 +53,7 @@ set_bit (int nr, volatile void *addr)
> }
>
> /**
> - * __set_bit - Set a bit in memory
> + * arch___set_bit - Set a bit in memory
> * @nr: the bit to set
> * @addr: the address to start counting from
> *
> @@ -61,8 +61,8 @@ set_bit (int nr, volatile void *addr)
> * If it's called on the same region of memory simultaneously, the effect
> * may be that only one operation succeeds.
> */
> -static __inline__ void
> -__set_bit (int nr, volatile void *addr)
> +static __always_inline void
> +arch___set_bit(unsigned long nr, volatile unsigned long *addr)
> {
> *((__u32 *) addr + (nr >> 5)) |= (1 << (nr & 31));
> }
> @@ -135,7 +135,7 @@ __clear_bit_unlock(int nr, void *addr)
> }
>
> /**
> - * __clear_bit - Clears a bit in memory (non-atomic version)
> + * arch___clear_bit - Clears a bit in memory (non-atomic version)
> * @nr: the bit to clear
> * @addr: the address to start counting from
> *
> @@ -143,8 +143,8 @@ __clear_bit_unlock(int nr, void *addr)
> * If it's called on the same region of memory simultaneously, the effect
> * may be that only one operation succeeds.
> */
> -static __inline__ void
> -__clear_bit (int nr, volatile void *addr)
> +static __always_inline void
> +arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
> {
> *((__u32 *) addr + (nr >> 5)) &= ~(1 << (nr & 31));
> }
> @@ -175,7 +175,7 @@ change_bit (int nr, volatile void *addr)
> }
>
> /**
> - * __change_bit - Toggle a bit in memory
> + * arch___change_bit - Toggle a bit in memory
> * @nr: the bit to toggle
> * @addr: the address to start counting from
> *
> @@ -183,8 +183,8 @@ change_bit (int nr, volatile void *addr)
> * If it's called on the same region of memory simultaneously, the effect
> * may be that only one operation succeeds.
> */
> -static __inline__ void
> -__change_bit (int nr, volatile void *addr)
> +static __always_inline void
> +arch___change_bit(unsigned long nr, volatile unsigned long *addr)
> {
> *((__u32 *) addr + (nr >> 5)) ^= (1 << (nr & 31));
> }
> @@ -224,7 +224,7 @@ test_and_set_bit (int nr, volatile void *addr)
> #define test_and_set_bit_lock test_and_set_bit
>
> /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * arch___test_and_set_bit - Set a bit and return its old value
> * @nr: Bit to set
> * @addr: Address to count from
> *
> @@ -232,8 +232,8 @@ test_and_set_bit (int nr, volatile void *addr)
> * If two examples of this operation race, one can appear to succeed
> * but actually fail. You must protect multiple accesses with a lock.
> */
> -static __inline__ int
> -__test_and_set_bit (int nr, volatile void *addr)
> +static __always_inline bool
> +arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> {
> __u32 *p = (__u32 *) addr + (nr >> 5);
> __u32 m = 1 << (nr & 31);
> @@ -269,7 +269,7 @@ test_and_clear_bit (int nr, volatile void *addr)
> }
>
> /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * arch___test_and_clear_bit - Clear a bit and return its old value
> * @nr: Bit to clear
> * @addr: Address to count from
> *
> @@ -277,8 +277,8 @@ test_and_clear_bit (int nr, volatile void *addr)
> * If two examples of this operation race, one can appear to succeed
> * but actually fail. You must protect multiple accesses with a lock.
> */
> -static __inline__ int
> -__test_and_clear_bit(int nr, volatile void * addr)
> +static __always_inline bool
> +arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> {
> __u32 *p = (__u32 *) addr + (nr >> 5);
> __u32 m = 1 << (nr & 31);
> @@ -314,14 +314,14 @@ test_and_change_bit (int nr, volatile void *addr)
> }
>
> /**
> - * __test_and_change_bit - Change a bit and return its old value
> + * arch___test_and_change_bit - Change a bit and return its old value
> * @nr: Bit to change
> * @addr: Address to count from
> *
> * This operation is non-atomic and can be reordered.
> */
> -static __inline__ int
> -__test_and_change_bit (int nr, void *addr)
> +static __always_inline bool
> +arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> {
> __u32 old, bit = (1 << (nr & 31));
> __u32 *m = (__u32 *) addr + (nr >> 5);
> @@ -331,8 +331,8 @@ __test_and_change_bit (int nr, void *addr)
> return (old & bit) != 0;
> }
>
> -static __inline__ int
> -test_bit (int nr, const volatile void *addr)
> +static __always_inline bool
> +arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
> {
> return 1 & (((const volatile __u32 *) addr)[nr >> 5] >> (nr & 31));
> }
> @@ -443,6 +443,8 @@ static __inline__ unsigned long __arch_hweight64(unsigned long x)
>
> #ifdef __KERNEL__
>
> +#include <asm-generic/bitops/non-instrumented-non-atomic.h>
> +
> #include <asm-generic/bitops/le.h>
>
> #include <asm-generic/bitops/ext2-atomic-setbit.h>
> diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
> index 51283db53667..71495faf2a90 100644
> --- a/arch/m68k/include/asm/bitops.h
> +++ b/arch/m68k/include/asm/bitops.h
> @@ -65,8 +65,11 @@ static inline void bfset_mem_set_bit(int nr, volatile unsigned long *vaddr)
> bfset_mem_set_bit(nr, vaddr))
> #endif
>
> -#define __set_bit(nr, vaddr) set_bit(nr, vaddr)
> -
> +static __always_inline void
> +arch___set_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> + set_bit(nr, addr);
> +}
>
> static inline void bclr_reg_clear_bit(int nr, volatile unsigned long *vaddr)
> {
> @@ -105,8 +108,11 @@ static inline void bfclr_mem_clear_bit(int nr, volatile unsigned long *vaddr)
> bfclr_mem_clear_bit(nr, vaddr))
> #endif
>
> -#define __clear_bit(nr, vaddr) clear_bit(nr, vaddr)
> -
> +static __always_inline void
> +arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> + clear_bit(nr, addr);
> +}
>
> static inline void bchg_reg_change_bit(int nr, volatile unsigned long *vaddr)
> {
> @@ -145,14 +151,17 @@ static inline void bfchg_mem_change_bit(int nr, volatile unsigned long *vaddr)
> bfchg_mem_change_bit(nr, vaddr))
> #endif
>
> -#define __change_bit(nr, vaddr) change_bit(nr, vaddr)
> -
> -
> -static inline int test_bit(int nr, const volatile unsigned long *vaddr)
> +static __always_inline void
> +arch___change_bit(unsigned long nr, volatile unsigned long *addr)
> {
> - return (vaddr[nr >> 5] & (1UL << (nr & 31))) != 0;
> + change_bit(nr, addr);
> }
>
> +static __always_inline bool
> +arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
> +{
> + return (addr[nr >> 5] & (1UL << (nr & 31))) != 0;
> +}
>
> static inline int bset_reg_test_and_set_bit(int nr,
> volatile unsigned long *vaddr)
> @@ -201,8 +210,11 @@ static inline int bfset_mem_test_and_set_bit(int nr,
> bfset_mem_test_and_set_bit(nr, vaddr))
> #endif
>
> -#define __test_and_set_bit(nr, vaddr) test_and_set_bit(nr, vaddr)
> -
> +static __always_inline bool
> +arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> + return test_and_set_bit(nr, addr);
> +}
>
> static inline int bclr_reg_test_and_clear_bit(int nr,
> volatile unsigned long *vaddr)
> @@ -251,8 +263,11 @@ static inline int bfclr_mem_test_and_clear_bit(int nr,
> bfclr_mem_test_and_clear_bit(nr, vaddr))
> #endif
>
> -#define __test_and_clear_bit(nr, vaddr) test_and_clear_bit(nr, vaddr)
> -
> +static __always_inline bool
> +arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> + return test_and_clear_bit(nr, addr);
> +}
>
> static inline int bchg_reg_test_and_change_bit(int nr,
> volatile unsigned long *vaddr)
> @@ -301,8 +316,11 @@ static inline int bfchg_mem_test_and_change_bit(int nr,
> bfchg_mem_test_and_change_bit(nr, vaddr))
> #endif
>
> -#define __test_and_change_bit(nr, vaddr) test_and_change_bit(nr, vaddr)
> -
> +static __always_inline bool
> +arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> +{
> + return test_and_change_bit(nr, addr);
> +}
>
> /*
> * The true 68020 and more advanced processors support the "bfffo"
> @@ -522,6 +540,7 @@ static inline int __fls(int x)
> #define clear_bit_unlock clear_bit
> #define __clear_bit_unlock clear_bit_unlock
>
> +#include <asm-generic/bitops/non-instrumented-non-atomic.h>
> #include <asm-generic/bitops/ext2-atomic.h>
> #include <asm-generic/bitops/fls64.h>
> #include <asm-generic/bitops/sched.h>
> diff --git a/arch/sh/include/asm/bitops-op32.h b/arch/sh/include/asm/bitops-op32.h
> index cfe5465acce7..565a85d8b7fb 100644
> --- a/arch/sh/include/asm/bitops-op32.h
> +++ b/arch/sh/include/asm/bitops-op32.h
> @@ -2,6 +2,8 @@
> #ifndef __ASM_SH_BITOPS_OP32_H
> #define __ASM_SH_BITOPS_OP32_H
>
> +#include <linux/bits.h>
> +
> /*
> * The bit modifying instructions on SH-2A are only capable of working
> * with a 3-bit immediate, which signifies the shift position for the bit
> @@ -16,7 +18,8 @@
> #define BYTE_OFFSET(nr) ((nr) % BITS_PER_BYTE)
> #endif
>
> -static inline void __set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +arch___set_bit(unsigned long nr, volatile unsigned long *addr)
> {
> if (__builtin_constant_p(nr)) {
> __asm__ __volatile__ (
> @@ -33,7 +36,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
> }
> }
>
> -static inline void __clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
> {
> if (__builtin_constant_p(nr)) {
> __asm__ __volatile__ (
> @@ -52,7 +56,7 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
> }
>
> /**
> - * __change_bit - Toggle a bit in memory
> + * arch___change_bit - Toggle a bit in memory
> * @nr: the bit to change
> * @addr: the address to start counting from
> *
> @@ -60,7 +64,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
> * If it's called on the same region of memory simultaneously, the effect
> * may be that only one operation succeeds.
> */
> -static inline void __change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +arch___change_bit(unsigned long nr, volatile unsigned long *addr)
> {
> if (__builtin_constant_p(nr)) {
> __asm__ __volatile__ (
> @@ -79,7 +84,7 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
> }
>
> /**
> - * __test_and_set_bit - Set a bit and return its old value
> + * arch___test_and_set_bit - Set a bit and return its old value
> * @nr: Bit to set
> * @addr: Address to count from
> *
> @@ -87,7 +92,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
> * If two examples of this operation race, one can appear to succeed
> * but actually fail. You must protect multiple accesses with a lock.
> */
> -static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> {
> unsigned long mask = BIT_MASK(nr);
> unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -98,7 +104,7 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> }
>
> /**
> - * __test_and_clear_bit - Clear a bit and return its old value
> + * arch___test_and_clear_bit - Clear a bit and return its old value
> * @nr: Bit to clear
> * @addr: Address to count from
> *
> @@ -106,7 +112,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> * If two examples of this operation race, one can appear to succeed
> * but actually fail. You must protect multiple accesses with a lock.
> */
> -static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> {
> unsigned long mask = BIT_MASK(nr);
> unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -117,8 +124,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> }
>
> /* WARNING: non atomic and it can be reordered! */
> -static inline int __test_and_change_bit(int nr,
> - volatile unsigned long *addr)
> +static __always_inline bool
> +arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> {
> unsigned long mask = BIT_MASK(nr);
> unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -129,13 +136,16 @@ static inline int __test_and_change_bit(int nr,
> }
>
> /**
> - * test_bit - Determine whether a bit is set
> + * arch_test_bit - Determine whether a bit is set
> * @nr: bit number to test
> * @addr: Address to start counting from
> */
> -static inline int test_bit(int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
> {
> return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> }
>
> +#include <asm-generic/bitops/non-instrumented-non-atomic.h>
> +
> #endif /* __ASM_SH_BITOPS_OP32_H */
> diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
> index a288ecd230ab..973c6bd17f98 100644
> --- a/arch/x86/include/asm/bitops.h
> +++ b/arch/x86/include/asm/bitops.h
> @@ -63,7 +63,7 @@ arch_set_bit(long nr, volatile unsigned long *addr)
> }
>
> static __always_inline void
> -arch___set_bit(long nr, volatile unsigned long *addr)
> +arch___set_bit(unsigned long nr, volatile unsigned long *addr)
> {
> asm volatile(__ASM_SIZE(bts) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
> }
> @@ -89,7 +89,7 @@ arch_clear_bit_unlock(long nr, volatile unsigned long *addr)
> }
>
> static __always_inline void
> -arch___clear_bit(long nr, volatile unsigned long *addr)
> +arch___clear_bit(unsigned long nr, volatile unsigned long *addr)
> {
> asm volatile(__ASM_SIZE(btr) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
> }
> @@ -114,7 +114,7 @@ arch___clear_bit_unlock(long nr, volatile unsigned long *addr)
> }
>
> static __always_inline void
> -arch___change_bit(long nr, volatile unsigned long *addr)
> +arch___change_bit(unsigned long nr, volatile unsigned long *addr)
> {
> asm volatile(__ASM_SIZE(btc) " %1,%0" : : ADDR, "Ir" (nr) : "memory");
> }
> @@ -145,7 +145,7 @@ arch_test_and_set_bit_lock(long nr, volatile unsigned long *addr)
> }
>
> static __always_inline bool
> -arch___test_and_set_bit(long nr, volatile unsigned long *addr)
> +arch___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> {
> bool oldbit;
>
> @@ -171,7 +171,7 @@ arch_test_and_clear_bit(long nr, volatile unsigned long *addr)
> * this without also updating arch/x86/kernel/kvm.c
> */
> static __always_inline bool
> -arch___test_and_clear_bit(long nr, volatile unsigned long *addr)
> +arch___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> {
> bool oldbit;
>
> @@ -183,7 +183,7 @@ arch___test_and_clear_bit(long nr, volatile unsigned long *addr)
> }
>
> static __always_inline bool
> -arch___test_and_change_bit(long nr, volatile unsigned long *addr)
> +arch___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> {
> bool oldbit;
>
> @@ -219,10 +219,12 @@ static __always_inline bool variable_test_bit(long nr, volatile const unsigned l
> return oldbit;
> }
>
> -#define arch_test_bit(nr, addr) \
> - (__builtin_constant_p((nr)) \
> - ? constant_test_bit((nr), (addr)) \
> - : variable_test_bit((nr), (addr)))
> +static __always_inline bool
> +arch_test_bit(unsigned long nr, const volatile unsigned long *addr)
> +{
> + return __builtin_constant_p(nr) ? constant_test_bit(nr, addr) :
> + variable_test_bit(nr, addr);
> +}
>
> /**
> * __ffs - find first set bit in word
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> index 7226488810e5..b85b8a2ac239 100644
> --- a/include/asm-generic/bitops/generic-non-atomic.h
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -24,7 +24,7 @@
> * may be that only one operation succeeds.
> */
> static __always_inline void
> -generic___set_bit(unsigned int nr, volatile unsigned long *addr)
> +generic___set_bit(unsigned long nr, volatile unsigned long *addr)
> {
> unsigned long mask = BIT_MASK(nr);
> unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -33,7 +33,7 @@ generic___set_bit(unsigned int nr, volatile unsigned long *addr)
> }
>
> static __always_inline void
> -generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
> +generic___clear_bit(unsigned long nr, volatile unsigned long *addr)
> {
> unsigned long mask = BIT_MASK(nr);
> unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -50,8 +50,8 @@ generic___clear_bit(unsigned int nr, volatile unsigned long *addr)
> * If it's called on the same region of memory simultaneously, the effect
> * may be that only one operation succeeds.
> */
> -static __always_inline
> -void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline void
> +generic___change_bit(unsigned long nr, volatile unsigned long *addr)
> {
> unsigned long mask = BIT_MASK(nr);
> unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -68,8 +68,8 @@ void generic___change_bit(unsigned int nr, volatile unsigned long *addr)
> * If two examples of this operation race, one can appear to succeed
> * but actually fail. You must protect multiple accesses with a lock.
> */
> -static __always_inline int
> -generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +generic___test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> {
> unsigned long mask = BIT_MASK(nr);
> unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -88,8 +88,8 @@ generic___test_and_set_bit(unsigned int nr, volatile unsigned long *addr)
> * If two examples of this operation race, one can appear to succeed
> * but actually fail. You must protect multiple accesses with a lock.
> */
> -static __always_inline int
> -generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +generic___test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> {
> unsigned long mask = BIT_MASK(nr);
> unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -100,8 +100,8 @@ generic___test_and_clear_bit(unsigned int nr, volatile unsigned long *addr)
> }
>
> /* WARNING: non atomic and it can be reordered! */
> -static __always_inline int
> -generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +generic___test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> {
> unsigned long mask = BIT_MASK(nr);
> unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -116,8 +116,8 @@ generic___test_and_change_bit(unsigned int nr, volatile unsigned long *addr)
> * @nr: bit number to test
> * @addr: Address to start counting from
> */
> -static __always_inline int
> -generic_test_bit(unsigned int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +generic_test_bit(unsigned long nr, const volatile unsigned long *addr)
> {
> /*
> * Unlike the bitops with the '__' prefix above, this one *is* atomic,
> diff --git a/include/asm-generic/bitops/instrumented-non-atomic.h b/include/asm-generic/bitops/instrumented-non-atomic.h
> index 7ab1ecc37782..b019f77ef21c 100644
> --- a/include/asm-generic/bitops/instrumented-non-atomic.h
> +++ b/include/asm-generic/bitops/instrumented-non-atomic.h
> @@ -22,7 +22,8 @@
> * region of memory concurrently, the effect may be that only one operation
> * succeeds.
> */
> -static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
> {
> instrument_write(addr + BIT_WORD(nr), sizeof(long));
> arch___set_bit(nr, addr);
> @@ -37,7 +38,8 @@ static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
> * region of memory concurrently, the effect may be that only one operation
> * succeeds.
> */
> -static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
> {
> instrument_write(addr + BIT_WORD(nr), sizeof(long));
> arch___clear_bit(nr, addr);
> @@ -52,7 +54,8 @@ static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
> * region of memory concurrently, the effect may be that only one operation
> * succeeds.
> */
> -static __always_inline void __change_bit(long nr, volatile unsigned long *addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
> {
> instrument_write(addr + BIT_WORD(nr), sizeof(long));
> arch___change_bit(nr, addr);
> @@ -90,7 +93,8 @@ static __always_inline void __instrument_read_write_bitop(long nr, volatile unsi
> * This operation is non-atomic. If two instances of this operation race, one
> * can appear to succeed but actually fail.
> */
> -static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> {
> __instrument_read_write_bitop(nr, addr);
> return arch___test_and_set_bit(nr, addr);
> @@ -104,7 +108,8 @@ static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *
> * This operation is non-atomic. If two instances of this operation race, one
> * can appear to succeed but actually fail.
> */
> -static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> {
> __instrument_read_write_bitop(nr, addr);
> return arch___test_and_clear_bit(nr, addr);
> @@ -118,7 +123,8 @@ static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long
> * This operation is non-atomic. If two instances of this operation race, one
> * can appear to succeed but actually fail.
> */
> -static __always_inline bool __test_and_change_bit(long nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> {
> __instrument_read_write_bitop(nr, addr);
> return arch___test_and_change_bit(nr, addr);
> @@ -129,7 +135,8 @@ static __always_inline bool __test_and_change_bit(long nr, volatile unsigned lon
> * @nr: bit number to test
> * @addr: Address to start counting from
> */
> -static __always_inline bool test_bit(long nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
> {
> instrument_atomic_read(addr + BIT_WORD(nr), sizeof(long));
> return arch_test_bit(nr, addr);
> diff --git a/include/asm-generic/bitops/non-atomic.h b/include/asm-generic/bitops/non-atomic.h
> index 23d3abc1e10d..5c37ced343ae 100644
> --- a/include/asm-generic/bitops/non-atomic.h
> +++ b/include/asm-generic/bitops/non-atomic.h
> @@ -5,24 +5,15 @@
> #include <asm-generic/bitops/generic-non-atomic.h>
>
> #define arch___set_bit generic___set_bit
> -#define __set_bit arch___set_bit
> -
> #define arch___clear_bit generic___clear_bit
> -#define __clear_bit arch___clear_bit
> -
> #define arch___change_bit generic___change_bit
> -#define __change_bit arch___change_bit
>
> #define arch___test_and_set_bit generic___test_and_set_bit
> -#define __test_and_set_bit arch___test_and_set_bit
> -
> #define arch___test_and_clear_bit generic___test_and_clear_bit
> -#define __test_and_clear_bit arch___test_and_clear_bit
> -
> #define arch___test_and_change_bit generic___test_and_change_bit
> -#define __test_and_change_bit arch___test_and_change_bit
>
> #define arch_test_bit generic_test_bit
> -#define test_bit arch_test_bit
> +
> +#include <asm-generic/bitops/non-instrumented-non-atomic.h>
>
> #endif /* _ASM_GENERIC_BITOPS_NON_ATOMIC_H_ */
> diff --git a/include/asm-generic/bitops/non-instrumented-non-atomic.h b/include/asm-generic/bitops/non-instrumented-non-atomic.h
> new file mode 100644
> index 000000000000..e0fd7bf72a56
> --- /dev/null
> +++ b/include/asm-generic/bitops/non-instrumented-non-atomic.h
> @@ -0,0 +1,16 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
> +#define __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H
> +
> +#define __set_bit arch___set_bit
> +#define __clear_bit arch___clear_bit
> +#define __change_bit arch___change_bit
> +
> +#define __test_and_set_bit arch___test_and_set_bit
> +#define __test_and_clear_bit arch___test_and_clear_bit
> +#define __test_and_change_bit arch___test_and_change_bit
> +
> +#define test_bit arch_test_bit
> +
> +#endif /* __ASM_GENERIC_BITOPS_NON_INSTRUMENTED_NON_ATOMIC_H */
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 7aaed501f768..87087454a288 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -26,12 +26,29 @@ extern unsigned int __sw_hweight16(unsigned int w);
> extern unsigned int __sw_hweight32(unsigned int w);
> extern unsigned long __sw_hweight64(__u64 w);
>
> +#include <asm-generic/bitops/generic-non-atomic.h>
> +
> /*
> * Include this here because some architectures need generic_ffs/fls in
> * scope
> */
> #include <asm/bitops.h>
>
> +/* Check that the bitops prototypes are sane */
> +#define __check_bitop_pr(name) \
> + static_assert(__same_type(arch_##name, generic_##name) && \
> + __same_type(name, generic_##name))
> +
> +__check_bitop_pr(__set_bit);
> +__check_bitop_pr(__clear_bit);
> +__check_bitop_pr(__change_bit);
> +__check_bitop_pr(__test_and_set_bit);
> +__check_bitop_pr(__test_and_clear_bit);
> +__check_bitop_pr(__test_and_change_bit);
> +__check_bitop_pr(test_bit);
> +
> +#undef __check_bitop_pr
> +
> static inline int get_bitmask_order(unsigned int count)
> {
> int order;
> diff --git a/tools/include/asm-generic/bitops/non-atomic.h b/tools/include/asm-generic/bitops/non-atomic.h
> index 7e10c4b50c5d..e5e78e42e57b 100644
> --- a/tools/include/asm-generic/bitops/non-atomic.h
> +++ b/tools/include/asm-generic/bitops/non-atomic.h
> @@ -2,7 +2,7 @@
> #ifndef _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
> #define _ASM_GENERIC_BITOPS_NON_ATOMIC_H_
>
> -#include <asm/types.h>
> +#include <linux/bits.h>
>
> /**
> * __set_bit - Set a bit in memory
> @@ -13,7 +13,8 @@
> * If it's called on the same region of memory simultaneously, the effect
> * may be that only one operation succeeds.
> */
> -static inline void __set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__set_bit(unsigned long nr, volatile unsigned long *addr)
> {
> unsigned long mask = BIT_MASK(nr);
> unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -21,7 +22,8 @@ static inline void __set_bit(int nr, volatile unsigned long *addr)
> *p |= mask;
> }
>
> -static inline void __clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__clear_bit(unsigned long nr, volatile unsigned long *addr)
> {
> unsigned long mask = BIT_MASK(nr);
> unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -38,7 +40,8 @@ static inline void __clear_bit(int nr, volatile unsigned long *addr)
> * If it's called on the same region of memory simultaneously, the effect
> * may be that only one operation succeeds.
> */
> -static inline void __change_bit(int nr, volatile unsigned long *addr)
> +static __always_inline void
> +__change_bit(unsigned long nr, volatile unsigned long *addr)
> {
> unsigned long mask = BIT_MASK(nr);
> unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -55,7 +58,8 @@ static inline void __change_bit(int nr, volatile unsigned long *addr)
> * If two examples of this operation race, one can appear to succeed
> * but actually fail. You must protect multiple accesses with a lock.
> */
> -static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_set_bit(unsigned long nr, volatile unsigned long *addr)
> {
> unsigned long mask = BIT_MASK(nr);
> unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -74,7 +78,8 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> * If two examples of this operation race, one can appear to succeed
> * but actually fail. You must protect multiple accesses with a lock.
> */
> -static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_clear_bit(unsigned long nr, volatile unsigned long *addr)
> {
> unsigned long mask = BIT_MASK(nr);
> unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -85,8 +90,8 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> }
>
> /* WARNING: non atomic and it can be reordered! */
> -static inline int __test_and_change_bit(int nr,
> - volatile unsigned long *addr)
> +static __always_inline bool
> +__test_and_change_bit(unsigned long nr, volatile unsigned long *addr)
> {
> unsigned long mask = BIT_MASK(nr);
> unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
> @@ -101,7 +106,8 @@ static inline int __test_and_change_bit(int nr,
> * @nr: bit number to test
> * @addr: Address to start counting from
> */
> -static inline int test_bit(int nr, const volatile unsigned long *addr)
> +static __always_inline bool
> +test_bit(unsigned long nr, const volatile unsigned long *addr)
> {
> return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> }
> --
> 2.36.1
>

--
With Best Regards,
Andy Shevchenko


2022-06-20 10:19:30

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH v3 4/7] bitops: define const_*() versions of the non-atomics

On Fri, 17 Jun 2022 at 19:21, Alexander Lobakin
<[email protected]> wrote:
>
> Define const_*() variants of the non-atomic bitops to be used when
> the input arguments are compile-time constants, so that the compiler
> will be always able to resolve those to compile-time constants as
> well. Those are mostly direct aliases for generic_*() with one
> exception for const_test_bit(): the original one is declared
> atomic-safe and thus doesn't discard the `volatile` qualifier, so
> in order to let optimize code, define it separately disregarding
> the qualifier.
> Add them to the compile-time type checks as well just in case.
>
> Suggested-by: Marco Elver <[email protected]>
> Signed-off-by: Alexander Lobakin <[email protected]>

Reviewed-by: Marco Elver <[email protected]>

> ---
> .../asm-generic/bitops/generic-non-atomic.h | 31 +++++++++++++++++++
> include/linux/bitops.h | 1 +
> 2 files changed, 32 insertions(+)
>
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> index b85b8a2ac239..3d5ebd24652b 100644
> --- a/include/asm-generic/bitops/generic-non-atomic.h
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -127,4 +127,35 @@ generic_test_bit(unsigned long nr, const volatile unsigned long *addr)
> return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> }
>
> +/*
> + * const_*() definitions provide good compile-time optimizations when
> + * the passed arguments can be resolved at compile time.
> + */
> +#define const___set_bit generic___set_bit
> +#define const___clear_bit generic___clear_bit
> +#define const___change_bit generic___change_bit
> +#define const___test_and_set_bit generic___test_and_set_bit
> +#define const___test_and_clear_bit generic___test_and_clear_bit
> +#define const___test_and_change_bit generic___test_and_change_bit
> +
> +/**
> + * const_test_bit - Determine whether a bit is set
> + * @nr: bit number to test
> + * @addr: Address to start counting from
> + *
> + * A version of generic_test_bit() which discards the `volatile` qualifier to
> + * allow a compiler to optimize code harder. Non-atomic and to be called only
> + * for testing compile-time constants, e.g. by the corresponding macros, not
> + * directly from "regular" code.
> + */
> +static __always_inline bool
> +const_test_bit(unsigned long nr, const volatile unsigned long *addr)
> +{
> + const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
> + unsigned long mask = BIT_MASK(nr);
> + unsigned long val = *p;
> +
> + return !!(val & mask);
> +}
> +
> #endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 87087454a288..d393297287d5 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -37,6 +37,7 @@ extern unsigned long __sw_hweight64(__u64 w);
> /* Check that the bitops prototypes are sane */
> #define __check_bitop_pr(name) \
> static_assert(__same_type(arch_##name, generic_##name) && \
> + __same_type(const_##name, generic_##name) && \
> __same_type(name, generic_##name))
>
> __check_bitop_pr(__set_bit);
> --
> 2.36.1
>

2022-06-20 10:34:37

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v3 4/7] bitops: define const_*() versions of the non-atomics

On Fri, Jun 17, 2022 at 04:40:28PM +0200, Alexander Lobakin wrote:
> Define const_*() variants of the non-atomic bitops to be used when
> the input arguments are compile-time constants, so that the compiler
> will be always able to resolve those to compile-time constants as
> well. Those are mostly direct aliases for generic_*() with one
> exception for const_test_bit(): the original one is declared
> atomic-safe and thus doesn't discard the `volatile` qualifier, so
> in order to let optimize code, define it separately disregarding
> the qualifier.
> Add them to the compile-time type checks as well just in case.

Reviewed-by: Andy Shevchenko <[email protected]>

> Suggested-by: Marco Elver <[email protected]>
> Signed-off-by: Alexander Lobakin <[email protected]>
> ---
> .../asm-generic/bitops/generic-non-atomic.h | 31 +++++++++++++++++++
> include/linux/bitops.h | 1 +
> 2 files changed, 32 insertions(+)
>
> diff --git a/include/asm-generic/bitops/generic-non-atomic.h b/include/asm-generic/bitops/generic-non-atomic.h
> index b85b8a2ac239..3d5ebd24652b 100644
> --- a/include/asm-generic/bitops/generic-non-atomic.h
> +++ b/include/asm-generic/bitops/generic-non-atomic.h
> @@ -127,4 +127,35 @@ generic_test_bit(unsigned long nr, const volatile unsigned long *addr)
> return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> }
>
> +/*
> + * const_*() definitions provide good compile-time optimizations when
> + * the passed arguments can be resolved at compile time.
> + */
> +#define const___set_bit generic___set_bit
> +#define const___clear_bit generic___clear_bit
> +#define const___change_bit generic___change_bit
> +#define const___test_and_set_bit generic___test_and_set_bit
> +#define const___test_and_clear_bit generic___test_and_clear_bit
> +#define const___test_and_change_bit generic___test_and_change_bit
> +
> +/**
> + * const_test_bit - Determine whether a bit is set
> + * @nr: bit number to test
> + * @addr: Address to start counting from
> + *
> + * A version of generic_test_bit() which discards the `volatile` qualifier to
> + * allow a compiler to optimize code harder. Non-atomic and to be called only
> + * for testing compile-time constants, e.g. by the corresponding macros, not
> + * directly from "regular" code.
> + */
> +static __always_inline bool
> +const_test_bit(unsigned long nr, const volatile unsigned long *addr)
> +{
> + const unsigned long *p = (const unsigned long *)addr + BIT_WORD(nr);
> + unsigned long mask = BIT_MASK(nr);
> + unsigned long val = *p;
> +
> + return !!(val & mask);
> +}
> +
> #endif /* __ASM_GENERIC_BITOPS_GENERIC_NON_ATOMIC_H */
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 87087454a288..d393297287d5 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -37,6 +37,7 @@ extern unsigned long __sw_hweight64(__u64 w);
> /* Check that the bitops prototypes are sane */
> #define __check_bitop_pr(name) \
> static_assert(__same_type(arch_##name, generic_##name) && \
> + __same_type(const_##name, generic_##name) && \
> __same_type(name, generic_##name))
>
> __check_bitop_pr(__set_bit);
> --
> 2.36.1
>

--
With Best Regards,
Andy Shevchenko


2022-06-20 12:26:12

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants

Hi Olek,

On Fri, Jun 17, 2022 at 6:51 PM Alexander Lobakin
<[email protected]> wrote:
> While I was working on converting some structure fields from a fixed
> type to a bitmap, I started observing code size increase not only in
> places where the code works with the converted structure fields, but
> also where the converted vars were on the stack. That said, the
> following code:
>
> DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> unsigned long bar = BIT(BAR_BIT);
> unsigned long baz = 0;
>
> __set_bit(FOO_BIT, foo);
> baz |= BIT(BAZ_BIT);
>
> BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
>
> triggers the first assertion on x86_64, which means that the
> compiler is unable to evaluate it to a compile-time initializer
> when the architecture-specific bitop is used even if it's obvious.
> I found that this is due to that many architecture-specific
> non-atomic bitop implementations use inline asm or other hacks which
> are faster or more robust when working with "real" variables (i.e.
> fields from the structures etc.), but the compilers have no clue how
> to optimize them out when called on compile-time constants.
>
> So, in order to let the compiler optimize out such cases, expand the
> test_bit() and __*_bit() definitions with a compile-time condition
> check, so that they will pick the generic C non-atomic bitop
> implementations when all of the arguments passed are compile-time
> constants, which means that the result will be a compile-time
> constant as well and the compiler will produce more efficient and
> simple code in 100% cases (no changes when there's at least one
> non-compile-time-constant argument).
> The condition itself:
>
> if (
> __builtin_constant_p(nr) && /* <- bit position is constant */
> __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is
> always either NULL or not */
> addr && /* <- bitmap addr is not NULL */
> __builtin_constant_p(*addr) /* <- compiler knows the value of
> the target bitmap */
> )
> /* then pick the generic C variant
> else
> /* old code path, arch-specific
>
> I also tried __is_constexpr() as suggested by Andy, but it was
> always returning 0 ('not a constant') for the 2,3 and 4th
> conditions.
>
> The savings are architecture, compiler and compiler flags dependent,
> for example, on x86_64 -O2:
>
> GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
>
> and ARM64 (courtesy of Mark[0]):
>
> GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
>
> And the following:
>
> DECLARE_BITMAP(flags, __IP_TUNNEL_FLAG_NUM) = { };
> __be16 flags;
>
> __set_bit(IP_TUNNEL_CSUM_BIT, flags);
>
> tun_flags = cpu_to_be16(*flags & U16_MAX);
>
> if (test_bit(IP_TUNNEL_VTI_BIT, flags))
> tun_flags |= VTI_ISVTI;
>
> BUILD_BUG_ON(!__builtin_constant_p(tun_flags));
>
> doesn't blow up anymore (which is being checked now at build time),
> so that we can now e.g. use fixed bitmaps in compile-time assertions
> etc.
>
> The series has been in intel-next for a while with no reported issues.
>
> From v2[1]:
> * collect several Reviewed-bys (Andy, Yury);
> * add a comment to generic_test_bit() that it is atomic-safe and
> must always stay like that (the first version of this series
> errorneously tried to change this) (Andy, Marco);
> * unify the way how architectures define platform-specific bitops,
> both supporting instrumentation and not: now they define only
> 'arch_' versions and asm-generic includes take care of the rest;
> * micro-optimize the diffstat of 0004/0007 (__check_bitop_pr())
> (Andy);
> * add compile-time tests to lib/test_bitmap to make sure everything
> works as expected on any setup (Yury).

Thanks for the update!

Still seeing
add/remove: 49/13 grow/shrink: 280/137 up/down: 6464/-3328 (3136)

on m68k atari_defconfig (i.e.CONFIG_CC_OPTIMIZE_FOR_SIZE=y)
with gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04).

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2022-06-20 14:46:45

by Alexander Lobakin

[permalink] [raw]
Subject: Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants

From: Geert Uytterhoeven <[email protected]>
Date: Mon, 20 Jun 2022 14:07:27 +0200

> Hi Olek,

Hey!

>
> On Fri, Jun 17, 2022 at 6:51 PM Alexander Lobakin
> <[email protected]> wrote:
> > While I was working on converting some structure fields from a fixed
> > type to a bitmap, I started observing code size increase not only in
> > places where the code works with the converted structure fields, but
> > also where the converted vars were on the stack. That said, the
> > following code:
> >
> > DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> > unsigned long bar = BIT(BAR_BIT);
> > unsigned long baz = 0;
> >
> > __set_bit(FOO_BIT, foo);
> > baz |= BIT(BAZ_BIT);
> >
> > BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> > BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> > BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
> >
> > triggers the first assertion on x86_64, which means that the
> > compiler is unable to evaluate it to a compile-time initializer
> > when the architecture-specific bitop is used even if it's obvious.
> > I found that this is due to that many architecture-specific
> > non-atomic bitop implementations use inline asm or other hacks which
> > are faster or more robust when working with "real" variables (i.e.
> > fields from the structures etc.), but the compilers have no clue how
> > to optimize them out when called on compile-time constants.
> >
> > So, in order to let the compiler optimize out such cases, expand the
> > test_bit() and __*_bit() definitions with a compile-time condition
> > check, so that they will pick the generic C non-atomic bitop
> > implementations when all of the arguments passed are compile-time
> > constants, which means that the result will be a compile-time
> > constant as well and the compiler will produce more efficient and
> > simple code in 100% cases (no changes when there's at least one
> > non-compile-time-constant argument).
> > The condition itself:
> >
> > if (
> > __builtin_constant_p(nr) && /* <- bit position is constant */
> > __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is
> > always either NULL or not */
> > addr && /* <- bitmap addr is not NULL */
> > __builtin_constant_p(*addr) /* <- compiler knows the value of
> > the target bitmap */
> > )
> > /* then pick the generic C variant
> > else
> > /* old code path, arch-specific
> >
> > I also tried __is_constexpr() as suggested by Andy, but it was
> > always returning 0 ('not a constant') for the 2,3 and 4th
> > conditions.
> >
> > The savings are architecture, compiler and compiler flags dependent,
> > for example, on x86_64 -O2:
> >
> > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> >
> > and ARM64 (courtesy of Mark[0]):
> >
> > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
> >
> > And the following:
> >
> > DECLARE_BITMAP(flags, __IP_TUNNEL_FLAG_NUM) = { };
> > __be16 flags;
> >
> > __set_bit(IP_TUNNEL_CSUM_BIT, flags);
> >
> > tun_flags = cpu_to_be16(*flags & U16_MAX);
> >
> > if (test_bit(IP_TUNNEL_VTI_BIT, flags))
> > tun_flags |= VTI_ISVTI;
> >
> > BUILD_BUG_ON(!__builtin_constant_p(tun_flags));
> >
> > doesn't blow up anymore (which is being checked now at build time),
> > so that we can now e.g. use fixed bitmaps in compile-time assertions
> > etc.
> >
> > The series has been in intel-next for a while with no reported issues.
> >
> > From v2[1]:
> > * collect several Reviewed-bys (Andy, Yury);
> > * add a comment to generic_test_bit() that it is atomic-safe and
> > must always stay like that (the first version of this series
> > errorneously tried to change this) (Andy, Marco);
> > * unify the way how architectures define platform-specific bitops,
> > both supporting instrumentation and not: now they define only
> > 'arch_' versions and asm-generic includes take care of the rest;
> > * micro-optimize the diffstat of 0004/0007 (__check_bitop_pr())
> > (Andy);
> > * add compile-time tests to lib/test_bitmap to make sure everything
> > works as expected on any setup (Yury).
>
> Thanks for the update!
>
> Still seeing
> add/remove: 49/13 grow/shrink: 280/137 up/down: 6464/-3328 (3136)

Meh. What about -O2 (OPTIMIZE_FOR_PERFORMANCE)? I have a thought to
make it depend on the config option above, but that would make code
behave differently, so it's not safe.
Are those 3 Kb critical for m68k machines? I'm asking because for
some embedded systems they are :)
Another thing, this could happen due to inlining rebalance. E.g.
the compiler could inline or uninline some functions due to
resolving bit{maps,ops} to compile-time constants. I was seeing
such in the past several times. Also, IIRC you already sent some
bloat-o-meter results here, and that was the case.
On the other hand, if lib/test_bitmap.o builds successfully (I
assume it does), the series works as expected.

>
> on m68k atari_defconfig (i.e.CONFIG_CC_OPTIMIZE_FOR_SIZE=y)
> with gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04).
>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
> -- Linus Torvalds

Thanks,
Olek

2022-06-20 14:48:13

by Alexander Lobakin

[permalink] [raw]
Subject: Re: [alobakin:bitops 3/7] block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t

From: kernel test robot <[email protected]>
Date: Sun, 19 Jun 2022 17:20:05 +0800

Also, could someone please help me with this? I don't get what went
wrong with sparse, it's not even some new code, just moving old
stuff.

> tree: https://github.com/alobakin/linux bitops
> head: 9bd39b17ce49d350eed93a031e0da6389067013e
> commit: 521611f961a7dda92eefa26e1afd3914c06af64e [3/7] bitops: unify non-atomic bitops prototypes across architectures
> config: mips-randconfig-s031-20220619 (https://download.01.org/0day-ci/archive/20220619/[email protected]/config)
> compiler: mips64el-linux-gcc (GCC) 11.3.0
> reproduce:
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # apt-get install sparse
> # sparse version: v0.6.4-30-g92122700-dirty
> # https://github.com/alobakin/linux/commit/521611f961a7dda92eefa26e1afd3914c06af64e
> git remote add alobakin https://github.com/alobakin/linux
> git fetch --no-tags alobakin bitops
> git checkout 521611f961a7dda92eefa26e1afd3914c06af64e
> # save the config file
> mkdir build_dir && cp config build_dir/.config
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=mips SHELL=/bin/bash
>
> If you fix the issue, kindly add following tag where applicable
> Reported-by: kernel test robot <[email protected]>
>
>
> sparse warnings: (new ones prefixed by >>)
> command-line: note: in included file:
> builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQUIRE redefined
> builtin:0:0: sparse: this was the original definition
> builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_SEQ_CST redefined
> builtin:0:0: sparse: this was the original definition
> builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQ_REL redefined
> builtin:0:0: sparse: this was the original definition
> builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_RELEASE redefined
> builtin:0:0: sparse: this was the original definition
> block/elevator.c: note: in included file (through include/linux/bitops.h, include/linux/kernel.h):
> include/asm-generic/bitops/generic-non-atomic.h:29:9: sparse: sparse: unreplaced symbol 'mask'
> include/asm-generic/bitops/generic-non-atomic.h:30:9: sparse: sparse: unreplaced symbol 'p'
> include/asm-generic/bitops/generic-non-atomic.h:32:10: sparse: sparse: unreplaced symbol 'p'
> include/asm-generic/bitops/generic-non-atomic.h:32:16: sparse: sparse: unreplaced symbol 'mask'
> include/asm-generic/bitops/generic-non-atomic.h:27:1: sparse: sparse: unreplaced symbol 'return'
> include/asm-generic/bitops/generic-non-atomic.h:38:9: sparse: sparse: unreplaced symbol 'mask'
> include/asm-generic/bitops/generic-non-atomic.h:39:9: sparse: sparse: unreplaced symbol 'p'
> include/asm-generic/bitops/generic-non-atomic.h:41:10: sparse: sparse: unreplaced symbol 'p'
> include/asm-generic/bitops/generic-non-atomic.h:41:16: sparse: sparse: unreplaced symbol 'mask'
> include/asm-generic/bitops/generic-non-atomic.h:36:1: sparse: sparse: unreplaced symbol 'return'
> include/asm-generic/bitops/generic-non-atomic.h:56:9: sparse: sparse: unreplaced symbol 'mask'
> include/asm-generic/bitops/generic-non-atomic.h:57:9: sparse: sparse: unreplaced symbol 'p'
> include/asm-generic/bitops/generic-non-atomic.h:59:10: sparse: sparse: unreplaced symbol 'p'
> include/asm-generic/bitops/generic-non-atomic.h:59:15: sparse: sparse: unreplaced symbol 'mask'
> include/asm-generic/bitops/generic-non-atomic.h:54:1: sparse: sparse: unreplaced symbol 'return'
> include/asm-generic/bitops/generic-non-atomic.h:74:9: sparse: sparse: unreplaced symbol 'mask'
> include/asm-generic/bitops/generic-non-atomic.h:75:9: sparse: sparse: unreplaced symbol 'p'
> include/asm-generic/bitops/generic-non-atomic.h:76:9: sparse: sparse: unreplaced symbol 'old'
> include/asm-generic/bitops/generic-non-atomic.h:78:10: sparse: sparse: unreplaced symbol 'p'
> include/asm-generic/bitops/generic-non-atomic.h:78:14: sparse: sparse: unreplaced symbol 'old'
> include/asm-generic/bitops/generic-non-atomic.h:78:20: sparse: sparse: unreplaced symbol 'mask'
> include/asm-generic/bitops/generic-non-atomic.h:79:17: sparse: sparse: unreplaced symbol 'old'
> include/asm-generic/bitops/generic-non-atomic.h:79:23: sparse: sparse: unreplaced symbol 'mask'
> include/asm-generic/bitops/generic-non-atomic.h:79:9: sparse: sparse: unreplaced symbol 'return'
> include/asm-generic/bitops/generic-non-atomic.h:72:1: sparse: sparse: unreplaced symbol 'return'
> include/asm-generic/bitops/generic-non-atomic.h:94:9: sparse: sparse: unreplaced symbol 'mask'
> include/asm-generic/bitops/generic-non-atomic.h:95:9: sparse: sparse: unreplaced symbol 'p'
> include/asm-generic/bitops/generic-non-atomic.h:96:9: sparse: sparse: unreplaced symbol 'old'
> include/asm-generic/bitops/generic-non-atomic.h:98:10: sparse: sparse: unreplaced symbol 'p'
> include/asm-generic/bitops/generic-non-atomic.h:98:14: sparse: sparse: unreplaced symbol 'old'
> include/asm-generic/bitops/generic-non-atomic.h:98:21: sparse: sparse: unreplaced symbol 'mask'
> include/asm-generic/bitops/generic-non-atomic.h:99:17: sparse: sparse: unreplaced symbol 'old'
> include/asm-generic/bitops/generic-non-atomic.h:99:23: sparse: sparse: unreplaced symbol 'mask'
> include/asm-generic/bitops/generic-non-atomic.h:99:9: sparse: sparse: unreplaced symbol 'return'
> include/asm-generic/bitops/generic-non-atomic.h:92:1: sparse: sparse: unreplaced symbol 'return'
> include/asm-generic/bitops/generic-non-atomic.h:106:9: sparse: sparse: unreplaced symbol 'mask'
> include/asm-generic/bitops/generic-non-atomic.h:107:9: sparse: sparse: unreplaced symbol 'p'
> include/asm-generic/bitops/generic-non-atomic.h:108:9: sparse: sparse: unreplaced symbol 'old'
> include/asm-generic/bitops/generic-non-atomic.h:110:10: sparse: sparse: unreplaced symbol 'p'
> include/asm-generic/bitops/generic-non-atomic.h:110:14: sparse: sparse: unreplaced symbol 'old'
> include/asm-generic/bitops/generic-non-atomic.h:110:20: sparse: sparse: unreplaced symbol 'mask'
> include/asm-generic/bitops/generic-non-atomic.h:111:17: sparse: sparse: unreplaced symbol 'old'
> include/asm-generic/bitops/generic-non-atomic.h:111:23: sparse: sparse: unreplaced symbol 'mask'
> include/asm-generic/bitops/generic-non-atomic.h:111:9: sparse: sparse: unreplaced symbol 'return'
> include/asm-generic/bitops/generic-non-atomic.h:104:1: sparse: sparse: unreplaced symbol 'return'
> include/asm-generic/bitops/generic-non-atomic.h:127:9: sparse: sparse: unreplaced symbol 'return'
> include/asm-generic/bitops/generic-non-atomic.h:120:1: sparse: sparse: unreplaced symbol 'return'
> >> block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
>
> vim +222 block/elevator.c
>
> 9817064b68fef7 Jens Axboe 2006-07-28 217
> 70b3ea056f3074 Jens Axboe 2016-12-07 218 void elv_rqhash_add(struct request_queue *q, struct request *rq)
> 9817064b68fef7 Jens Axboe 2006-07-28 219 {
> b374d18a4bfce7 Jens Axboe 2008-10-31 220 struct elevator_queue *e = q->elevator;
> 9817064b68fef7 Jens Axboe 2006-07-28 221
> 9817064b68fef7 Jens Axboe 2006-07-28 @222 BUG_ON(ELV_ON_HASH(rq));
> 242d98f077ac0a Sasha Levin 2012-12-17 223 hash_add(e->hash, &rq->hash, rq_hash_key(rq));
> e806402130c9c4 Christoph Hellwig 2016-10-20 224 rq->rq_flags |= RQF_HASHED;
> 9817064b68fef7 Jens Axboe 2006-07-28 225 }
> bd166ef183c263 Jens Axboe 2017-01-17 226 EXPORT_SYMBOL_GPL(elv_rqhash_add);
> 9817064b68fef7 Jens Axboe 2006-07-28 227
>
> :::::: The code at line 222 was first introduced by commit
> :::::: 9817064b68fef7e4580c6df1ea597e106b9ff88b [PATCH] elevator: move the backmerging logic into the elevator core
>
> :::::: TO: Jens Axboe <[email protected]>
> :::::: CC: Jens Axboe <[email protected]>
>
> --
> 0-DAY CI Kernel Test Service
> https://01.org/lkp

Thanks,
Olek

2022-06-20 15:34:09

by Alexander Lobakin

[permalink] [raw]
Subject: Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants

From: Mark Rutland <[email protected]>
Date: Mon, 20 Jun 2022 15:19:42 +0100

> On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote:
> > So, in order to let the compiler optimize out such cases, expand the
> > test_bit() and __*_bit() definitions with a compile-time condition
> > check, so that they will pick the generic C non-atomic bitop
> > implementations when all of the arguments passed are compile-time
> > constants, which means that the result will be a compile-time
> > constant as well and the compiler will produce more efficient and
> > simple code in 100% cases (no changes when there's at least one
> > non-compile-time-constant argument).
>
> > The savings are architecture, compiler and compiler flags dependent,
> > for example, on x86_64 -O2:
> >
> > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> >
> > and ARM64 (courtesy of Mark[0]):
> >
> > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
>
> Hmm... with *this version* of the series, I'm not getting results nearly as
> good as that when building defconfig atop v5.19-rc3:
>
> GCC 8.5.0: add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804)
> GCC 9.3.0: add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632)
> GCC 10.3.0: add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548)
> GCC 11.1.0: add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444)
> GCC 11.3.0: add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400)
> GCC 12.1.0: add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608)
>
> LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356)
> LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792)
> LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712)
>
> ... and that now seems to be regressing codegen with recent versions of GCC as
> much as it improves it LLVM.
>
> I'm not sure if we've improved some other code and removed the benefit between
> v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't
> look as compelling as it did.

Mostly likely it's due to that in v1 I mistakingly removed
`volatile` from gen[eric]_test_bit(), so there was an impact for
non-constant cases as well.
+5 Kb sounds bad tho. Do you have CONFIG_TEST_BITMAP enabled, does
it work? Probably the same reason as for m68k, more constant
optimization -> more aggressive inlining or inlining rebalance ->
larger code. OTOH I've no idea why sometimes compiler decides to
uninline really tiny functions where due to this patch series some
bitops have been converted to constants, like it goes on m68k.

>
> Overall that's mostly hidden in the Image size, due to 64K alignment and
> padding requirements:
>
> Toolchain Before After Difference
>
> GCC 8.5.0 36178432 36178432 0
> GCC 9.3.0 36112896 36112896 0
> GCC 10.3.0 36442624 36377088 -65536
> GCC 11.1.0 36311552 36377088 +65536
> GCC 11.3.0 36311552 36311552 0
> GCC 12.1.0 36377088 36377088 0
>
> LLVM 12.0.1 31418880 31418880 0
> LLVM 13.0.1 31418880 31418880 0
> LLVM 14.0.0 31218176 31218176 0
>
> ... so aside from the blip around GCC 10.3.0 and 11.1.0, there's not a massive
> change overall (due to 64KiB alignment restrictions for portions of the kernel
> Image).
>
> Thanks,
> Mark.

Thanks,
Olek

2022-06-20 15:47:27

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants

On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote:
> So, in order to let the compiler optimize out such cases, expand the
> test_bit() and __*_bit() definitions with a compile-time condition
> check, so that they will pick the generic C non-atomic bitop
> implementations when all of the arguments passed are compile-time
> constants, which means that the result will be a compile-time
> constant as well and the compiler will produce more efficient and
> simple code in 100% cases (no changes when there's at least one
> non-compile-time-constant argument).

> The savings are architecture, compiler and compiler flags dependent,
> for example, on x86_64 -O2:
>
> GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
>
> and ARM64 (courtesy of Mark[0]):
>
> GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)

Hmm... with *this version* of the series, I'm not getting results nearly as
good as that when building defconfig atop v5.19-rc3:

GCC 8.5.0: add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804)
GCC 9.3.0: add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632)
GCC 10.3.0: add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548)
GCC 11.1.0: add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444)
GCC 11.3.0: add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400)
GCC 12.1.0: add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608)

LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356)
LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792)
LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712)

... and that now seems to be regressing codegen with recent versions of GCC as
much as it improves it LLVM.

I'm not sure if we've improved some other code and removed the benefit between
v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't
look as compelling as it did.

Overall that's mostly hidden in the Image size, due to 64K alignment and
padding requirements:

Toolchain Before After Difference

GCC 8.5.0 36178432 36178432 0
GCC 9.3.0 36112896 36112896 0
GCC 10.3.0 36442624 36377088 -65536
GCC 11.1.0 36311552 36377088 +65536
GCC 11.3.0 36311552 36311552 0
GCC 12.1.0 36377088 36377088 0

LLVM 12.0.1 31418880 31418880 0
LLVM 13.0.1 31418880 31418880 0
LLVM 14.0.0 31218176 31218176 0

... so aside from the blip around GCC 10.3.0 and 11.1.0, there's not a massive
change overall (due to 64KiB alignment restrictions for portions of the kernel
Image).

Thanks,
Mark.

2022-06-20 15:50:24

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [alobakin:bitops 3/7] block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t

On Mon, Jun 20, 2022 at 4:48 PM Alexander Lobakin
<[email protected]> wrote:
>
> From: kernel test robot <[email protected]>
> Date: Sun, 19 Jun 2022 17:20:05 +0800
>
> Also, could someone please help me with this? I don't get what went
> wrong with sparse, it's not even some new code, just moving old
> stuff.
>
> > tree: https://github.com/alobakin/linux bitops
> > head: 9bd39b17ce49d350eed93a031e0da6389067013e
> > commit: 521611f961a7dda92eefa26e1afd3914c06af64e [3/7] bitops: unify non-atomic bitops prototypes across architectures
> > config: mips-randconfig-s031-20220619 (https://download.01.org/0day-ci/archive/20220619/[email protected]/config)
> > compiler: mips64el-linux-gcc (GCC) 11.3.0
> > reproduce:
> > wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> > chmod +x ~/bin/make.cross
> > # apt-get install sparse
> > # sparse version: v0.6.4-30-g92122700-dirty
> > # https://github.com/alobakin/linux/commit/521611f961a7dda92eefa26e1afd3914c06af64e
> > git remote add alobakin https://github.com/alobakin/linux
> > git fetch --no-tags alobakin bitops
> > git checkout 521611f961a7dda92eefa26e1afd3914c06af64e
> > # save the config file
> > mkdir build_dir && cp config build_dir/.config
> > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=mips SHELL=/bin/bash
> >
> > If you fix the issue, kindly add following tag where applicable
> > Reported-by: kernel test robot <[email protected]>
> >
> >
> > sparse warnings: (new ones prefixed by >>)
> > command-line: note: in included file:
> > builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQUIRE redefined
> > builtin:0:0: sparse: this was the original definition
> > builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_SEQ_CST redefined
> > builtin:0:0: sparse: this was the original definition
> > builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_ACQ_REL redefined
> > builtin:0:0: sparse: this was the original definition
> > builtin:1:9: sparse: sparse: preprocessor token __ATOMIC_RELEASE redefined
> > builtin:0:0: sparse: this was the original definition
> > block/elevator.c: note: in included file (through include/linux/bitops.h, include/linux/kernel.h):
> > include/asm-generic/bitops/generic-non-atomic.h:29:9: sparse: sparse: unreplaced symbol 'mask'
> > include/asm-generic/bitops/generic-non-atomic.h:30:9: sparse: sparse: unreplaced symbol 'p'
> > include/asm-generic/bitops/generic-non-atomic.h:32:10: sparse: sparse: unreplaced symbol 'p'
> > include/asm-generic/bitops/generic-non-atomic.h:32:16: sparse: sparse: unreplaced symbol 'mask'
> > include/asm-generic/bitops/generic-non-atomic.h:27:1: sparse: sparse: unreplaced symbol 'return'
> > include/asm-generic/bitops/generic-non-atomic.h:38:9: sparse: sparse: unreplaced symbol 'mask'
> > include/asm-generic/bitops/generic-non-atomic.h:39:9: sparse: sparse: unreplaced symbol 'p'
> > include/asm-generic/bitops/generic-non-atomic.h:41:10: sparse: sparse: unreplaced symbol 'p'
> > include/asm-generic/bitops/generic-non-atomic.h:41:16: sparse: sparse: unreplaced symbol 'mask'
> > include/asm-generic/bitops/generic-non-atomic.h:36:1: sparse: sparse: unreplaced symbol 'return'
> > include/asm-generic/bitops/generic-non-atomic.h:56:9: sparse: sparse: unreplaced symbol 'mask'
> > include/asm-generic/bitops/generic-non-atomic.h:57:9: sparse: sparse: unreplaced symbol 'p'
> > include/asm-generic/bitops/generic-non-atomic.h:59:10: sparse: sparse: unreplaced symbol 'p'
> > include/asm-generic/bitops/generic-non-atomic.h:59:15: sparse: sparse: unreplaced symbol 'mask'
> > include/asm-generic/bitops/generic-non-atomic.h:54:1: sparse: sparse: unreplaced symbol 'return'
> > include/asm-generic/bitops/generic-non-atomic.h:74:9: sparse: sparse: unreplaced symbol 'mask'
> > include/asm-generic/bitops/generic-non-atomic.h:75:9: sparse: sparse: unreplaced symbol 'p'
> > include/asm-generic/bitops/generic-non-atomic.h:76:9: sparse: sparse: unreplaced symbol 'old'
> > include/asm-generic/bitops/generic-non-atomic.h:78:10: sparse: sparse: unreplaced symbol 'p'
> > include/asm-generic/bitops/generic-non-atomic.h:78:14: sparse: sparse: unreplaced symbol 'old'
> > include/asm-generic/bitops/generic-non-atomic.h:78:20: sparse: sparse: unreplaced symbol 'mask'
> > include/asm-generic/bitops/generic-non-atomic.h:79:17: sparse: sparse: unreplaced symbol 'old'
> > include/asm-generic/bitops/generic-non-atomic.h:79:23: sparse: sparse: unreplaced symbol 'mask'
> > include/asm-generic/bitops/generic-non-atomic.h:79:9: sparse: sparse: unreplaced symbol 'return'
> > include/asm-generic/bitops/generic-non-atomic.h:72:1: sparse: sparse: unreplaced symbol 'return'
> > include/asm-generic/bitops/generic-non-atomic.h:94:9: sparse: sparse: unreplaced symbol 'mask'
> > include/asm-generic/bitops/generic-non-atomic.h:95:9: sparse: sparse: unreplaced symbol 'p'
> > include/asm-generic/bitops/generic-non-atomic.h:96:9: sparse: sparse: unreplaced symbol 'old'
> > include/asm-generic/bitops/generic-non-atomic.h:98:10: sparse: sparse: unreplaced symbol 'p'
> > include/asm-generic/bitops/generic-non-atomic.h:98:14: sparse: sparse: unreplaced symbol 'old'
> > include/asm-generic/bitops/generic-non-atomic.h:98:21: sparse: sparse: unreplaced symbol 'mask'
> > include/asm-generic/bitops/generic-non-atomic.h:99:17: sparse: sparse: unreplaced symbol 'old'
> > include/asm-generic/bitops/generic-non-atomic.h:99:23: sparse: sparse: unreplaced symbol 'mask'
> > include/asm-generic/bitops/generic-non-atomic.h:99:9: sparse: sparse: unreplaced symbol 'return'
> > include/asm-generic/bitops/generic-non-atomic.h:92:1: sparse: sparse: unreplaced symbol 'return'
> > include/asm-generic/bitops/generic-non-atomic.h:106:9: sparse: sparse: unreplaced symbol 'mask'
> > include/asm-generic/bitops/generic-non-atomic.h:107:9: sparse: sparse: unreplaced symbol 'p'
> > include/asm-generic/bitops/generic-non-atomic.h:108:9: sparse: sparse: unreplaced symbol 'old'
> > include/asm-generic/bitops/generic-non-atomic.h:110:10: sparse: sparse: unreplaced symbol 'p'
> > include/asm-generic/bitops/generic-non-atomic.h:110:14: sparse: sparse: unreplaced symbol 'old'
> > include/asm-generic/bitops/generic-non-atomic.h:110:20: sparse: sparse: unreplaced symbol 'mask'
> > include/asm-generic/bitops/generic-non-atomic.h:111:17: sparse: sparse: unreplaced symbol 'old'
> > include/asm-generic/bitops/generic-non-atomic.h:111:23: sparse: sparse: unreplaced symbol 'mask'
> > include/asm-generic/bitops/generic-non-atomic.h:111:9: sparse: sparse: unreplaced symbol 'return'
> > include/asm-generic/bitops/generic-non-atomic.h:104:1: sparse: sparse: unreplaced symbol 'return'
> > include/asm-generic/bitops/generic-non-atomic.h:127:9: sparse: sparse: unreplaced symbol 'return'
> > include/asm-generic/bitops/generic-non-atomic.h:120:1: sparse: sparse: unreplaced symbol 'return'
> > >> block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t
> >
> > vim +222 block/elevator.c
> >
> > 9817064b68fef7 Jens Axboe 2006-07-28 217
> > 70b3ea056f3074 Jens Axboe 2016-12-07 218 void elv_rqhash_add(struct request_queue *q, struct request *rq)
> > 9817064b68fef7 Jens Axboe 2006-07-28 219 {
> > b374d18a4bfce7 Jens Axboe 2008-10-31 220 struct elevator_queue *e = q->elevator;
> > 9817064b68fef7 Jens Axboe 2006-07-28 221
> > 9817064b68fef7 Jens Axboe 2006-07-28 @222 BUG_ON(ELV_ON_HASH(rq));
> > 242d98f077ac0a Sasha Levin 2012-12-17 223 hash_add(e->hash, &rq->hash, rq_hash_key(rq));
> > e806402130c9c4 Christoph Hellwig 2016-10-20 224 rq->rq_flags |= RQF_HASHED;
> > 9817064b68fef7 Jens Axboe 2006-07-28 225 }
> > bd166ef183c263 Jens Axboe 2017-01-17 226 EXPORT_SYMBOL_GPL(elv_rqhash_add);
> > 9817064b68fef7 Jens Axboe 2006-07-28 227

It looks like a false positive for _your_ case, but if you want to fix
here is the background.

The sparse has an ability to control custom types that should never
set bits outside of the limited range. For this the special annotation
is given, i.e. __bitwise. Since the culprit type is defined that way
it means the pure integer (signed or unsigned) that comes with pure
definition can't be used in a safe way. To solve this each of such
definitions should be converted to the very same type (req_flags_t).
See serial core where some UART flags are defined in a similar way and
how code copes with that.

--
With Best Regards,
Andy Shevchenko

2022-06-20 19:27:40

by Luc Van Oostenryck

[permalink] [raw]
Subject: Re: [alobakin:bitops 3/7] block/elevator.c:222:9: sparse: sparse: cast from restricted req_flags_t

On Mon, Jun 20, 2022 at 03:51:46PM +0200, Alexander Lobakin wrote:
> From: kernel test robot <[email protected]>
> Date: Sun, 19 Jun 2022 17:20:05 +0800
>
> Also, could someone please help me with this? I don't get what went
> wrong with sparse, it's not even some new code, just moving old
> stuff.

Hi,

The first sparse's warnings (sparse: preprocessor token __ATOMIC_*) are already
fixed (and most probably, the bots have already taken the fix).
I'm working on the second ones (sparse: unreplaced symbol).

-- Luc (Sparse's maintainer)

2022-06-21 06:06:34

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants

On Mon, Jun 20, 2022 at 05:08:55PM +0200, Alexander Lobakin wrote:
> From: Mark Rutland <[email protected]>
> Date: Mon, 20 Jun 2022 15:19:42 +0100
>
> > On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote:
> >
> > > The savings are architecture, compiler and compiler flags dependent,
> > > for example, on x86_64 -O2:
> > >
> > > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> > > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> > >
> > > and ARM64 (courtesy of Mark[0]):
> > >
> > > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> > > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
> >
> > Hmm... with *this version* of the series, I'm not getting results nearly as
> > good as that when building defconfig atop v5.19-rc3:
> >
> > GCC 8.5.0: add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804)
> > GCC 9.3.0: add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632)
> > GCC 10.3.0: add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548)
> > GCC 11.1.0: add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444)
> > GCC 11.3.0: add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400)
> > GCC 12.1.0: add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608)
> >
> > LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356)
> > LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792)
> > LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712)
> >
> > ... and that now seems to be regressing codegen with recent versions of GCC as
> > much as it improves it LLVM.
> >
> > I'm not sure if we've improved some other code and removed the benefit between
> > v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't
> > look as compelling as it did.
>
> Mostly likely it's due to that in v1 I mistakingly removed
> `volatile` from gen[eric]_test_bit(), so there was an impact for
> non-constant cases as well.
> +5 Kb sounds bad tho. Do you have CONFIG_TEST_BITMAP enabled, does
> it work?

I didn't have it enabled, but I tried that just nw with GCC 12.1.0 and it
builds cleanly, and test_bitmap_const_eval() gets optimized away entirely. If i
remove the `static` from that, GCC generates:

| <test_bitmap_const_eval>:
| paciasp
| autiasp
| ret

... which is a trivial stub.

> Probably the same reason as for m68k, more constant
> optimization -> more aggressive inlining or inlining rebalance ->
> larger code. OTOH I've no idea why sometimes compiler decides to
> uninline really tiny functions where due to this patch series some
> bitops have been converted to constants, like it goes on m68k.

Hmm.... it'd be interesting to take a look at a few architectures and see what
the common case is.

Thanks,
Mark.

2022-06-21 18:04:22

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants

On Mon, Jun 20, 2022 at 05:08:55PM +0200, Alexander Lobakin wrote:
> From: Mark Rutland <[email protected]>
> Date: Mon, 20 Jun 2022 15:19:42 +0100
>
> > On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote:
> > > So, in order to let the compiler optimize out such cases, expand the
> > > test_bit() and __*_bit() definitions with a compile-time condition
> > > check, so that they will pick the generic C non-atomic bitop
> > > implementations when all of the arguments passed are compile-time
> > > constants, which means that the result will be a compile-time
> > > constant as well and the compiler will produce more efficient and
> > > simple code in 100% cases (no changes when there's at least one
> > > non-compile-time-constant argument).
> >
> > > The savings are architecture, compiler and compiler flags dependent,
> > > for example, on x86_64 -O2:
> > >
> > > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> > > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> > >
> > > and ARM64 (courtesy of Mark[0]):
> > >
> > > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> > > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
> >
> > Hmm... with *this version* of the series, I'm not getting results nearly as
> > good as that when building defconfig atop v5.19-rc3:
> >
> > GCC 8.5.0: add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804)
> > GCC 9.3.0: add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632)
> > GCC 10.3.0: add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548)
> > GCC 11.1.0: add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444)
> > GCC 11.3.0: add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400)
> > GCC 12.1.0: add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608)
> >
> > LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356)
> > LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792)
> > LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712)
> >
> > ... and that now seems to be regressing codegen with recent versions of GCC as
> > much as it improves it LLVM.
> >
> > I'm not sure if we've improved some other code and removed the benefit between
> > v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't
> > look as compelling as it did.
>
> Mostly likely it's due to that in v1 I mistakingly removed
> `volatile` from gen[eric]_test_bit(), so there was an impact for
> non-constant cases as well.
> +5 Kb sounds bad tho. Do you have CONFIG_TEST_BITMAP enabled, does
> it work? Probably the same reason as for m68k, more constant
> optimization -> more aggressive inlining or inlining rebalance ->
> larger code. OTOH I've no idea why sometimes compiler decides to
> uninline really tiny functions where due to this patch series some
> bitops have been converted to constants, like it goes on m68k.
>
> >
> > Overall that's mostly hidden in the Image size, due to 64K alignment and
> > padding requirements:
> >
> > Toolchain Before After Difference
> >
> > GCC 8.5.0 36178432 36178432 0
> > GCC 9.3.0 36112896 36112896 0
> > GCC 10.3.0 36442624 36377088 -65536
> > GCC 11.1.0 36311552 36377088 +65536
> > GCC 11.3.0 36311552 36311552 0
> > GCC 12.1.0 36377088 36377088 0
> >
> > LLVM 12.0.1 31418880 31418880 0
> > LLVM 13.0.1 31418880 31418880 0
> > LLVM 14.0.0 31218176 31218176 0
> >
> > ... so aside from the blip around GCC 10.3.0 and 11.1.0, there's not a massive
> > change overall (due to 64KiB alignment restrictions for portions of the kernel
> > Image).

I gave it a try on v5.19-rc3 for arm64 with my default GCC 11.2, and it's:
add/remove: 89/33 grow/shrink: 1629/966 up/down: 51456/-46064 (5392)

Which is not great in terms of layout size. But I don't think we should
focus too much on those numbers. The goal of the series is not to shrink
the image; the true goal is to provide more information to the compiler
in a hope that it will make a better decision regarding optimizations.

Looking at results provided by Mark, both GCC and LLVM have a tendency
to inline and use other techniques that increase the image more
aggressively in newer releases, comparing to old ones. From this
perspective, unless we find some terribly wrong behavior, I'm OK with
+5K for the Image, because I trust my compiler and believe it spent
those 5K wisely.

For the reasons said above, I think we shouldn't disable const
bitops for -Os build.

I think this series has total positive impact because it adds a lot
of information for compiler with just a few lines of code.

If no objections, I think it's good to try it in -next. Alexander,
would you like me to fix gen/generic typo in comment and take it in
bitmap-for-next, or you'd prefer to send v4?

Thanks,
Yury

2022-06-21 18:58:23

by Alexander Lobakin

[permalink] [raw]
Subject: Re: [PATCH v3 0/7] bitops: let optimize out non-atomic bitops on compile-time constants

From: Yury Norov <[email protected]>
Date: Tue, 21 Jun 2022 10:39:44 -0700

> On Mon, Jun 20, 2022 at 05:08:55PM +0200, Alexander Lobakin wrote:
> > From: Mark Rutland <[email protected]>
> > Date: Mon, 20 Jun 2022 15:19:42 +0100
> >
> > > On Fri, Jun 17, 2022 at 04:40:24PM +0200, Alexander Lobakin wrote:
> > > > So, in order to let the compiler optimize out such cases, expand the
> > > > test_bit() and __*_bit() definitions with a compile-time condition
> > > > check, so that they will pick the generic C non-atomic bitop
> > > > implementations when all of the arguments passed are compile-time
> > > > constants, which means that the result will be a compile-time
> > > > constant as well and the compiler will produce more efficient and
> > > > simple code in 100% cases (no changes when there's at least one
> > > > non-compile-time-constant argument).
> > >
> > > > The savings are architecture, compiler and compiler flags dependent,
> > > > for example, on x86_64 -O2:
> > > >
> > > > GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235)
> > > > LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> > > > LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287)
> > > >
> > > > and ARM64 (courtesy of Mark[0]):
> > > >
> > > > GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240)
> > > > LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764)
> > >
> > > Hmm... with *this version* of the series, I'm not getting results nearly as
> > > good as that when building defconfig atop v5.19-rc3:
> > >
> > > GCC 8.5.0: add/remove: 83/49 grow/shrink: 973/1147 up/down: 32020/-47824 (-15804)
> > > GCC 9.3.0: add/remove: 68/51 grow/shrink: 1167/592 up/down: 30720/-31352 (-632)
> > > GCC 10.3.0: add/remove: 84/37 grow/shrink: 1711/1003 up/down: 45392/-41844 (3548)
> > > GCC 11.1.0: add/remove: 88/31 grow/shrink: 1635/963 up/down: 51540/-46096 (5444)
> > > GCC 11.3.0: add/remove: 89/32 grow/shrink: 1629/966 up/down: 51456/-46056 (5400)
> > > GCC 12.1.0: add/remove: 84/31 grow/shrink: 1540/829 up/down: 48772/-43164 (5608)
> > >
> > > LLVM 12.0.1: add/remove: 118/58 grow/shrink: 437/381 up/down: 45312/-65668 (-20356)
> > > LLVM 13.0.1: add/remove: 35/19 grow/shrink: 416/243 up/down: 14408/-22200 (-7792)
> > > LLVM 14.0.0: add/remove: 42/16 grow/shrink: 415/234 up/down: 15296/-21008 (-5712)
> > >
> > > ... and that now seems to be regressing codegen with recent versions of GCC as
> > > much as it improves it LLVM.
> > >
> > > I'm not sure if we've improved some other code and removed the benefit between
> > > v5.19-rc1 and v5.19-rc3, or whether something else it at play, but this doesn't
> > > look as compelling as it did.
> >
> > Mostly likely it's due to that in v1 I mistakingly removed
> > `volatile` from gen[eric]_test_bit(), so there was an impact for
> > non-constant cases as well.
> > +5 Kb sounds bad tho. Do you have CONFIG_TEST_BITMAP enabled, does
> > it work? Probably the same reason as for m68k, more constant
> > optimization -> more aggressive inlining or inlining rebalance ->
> > larger code. OTOH I've no idea why sometimes compiler decides to
> > uninline really tiny functions where due to this patch series some
> > bitops have been converted to constants, like it goes on m68k.
> >
> > >
> > > Overall that's mostly hidden in the Image size, due to 64K alignment and
> > > padding requirements:
> > >
> > > Toolchain Before After Difference
> > >
> > > GCC 8.5.0 36178432 36178432 0
> > > GCC 9.3.0 36112896 36112896 0
> > > GCC 10.3.0 36442624 36377088 -65536
> > > GCC 11.1.0 36311552 36377088 +65536
> > > GCC 11.3.0 36311552 36311552 0
> > > GCC 12.1.0 36377088 36377088 0
> > >
> > > LLVM 12.0.1 31418880 31418880 0
> > > LLVM 13.0.1 31418880 31418880 0
> > > LLVM 14.0.0 31218176 31218176 0
> > >
> > > ... so aside from the blip around GCC 10.3.0 and 11.1.0, there's not a massive
> > > change overall (due to 64KiB alignment restrictions for portions of the kernel
> > > Image).
>
> I gave it a try on v5.19-rc3 for arm64 with my default GCC 11.2, and it's:
> add/remove: 89/33 grow/shrink: 1629/966 up/down: 51456/-46064 (5392)
>
> Which is not great in terms of layout size. But I don't think we should
> focus too much on those numbers. The goal of the series is not to shrink
> the image; the true goal is to provide more information to the compiler
> in a hope that it will make a better decision regarding optimizations.
>
> Looking at results provided by Mark, both GCC and LLVM have a tendency
> to inline and use other techniques that increase the image more
> aggressively in newer releases, comparing to old ones. From this
> perspective, unless we find some terribly wrong behavior, I'm OK with
> +5K for the Image, because I trust my compiler and believe it spent
> those 5K wisely.
>
> For the reasons said above, I think we shouldn't disable const
> bitops for -Os build.
>
> I think this series has total positive impact because it adds a lot
> of information for compiler with just a few lines of code.

Right, that was the primary intention. But then I got some text size
decreases and thought this applies to any setup :)

>
> If no objections, I think it's good to try it in -next. Alexander,
> would you like me to fix gen/generic typo in comment and take it in
> bitmap-for-next, or you'd prefer to send v4?

I'm sending v4 in a couple minutes, lkp reported that on ARC GCC
never expands mem*() builtins to plain assignments, which sucks,
but failed my compile-time tests, so I adjusted code a bit. Hope
that change will be okay for everyone, so that you could pick it.

>
> Thanks,
> Yury

Thanks,
Olek

2022-07-06 10:20:15

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v3 3/7] bitops: unify non-atomic bitops prototypes across architectures

On Fri, Jun 17, 2022 at 7:09 PM Alexander Lobakin
<[email protected]> wrote:
> Currently, there is a mess with the prototypes of the non-atomic
> bitops across the different architectures:
>
> ret bool, int, unsigned long
> nr int, long, unsigned int, unsigned long
> addr volatile unsigned long *, volatile void *
>
> Thankfully, it doesn't provoke any bugs, but can sometimes make
> the compiler angry when it's not handy at all.
> Adjust all the prototypes to the following standard:
>
> ret bool retval can be only 0 or 1
> nr unsigned long native; signed makes no sense
> addr volatile unsigned long * bitmaps are arrays of ulongs
>
> Next, some architectures don't define 'arch_' versions as they don't
> support instrumentation, others do. To make sure there is always the
> same set of callables present and to ease any potential future
> changes, make them all follow the rule:
> * architecture-specific files define only 'arch_' versions;
> * non-prefixed versions can be defined only in asm-generic files;
> and place the non-prefixed definitions into a new file in
> asm-generic to be included by non-instrumented architectures.
>
> Finally, add some static assertions in order to prevent people from
> making a mess in this room again.
> I also used the %__always_inline attribute consistently, so that
> they always get resolved to the actual operations.
>
> Suggested-by: Andy Shevchenko <[email protected]>
> Signed-off-by: Alexander Lobakin <[email protected]>
> Acked-by: Mark Rutland <[email protected]>
> Reviewed-by: Yury Norov <[email protected]>

> arch/m68k/include/asm/bitops.h | 49 +++++++++++++------

Reviewed-by: Geert Uytterhoeven <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]>

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds