2018-05-24 11:00:57

by Will Deacon

[permalink] [raw]
Subject: [PATCH 0/9] Rewrite asm-generic/bitops/{atomic,lock}.h and use on arm64

Hi all,

This patch series has previously been posted in RFC form here:

RFCv1: https://www.spinics.net/lists/arm-kernel/msg634719.html
RFCv2: https://www.spinics.net/lists/arm-kernel/msg636875.html

Changes since RFCv2 include:

* Rebased onto v4.17-rc4, which allowed me to drop some patches from
the series which were merged in 4.16.

* Moved bit.h to be linux/bit.h instead of asm-generic/bit.h

Thanks,

Will

--->8

Will Deacon (9):
h8300: Don't include linux/kernel.h in asm/atomic.h
m68k: Don't use asm-generic/bitops/lock.h
asm-generic: Move some macros from linux/bitops.h to a new bits.h file
openrisc: Don't pull in all of linux/bitops.h in asm/cmpxchg.h
sh: Don't pull in all of linux/bitops.h in asm/cmpxchg-xchg.h
asm-generic/bitops/atomic.h: Rewrite using atomic_fetch_*
asm-generic/bitops/lock.h: Rewrite using atomic_fetch_*
arm64: Replace our atomic/lock bitop implementations with asm-generic
arm64: bitops: Include <asm-generic/bitops/ext2-atomic-setbit.h>

arch/arm64/include/asm/bitops.h | 21 +---
arch/arm64/lib/Makefile | 2 +-
arch/arm64/lib/bitops.S | 76 ---------------
arch/h8300/include/asm/atomic.h | 4 +-
arch/m68k/include/asm/bitops.h | 6 +-
arch/openrisc/include/asm/cmpxchg.h | 3 +-
arch/sh/include/asm/cmpxchg-xchg.h | 3 +-
include/asm-generic/bitops/atomic.h | 188 +++++++-----------------------------
include/asm-generic/bitops/lock.h | 68 ++++++++++---
include/linux/bitops.h | 22 +----
include/linux/bits.h | 26 +++++
11 files changed, 131 insertions(+), 288 deletions(-)
delete mode 100644 arch/arm64/lib/bitops.S
create mode 100644 include/linux/bits.h

--
2.1.4



2018-05-24 11:00:34

by Will Deacon

[permalink] [raw]
Subject: [PATCH 1/9] h8300: Don't include linux/kernel.h in asm/atomic.h

linux/kernel.h isn't needed by asm/atomic.h and will result in circular
dependencies when the asm-generic atomic bitops are built around the
tomic_long_t interface.

Remove the broad include and replace it with linux/compiler.h for
READ_ONCE etc and asm/irqflags.h for arch_local_irq_save etc.

Cc: Yoshinori Sato <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
---
arch/h8300/include/asm/atomic.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/h8300/include/asm/atomic.h b/arch/h8300/include/asm/atomic.h
index 941e7554e886..b174dec099bf 100644
--- a/arch/h8300/include/asm/atomic.h
+++ b/arch/h8300/include/asm/atomic.h
@@ -2,8 +2,10 @@
#ifndef __ARCH_H8300_ATOMIC__
#define __ARCH_H8300_ATOMIC__

+#include <linux/compiler.h>
#include <linux/types.h>
#include <asm/cmpxchg.h>
+#include <asm/irqflags.h>

/*
* Atomic operations that C can't guarantee us. Useful for
@@ -15,8 +17,6 @@
#define atomic_read(v) READ_ONCE((v)->counter)
#define atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))

-#include <linux/kernel.h>
-
#define ATOMIC_OP_RETURN(op, c_op) \
static inline int atomic_##op##_return(int i, atomic_t *v) \
{ \
--
2.1.4


2018-05-24 11:01:22

by Will Deacon

[permalink] [raw]
Subject: [PATCH 3/9] asm-generic: Move some macros from linux/bitops.h to a new bits.h file

In preparation for implementing the asm-generic atomic bitops in terms
of atomic_long_*, we need to prevent asm/atomic.h implementations from
pulling in linux/bitops.h. A common reason for this include is for the
BITS_PER_BYTE definition, so move this and some other BIT and masking
macros into a new header file, linux/bits.h

Signed-off-by: Will Deacon <[email protected]>
---
include/linux/bitops.h | 22 +---------------------
include/linux/bits.h | 26 ++++++++++++++++++++++++++
2 files changed, 27 insertions(+), 21 deletions(-)
create mode 100644 include/linux/bits.h

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 4cac4e1a72ff..af419012d77d 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -2,29 +2,9 @@
#ifndef _LINUX_BITOPS_H
#define _LINUX_BITOPS_H
#include <asm/types.h>
+#include <linux/bits.h>

-#ifdef __KERNEL__
-#define BIT(nr) (1UL << (nr))
-#define BIT_ULL(nr) (1ULL << (nr))
-#define BIT_MASK(nr) (1UL << ((nr) % BITS_PER_LONG))
-#define BIT_WORD(nr) ((nr) / BITS_PER_LONG)
-#define BIT_ULL_MASK(nr) (1ULL << ((nr) % BITS_PER_LONG_LONG))
-#define BIT_ULL_WORD(nr) ((nr) / BITS_PER_LONG_LONG)
-#define BITS_PER_BYTE 8
#define BITS_TO_LONGS(nr) DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long))
-#endif
-
-/*
- * Create a contiguous bitmask starting at bit position @l and ending at
- * position @h. For example
- * GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000.
- */
-#define GENMASK(h, l) \
- (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h))))
-
-#define GENMASK_ULL(h, l) \
- (((~0ULL) - (1ULL << (l)) + 1) & \
- (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h))))

extern unsigned int __sw_hweight8(unsigned int w);
extern unsigned int __sw_hweight16(unsigned int w);
diff --git a/include/linux/bits.h b/include/linux/bits.h
new file mode 100644
index 000000000000..2b7b532c1d51
--- /dev/null
+++ b/include/linux/bits.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_BITS_H
+#define __LINUX_BITS_H
+#include <asm/bitsperlong.h>
+
+#define BIT(nr) (1UL << (nr))
+#define BIT_ULL(nr) (1ULL << (nr))
+#define BIT_MASK(nr) (1UL << ((nr) % BITS_PER_LONG))
+#define BIT_WORD(nr) ((nr) / BITS_PER_LONG)
+#define BIT_ULL_MASK(nr) (1ULL << ((nr) % BITS_PER_LONG_LONG))
+#define BIT_ULL_WORD(nr) ((nr) / BITS_PER_LONG_LONG)
+#define BITS_PER_BYTE 8
+
+/*
+ * Create a contiguous bitmask starting at bit position @l and ending at
+ * position @h. For example
+ * GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000.
+ */
+#define GENMASK(h, l) \
+ (((~0UL) - (1UL << (l)) + 1) & (~0UL >> (BITS_PER_LONG - 1 - (h))))
+
+#define GENMASK_ULL(h, l) \
+ (((~0ULL) - (1ULL << (l)) + 1) & \
+ (~0ULL >> (BITS_PER_LONG_LONG - 1 - (h))))
+
+#endif /* __LINUX_BITS_H */
--
2.1.4


2018-05-24 11:01:27

by Will Deacon

[permalink] [raw]
Subject: [PATCH 4/9] openrisc: Don't pull in all of linux/bitops.h in asm/cmpxchg.h

The openrisc implementation of asm/cmpxchg.h pulls in linux/bitops.h
so that it can refer to BITS_PER_BYTE. It also transitively relies on
this pulling in linux/compiler.h for READ_ONCE.

Replace the #include with linux/bits.h and linux/compiler.h

Signed-off-by: Will Deacon <[email protected]>
---
arch/openrisc/include/asm/cmpxchg.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/openrisc/include/asm/cmpxchg.h b/arch/openrisc/include/asm/cmpxchg.h
index d29f7db53906..f9cd43a39d72 100644
--- a/arch/openrisc/include/asm/cmpxchg.h
+++ b/arch/openrisc/include/asm/cmpxchg.h
@@ -16,8 +16,9 @@
#ifndef __ASM_OPENRISC_CMPXCHG_H
#define __ASM_OPENRISC_CMPXCHG_H

+#include <linux/bits.h>
+#include <linux/compiler.h>
#include <linux/types.h>
-#include <linux/bitops.h>

#define __HAVE_ARCH_CMPXCHG 1

--
2.1.4


2018-05-24 11:02:32

by Will Deacon

[permalink] [raw]
Subject: [PATCH 2/9] m68k: Don't use asm-generic/bitops/lock.h

asm-generic/bitops/lock.h is shortly going to be built on top of the
atomic_long_* API, which introduces a nasty circular dependency for
m68k where linux/atomic.h pulls in linux/bitops.h via:

linux/atomic.h
asm/atomic.h
linux/irqflags.h
asm/irqflags.h
linux/preempt.h
asm/preempt.h
asm-generic/preempt.h
linux/thread_info.h
asm/thread_info.h
asm/page.h
asm-generic/getorder.h
linux/log2.h
linux/bitops.h

Since m68k isn't SMP and doesn't support ACQUIRE/RELEASE barriers, we
can just define the lock bitops in terms of the atomic bitops in the
asm/bitops.h header.

Acked-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
---
arch/m68k/include/asm/bitops.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h
index 93b47b1f6fb4..18193419f97d 100644
--- a/arch/m68k/include/asm/bitops.h
+++ b/arch/m68k/include/asm/bitops.h
@@ -515,12 +515,16 @@ static inline int __fls(int x)

#endif

+/* Simple test-and-set bit locks */
+#define test_and_set_bit_lock test_and_set_bit
+#define clear_bit_unlock clear_bit
+#define __clear_bit_unlock clear_bit_unlock
+
#include <asm-generic/bitops/ext2-atomic.h>
#include <asm-generic/bitops/le.h>
#include <asm-generic/bitops/fls64.h>
#include <asm-generic/bitops/sched.h>
#include <asm-generic/bitops/hweight.h>
-#include <asm-generic/bitops/lock.h>
#endif /* __KERNEL__ */

#endif /* _M68K_BITOPS_H */
--
2.1.4


2018-05-24 11:03:57

by Will Deacon

[permalink] [raw]
Subject: [PATCH 6/9] asm-generic/bitops/atomic.h: Rewrite using atomic_fetch_*

The atomic bitops can actually be implemented pretty efficiently using
the atomic_fetch_* ops, rather than explicit use of spinlocks.

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
---
include/asm-generic/bitops/atomic.h | 188 +++++++-----------------------------
1 file changed, 33 insertions(+), 155 deletions(-)

diff --git a/include/asm-generic/bitops/atomic.h b/include/asm-generic/bitops/atomic.h
index 04deffaf5f7d..bca92586c2f6 100644
--- a/include/asm-generic/bitops/atomic.h
+++ b/include/asm-generic/bitops/atomic.h
@@ -2,189 +2,67 @@
#ifndef _ASM_GENERIC_BITOPS_ATOMIC_H_
#define _ASM_GENERIC_BITOPS_ATOMIC_H_

-#include <asm/types.h>
-#include <linux/irqflags.h>
-
-#ifdef CONFIG_SMP
-#include <asm/spinlock.h>
-#include <asm/cache.h> /* we use L1_CACHE_BYTES */
-
-/* Use an array of spinlocks for our atomic_ts.
- * Hash function to index into a different SPINLOCK.
- * Since "a" is usually an address, use one spinlock per cacheline.
- */
-# define ATOMIC_HASH_SIZE 4
-# define ATOMIC_HASH(a) (&(__atomic_hash[ (((unsigned long) a)/L1_CACHE_BYTES) & (ATOMIC_HASH_SIZE-1) ]))
-
-extern arch_spinlock_t __atomic_hash[ATOMIC_HASH_SIZE] __lock_aligned;
-
-/* Can't use raw_spin_lock_irq because of #include problems, so
- * this is the substitute */
-#define _atomic_spin_lock_irqsave(l,f) do { \
- arch_spinlock_t *s = ATOMIC_HASH(l); \
- local_irq_save(f); \
- arch_spin_lock(s); \
-} while(0)
-
-#define _atomic_spin_unlock_irqrestore(l,f) do { \
- arch_spinlock_t *s = ATOMIC_HASH(l); \
- arch_spin_unlock(s); \
- local_irq_restore(f); \
-} while(0)
-
-
-#else
-# define _atomic_spin_lock_irqsave(l,f) do { local_irq_save(f); } while (0)
-# define _atomic_spin_unlock_irqrestore(l,f) do { local_irq_restore(f); } while (0)
-#endif
+#include <linux/atomic.h>
+#include <linux/compiler.h>
+#include <asm/barrier.h>

/*
- * NMI events can occur at any time, including when interrupts have been
- * disabled by *_irqsave(). So you can get NMI events occurring while a
- * *_bit function is holding a spin lock. If the NMI handler also wants
- * to do bit manipulation (and they do) then you can get a deadlock
- * between the original caller of *_bit() and the NMI handler.
- *
- * by Keith Owens
+ * Implementation of atomic bitops using atomic-fetch ops.
+ * See Documentation/atomic_bitops.txt for details.
*/

-/**
- * set_bit - Atomically set a bit in memory
- * @nr: the bit to set
- * @addr: the address to start counting from
- *
- * This function is atomic and may not be reordered. See __set_bit()
- * if you do not require the atomic guarantees.
- *
- * Note: there are no guarantees that this function will not be reordered
- * on non x86 architectures, so if you are writing portable code,
- * make sure not to rely on its reordering guarantees.
- *
- * Note that @nr may be almost arbitrarily large; this function is not
- * restricted to acting on a single-word quantity.
- */
-static inline void set_bit(int nr, volatile unsigned long *addr)
+static inline void set_bit(unsigned int nr, volatile unsigned long *p)
{
- unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long flags;
-
- _atomic_spin_lock_irqsave(p, flags);
- *p |= mask;
- _atomic_spin_unlock_irqrestore(p, flags);
+ p += BIT_WORD(nr);
+ atomic_long_fetch_or_relaxed(BIT_MASK(nr), (atomic_long_t *)p);
}

-/**
- * clear_bit - Clears a bit in memory
- * @nr: Bit to clear
- * @addr: Address to start counting from
- *
- * clear_bit() is atomic and may not be reordered. However, it does
- * not contain a memory barrier, so if it is used for locking purposes,
- * you should call smp_mb__before_atomic() and/or smp_mb__after_atomic()
- * in order to ensure changes are visible on other processors.
- */
-static inline void clear_bit(int nr, volatile unsigned long *addr)
+static inline void clear_bit(unsigned int nr, volatile unsigned long *p)
{
- unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long flags;
-
- _atomic_spin_lock_irqsave(p, flags);
- *p &= ~mask;
- _atomic_spin_unlock_irqrestore(p, flags);
+ p += BIT_WORD(nr);
+ atomic_long_fetch_andnot_relaxed(BIT_MASK(nr), (atomic_long_t *)p);
}

-/**
- * change_bit - Toggle a bit in memory
- * @nr: Bit to change
- * @addr: Address to start counting from
- *
- * change_bit() is atomic and may not be reordered. It may be
- * reordered on other architectures than x86.
- * Note that @nr may be almost arbitrarily large; this function is not
- * restricted to acting on a single-word quantity.
- */
-static inline void change_bit(int nr, volatile unsigned long *addr)
+static inline void change_bit(unsigned int nr, volatile unsigned long *p)
{
- unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long flags;
-
- _atomic_spin_lock_irqsave(p, flags);
- *p ^= mask;
- _atomic_spin_unlock_irqrestore(p, flags);
+ p += BIT_WORD(nr);
+ atomic_long_fetch_xor_relaxed(BIT_MASK(nr), (atomic_long_t *)p);
}

-/**
- * test_and_set_bit - Set a bit and return its old value
- * @nr: Bit to set
- * @addr: Address to count from
- *
- * This operation is atomic and cannot be reordered.
- * It may be reordered on other architectures than x86.
- * It also implies a memory barrier.
- */
-static inline int test_and_set_bit(int nr, volatile unsigned long *addr)
+static inline int test_and_set_bit(unsigned int nr, volatile unsigned long *p)
{
+ long old;
unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long old;
- unsigned long flags;

- _atomic_spin_lock_irqsave(p, flags);
- old = *p;
- *p = old | mask;
- _atomic_spin_unlock_irqrestore(p, flags);
+ p += BIT_WORD(nr);
+ if (READ_ONCE(*p) & mask)
+ return 1;

- return (old & mask) != 0;
+ old = atomic_long_fetch_or(mask, (atomic_long_t *)p);
+ return !!(old & mask);
}

-/**
- * test_and_clear_bit - Clear a bit and return its old value
- * @nr: Bit to clear
- * @addr: Address to count from
- *
- * This operation is atomic and cannot be reordered.
- * It can be reorderdered on other architectures other than x86.
- * It also implies a memory barrier.
- */
-static inline int test_and_clear_bit(int nr, volatile unsigned long *addr)
+static inline int test_and_clear_bit(unsigned int nr, volatile unsigned long *p)
{
+ long old;
unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long old;
- unsigned long flags;

- _atomic_spin_lock_irqsave(p, flags);
- old = *p;
- *p = old & ~mask;
- _atomic_spin_unlock_irqrestore(p, flags);
+ p += BIT_WORD(nr);
+ if (!(READ_ONCE(*p) & mask))
+ return 0;

- return (old & mask) != 0;
+ old = atomic_long_fetch_andnot(mask, (atomic_long_t *)p);
+ return !!(old & mask);
}

-/**
- * test_and_change_bit - Change a bit and return its old value
- * @nr: Bit to change
- * @addr: Address to count from
- *
- * This operation is atomic and cannot be reordered.
- * It also implies a memory barrier.
- */
-static inline int test_and_change_bit(int nr, volatile unsigned long *addr)
+static inline int test_and_change_bit(unsigned int nr, volatile unsigned long *p)
{
+ long old;
unsigned long mask = BIT_MASK(nr);
- unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
- unsigned long old;
- unsigned long flags;
-
- _atomic_spin_lock_irqsave(p, flags);
- old = *p;
- *p = old ^ mask;
- _atomic_spin_unlock_irqrestore(p, flags);

- return (old & mask) != 0;
+ p += BIT_WORD(nr);
+ old = atomic_long_fetch_xor(mask, (atomic_long_t *)p);
+ return !!(old & mask);
}

#endif /* _ASM_GENERIC_BITOPS_ATOMIC_H */
--
2.1.4


2018-05-24 11:04:01

by Will Deacon

[permalink] [raw]
Subject: [PATCH 5/9] sh: Don't pull in all of linux/bitops.h in asm/cmpxchg-xchg.h

The sh implementation of asm/cmpxchg-xchg.h pulls in linux/bitops.h
so that it can refer to BITS_PER_BYTE. It also transitively relies on
this pulling in linux/compiler.h for READ_ONCE.

Replace the #include with linux/bits.h and linux/compiler.h

Signed-off-by: Will Deacon <[email protected]>
---
arch/sh/include/asm/cmpxchg-xchg.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/sh/include/asm/cmpxchg-xchg.h b/arch/sh/include/asm/cmpxchg-xchg.h
index 1e881f5db659..593a9704782b 100644
--- a/arch/sh/include/asm/cmpxchg-xchg.h
+++ b/arch/sh/include/asm/cmpxchg-xchg.h
@@ -8,7 +8,8 @@
* This work is licensed under the terms of the GNU GPL, version 2. See the
* file "COPYING" in the main directory of this archive for more details.
*/
-#include <linux/bitops.h>
+#include <linux/bits.h>
+#include <linux/compiler.h>
#include <asm/byteorder.h>

/*
--
2.1.4


2018-05-24 11:04:10

by Will Deacon

[permalink] [raw]
Subject: [PATCH 9/9] arm64: bitops: Include <asm-generic/bitops/ext2-atomic-setbit.h>

asm-generic/bitops/ext2-atomic-setbit.h provides the ext2 atomic bitop
definitions, so we don't need to define our own.

Signed-off-by: Will Deacon <[email protected]>
---
arch/arm64/include/asm/bitops.h | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/bitops.h b/arch/arm64/include/asm/bitops.h
index 13501460be6b..10d536b1af74 100644
--- a/arch/arm64/include/asm/bitops.h
+++ b/arch/arm64/include/asm/bitops.h
@@ -38,11 +38,6 @@
#include <asm-generic/bitops/lock.h>
#include <asm-generic/bitops/non-atomic.h>
#include <asm-generic/bitops/le.h>
-
-/*
- * Ext2 is defined to use little-endian byte ordering.
- */
-#define ext2_set_bit_atomic(lock, nr, p) test_and_set_bit_le(nr, p)
-#define ext2_clear_bit_atomic(lock, nr, p) test_and_clear_bit_le(nr, p)
+#include <asm-generic/bitops/ext2-atomic-setbit.h>

#endif /* __ASM_BITOPS_H */
--
2.1.4


2018-05-24 11:04:15

by Will Deacon

[permalink] [raw]
Subject: [PATCH 7/9] asm-generic/bitops/lock.h: Rewrite using atomic_fetch_*

The lock bitops can be implemented more efficiently using the atomic_fetch_*
ops, which provide finer-grained control over the memory ordering semantics
than the bitops.

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
---
include/asm-generic/bitops/lock.h | 68 ++++++++++++++++++++++++++++++++-------
1 file changed, 56 insertions(+), 12 deletions(-)

diff --git a/include/asm-generic/bitops/lock.h b/include/asm-generic/bitops/lock.h
index 67ab280ad134..3ae021368f48 100644
--- a/include/asm-generic/bitops/lock.h
+++ b/include/asm-generic/bitops/lock.h
@@ -2,6 +2,10 @@
#ifndef _ASM_GENERIC_BITOPS_LOCK_H_
#define _ASM_GENERIC_BITOPS_LOCK_H_

+#include <linux/atomic.h>
+#include <linux/compiler.h>
+#include <asm/barrier.h>
+
/**
* test_and_set_bit_lock - Set a bit and return its old value, for lock
* @nr: Bit to set
@@ -11,7 +15,20 @@
* the returned value is 0.
* It can be used to implement bit locks.
*/
-#define test_and_set_bit_lock(nr, addr) test_and_set_bit(nr, addr)
+static inline int test_and_set_bit_lock(unsigned int nr,
+ volatile unsigned long *p)
+{
+ long old;
+ unsigned long mask = BIT_MASK(nr);
+
+ p += BIT_WORD(nr);
+ if (READ_ONCE(*p) & mask)
+ return 1;
+
+ old = atomic_long_fetch_or_acquire(mask, (atomic_long_t *)p);
+ return !!(old & mask);
+}
+

/**
* clear_bit_unlock - Clear a bit in memory, for unlock
@@ -20,11 +37,11 @@
*
* This operation is atomic and provides release barrier semantics.
*/
-#define clear_bit_unlock(nr, addr) \
-do { \
- smp_mb__before_atomic(); \
- clear_bit(nr, addr); \
-} while (0)
+static inline void clear_bit_unlock(unsigned int nr, volatile unsigned long *p)
+{
+ p += BIT_WORD(nr);
+ atomic_long_fetch_andnot_release(BIT_MASK(nr), (atomic_long_t *)p);
+}

/**
* __clear_bit_unlock - Clear a bit in memory, for unlock
@@ -37,11 +54,38 @@ do { \
*
* See for example x86's implementation.
*/
-#define __clear_bit_unlock(nr, addr) \
-do { \
- smp_mb__before_atomic(); \
- clear_bit(nr, addr); \
-} while (0)
+static inline void __clear_bit_unlock(unsigned int nr,
+ volatile unsigned long *p)
+{
+ unsigned long old;

-#endif /* _ASM_GENERIC_BITOPS_LOCK_H_ */
+ p += BIT_WORD(nr);
+ old = READ_ONCE(*p);
+ old &= ~BIT_MASK(nr);
+ atomic_long_set_release((atomic_long_t *)p, old);
+}
+
+/**
+ * clear_bit_unlock_is_negative_byte - Clear a bit in memory and test if bottom
+ * byte is negative, for unlock.
+ * @nr: the bit to clear
+ * @addr: the address to start counting from
+ *
+ * This is a bit of a one-trick-pony for the filemap code, which clears
+ * PG_locked and tests PG_waiters,
+ */
+#ifndef clear_bit_unlock_is_negative_byte
+static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr,
+ volatile unsigned long *p)
+{
+ long old;
+ unsigned long mask = BIT_MASK(nr);
+
+ p += BIT_WORD(nr);
+ old = atomic_long_fetch_andnot_release(mask, (atomic_long_t *)p);
+ return !!(old & BIT(7));
+}
+#define clear_bit_unlock_is_negative_byte clear_bit_unlock_is_negative_byte
+#endif

+#endif /* _ASM_GENERIC_BITOPS_LOCK_H_ */
--
2.1.4


2018-05-24 11:04:31

by Will Deacon

[permalink] [raw]
Subject: [PATCH 8/9] arm64: Replace our atomic/lock bitop implementations with asm-generic

The asm-generic/bitops/{atomic,lock}.h implementations are built around
the atomic-fetch ops, which we implement efficiently for both LSE and
LL/SC systems. Use that instead of our hand-rolled, out-of-line bitops.S.

Signed-off-by: Will Deacon <[email protected]>
---
arch/arm64/include/asm/bitops.h | 14 ++------
arch/arm64/lib/Makefile | 2 +-
arch/arm64/lib/bitops.S | 76 -----------------------------------------
3 files changed, 3 insertions(+), 89 deletions(-)
delete mode 100644 arch/arm64/lib/bitops.S

diff --git a/arch/arm64/include/asm/bitops.h b/arch/arm64/include/asm/bitops.h
index 9c19594ce7cb..13501460be6b 100644
--- a/arch/arm64/include/asm/bitops.h
+++ b/arch/arm64/include/asm/bitops.h
@@ -17,22 +17,11 @@
#define __ASM_BITOPS_H

#include <linux/compiler.h>
-#include <asm/barrier.h>

#ifndef _LINUX_BITOPS_H
#error only <linux/bitops.h> can be included directly
#endif

-/*
- * Little endian assembly atomic bitops.
- */
-extern void set_bit(int nr, volatile unsigned long *p);
-extern void clear_bit(int nr, volatile unsigned long *p);
-extern void change_bit(int nr, volatile unsigned long *p);
-extern int test_and_set_bit(int nr, volatile unsigned long *p);
-extern int test_and_clear_bit(int nr, volatile unsigned long *p);
-extern int test_and_change_bit(int nr, volatile unsigned long *p);
-
#include <asm-generic/bitops/builtin-__ffs.h>
#include <asm-generic/bitops/builtin-ffs.h>
#include <asm-generic/bitops/builtin-__fls.h>
@@ -44,8 +33,9 @@ extern int test_and_change_bit(int nr, volatile unsigned long *p);

#include <asm-generic/bitops/sched.h>
#include <asm-generic/bitops/hweight.h>
-#include <asm-generic/bitops/lock.h>

+#include <asm-generic/bitops/atomic.h>
+#include <asm-generic/bitops/lock.h>
#include <asm-generic/bitops/non-atomic.h>
#include <asm-generic/bitops/le.h>

diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 137710f4dac3..68755fd70dcf 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-lib-y := bitops.o clear_user.o delay.o copy_from_user.o \
+lib-y := clear_user.o delay.o copy_from_user.o \
copy_to_user.o copy_in_user.o copy_page.o \
clear_page.o memchr.o memcpy.o memmove.o memset.o \
memcmp.o strcmp.o strncmp.o strlen.o strnlen.o \
diff --git a/arch/arm64/lib/bitops.S b/arch/arm64/lib/bitops.S
deleted file mode 100644
index 43ac736baa5b..000000000000
--- a/arch/arm64/lib/bitops.S
+++ /dev/null
@@ -1,76 +0,0 @@
-/*
- * Based on arch/arm/lib/bitops.h
- *
- * Copyright (C) 2013 ARM Ltd.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program. If not, see <http://www.gnu.org/licenses/>.
- */
-
-#include <linux/linkage.h>
-#include <asm/assembler.h>
-#include <asm/lse.h>
-
-/*
- * x0: bits 5:0 bit offset
- * bits 31:6 word offset
- * x1: address
- */
- .macro bitop, name, llsc, lse
-ENTRY( \name )
- and w3, w0, #63 // Get bit offset
- eor w0, w0, w3 // Clear low bits
- mov x2, #1
- add x1, x1, x0, lsr #3 // Get word offset
-alt_lse " prfm pstl1strm, [x1]", "nop"
- lsl x3, x2, x3 // Create mask
-
-alt_lse "1: ldxr x2, [x1]", "\lse x3, [x1]"
-alt_lse " \llsc x2, x2, x3", "nop"
-alt_lse " stxr w0, x2, [x1]", "nop"
-alt_lse " cbnz w0, 1b", "nop"
-
- ret
-ENDPROC(\name )
- .endm
-
- .macro testop, name, llsc, lse
-ENTRY( \name )
- and w3, w0, #63 // Get bit offset
- eor w0, w0, w3 // Clear low bits
- mov x2, #1
- add x1, x1, x0, lsr #3 // Get word offset
-alt_lse " prfm pstl1strm, [x1]", "nop"
- lsl x4, x2, x3 // Create mask
-
-alt_lse "1: ldxr x2, [x1]", "\lse x4, x2, [x1]"
- lsr x0, x2, x3
-alt_lse " \llsc x2, x2, x4", "nop"
-alt_lse " stlxr w5, x2, [x1]", "nop"
-alt_lse " cbnz w5, 1b", "nop"
-alt_lse " dmb ish", "nop"
-
- and x0, x0, #1
- ret
-ENDPROC(\name )
- .endm
-
-/*
- * Atomic bit operations.
- */
- bitop change_bit, eor, steor
- bitop clear_bit, bic, stclr
- bitop set_bit, orr, stset
-
- testop test_and_change_bit, eor, ldeoral
- testop test_and_clear_bit, bic, ldclral
- testop test_and_set_bit, orr, ldsetal
--
2.1.4


2018-05-24 22:36:31

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 6/9] asm-generic/bitops/atomic.h: Rewrite using atomic_fetch_*

On Thu, May 24, 2018 at 11:59:43AM +0100, Will Deacon wrote:
> +static inline void set_bit(unsigned int nr, volatile unsigned long *p)
> {
> + p += BIT_WORD(nr);
> + atomic_long_fetch_or_relaxed(BIT_MASK(nr), (atomic_long_t *)p);
> }
>
> +static inline void clear_bit(unsigned int nr, volatile unsigned long *p)
> {
> + p += BIT_WORD(nr);
> + atomic_long_fetch_andnot_relaxed(BIT_MASK(nr), (atomic_long_t *)p);
> }
>
> +static inline void change_bit(unsigned int nr, volatile unsigned long *p)
> {
> + p += BIT_WORD(nr);
> + atomic_long_fetch_xor_relaxed(BIT_MASK(nr), (atomic_long_t *)p);
> }

Why use the fetch variants here?

2018-05-24 22:37:32

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 6/9] asm-generic/bitops/atomic.h: Rewrite using atomic_fetch_*

On Thu, May 24, 2018 at 02:44:10PM +0200, Peter Zijlstra wrote:
> On Thu, May 24, 2018 at 11:59:43AM +0100, Will Deacon wrote:
> > +static inline void set_bit(unsigned int nr, volatile unsigned long *p)
> > {
> > + p += BIT_WORD(nr);
> > + atomic_long_fetch_or_relaxed(BIT_MASK(nr), (atomic_long_t *)p);
> > }
> >
> > +static inline void clear_bit(unsigned int nr, volatile unsigned long *p)
> > {
> > + p += BIT_WORD(nr);
> > + atomic_long_fetch_andnot_relaxed(BIT_MASK(nr), (atomic_long_t *)p);
> > }
> >
> > +static inline void change_bit(unsigned int nr, volatile unsigned long *p)
> > {
> > + p += BIT_WORD(nr);
> > + atomic_long_fetch_xor_relaxed(BIT_MASK(nr), (atomic_long_t *)p);
> > }
>
> Why use the fetch variants here?

I noticed the same thing just now; I'll drop that and just use the
non-value-returning variants. It's shame that I can't do the same for
the lock.h unlock code, but we don't have non-returning release variants.

Will

2018-05-24 23:02:07

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 6/9] asm-generic/bitops/atomic.h: Rewrite using atomic_fetch_*

On Thu, May 24, 2018 at 01:47:39PM +0100, Will Deacon wrote:
> On Thu, May 24, 2018 at 02:44:10PM +0200, Peter Zijlstra wrote:
> > On Thu, May 24, 2018 at 11:59:43AM +0100, Will Deacon wrote:
> > > +static inline void set_bit(unsigned int nr, volatile unsigned long *p)
> > > {
> > > + p += BIT_WORD(nr);
> > > + atomic_long_fetch_or_relaxed(BIT_MASK(nr), (atomic_long_t *)p);
> > > }
> > >
> > > +static inline void clear_bit(unsigned int nr, volatile unsigned long *p)
> > > {
> > > + p += BIT_WORD(nr);
> > > + atomic_long_fetch_andnot_relaxed(BIT_MASK(nr), (atomic_long_t *)p);
> > > }
> > >
> > > +static inline void change_bit(unsigned int nr, volatile unsigned long *p)
> > > {
> > > + p += BIT_WORD(nr);
> > > + atomic_long_fetch_xor_relaxed(BIT_MASK(nr), (atomic_long_t *)p);
> > > }
> >
> > Why use the fetch variants here?
>
> I noticed the same thing just now; I'll drop that and just use the
> non-value-returning variants. It's shame that I can't do the same for
> the lock.h unlock code, but we don't have non-returning release variants.

As an aside, If I complete the autogeneration stuff, it'll be possible
to generate those. I split out the necessary barriers in [1], but I
still have a lot of other preparatory cleanup to do.

IIUC, the void-returning atomic ops are relaxed, so trying to unify that
with the usual rule that no suffix means fence will slow things down
unless we want to do a treewide substitition to fixup for that.

Thanks,
Mark.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/commit/?h=atomics/api-unification&id=c6b9ff2627d06776e427a7f1a7f83caeff3db536

2018-05-25 02:45:45

by Andrea Parri

[permalink] [raw]
Subject: Re: [PATCH 6/9] asm-generic/bitops/atomic.h: Rewrite using atomic_fetch_*

Hi Mark,

> As an aside, If I complete the autogeneration stuff, it'll be possible
> to generate those. I split out the necessary barriers in [1], but I
> still have a lot of other preparatory cleanup to do.

I do grasp the rationale behind that naming:

__atomic_mb_{before,after}_{acquire,release,fence}()

and yet I remain puzzled by it:

For example, can you imagine (using):

__atomic_mb_before_acquire() ?

(as your __atomic_mb_after_acquire() is whispering me "acquire-fences"...)

Another example:

the "atomic" in that "smp_mb__{before,after}_atomic" is so "suggestive"!

(think at x86...), but it's not explicit in the proposed names.

I don't have other names to suggest at the moment... ;/ (aka just saying)

Andrea


>
> IIUC, the void-returning atomic ops are relaxed, so trying to unify that
> with the usual rule that no suffix means fence will slow things down
> unless we want to do a treewide substitition to fixup for that.
>
> Thanks,
> Mark.
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/commit/?h=atomics/api-unification&id=c6b9ff2627d06776e427a7f1a7f83caeff3db536

2018-05-25 02:47:15

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 6/9] asm-generic/bitops/atomic.h: Rewrite using atomic_fetch_*

On Fri, May 25, 2018 at 12:06:10AM +0200, Andrea Parri wrote:
> Hi Mark,
>
> > As an aside, If I complete the autogeneration stuff, it'll be possible
> > to generate those. I split out the necessary barriers in [1], but I
> > still have a lot of other preparatory cleanup to do.
>
> I do grasp the rationale behind that naming:
>
> __atomic_mb_{before,after}_{acquire,release,fence}()
>
> and yet I remain puzzled by it:
>
> For example, can you imagine (using):
>
> __atomic_mb_before_acquire() ?
>
> (as your __atomic_mb_after_acquire() is whispering me "acquire-fences"...)

Yes, I really do think he means acquire-fence. It is however something I
have vague memories of not being liked much because it is the memop
itself that carries the ordering.

That said, this is only an implementation detail and not a public
interface, so maybe we can get away with it.

2018-07-22 13:59:17

by Yoshinori Sato

[permalink] [raw]
Subject: Re: [PATCH 1/9] h8300: Don't include linux/kernel.h in asm/atomic.h

On Thu, 24 May 2018 19:59:38 +0900,
Will Deacon wrote:
>
> linux/kernel.h isn't needed by asm/atomic.h and will result in circular
> dependencies when the asm-generic atomic bitops are built around the
> tomic_long_t interface.
>
> Remove the broad include and replace it with linux/compiler.h for
> READ_ONCE etc and asm/irqflags.h for arch_local_irq_save etc.
>
> Cc: Yoshinori Sato <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>
> ---
> arch/h8300/include/asm/atomic.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/h8300/include/asm/atomic.h b/arch/h8300/include/asm/atomic.h
> index 941e7554e886..b174dec099bf 100644
> --- a/arch/h8300/include/asm/atomic.h
> +++ b/arch/h8300/include/asm/atomic.h
> @@ -2,8 +2,10 @@
> #ifndef __ARCH_H8300_ATOMIC__
> #define __ARCH_H8300_ATOMIC__
>
> +#include <linux/compiler.h>
> #include <linux/types.h>
> #include <asm/cmpxchg.h>
> +#include <asm/irqflags.h>
>
> /*
> * Atomic operations that C can't guarantee us. Useful for
> @@ -15,8 +17,6 @@
> #define atomic_read(v) READ_ONCE((v)->counter)
> #define atomic_set(v, i) WRITE_ONCE(((v)->counter), (i))
>
> -#include <linux/kernel.h>
> -
> #define ATOMIC_OP_RETURN(op, c_op) \
> static inline int atomic_##op##_return(int i, atomic_t *v) \
> { \
> --
> 2.1.4
>

Sorry too late reply.
Applied to h8300-next.
Thanks.

--
Yosinori Sato