2022-07-13 07:23:13

by Nicholas Piggin

[permalink] [raw]
Subject: [PATCH v2 00/12] locking/qspinlock: simplify code generation

This accounts for comments from last post. Thanks for the feedback.

Thanks,
Nick

v2:
- Remove BUILD_BUG_ON contradiction
- Add Longman's for pending bit for !paravirt case
- Make queued_spin_lock_mcs_queue __always_inline
- Drop cmpxchg_acquire->trylock patch that changed generated
code slightly (and had wild indenting).
- Fix dropped acquire memory ordering in pv_wait_node cleanup
(and renamed it to pv_wait_node_acquire).


Nicholas Piggin (12):
locking/qspinlock: remove pv_node abstraction
locking/qspinlock: inline mcs_spinlock functions into qspinlock
locking/qspinlock: split common mcs queueing code into its own
function
locking/qspinlock: move pv lock word helpers into qspinlock.c
locking/qspinlock: be less clever with the preprocessor
locking/qspinlock: merge qspinlock_paravirt.h into qspinlock.c
locking/qspinlock: remove arch qspinlock_paravirt.h includes
locking/qspinlock: stop renaming queued_spin_lock_slowpath to
native_queued_spin_lock_slowpath
locking/qspinlock: rename __pv_init_lock_hash to pv_spinlocks_init
locking/qspinlock: paravirt use simple trylock in case idx overflows
locking/qspinlock: separate pv_wait_node from the non-paravirt path
locking/qspinlock: simplify pv_wait_head_or_lock calling scheme

arch/alpha/include/asm/Kbuild | 1 -
arch/arc/include/asm/Kbuild | 1 -
arch/arm/include/asm/mcs_spinlock.h | 24 -
arch/arm64/include/asm/Kbuild | 1 -
arch/hexagon/include/asm/Kbuild | 1 -
arch/ia64/include/asm/Kbuild | 1 -
arch/m68k/include/asm/Kbuild | 1 -
arch/microblaze/include/asm/Kbuild | 1 -
arch/mips/include/asm/Kbuild | 1 -
arch/nios2/include/asm/Kbuild | 1 -
arch/parisc/include/asm/Kbuild | 1 -
arch/powerpc/include/asm/Kbuild | 1 -
arch/powerpc/include/asm/qspinlock.h | 45 +-
arch/powerpc/include/asm/qspinlock_paravirt.h | 7 -
arch/powerpc/include/asm/spinlock.h | 2 +-
arch/s390/include/asm/Kbuild | 1 -
arch/sh/include/asm/Kbuild | 1 -
arch/sparc/include/asm/Kbuild | 1 -
arch/um/include/asm/Kbuild | 1 -
arch/x86/hyperv/hv_spinlock.c | 2 +-
arch/x86/include/asm/Kbuild | 1 -
arch/x86/include/asm/qspinlock.h | 19 +-
arch/x86/include/asm/qspinlock_paravirt.h | 72 --
arch/x86/kernel/kvm.c | 2 +-
arch/x86/kernel/paravirt-spinlocks.c | 71 ++
arch/x86/kernel/paravirt.c | 2 +-
arch/x86/xen/spinlock.c | 2 +-
arch/xtensa/include/asm/Kbuild | 1 -
include/asm-generic/mcs_spinlock.h | 13 -
include/asm-generic/qspinlock.h | 6 +
kernel/locking/mcs_spinlock.h | 121 ---
kernel/locking/qspinlock.c | 835 ++++++++++++++----
kernel/locking/qspinlock_paravirt.h | 562 ------------
33 files changed, 764 insertions(+), 1038 deletions(-)
delete mode 100644 arch/arm/include/asm/mcs_spinlock.h
delete mode 100644 arch/powerpc/include/asm/qspinlock_paravirt.h
delete mode 100644 arch/x86/include/asm/qspinlock_paravirt.h
delete mode 100644 include/asm-generic/mcs_spinlock.h
delete mode 100644 kernel/locking/mcs_spinlock.h
delete mode 100644 kernel/locking/qspinlock_paravirt.h

--
2.35.1


2022-07-13 07:24:04

by Nicholas Piggin

[permalink] [raw]
Subject: [PATCH v2 07/12] locking/qspinlock: remove arch qspinlock_paravirt.h includes

Signed-off-by: Nicholas Piggin <[email protected]>
---
arch/powerpc/include/asm/qspinlock_paravirt.h | 7 --
arch/x86/include/asm/qspinlock.h | 4 ++
arch/x86/include/asm/qspinlock_paravirt.h | 72 -------------------
arch/x86/kernel/paravirt-spinlocks.c | 71 ++++++++++++++++++
kernel/locking/qspinlock.c | 11 +--
5 files changed, 76 insertions(+), 89 deletions(-)
delete mode 100644 arch/powerpc/include/asm/qspinlock_paravirt.h
delete mode 100644 arch/x86/include/asm/qspinlock_paravirt.h

diff --git a/arch/powerpc/include/asm/qspinlock_paravirt.h b/arch/powerpc/include/asm/qspinlock_paravirt.h
deleted file mode 100644
index 6b60e7736a47..000000000000
--- a/arch/powerpc/include/asm/qspinlock_paravirt.h
+++ /dev/null
@@ -1,7 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-#ifndef _ASM_POWERPC_QSPINLOCK_PARAVIRT_H
-#define _ASM_POWERPC_QSPINLOCK_PARAVIRT_H
-
-EXPORT_SYMBOL(__pv_queued_spin_unlock);
-
-#endif /* _ASM_POWERPC_QSPINLOCK_PARAVIRT_H */
diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index d87451df480b..7f914fe7bc30 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -34,6 +34,10 @@ extern void __pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
extern void __raw_callee_save___pv_queued_spin_unlock(struct qspinlock *lock);
extern bool nopvspin;

+#ifdef CONFIG_64BIT
+#define __pv_queued_spin_unlock __pv_queued_spin_unlock
+#endif
+
#define queued_spin_unlock queued_spin_unlock
/**
* queued_spin_unlock - release a queued spinlock
diff --git a/arch/x86/include/asm/qspinlock_paravirt.h b/arch/x86/include/asm/qspinlock_paravirt.h
deleted file mode 100644
index 892fd8c3a6f7..000000000000
--- a/arch/x86/include/asm/qspinlock_paravirt.h
+++ /dev/null
@@ -1,72 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __ASM_QSPINLOCK_PARAVIRT_H
-#define __ASM_QSPINLOCK_PARAVIRT_H
-
-#include <asm/ibt.h>
-
-/*
- * For x86-64, PV_CALLEE_SAVE_REGS_THUNK() saves and restores 8 64-bit
- * registers. For i386, however, only 1 32-bit register needs to be saved
- * and restored. So an optimized version of __pv_queued_spin_unlock() is
- * hand-coded for 64-bit, but it isn't worthwhile to do it for 32-bit.
- */
-#ifdef CONFIG_64BIT
-
-PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock_slowpath);
-#define __pv_queued_spin_unlock __pv_queued_spin_unlock
-#define PV_UNLOCK "__raw_callee_save___pv_queued_spin_unlock"
-#define PV_UNLOCK_SLOWPATH "__raw_callee_save___pv_queued_spin_unlock_slowpath"
-
-/*
- * Optimized assembly version of __raw_callee_save___pv_queued_spin_unlock
- * which combines the registers saving trunk and the body of the following
- * C code:
- *
- * void __pv_queued_spin_unlock(struct qspinlock *lock)
- * {
- * u8 lockval = cmpxchg(&lock->locked, _Q_LOCKED_VAL, 0);
- *
- * if (likely(lockval == _Q_LOCKED_VAL))
- * return;
- * pv_queued_spin_unlock_slowpath(lock, lockval);
- * }
- *
- * For x86-64,
- * rdi = lock (first argument)
- * rsi = lockval (second argument)
- * rdx = internal variable (set to 0)
- */
-asm (".pushsection .text;"
- ".globl " PV_UNLOCK ";"
- ".type " PV_UNLOCK ", @function;"
- ".align 4,0x90;"
- PV_UNLOCK ": "
- ASM_ENDBR
- FRAME_BEGIN
- "push %rdx;"
- "mov $0x1,%eax;"
- "xor %edx,%edx;"
- LOCK_PREFIX "cmpxchg %dl,(%rdi);"
- "cmp $0x1,%al;"
- "jne .slowpath;"
- "pop %rdx;"
- FRAME_END
- ASM_RET
- ".slowpath: "
- "push %rsi;"
- "movzbl %al,%esi;"
- "call " PV_UNLOCK_SLOWPATH ";"
- "pop %rsi;"
- "pop %rdx;"
- FRAME_END
- ASM_RET
- ".size " PV_UNLOCK ", .-" PV_UNLOCK ";"
- ".popsection");
-
-#else /* CONFIG_64BIT */
-
-extern void __pv_queued_spin_unlock(struct qspinlock *lock);
-PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock);
-
-#endif /* CONFIG_64BIT */
-#endif
diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c
index 9e1ea99ad9df..c6a107dfe20d 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -7,6 +7,7 @@
#include <linux/export.h>
#include <linux/jump_label.h>

+#include <asm/ibt.h>
#include <asm/paravirt.h>

__visible void __native_queued_spin_unlock(struct qspinlock *lock)
@@ -15,6 +16,76 @@ __visible void __native_queued_spin_unlock(struct qspinlock *lock)
}
PV_CALLEE_SAVE_REGS_THUNK(__native_queued_spin_unlock);

+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+/*
+ * For x86-64, PV_CALLEE_SAVE_REGS_THUNK() saves and restores 8 64-bit
+ * registers. For i386, however, only 1 32-bit register needs to be saved
+ * and restored. So an optimized version of __pv_queued_spin_unlock() is
+ * hand-coded for 64-bit, but it isn't worthwhile to do it for 32-bit.
+ */
+#ifdef CONFIG_64BIT
+
+__visible void
+__pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked);
+
+PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock_slowpath);
+#define PV_UNLOCK "__raw_callee_save___pv_queued_spin_unlock"
+#define PV_UNLOCK_SLOWPATH "__raw_callee_save___pv_queued_spin_unlock_slowpath"
+
+/*
+ * Optimized assembly version of __raw_callee_save___pv_queued_spin_unlock
+ * which combines the registers saving trunk and the body of the following
+ * C code:
+ *
+ * void __pv_queued_spin_unlock(struct qspinlock *lock)
+ * {
+ * u8 lockval = cmpxchg(&lock->locked, _Q_LOCKED_VAL, 0);
+ *
+ * if (likely(lockval == _Q_LOCKED_VAL))
+ * return;
+ * pv_queued_spin_unlock_slowpath(lock, lockval);
+ * }
+ *
+ * For x86-64,
+ * rdi = lock (first argument)
+ * rsi = lockval (second argument)
+ * rdx = internal variable (set to 0)
+ */
+asm (".pushsection .text;"
+ ".globl " PV_UNLOCK ";"
+ ".type " PV_UNLOCK ", @function;"
+ ".align 4,0x90;"
+ PV_UNLOCK ": "
+ ASM_ENDBR
+ FRAME_BEGIN
+ "push %rdx;"
+ "mov $0x1,%eax;"
+ "xor %edx,%edx;"
+ LOCK_PREFIX "cmpxchg %dl,(%rdi);"
+ "cmp $0x1,%al;"
+ "jne .slowpath;"
+ "pop %rdx;"
+ FRAME_END
+ ASM_RET
+ ".slowpath: "
+ "push %rsi;"
+ "movzbl %al,%esi;"
+ "call " PV_UNLOCK_SLOWPATH ";"
+ "pop %rsi;"
+ "pop %rdx;"
+ FRAME_END
+ ASM_RET
+ ".size " PV_UNLOCK ", .-" PV_UNLOCK ";"
+ ".popsection");
+
+#else /* CONFIG_64BIT */
+
+extern void __pv_queued_spin_unlock(struct qspinlock *lock);
+PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock);
+
+#endif /* CONFIG_64BIT */
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
+
bool pv_is_native_spin_unlock(void)
{
return pv_ops.lock.queued_spin_unlock.func ==
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 3b3663d15402..c4f223a03345 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -745,16 +745,6 @@ __pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked)
pv_kick(node->cpu);
}

-/*
- * Include the architecture specific callee-save thunk of the
- * __pv_queued_spin_unlock(). This thunk is put together with
- * __pv_queued_spin_unlock() to make the callee-save thunk and the real unlock
- * function close to each other sharing consecutive instruction cachelines.
- * Alternatively, architecture specific version of __pv_queued_spin_unlock()
- * can be defined.
- */
-#include <asm/qspinlock_paravirt.h>
-
#ifndef __pv_queued_spin_unlock
__visible void __pv_queued_spin_unlock(struct qspinlock *lock)
{
@@ -771,6 +761,7 @@ __visible void __pv_queued_spin_unlock(struct qspinlock *lock)

__pv_queued_spin_unlock_slowpath(lock, locked);
}
+EXPORT_SYMBOL(__pv_queued_spin_unlock);
#endif

#else /* CONFIG_PARAVIRT_SPINLOCKS */
--
2.35.1

2022-07-13 07:24:49

by Nicholas Piggin

[permalink] [raw]
Subject: [PATCH v2 06/12] locking/qspinlock: merge qspinlock_paravirt.h into qspinlock.c

There isn't much reason to keep these separate.

Signed-off-by: Nicholas Piggin <[email protected]>
---
kernel/locking/qspinlock.c | 488 ++++++++++++++++++++++++++-
kernel/locking/qspinlock_paravirt.h | 490 ----------------------------
2 files changed, 487 insertions(+), 491 deletions(-)
delete mode 100644 kernel/locking/qspinlock_paravirt.h

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 037bd5440cfd..3b3663d15402 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -16,6 +16,7 @@
#include <linux/cpumask.h>
#include <linux/percpu.h>
#include <linux/hardirq.h>
+#include <linux/memblock.h>
#include <linux/mutex.h>
#include <linux/prefetch.h>
#include <asm/byteorder.h>
@@ -286,7 +287,492 @@ static __always_inline void set_locked(struct qspinlock *lock)
}

#ifdef CONFIG_PARAVIRT_SPINLOCKS
-#include "qspinlock_paravirt.h"
+/*
+ * Implement paravirt qspinlocks; the general idea is to halt the vcpus instead
+ * of spinning them.
+ *
+ * This relies on the architecture to provide two paravirt hypercalls:
+ *
+ * pv_wait(u8 *ptr, u8 val) -- suspends the vcpu if *ptr == val
+ * pv_kick(cpu) -- wakes a suspended vcpu
+ *
+ * Using these we implement __pv_queued_spin_lock_slowpath() and
+ * __pv_queued_spin_unlock() to replace native_queued_spin_lock_slowpath() and
+ * native_queued_spin_unlock().
+ */
+
+#define _Q_SLOW_VAL (3U << _Q_LOCKED_OFFSET)
+
+/*
+ * Queue Node Adaptive Spinning
+ *
+ * A queue node vCPU will stop spinning if the vCPU in the previous node is
+ * not running. The one lock stealing attempt allowed at slowpath entry
+ * mitigates the slight slowdown for non-overcommitted guest with this
+ * aggressive wait-early mechanism.
+ *
+ * The status of the previous node will be checked at fixed interval
+ * controlled by PV_PREV_CHECK_MASK. This is to ensure that we won't
+ * pound on the cacheline of the previous node too heavily.
+ */
+#define PV_PREV_CHECK_MASK 0xff
+
+/*
+ * Queue node uses: vcpu_running & vcpu_halted.
+ * Queue head uses: vcpu_running & vcpu_hashed.
+ */
+enum vcpu_state {
+ vcpu_running = 0,
+ vcpu_halted, /* Used only in pv_wait_node */
+ vcpu_hashed, /* = pv_hash'ed + vcpu_halted */
+};
+
+/*
+ * Hybrid PV queued/unfair lock
+ *
+ * This function is called once when a lock waiter enters the PV slowpath
+ * before being queued.
+ *
+ * The pending bit is set by the queue head vCPU of the MCS wait queue in
+ * pv_wait_head_or_lock() to signal that it is ready to spin on the lock.
+ * When that bit becomes visible to the incoming waiters, no lock stealing
+ * is allowed. The function will return immediately to make the waiters
+ * enter the MCS wait queue. So lock starvation shouldn't happen as long
+ * as the queued mode vCPUs are actively running to set the pending bit
+ * and hence disabling lock stealing.
+ *
+ * When the pending bit isn't set, the lock waiters will stay in the unfair
+ * mode spinning on the lock unless the MCS wait queue is empty. In this
+ * case, the lock waiters will enter the queued mode slowpath trying to
+ * become the queue head and set the pending bit.
+ *
+ * This hybrid PV queued/unfair lock combines the best attributes of a
+ * queued lock (no lock starvation) and an unfair lock (good performance
+ * on not heavily contended locks).
+ */
+static inline bool pv_hybrid_queued_unfair_trylock(struct qspinlock *lock)
+{
+ /*
+ * Stay in unfair lock mode as long as queued mode waiters are
+ * present in the MCS wait queue but the pending bit isn't set.
+ */
+ for (;;) {
+ int val = atomic_read(&lock->val);
+
+ if (!(val & _Q_LOCKED_PENDING_MASK) &&
+ (cmpxchg_acquire(&lock->locked, 0, _Q_LOCKED_VAL) == 0)) {
+ lockevent_inc(pv_lock_stealing);
+ return true;
+ }
+ if (!(val & _Q_TAIL_MASK) || (val & _Q_PENDING_MASK))
+ break;
+
+ cpu_relax();
+ }
+
+ return false;
+}
+
+/*
+ * Lock and MCS node addresses hash table for fast lookup
+ *
+ * Hashing is done on a per-cacheline basis to minimize the need to access
+ * more than one cacheline.
+ *
+ * Dynamically allocate a hash table big enough to hold at least 4X the
+ * number of possible cpus in the system. Allocation is done on page
+ * granularity. So the minimum number of hash buckets should be at least
+ * 256 (64-bit) or 512 (32-bit) to fully utilize a 4k page.
+ *
+ * Since we should not be holding locks from NMI context (very rare indeed) the
+ * max load factor is 0.75, which is around the point where open addressing
+ * breaks down.
+ *
+ */
+struct pv_hash_entry {
+ struct qspinlock *lock;
+ struct qnode *node;
+};
+
+#define PV_HE_PER_LINE (SMP_CACHE_BYTES / sizeof(struct pv_hash_entry))
+#define PV_HE_MIN (PAGE_SIZE / sizeof(struct pv_hash_entry))
+
+static struct pv_hash_entry *pv_lock_hash;
+static unsigned int pv_lock_hash_bits __read_mostly;
+
+/*
+ * Allocate memory for the PV qspinlock hash buckets
+ *
+ * This function should be called from the paravirt spinlock initialization
+ * routine.
+ */
+void __init __pv_init_lock_hash(void)
+{
+ int pv_hash_size = ALIGN(4 * num_possible_cpus(), PV_HE_PER_LINE);
+
+ if (pv_hash_size < PV_HE_MIN)
+ pv_hash_size = PV_HE_MIN;
+
+ /*
+ * Allocate space from bootmem which should be page-size aligned
+ * and hence cacheline aligned.
+ */
+ pv_lock_hash = alloc_large_system_hash("PV qspinlock",
+ sizeof(struct pv_hash_entry),
+ pv_hash_size, 0,
+ HASH_EARLY | HASH_ZERO,
+ &pv_lock_hash_bits, NULL,
+ pv_hash_size, pv_hash_size);
+}
+
+#define for_each_hash_entry(he, offset, hash) \
+ for (hash &= ~(PV_HE_PER_LINE - 1), he = &pv_lock_hash[hash], offset = 0; \
+ offset < (1 << pv_lock_hash_bits); \
+ offset++, he = &pv_lock_hash[(hash + offset) & ((1 << pv_lock_hash_bits) - 1)])
+
+static struct qspinlock **pv_hash(struct qspinlock *lock, struct qnode *node)
+{
+ unsigned long offset, hash = hash_ptr(lock, pv_lock_hash_bits);
+ struct pv_hash_entry *he;
+ int hopcnt = 0;
+
+ for_each_hash_entry(he, offset, hash) {
+ hopcnt++;
+ if (!cmpxchg(&he->lock, NULL, lock)) {
+ WRITE_ONCE(he->node, node);
+ lockevent_pv_hop(hopcnt);
+ return &he->lock;
+ }
+ }
+ /*
+ * Hard assume there is a free entry for us.
+ *
+ * This is guaranteed by ensuring every blocked lock only ever consumes
+ * a single entry, and since we only have 4 nesting levels per CPU
+ * and allocated 4*nr_possible_cpus(), this must be so.
+ *
+ * The single entry is guaranteed by having the lock owner unhash
+ * before it releases.
+ */
+ BUG();
+}
+
+static struct qnode *pv_unhash(struct qspinlock *lock)
+{
+ unsigned long offset, hash = hash_ptr(lock, pv_lock_hash_bits);
+ struct pv_hash_entry *he;
+ struct qnode *node;
+
+ for_each_hash_entry(he, offset, hash) {
+ if (READ_ONCE(he->lock) == lock) {
+ node = READ_ONCE(he->node);
+ WRITE_ONCE(he->lock, NULL);
+ return node;
+ }
+ }
+ /*
+ * Hard assume we'll find an entry.
+ *
+ * This guarantees a limited lookup time and is itself guaranteed by
+ * having the lock owner do the unhash -- IFF the unlock sees the
+ * SLOW flag, there MUST be a hash entry.
+ */
+ BUG();
+}
+
+/*
+ * Return true if when it is time to check the previous node which is not
+ * in a running state.
+ */
+static inline bool
+pv_wait_early(struct qnode *prev, int loop)
+{
+ if ((loop & PV_PREV_CHECK_MASK) != 0)
+ return false;
+
+ return READ_ONCE(prev->state) != vcpu_running;
+}
+
+/*
+ * Initialize the PV part of the qnode.
+ */
+static void pv_init_node(struct qnode *node)
+{
+ node->cpu = smp_processor_id();
+ node->state = vcpu_running;
+}
+
+/*
+ * Wait for node->locked to become true, halt the vcpu after a short spin.
+ * pv_kick_node() is used to set _Q_SLOW_VAL and fill in hash table on its
+ * behalf.
+ */
+static void pv_wait_node(struct qnode *node, struct qnode *prev)
+{
+ int loop;
+ bool wait_early;
+
+ for (;;) {
+ for (wait_early = false, loop = SPIN_THRESHOLD; loop; loop--) {
+ if (READ_ONCE(node->locked))
+ return;
+ if (pv_wait_early(prev, loop)) {
+ wait_early = true;
+ break;
+ }
+ cpu_relax();
+ }
+
+ /*
+ * Order node->state vs node->locked thusly:
+ *
+ * [S] node->state = vcpu_halted [S] next->locked = 1
+ * MB MB
+ * [L] node->locked [RmW] node->state = vcpu_hashed
+ *
+ * Matches the cmpxchg() from pv_kick_node().
+ */
+ smp_store_mb(node->state, vcpu_halted);
+
+ if (!READ_ONCE(node->locked)) {
+ lockevent_inc(pv_wait_node);
+ lockevent_cond_inc(pv_wait_early, wait_early);
+ pv_wait(&node->state, vcpu_halted);
+ }
+
+ /*
+ * If pv_kick_node() changed us to vcpu_hashed, retain that
+ * value so that pv_wait_head_or_lock() knows to not also try
+ * to hash this lock.
+ */
+ cmpxchg(&node->state, vcpu_halted, vcpu_running);
+
+ /*
+ * If the locked flag is still not set after wakeup, it is a
+ * spurious wakeup and the vCPU should wait again. However,
+ * there is a pretty high overhead for CPU halting and kicking.
+ * So it is better to spin for a while in the hope that the
+ * MCS lock will be released soon.
+ */
+ lockevent_cond_inc(pv_spurious_wakeup,
+ !READ_ONCE(node->locked));
+ }
+
+ /*
+ * By now our node->locked should be 1 and our caller will not actually
+ * spin-wait for it. We do however rely on our caller to do a
+ * load-acquire for us.
+ */
+}
+
+/*
+ * Called after setting next->locked = 1 when we're the lock owner.
+ *
+ * Instead of waking the waiters stuck in pv_wait_node() advance their state
+ * such that they're waiting in pv_wait_head_or_lock(), this avoids a
+ * wake/sleep cycle.
+ */
+static void pv_kick_node(struct qspinlock *lock, struct qnode *node)
+{
+ /*
+ * If the vCPU is indeed halted, advance its state to match that of
+ * pv_wait_node(). If OTOH this fails, the vCPU was running and will
+ * observe its next->locked value and advance itself.
+ *
+ * Matches with smp_store_mb() and cmpxchg() in pv_wait_node()
+ *
+ * The write to next->locked in arch_mcs_spin_unlock_contended()
+ * must be ordered before the read of node->state in the cmpxchg()
+ * below for the code to work correctly. To guarantee full ordering
+ * irrespective of the success or failure of the cmpxchg(),
+ * a relaxed version with explicit barrier is used. The control
+ * dependency will order the reading of node->state before any
+ * subsequent writes.
+ */
+ smp_mb__before_atomic();
+ if (cmpxchg_relaxed(&node->state, vcpu_halted, vcpu_hashed)
+ != vcpu_halted)
+ return;
+
+ /*
+ * Put the lock into the hash table and set the _Q_SLOW_VAL.
+ *
+ * As this is the same vCPU that will check the _Q_SLOW_VAL value and
+ * the hash table later on at unlock time, no atomic instruction is
+ * needed.
+ */
+ WRITE_ONCE(lock->locked, _Q_SLOW_VAL);
+ (void)pv_hash(lock, node);
+}
+
+/*
+ * Wait for l->locked to become clear and acquire the lock;
+ * halt the vcpu after a short spin.
+ * __pv_queued_spin_unlock() will wake us.
+ *
+ * The current value of the lock will be returned for additional processing.
+ */
+static u32
+pv_wait_head_or_lock(struct qspinlock *lock, struct qnode *node)
+{
+ struct qspinlock **lp = NULL;
+ int waitcnt = 0;
+ int loop;
+
+ /*
+ * If pv_kick_node() already advanced our state, we don't need to
+ * insert ourselves into the hash table anymore.
+ */
+ if (READ_ONCE(node->state) == vcpu_hashed)
+ lp = (struct qspinlock **)1;
+
+ /*
+ * Tracking # of slowpath locking operations
+ */
+ lockevent_inc(lock_slowpath);
+
+ for (;; waitcnt++) {
+ /*
+ * Set correct vCPU state to be used by queue node wait-early
+ * mechanism.
+ */
+ WRITE_ONCE(node->state, vcpu_running);
+
+ /*
+ * Set the pending bit in the active lock spinning loop to
+ * disable lock stealing before attempting to acquire the lock.
+ */
+ set_pending(lock);
+ for (loop = SPIN_THRESHOLD; loop; loop--) {
+ if (trylock_clear_pending(lock))
+ goto gotlock;
+ cpu_relax();
+ }
+ clear_pending(lock);
+
+
+ if (!lp) { /* ONCE */
+ lp = pv_hash(lock, node);
+
+ /*
+ * We must hash before setting _Q_SLOW_VAL, such that
+ * when we observe _Q_SLOW_VAL in __pv_queued_spin_unlock()
+ * we'll be sure to be able to observe our hash entry.
+ *
+ * [S] <hash> [Rmw] l->locked == _Q_SLOW_VAL
+ * MB RMB
+ * [RmW] l->locked = _Q_SLOW_VAL [L] <unhash>
+ *
+ * Matches the smp_rmb() in __pv_queued_spin_unlock().
+ */
+ if (xchg(&lock->locked, _Q_SLOW_VAL) == 0) {
+ /*
+ * The lock was free and now we own the lock.
+ * Change the lock value back to _Q_LOCKED_VAL
+ * and unhash the table.
+ */
+ WRITE_ONCE(lock->locked, _Q_LOCKED_VAL);
+ WRITE_ONCE(*lp, NULL);
+ goto gotlock;
+ }
+ }
+ WRITE_ONCE(node->state, vcpu_hashed);
+ lockevent_inc(pv_wait_head);
+ lockevent_cond_inc(pv_wait_again, waitcnt);
+ pv_wait(&lock->locked, _Q_SLOW_VAL);
+
+ /*
+ * Because of lock stealing, the queue head vCPU may not be
+ * able to acquire the lock before it has to wait again.
+ */
+ }
+
+ /*
+ * The cmpxchg() or xchg() call before coming here provides the
+ * acquire semantics for locking. The dummy ORing of _Q_LOCKED_VAL
+ * here is to indicate to the compiler that the value will always
+ * be nozero to enable better code optimization.
+ */
+gotlock:
+ return (u32)(atomic_read(&lock->val) | _Q_LOCKED_VAL);
+}
+
+/*
+ * PV versions of the unlock fastpath and slowpath functions to be used
+ * instead of queued_spin_unlock().
+ */
+__visible void
+__pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked)
+{
+ struct qnode *node;
+
+ if (unlikely(locked != _Q_SLOW_VAL)) {
+ WARN(!debug_locks_silent,
+ "pvqspinlock: lock 0x%lx has corrupted value 0x%x!\n",
+ (unsigned long)lock, atomic_read(&lock->val));
+ return;
+ }
+
+ /*
+ * A failed cmpxchg doesn't provide any memory-ordering guarantees,
+ * so we need a barrier to order the read of the node data in
+ * pv_unhash *after* we've read the lock being _Q_SLOW_VAL.
+ *
+ * Matches the cmpxchg() in pv_wait_head_or_lock() setting _Q_SLOW_VAL.
+ */
+ smp_rmb();
+
+ /*
+ * Since the above failed to release, this must be the SLOW path.
+ * Therefore start by looking up the blocked node and unhashing it.
+ */
+ node = pv_unhash(lock);
+
+ /*
+ * Now that we have a reference to the (likely) blocked qnode,
+ * release the lock.
+ */
+ smp_store_release(&lock->locked, 0);
+
+ /*
+ * At this point the memory pointed at by lock can be freed/reused,
+ * however we can still use the qnode to kick the CPU.
+ * The other vCPU may not really be halted, but kicking an active
+ * vCPU is harmless other than the additional latency in completing
+ * the unlock.
+ */
+ lockevent_inc(pv_kick_unlock);
+ pv_kick(node->cpu);
+}
+
+/*
+ * Include the architecture specific callee-save thunk of the
+ * __pv_queued_spin_unlock(). This thunk is put together with
+ * __pv_queued_spin_unlock() to make the callee-save thunk and the real unlock
+ * function close to each other sharing consecutive instruction cachelines.
+ * Alternatively, architecture specific version of __pv_queued_spin_unlock()
+ * can be defined.
+ */
+#include <asm/qspinlock_paravirt.h>
+
+#ifndef __pv_queued_spin_unlock
+__visible void __pv_queued_spin_unlock(struct qspinlock *lock)
+{
+ u8 locked;
+
+ /*
+ * We must not unlock if SLOW, because in that case we must first
+ * unhash. Otherwise it would be possible to have multiple @lock
+ * entries, which would be BAD.
+ */
+ locked = cmpxchg_release(&lock->locked, _Q_LOCKED_VAL, 0);
+ if (likely(locked == _Q_LOCKED_VAL))
+ return;
+
+ __pv_queued_spin_unlock_slowpath(lock, locked);
+}
+#endif
+
#else /* CONFIG_PARAVIRT_SPINLOCKS */
static __always_inline void pv_init_node(struct qnode *node) { }
static __always_inline void pv_wait_node(struct qnode *node,
diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
deleted file mode 100644
index f1922e3a0f7d..000000000000
--- a/kernel/locking/qspinlock_paravirt.h
+++ /dev/null
@@ -1,490 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#include <linux/hash.h>
-#include <linux/memblock.h>
-#include <linux/debug_locks.h>
-
-/*
- * Implement paravirt qspinlocks; the general idea is to halt the vcpus instead
- * of spinning them.
- *
- * This relies on the architecture to provide two paravirt hypercalls:
- *
- * pv_wait(u8 *ptr, u8 val) -- suspends the vcpu if *ptr == val
- * pv_kick(cpu) -- wakes a suspended vcpu
- *
- * Using these we implement __pv_queued_spin_lock_slowpath() and
- * __pv_queued_spin_unlock() to replace native_queued_spin_lock_slowpath() and
- * native_queued_spin_unlock().
- */
-
-#define _Q_SLOW_VAL (3U << _Q_LOCKED_OFFSET)
-
-/*
- * Queue Node Adaptive Spinning
- *
- * A queue node vCPU will stop spinning if the vCPU in the previous node is
- * not running. The one lock stealing attempt allowed at slowpath entry
- * mitigates the slight slowdown for non-overcommitted guest with this
- * aggressive wait-early mechanism.
- *
- * The status of the previous node will be checked at fixed interval
- * controlled by PV_PREV_CHECK_MASK. This is to ensure that we won't
- * pound on the cacheline of the previous node too heavily.
- */
-#define PV_PREV_CHECK_MASK 0xff
-
-/*
- * Queue node uses: vcpu_running & vcpu_halted.
- * Queue head uses: vcpu_running & vcpu_hashed.
- */
-enum vcpu_state {
- vcpu_running = 0,
- vcpu_halted, /* Used only in pv_wait_node */
- vcpu_hashed, /* = pv_hash'ed + vcpu_halted */
-};
-
-/*
- * Hybrid PV queued/unfair lock
- *
- * This function is called once when a lock waiter enters the PV slowpath
- * before being queued.
- *
- * The pending bit is set by the queue head vCPU of the MCS wait queue in
- * pv_wait_head_or_lock() to signal that it is ready to spin on the lock.
- * When that bit becomes visible to the incoming waiters, no lock stealing
- * is allowed. The function will return immediately to make the waiters
- * enter the MCS wait queue. So lock starvation shouldn't happen as long
- * as the queued mode vCPUs are actively running to set the pending bit
- * and hence disabling lock stealing.
- *
- * When the pending bit isn't set, the lock waiters will stay in the unfair
- * mode spinning on the lock unless the MCS wait queue is empty. In this
- * case, the lock waiters will enter the queued mode slowpath trying to
- * become the queue head and set the pending bit.
- *
- * This hybrid PV queued/unfair lock combines the best attributes of a
- * queued lock (no lock starvation) and an unfair lock (good performance
- * on not heavily contended locks).
- */
-static inline bool pv_hybrid_queued_unfair_trylock(struct qspinlock *lock)
-{
- /*
- * Stay in unfair lock mode as long as queued mode waiters are
- * present in the MCS wait queue but the pending bit isn't set.
- */
- for (;;) {
- int val = atomic_read(&lock->val);
-
- if (!(val & _Q_LOCKED_PENDING_MASK) &&
- (cmpxchg_acquire(&lock->locked, 0, _Q_LOCKED_VAL) == 0)) {
- lockevent_inc(pv_lock_stealing);
- return true;
- }
- if (!(val & _Q_TAIL_MASK) || (val & _Q_PENDING_MASK))
- break;
-
- cpu_relax();
- }
-
- return false;
-}
-
-/*
- * Lock and MCS node addresses hash table for fast lookup
- *
- * Hashing is done on a per-cacheline basis to minimize the need to access
- * more than one cacheline.
- *
- * Dynamically allocate a hash table big enough to hold at least 4X the
- * number of possible cpus in the system. Allocation is done on page
- * granularity. So the minimum number of hash buckets should be at least
- * 256 (64-bit) or 512 (32-bit) to fully utilize a 4k page.
- *
- * Since we should not be holding locks from NMI context (very rare indeed) the
- * max load factor is 0.75, which is around the point where open addressing
- * breaks down.
- *
- */
-struct pv_hash_entry {
- struct qspinlock *lock;
- struct qnode *node;
-};
-
-#define PV_HE_PER_LINE (SMP_CACHE_BYTES / sizeof(struct pv_hash_entry))
-#define PV_HE_MIN (PAGE_SIZE / sizeof(struct pv_hash_entry))
-
-static struct pv_hash_entry *pv_lock_hash;
-static unsigned int pv_lock_hash_bits __read_mostly;
-
-/*
- * Allocate memory for the PV qspinlock hash buckets
- *
- * This function should be called from the paravirt spinlock initialization
- * routine.
- */
-void __init __pv_init_lock_hash(void)
-{
- int pv_hash_size = ALIGN(4 * num_possible_cpus(), PV_HE_PER_LINE);
-
- if (pv_hash_size < PV_HE_MIN)
- pv_hash_size = PV_HE_MIN;
-
- /*
- * Allocate space from bootmem which should be page-size aligned
- * and hence cacheline aligned.
- */
- pv_lock_hash = alloc_large_system_hash("PV qspinlock",
- sizeof(struct pv_hash_entry),
- pv_hash_size, 0,
- HASH_EARLY | HASH_ZERO,
- &pv_lock_hash_bits, NULL,
- pv_hash_size, pv_hash_size);
-}
-
-#define for_each_hash_entry(he, offset, hash) \
- for (hash &= ~(PV_HE_PER_LINE - 1), he = &pv_lock_hash[hash], offset = 0; \
- offset < (1 << pv_lock_hash_bits); \
- offset++, he = &pv_lock_hash[(hash + offset) & ((1 << pv_lock_hash_bits) - 1)])
-
-static struct qspinlock **pv_hash(struct qspinlock *lock, struct qnode *node)
-{
- unsigned long offset, hash = hash_ptr(lock, pv_lock_hash_bits);
- struct pv_hash_entry *he;
- int hopcnt = 0;
-
- for_each_hash_entry(he, offset, hash) {
- hopcnt++;
- if (!cmpxchg(&he->lock, NULL, lock)) {
- WRITE_ONCE(he->node, node);
- lockevent_pv_hop(hopcnt);
- return &he->lock;
- }
- }
- /*
- * Hard assume there is a free entry for us.
- *
- * This is guaranteed by ensuring every blocked lock only ever consumes
- * a single entry, and since we only have 4 nesting levels per CPU
- * and allocated 4*nr_possible_cpus(), this must be so.
- *
- * The single entry is guaranteed by having the lock owner unhash
- * before it releases.
- */
- BUG();
-}
-
-static struct qnode *pv_unhash(struct qspinlock *lock)
-{
- unsigned long offset, hash = hash_ptr(lock, pv_lock_hash_bits);
- struct pv_hash_entry *he;
- struct qnode *node;
-
- for_each_hash_entry(he, offset, hash) {
- if (READ_ONCE(he->lock) == lock) {
- node = READ_ONCE(he->node);
- WRITE_ONCE(he->lock, NULL);
- return node;
- }
- }
- /*
- * Hard assume we'll find an entry.
- *
- * This guarantees a limited lookup time and is itself guaranteed by
- * having the lock owner do the unhash -- IFF the unlock sees the
- * SLOW flag, there MUST be a hash entry.
- */
- BUG();
-}
-
-/*
- * Return true if when it is time to check the previous node which is not
- * in a running state.
- */
-static inline bool
-pv_wait_early(struct qnode *prev, int loop)
-{
- if ((loop & PV_PREV_CHECK_MASK) != 0)
- return false;
-
- return READ_ONCE(prev->state) != vcpu_running;
-}
-
-/*
- * Initialize the PV part of the qnode.
- */
-static void pv_init_node(struct qnode *node)
-{
- node->cpu = smp_processor_id();
- node->state = vcpu_running;
-}
-
-/*
- * Wait for node->locked to become true, halt the vcpu after a short spin.
- * pv_kick_node() is used to set _Q_SLOW_VAL and fill in hash table on its
- * behalf.
- */
-static void pv_wait_node(struct qnode *node, struct qnode *prev)
-{
- int loop;
- bool wait_early;
-
- for (;;) {
- for (wait_early = false, loop = SPIN_THRESHOLD; loop; loop--) {
- if (READ_ONCE(node->locked))
- return;
- if (pv_wait_early(prev, loop)) {
- wait_early = true;
- break;
- }
- cpu_relax();
- }
-
- /*
- * Order node->state vs node->locked thusly:
- *
- * [S] node->state = vcpu_halted [S] next->locked = 1
- * MB MB
- * [L] node->locked [RmW] node->state = vcpu_hashed
- *
- * Matches the cmpxchg() from pv_kick_node().
- */
- smp_store_mb(node->state, vcpu_halted);
-
- if (!READ_ONCE(node->locked)) {
- lockevent_inc(pv_wait_node);
- lockevent_cond_inc(pv_wait_early, wait_early);
- pv_wait(&node->state, vcpu_halted);
- }
-
- /*
- * If pv_kick_node() changed us to vcpu_hashed, retain that
- * value so that pv_wait_head_or_lock() knows to not also try
- * to hash this lock.
- */
- cmpxchg(&node->state, vcpu_halted, vcpu_running);
-
- /*
- * If the locked flag is still not set after wakeup, it is a
- * spurious wakeup and the vCPU should wait again. However,
- * there is a pretty high overhead for CPU halting and kicking.
- * So it is better to spin for a while in the hope that the
- * MCS lock will be released soon.
- */
- lockevent_cond_inc(pv_spurious_wakeup,
- !READ_ONCE(node->locked));
- }
-
- /*
- * By now our node->locked should be 1 and our caller will not actually
- * spin-wait for it. We do however rely on our caller to do a
- * load-acquire for us.
- */
-}
-
-/*
- * Called after setting next->locked = 1 when we're the lock owner.
- *
- * Instead of waking the waiters stuck in pv_wait_node() advance their state
- * such that they're waiting in pv_wait_head_or_lock(), this avoids a
- * wake/sleep cycle.
- */
-static void pv_kick_node(struct qspinlock *lock, struct qnode *node)
-{
- /*
- * If the vCPU is indeed halted, advance its state to match that of
- * pv_wait_node(). If OTOH this fails, the vCPU was running and will
- * observe its next->locked value and advance itself.
- *
- * Matches with smp_store_mb() and cmpxchg() in pv_wait_node()
- *
- * The write to next->locked in arch_mcs_spin_unlock_contended()
- * must be ordered before the read of node->state in the cmpxchg()
- * below for the code to work correctly. To guarantee full ordering
- * irrespective of the success or failure of the cmpxchg(),
- * a relaxed version with explicit barrier is used. The control
- * dependency will order the reading of node->state before any
- * subsequent writes.
- */
- smp_mb__before_atomic();
- if (cmpxchg_relaxed(&node->state, vcpu_halted, vcpu_hashed)
- != vcpu_halted)
- return;
-
- /*
- * Put the lock into the hash table and set the _Q_SLOW_VAL.
- *
- * As this is the same vCPU that will check the _Q_SLOW_VAL value and
- * the hash table later on at unlock time, no atomic instruction is
- * needed.
- */
- WRITE_ONCE(lock->locked, _Q_SLOW_VAL);
- (void)pv_hash(lock, node);
-}
-
-/*
- * Wait for l->locked to become clear and acquire the lock;
- * halt the vcpu after a short spin.
- * __pv_queued_spin_unlock() will wake us.
- *
- * The current value of the lock will be returned for additional processing.
- */
-static u32
-pv_wait_head_or_lock(struct qspinlock *lock, struct qnode *node)
-{
- struct qspinlock **lp = NULL;
- int waitcnt = 0;
- int loop;
-
- /*
- * If pv_kick_node() already advanced our state, we don't need to
- * insert ourselves into the hash table anymore.
- */
- if (READ_ONCE(node->state) == vcpu_hashed)
- lp = (struct qspinlock **)1;
-
- /*
- * Tracking # of slowpath locking operations
- */
- lockevent_inc(lock_slowpath);
-
- for (;; waitcnt++) {
- /*
- * Set correct vCPU state to be used by queue node wait-early
- * mechanism.
- */
- WRITE_ONCE(node->state, vcpu_running);
-
- /*
- * Set the pending bit in the active lock spinning loop to
- * disable lock stealing before attempting to acquire the lock.
- */
- set_pending(lock);
- for (loop = SPIN_THRESHOLD; loop; loop--) {
- if (trylock_clear_pending(lock))
- goto gotlock;
- cpu_relax();
- }
- clear_pending(lock);
-
-
- if (!lp) { /* ONCE */
- lp = pv_hash(lock, node);
-
- /*
- * We must hash before setting _Q_SLOW_VAL, such that
- * when we observe _Q_SLOW_VAL in __pv_queued_spin_unlock()
- * we'll be sure to be able to observe our hash entry.
- *
- * [S] <hash> [Rmw] l->locked == _Q_SLOW_VAL
- * MB RMB
- * [RmW] l->locked = _Q_SLOW_VAL [L] <unhash>
- *
- * Matches the smp_rmb() in __pv_queued_spin_unlock().
- */
- if (xchg(&lock->locked, _Q_SLOW_VAL) == 0) {
- /*
- * The lock was free and now we own the lock.
- * Change the lock value back to _Q_LOCKED_VAL
- * and unhash the table.
- */
- WRITE_ONCE(lock->locked, _Q_LOCKED_VAL);
- WRITE_ONCE(*lp, NULL);
- goto gotlock;
- }
- }
- WRITE_ONCE(node->state, vcpu_hashed);
- lockevent_inc(pv_wait_head);
- lockevent_cond_inc(pv_wait_again, waitcnt);
- pv_wait(&lock->locked, _Q_SLOW_VAL);
-
- /*
- * Because of lock stealing, the queue head vCPU may not be
- * able to acquire the lock before it has to wait again.
- */
- }
-
- /*
- * The cmpxchg() or xchg() call before coming here provides the
- * acquire semantics for locking. The dummy ORing of _Q_LOCKED_VAL
- * here is to indicate to the compiler that the value will always
- * be nozero to enable better code optimization.
- */
-gotlock:
- return (u32)(atomic_read(&lock->val) | _Q_LOCKED_VAL);
-}
-
-/*
- * PV versions of the unlock fastpath and slowpath functions to be used
- * instead of queued_spin_unlock().
- */
-__visible void
-__pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked)
-{
- struct qnode *node;
-
- if (unlikely(locked != _Q_SLOW_VAL)) {
- WARN(!debug_locks_silent,
- "pvqspinlock: lock 0x%lx has corrupted value 0x%x!\n",
- (unsigned long)lock, atomic_read(&lock->val));
- return;
- }
-
- /*
- * A failed cmpxchg doesn't provide any memory-ordering guarantees,
- * so we need a barrier to order the read of the node data in
- * pv_unhash *after* we've read the lock being _Q_SLOW_VAL.
- *
- * Matches the cmpxchg() in pv_wait_head_or_lock() setting _Q_SLOW_VAL.
- */
- smp_rmb();
-
- /*
- * Since the above failed to release, this must be the SLOW path.
- * Therefore start by looking up the blocked node and unhashing it.
- */
- node = pv_unhash(lock);
-
- /*
- * Now that we have a reference to the (likely) blocked qnode,
- * release the lock.
- */
- smp_store_release(&lock->locked, 0);
-
- /*
- * At this point the memory pointed at by lock can be freed/reused,
- * however we can still use the qnode to kick the CPU.
- * The other vCPU may not really be halted, but kicking an active
- * vCPU is harmless other than the additional latency in completing
- * the unlock.
- */
- lockevent_inc(pv_kick_unlock);
- pv_kick(node->cpu);
-}
-
-/*
- * Include the architecture specific callee-save thunk of the
- * __pv_queued_spin_unlock(). This thunk is put together with
- * __pv_queued_spin_unlock() to make the callee-save thunk and the real unlock
- * function close to each other sharing consecutive instruction cachelines.
- * Alternatively, architecture specific version of __pv_queued_spin_unlock()
- * can be defined.
- */
-#include <asm/qspinlock_paravirt.h>
-
-#ifndef __pv_queued_spin_unlock
-__visible void __pv_queued_spin_unlock(struct qspinlock *lock)
-{
- u8 locked;
-
- /*
- * We must not unlock if SLOW, because in that case we must first
- * unhash. Otherwise it would be possible to have multiple @lock
- * entries, which would be BAD.
- */
- locked = cmpxchg_release(&lock->locked, _Q_LOCKED_VAL, 0);
- if (likely(locked == _Q_LOCKED_VAL))
- return;
-
- __pv_queued_spin_unlock_slowpath(lock, locked);
-}
-#endif /* __pv_queued_spin_unlock */
--
2.35.1

2022-07-13 07:24:52

by Nicholas Piggin

[permalink] [raw]
Subject: [PATCH v2 02/12] locking/qspinlock: inline mcs_spinlock functions into qspinlock

qspinlock uses mcs_spinlock for the struct type (.next, .locked, and the
misplaced .count), and arch_mcs_spin_{un}lock_contended(). These can be
trivially inlined into qspinlock, the only arch that overrides them is
arm, and it does not implement qspinlock.

The now-unused mcs_spinlock code is removed.

Signed-off-by: Nicholas Piggin <[email protected]>
---
arch/alpha/include/asm/Kbuild | 1 -
arch/arc/include/asm/Kbuild | 1 -
arch/arm/include/asm/mcs_spinlock.h | 24 ------
arch/arm64/include/asm/Kbuild | 1 -
arch/hexagon/include/asm/Kbuild | 1 -
arch/ia64/include/asm/Kbuild | 1 -
arch/m68k/include/asm/Kbuild | 1 -
arch/microblaze/include/asm/Kbuild | 1 -
arch/mips/include/asm/Kbuild | 1 -
arch/nios2/include/asm/Kbuild | 1 -
arch/parisc/include/asm/Kbuild | 1 -
arch/powerpc/include/asm/Kbuild | 1 -
arch/s390/include/asm/Kbuild | 1 -
arch/sh/include/asm/Kbuild | 1 -
arch/sparc/include/asm/Kbuild | 1 -
arch/um/include/asm/Kbuild | 1 -
arch/x86/include/asm/Kbuild | 1 -
arch/xtensa/include/asm/Kbuild | 1 -
include/asm-generic/mcs_spinlock.h | 13 ---
kernel/locking/mcs_spinlock.h | 121 ----------------------------
kernel/locking/qspinlock.c | 38 ++++-----
kernel/locking/qspinlock_paravirt.h | 53 ++++++------
22 files changed, 43 insertions(+), 223 deletions(-)
delete mode 100644 arch/arm/include/asm/mcs_spinlock.h
delete mode 100644 include/asm-generic/mcs_spinlock.h
delete mode 100644 kernel/locking/mcs_spinlock.h

diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild
index 42911c8340c7..d21cf7b3173a 100644
--- a/arch/alpha/include/asm/Kbuild
+++ b/arch/alpha/include/asm/Kbuild
@@ -3,4 +3,3 @@
generated-y += syscall_table.h
generic-y += export.h
generic-y += kvm_para.h
-generic-y += mcs_spinlock.h
diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index 3c1afa524b9c..5ae4337a9301 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -1,6 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
generic-y += extable.h
generic-y += kvm_para.h
-generic-y += mcs_spinlock.h
generic-y += parport.h
generic-y += user.h
diff --git a/arch/arm/include/asm/mcs_spinlock.h b/arch/arm/include/asm/mcs_spinlock.h
deleted file mode 100644
index 529d2cf4d06f..000000000000
--- a/arch/arm/include/asm/mcs_spinlock.h
+++ /dev/null
@@ -1,24 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __ASM_MCS_LOCK_H
-#define __ASM_MCS_LOCK_H
-
-#ifdef CONFIG_SMP
-#include <asm/spinlock.h>
-
-/* MCS spin-locking. */
-#define arch_mcs_spin_lock_contended(lock) \
-do { \
- /* Ensure prior stores are observed before we enter wfe. */ \
- smp_mb(); \
- while (!(smp_load_acquire(lock))) \
- wfe(); \
-} while (0) \
-
-#define arch_mcs_spin_unlock_contended(lock) \
-do { \
- smp_store_release(lock, 1); \
- dsb_sev(); \
-} while (0)
-
-#endif /* CONFIG_SMP */
-#endif /* __ASM_MCS_LOCK_H */
diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index 5c8ee5a541d2..57e9ad366d25 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -1,6 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
generic-y += early_ioremap.h
-generic-y += mcs_spinlock.h
generic-y += qrwlock.h
generic-y += qspinlock.h
generic-y += parport.h
diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild
index 3ece3c93fe08..37bbf99f66d4 100644
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -2,4 +2,3 @@
generic-y += extable.h
generic-y += iomap.h
generic-y += kvm_para.h
-generic-y += mcs_spinlock.h
diff --git a/arch/ia64/include/asm/Kbuild b/arch/ia64/include/asm/Kbuild
index f994c1daf9d4..a0198c12e339 100644
--- a/arch/ia64/include/asm/Kbuild
+++ b/arch/ia64/include/asm/Kbuild
@@ -1,5 +1,4 @@
# SPDX-License-Identifier: GPL-2.0
generated-y += syscall_table.h
generic-y += kvm_para.h
-generic-y += mcs_spinlock.h
generic-y += vtime.h
diff --git a/arch/m68k/include/asm/Kbuild b/arch/m68k/include/asm/Kbuild
index 1b720299deb1..8dbef73ce01d 100644
--- a/arch/m68k/include/asm/Kbuild
+++ b/arch/m68k/include/asm/Kbuild
@@ -3,5 +3,4 @@ generated-y += syscall_table.h
generic-y += export.h
generic-y += extable.h
generic-y += kvm_para.h
-generic-y += mcs_spinlock.h
generic-y += spinlock.h
diff --git a/arch/microblaze/include/asm/Kbuild b/arch/microblaze/include/asm/Kbuild
index a055f5dbe00a..7615a27e0851 100644
--- a/arch/microblaze/include/asm/Kbuild
+++ b/arch/microblaze/include/asm/Kbuild
@@ -3,7 +3,6 @@ generated-y += syscall_table.h
generic-y += cmpxchg.h
generic-y += extable.h
generic-y += kvm_para.h
-generic-y += mcs_spinlock.h
generic-y += parport.h
generic-y += syscalls.h
generic-y += tlb.h
diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index dee172716581..65cedca08771 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -9,7 +9,6 @@ generated-y += unistd_nr_o32.h

generic-y += export.h
generic-y += kvm_para.h
-generic-y += mcs_spinlock.h
generic-y += parport.h
generic-y += qrwlock.h
generic-y += qspinlock.h
diff --git a/arch/nios2/include/asm/Kbuild b/arch/nios2/include/asm/Kbuild
index 7fe7437555fb..5718eee9665c 100644
--- a/arch/nios2/include/asm/Kbuild
+++ b/arch/nios2/include/asm/Kbuild
@@ -2,6 +2,5 @@
generic-y += cmpxchg.h
generic-y += extable.h
generic-y += kvm_para.h
-generic-y += mcs_spinlock.h
generic-y += spinlock.h
generic-y += user.h
diff --git a/arch/parisc/include/asm/Kbuild b/arch/parisc/include/asm/Kbuild
index e6e7f74c8ac9..1f0c28d74c88 100644
--- a/arch/parisc/include/asm/Kbuild
+++ b/arch/parisc/include/asm/Kbuild
@@ -2,5 +2,4 @@
generated-y += syscall_table_32.h
generated-y += syscall_table_64.h
generic-y += kvm_para.h
-generic-y += mcs_spinlock.h
generic-y += user.h
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index bcf95ce0964f..813a8c3405ad 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -4,7 +4,6 @@ generated-y += syscall_table_64.h
generated-y += syscall_table_spu.h
generic-y += export.h
generic-y += kvm_types.h
-generic-y += mcs_spinlock.h
generic-y += qrwlock.h
generic-y += vtime.h
generic-y += early_ioremap.h
diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild
index 1a18d7b82f86..8b036a4ee2ca 100644
--- a/arch/s390/include/asm/Kbuild
+++ b/arch/s390/include/asm/Kbuild
@@ -7,4 +7,3 @@ generated-y += unistd_nr.h
generic-y += asm-offsets.h
generic-y += export.h
generic-y += kvm_types.h
-generic-y += mcs_spinlock.h
diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
index fc44d9c88b41..3192f19bcf85 100644
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -1,5 +1,4 @@
# SPDX-License-Identifier: GPL-2.0
generated-y += syscall_table.h
generic-y += kvm_para.h
-generic-y += mcs_spinlock.h
generic-y += parport.h
diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild
index 0b9d98ced34a..f0b913f7ba05 100644
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -3,4 +3,3 @@ generated-y += syscall_table_32.h
generated-y += syscall_table_64.h
generic-y += export.h
generic-y += kvm_para.h
-generic-y += mcs_spinlock.h
diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild
index b2d834a29f3a..04080c0c1aec 100644
--- a/arch/um/include/asm/Kbuild
+++ b/arch/um/include/asm/Kbuild
@@ -14,7 +14,6 @@ generic-y += hw_irq.h
generic-y += irq_regs.h
generic-y += irq_work.h
generic-y += kdebug.h
-generic-y += mcs_spinlock.h
generic-y += mmiowb.h
generic-y += module.lds.h
generic-y += param.h
diff --git a/arch/x86/include/asm/Kbuild b/arch/x86/include/asm/Kbuild
index 1e51650b79d7..beb7683f7b8f 100644
--- a/arch/x86/include/asm/Kbuild
+++ b/arch/x86/include/asm/Kbuild
@@ -10,4 +10,3 @@ generated-y += xen-hypercalls.h

generic-y += early_ioremap.h
generic-y += export.h
-generic-y += mcs_spinlock.h
diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild
index fa07c686cbcc..29ae65cb68c2 100644
--- a/arch/xtensa/include/asm/Kbuild
+++ b/arch/xtensa/include/asm/Kbuild
@@ -2,7 +2,6 @@
generated-y += syscall_table.h
generic-y += extable.h
generic-y += kvm_para.h
-generic-y += mcs_spinlock.h
generic-y += param.h
generic-y += parport.h
generic-y += qrwlock.h
diff --git a/include/asm-generic/mcs_spinlock.h b/include/asm-generic/mcs_spinlock.h
deleted file mode 100644
index 10cd4ffc6ba2..000000000000
--- a/include/asm-generic/mcs_spinlock.h
+++ /dev/null
@@ -1,13 +0,0 @@
-#ifndef __ASM_MCS_SPINLOCK_H
-#define __ASM_MCS_SPINLOCK_H
-
-/*
- * Architectures can define their own:
- *
- * arch_mcs_spin_lock_contended(l)
- * arch_mcs_spin_unlock_contended(l)
- *
- * See kernel/locking/mcs_spinlock.c.
- */
-
-#endif /* __ASM_MCS_SPINLOCK_H */
diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
deleted file mode 100644
index 85251d8771d9..000000000000
--- a/kernel/locking/mcs_spinlock.h
+++ /dev/null
@@ -1,121 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * MCS lock defines
- *
- * This file contains the main data structure and API definitions of MCS lock.
- *
- * The MCS lock (proposed by Mellor-Crummey and Scott) is a simple spin-lock
- * with the desirable properties of being fair, and with each cpu trying
- * to acquire the lock spinning on a local variable.
- * It avoids expensive cache bounces that common test-and-set spin-lock
- * implementations incur.
- */
-#ifndef __LINUX_MCS_SPINLOCK_H
-#define __LINUX_MCS_SPINLOCK_H
-
-#include <asm/mcs_spinlock.h>
-
-struct mcs_spinlock {
- struct mcs_spinlock *next;
- int locked; /* 1 if lock acquired */
- int count; /* nesting count, see qspinlock.c */
-};
-
-#ifndef arch_mcs_spin_lock_contended
-/*
- * Using smp_cond_load_acquire() provides the acquire semantics
- * required so that subsequent operations happen after the
- * lock is acquired. Additionally, some architectures such as
- * ARM64 would like to do spin-waiting instead of purely
- * spinning, and smp_cond_load_acquire() provides that behavior.
- */
-#define arch_mcs_spin_lock_contended(l) \
-do { \
- smp_cond_load_acquire(l, VAL); \
-} while (0)
-#endif
-
-#ifndef arch_mcs_spin_unlock_contended
-/*
- * smp_store_release() provides a memory barrier to ensure all
- * operations in the critical section has been completed before
- * unlocking.
- */
-#define arch_mcs_spin_unlock_contended(l) \
- smp_store_release((l), 1)
-#endif
-
-/*
- * Note: the smp_load_acquire/smp_store_release pair is not
- * sufficient to form a full memory barrier across
- * cpus for many architectures (except x86) for mcs_unlock and mcs_lock.
- * For applications that need a full barrier across multiple cpus
- * with mcs_unlock and mcs_lock pair, smp_mb__after_unlock_lock() should be
- * used after mcs_lock.
- */
-
-/*
- * In order to acquire the lock, the caller should declare a local node and
- * pass a reference of the node to this function in addition to the lock.
- * If the lock has already been acquired, then this will proceed to spin
- * on this node->locked until the previous lock holder sets the node->locked
- * in mcs_spin_unlock().
- */
-static inline
-void mcs_spin_lock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
-{
- struct mcs_spinlock *prev;
-
- /* Init node */
- node->locked = 0;
- node->next = NULL;
-
- /*
- * We rely on the full barrier with global transitivity implied by the
- * below xchg() to order the initialization stores above against any
- * observation of @node. And to provide the ACQUIRE ordering associated
- * with a LOCK primitive.
- */
- prev = xchg(lock, node);
- if (likely(prev == NULL)) {
- /*
- * Lock acquired, don't need to set node->locked to 1. Threads
- * only spin on its own node->locked value for lock acquisition.
- * However, since this thread can immediately acquire the lock
- * and does not proceed to spin on its own node->locked, this
- * value won't be used. If a debug mode is needed to
- * audit lock status, then set node->locked value here.
- */
- return;
- }
- WRITE_ONCE(prev->next, node);
-
- /* Wait until the lock holder passes the lock down. */
- arch_mcs_spin_lock_contended(&node->locked);
-}
-
-/*
- * Releases the lock. The caller should pass in the corresponding node that
- * was used to acquire the lock.
- */
-static inline
-void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
-{
- struct mcs_spinlock *next = READ_ONCE(node->next);
-
- if (likely(!next)) {
- /*
- * Release the lock by setting it to NULL
- */
- if (likely(cmpxchg_release(lock, node, NULL) == node))
- return;
- /* Wait until the next pointer is set */
- while (!(next = READ_ONCE(node->next)))
- cpu_relax();
- }
-
- /* Pass lock to next waiter. */
- arch_mcs_spin_unlock_contended(&next->locked);
-}
-
-#endif /* __LINUX_MCS_SPINLOCK_H */
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index a0fc21d99199..32f401e966ab 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -66,11 +66,10 @@
*
*/

-#include "mcs_spinlock.h"
#define MAX_NODES 4

/*
- * On 64-bit architectures, the mcs_spinlock structure will be 16 bytes in
+ * On 64-bit architectures, the qnode structure will be 16 bytes in
* size and four of them will fit nicely in one 64-byte cacheline. For
* pvqspinlock, however, we need more space for extra data. To accommodate
* that, we insert two more long words to pad it up to 32 bytes. IOW, only
@@ -80,7 +79,9 @@
* qspinlocks.
*/
struct qnode {
- struct mcs_spinlock mcs;
+ struct qnode *next;
+ int locked; /* 1 if lock acquired */
+ int count; /* nesting count */
#ifdef CONFIG_PARAVIRT_SPINLOCKS
int cpu;
u8 state;
@@ -124,18 +125,18 @@ static inline __pure u32 encode_tail(int cpu, int idx)
return tail;
}

-static inline __pure struct mcs_spinlock *decode_tail(u32 tail)
+static inline __pure struct qnode *decode_tail(u32 tail)
{
int cpu = (tail >> _Q_TAIL_CPU_OFFSET) - 1;
int idx = (tail & _Q_TAIL_IDX_MASK) >> _Q_TAIL_IDX_OFFSET;

- return per_cpu_ptr(&qnodes[idx].mcs, cpu);
+ return per_cpu_ptr(&qnodes[idx], cpu);
}

static inline __pure
-struct mcs_spinlock *grab_mcs_node(struct mcs_spinlock *base, int idx)
+struct qnode *grab_qnode(struct qnode *base, int idx)
{
- return &((struct qnode *)base + idx)->mcs;
+ return &base[idx];
}

#define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK)
@@ -271,13 +272,13 @@ static __always_inline void set_locked(struct qspinlock *lock)
* all the PV callbacks.
*/

-static __always_inline void __pv_init_node(struct mcs_spinlock *node) { }
-static __always_inline void __pv_wait_node(struct mcs_spinlock *node,
- struct mcs_spinlock *prev) { }
+static __always_inline void __pv_init_node(struct qnode *node) { }
+static __always_inline void __pv_wait_node(struct qnode *node,
+ struct qnode *prev) { }
static __always_inline void __pv_kick_node(struct qspinlock *lock,
- struct mcs_spinlock *node) { }
+ struct qnode *node) { }
static __always_inline u32 __pv_wait_head_or_lock(struct qspinlock *lock,
- struct mcs_spinlock *node)
+ struct qnode *node)
{ return 0; }

#define pv_enabled() false
@@ -316,7 +317,7 @@ static __always_inline u32 __pv_wait_head_or_lock(struct qspinlock *lock,
*/
void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
{
- struct mcs_spinlock *prev, *next, *node;
+ struct qnode *prev, *next, *node;
u32 old, tail;
int idx;

@@ -399,7 +400,7 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
queue:
lockevent_inc(lock_slowpath);
pv_queue:
- node = this_cpu_ptr(&qnodes[0].mcs);
+ node = this_cpu_ptr(&qnodes[0]);
idx = node->count++;
tail = encode_tail(smp_processor_id(), idx);

@@ -421,7 +422,7 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
goto release;
}

- node = grab_mcs_node(node, idx);
+ node = grab_qnode(node, idx);

/*
* Keep counts of non-zero index values:
@@ -475,7 +476,8 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
WRITE_ONCE(prev->next, node);

pv_wait_node(node, prev);
- arch_mcs_spin_lock_contended(&node->locked);
+ /* Wait for mcs node lock to be released */
+ smp_cond_load_acquire(&node->locked, VAL);

/*
* While waiting for the MCS lock, the next pointer may have
@@ -554,7 +556,7 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
if (!next)
next = smp_cond_load_relaxed(&node->next, (VAL));

- arch_mcs_spin_unlock_contended(&next->locked);
+ smp_store_release(&next->locked, 1); /* unlock the mcs node lock */
pv_kick_node(lock, next);

release:
@@ -563,7 +565,7 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
/*
* release the node
*/
- __this_cpu_dec(qnodes[0].mcs.count);
+ __this_cpu_dec(qnodes[0].count);
}
EXPORT_SYMBOL(queued_spin_lock_slowpath);

diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
index 4efe00e6b441..cce3d3dde216 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -267,14 +267,12 @@ pv_wait_early(struct qnode *prev, int loop)
}

/*
- * Initialize the PV part of the mcs_spinlock node.
+ * Initialize the PV part of the qnode.
*/
-static void pv_init_node(struct mcs_spinlock *node)
+static void pv_init_node(struct qnode *node)
{
- struct qnode *pn = (struct qnode *)node;
-
- pn->cpu = smp_processor_id();
- pn->state = vcpu_running;
+ node->cpu = smp_processor_id();
+ node->state = vcpu_running;
}

/*
@@ -282,10 +280,8 @@ static void pv_init_node(struct mcs_spinlock *node)
* pv_kick_node() is used to set _Q_SLOW_VAL and fill in hash table on its
* behalf.
*/
-static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev)
+static void pv_wait_node(struct qnode *node, struct qnode *prev)
{
- struct qnode *pn = (struct qnode *)node;
- struct qnode *pp = (struct qnode *)prev;
int loop;
bool wait_early;

@@ -293,7 +289,7 @@ static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev)
for (wait_early = false, loop = SPIN_THRESHOLD; loop; loop--) {
if (READ_ONCE(node->locked))
return;
- if (pv_wait_early(pp, loop)) {
+ if (pv_wait_early(prev, loop)) {
wait_early = true;
break;
}
@@ -301,20 +297,20 @@ static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev)
}

/*
- * Order pn->state vs pn->locked thusly:
+ * Order node->state vs node->locked thusly:
*
- * [S] pn->state = vcpu_halted [S] next->locked = 1
- * MB MB
- * [L] pn->locked [RmW] pn->state = vcpu_hashed
+ * [S] node->state = vcpu_halted [S] next->locked = 1
+ * MB MB
+ * [L] node->locked [RmW] node->state = vcpu_hashed
*
* Matches the cmpxchg() from pv_kick_node().
*/
- smp_store_mb(pn->state, vcpu_halted);
+ smp_store_mb(node->state, vcpu_halted);

if (!READ_ONCE(node->locked)) {
lockevent_inc(pv_wait_node);
lockevent_cond_inc(pv_wait_early, wait_early);
- pv_wait(&pn->state, vcpu_halted);
+ pv_wait(&node->state, vcpu_halted);
}

/*
@@ -322,7 +318,7 @@ static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev)
* value so that pv_wait_head_or_lock() knows to not also try
* to hash this lock.
*/
- cmpxchg(&pn->state, vcpu_halted, vcpu_running);
+ cmpxchg(&node->state, vcpu_halted, vcpu_running);

/*
* If the locked flag is still not set after wakeup, it is a
@@ -349,10 +345,8 @@ static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev)
* such that they're waiting in pv_wait_head_or_lock(), this avoids a
* wake/sleep cycle.
*/
-static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node)
+static void pv_kick_node(struct qspinlock *lock, struct qnode *node)
{
- struct qnode *pn = (struct qnode *)node;
-
/*
* If the vCPU is indeed halted, advance its state to match that of
* pv_wait_node(). If OTOH this fails, the vCPU was running and will
@@ -361,15 +355,15 @@ static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node)
* Matches with smp_store_mb() and cmpxchg() in pv_wait_node()
*
* The write to next->locked in arch_mcs_spin_unlock_contended()
- * must be ordered before the read of pn->state in the cmpxchg()
+ * must be ordered before the read of node->state in the cmpxchg()
* below for the code to work correctly. To guarantee full ordering
* irrespective of the success or failure of the cmpxchg(),
* a relaxed version with explicit barrier is used. The control
- * dependency will order the reading of pn->state before any
+ * dependency will order the reading of node->state before any
* subsequent writes.
*/
smp_mb__before_atomic();
- if (cmpxchg_relaxed(&pn->state, vcpu_halted, vcpu_hashed)
+ if (cmpxchg_relaxed(&node->state, vcpu_halted, vcpu_hashed)
!= vcpu_halted)
return;

@@ -381,7 +375,7 @@ static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node)
* needed.
*/
WRITE_ONCE(lock->locked, _Q_SLOW_VAL);
- (void)pv_hash(lock, pn);
+ (void)pv_hash(lock, node);
}

/*
@@ -392,9 +386,8 @@ static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node)
* The current value of the lock will be returned for additional processing.
*/
static u32
-pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
+pv_wait_head_or_lock(struct qspinlock *lock, struct qnode *node)
{
- struct qnode *pn = (struct qnode *)node;
struct qspinlock **lp = NULL;
int waitcnt = 0;
int loop;
@@ -403,7 +396,7 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
* If pv_kick_node() already advanced our state, we don't need to
* insert ourselves into the hash table anymore.
*/
- if (READ_ONCE(pn->state) == vcpu_hashed)
+ if (READ_ONCE(node->state) == vcpu_hashed)
lp = (struct qspinlock **)1;

/*
@@ -416,7 +409,7 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
* Set correct vCPU state to be used by queue node wait-early
* mechanism.
*/
- WRITE_ONCE(pn->state, vcpu_running);
+ WRITE_ONCE(node->state, vcpu_running);

/*
* Set the pending bit in the active lock spinning loop to
@@ -432,7 +425,7 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)


if (!lp) { /* ONCE */
- lp = pv_hash(lock, pn);
+ lp = pv_hash(lock, node);

/*
* We must hash before setting _Q_SLOW_VAL, such that
@@ -456,7 +449,7 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
goto gotlock;
}
}
- WRITE_ONCE(pn->state, vcpu_hashed);
+ WRITE_ONCE(node->state, vcpu_hashed);
lockevent_inc(pv_wait_head);
lockevent_cond_inc(pv_wait_again, waitcnt);
pv_wait(&lock->locked, _Q_SLOW_VAL);
--
2.35.1

2022-07-13 07:31:53

by Nicholas Piggin

[permalink] [raw]
Subject: [PATCH v2 01/12] locking/qspinlock: remove pv_node abstraction

There isn't much point trying to separate struct qnode from struct pv_node
when struct qnode has to know about pv_node anyway.

Acked-by: Boqun Feng <[email protected]>
Signed-off-by: Nicholas Piggin <[email protected]>
---
kernel/locking/qspinlock.c | 3 ++-
kernel/locking/qspinlock_paravirt.h | 34 +++++++++++------------------
2 files changed, 15 insertions(+), 22 deletions(-)

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 65a9a10caa6f..a0fc21d99199 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -82,7 +82,8 @@
struct qnode {
struct mcs_spinlock mcs;
#ifdef CONFIG_PARAVIRT_SPINLOCKS
- long reserved[2];
+ int cpu;
+ u8 state;
#endif
};

diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
index e84d21aa0722..4efe00e6b441 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -47,12 +47,6 @@ enum vcpu_state {
vcpu_hashed, /* = pv_hash'ed + vcpu_halted */
};

-struct pv_node {
- struct mcs_spinlock mcs;
- int cpu;
- u8 state;
-};
-
/*
* Hybrid PV queued/unfair lock
*
@@ -170,7 +164,7 @@ static __always_inline int trylock_clear_pending(struct qspinlock *lock)
*/
struct pv_hash_entry {
struct qspinlock *lock;
- struct pv_node *node;
+ struct qnode *node;
};

#define PV_HE_PER_LINE (SMP_CACHE_BYTES / sizeof(struct pv_hash_entry))
@@ -209,7 +203,7 @@ void __init __pv_init_lock_hash(void)
offset < (1 << pv_lock_hash_bits); \
offset++, he = &pv_lock_hash[(hash + offset) & ((1 << pv_lock_hash_bits) - 1)])

-static struct qspinlock **pv_hash(struct qspinlock *lock, struct pv_node *node)
+static struct qspinlock **pv_hash(struct qspinlock *lock, struct qnode *node)
{
unsigned long offset, hash = hash_ptr(lock, pv_lock_hash_bits);
struct pv_hash_entry *he;
@@ -236,11 +230,11 @@ static struct qspinlock **pv_hash(struct qspinlock *lock, struct pv_node *node)
BUG();
}

-static struct pv_node *pv_unhash(struct qspinlock *lock)
+static struct qnode *pv_unhash(struct qspinlock *lock)
{
unsigned long offset, hash = hash_ptr(lock, pv_lock_hash_bits);
struct pv_hash_entry *he;
- struct pv_node *node;
+ struct qnode *node;

for_each_hash_entry(he, offset, hash) {
if (READ_ONCE(he->lock) == lock) {
@@ -264,7 +258,7 @@ static struct pv_node *pv_unhash(struct qspinlock *lock)
* in a running state.
*/
static inline bool
-pv_wait_early(struct pv_node *prev, int loop)
+pv_wait_early(struct qnode *prev, int loop)
{
if ((loop & PV_PREV_CHECK_MASK) != 0)
return false;
@@ -277,9 +271,7 @@ pv_wait_early(struct pv_node *prev, int loop)
*/
static void pv_init_node(struct mcs_spinlock *node)
{
- struct pv_node *pn = (struct pv_node *)node;
-
- BUILD_BUG_ON(sizeof(struct pv_node) > sizeof(struct qnode));
+ struct qnode *pn = (struct qnode *)node;

pn->cpu = smp_processor_id();
pn->state = vcpu_running;
@@ -292,8 +284,8 @@ static void pv_init_node(struct mcs_spinlock *node)
*/
static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev)
{
- struct pv_node *pn = (struct pv_node *)node;
- struct pv_node *pp = (struct pv_node *)prev;
+ struct qnode *pn = (struct qnode *)node;
+ struct qnode *pp = (struct qnode *)prev;
int loop;
bool wait_early;

@@ -359,7 +351,7 @@ static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev)
*/
static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node)
{
- struct pv_node *pn = (struct pv_node *)node;
+ struct qnode *pn = (struct qnode *)node;

/*
* If the vCPU is indeed halted, advance its state to match that of
@@ -402,7 +394,7 @@ static void pv_kick_node(struct qspinlock *lock, struct mcs_spinlock *node)
static u32
pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
{
- struct pv_node *pn = (struct pv_node *)node;
+ struct qnode *pn = (struct qnode *)node;
struct qspinlock **lp = NULL;
int waitcnt = 0;
int loop;
@@ -492,7 +484,7 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
__visible void
__pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked)
{
- struct pv_node *node;
+ struct qnode *node;

if (unlikely(locked != _Q_SLOW_VAL)) {
WARN(!debug_locks_silent,
@@ -517,14 +509,14 @@ __pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked)
node = pv_unhash(lock);

/*
- * Now that we have a reference to the (likely) blocked pv_node,
+ * Now that we have a reference to the (likely) blocked qnode,
* release the lock.
*/
smp_store_release(&lock->locked, 0);

/*
* At this point the memory pointed at by lock can be freed/reused,
- * however we can still use the pv_node to kick the CPU.
+ * however we can still use the qnode to kick the CPU.
* The other vCPU may not really be halted, but kicking an active
* vCPU is harmless other than the additional latency in completing
* the unlock.
--
2.35.1

2022-07-13 07:36:01

by Nicholas Piggin

[permalink] [raw]
Subject: [PATCH v2 05/12] locking/qspinlock: be less clever with the preprocessor

Stop qspinlock.c including itself and avoid most of the function
renaming with the preprocessor.

This is mostly done by having the common slowpath code take a 'bool
paravirt' argument and adjusting code based on that. __always_inline
ensures the paravirt and non-paravirt cases are kept separate and
the compiler can constant-fold the 'paravirt' tests.

Signed-off-by: Nicholas Piggin <[email protected]>
---
kernel/locking/qspinlock.c | 116 ++++++++++++----------------
kernel/locking/qspinlock_paravirt.h | 10 +--
2 files changed, 52 insertions(+), 74 deletions(-)

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 95bf24d276c3..037bd5440cfd 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -11,8 +11,6 @@
* Peter Zijlstra <[email protected]>
*/

-#ifndef _GEN_PV_LOCK_SLOWPATH
-
#include <linux/smp.h>
#include <linux/bug.h>
#include <linux/cpumask.h>
@@ -287,35 +285,21 @@ static __always_inline void set_locked(struct qspinlock *lock)
WRITE_ONCE(lock->locked, _Q_LOCKED_VAL);
}

-
-/*
- * Generate the native code for queued_spin_unlock_slowpath(); provide NOPs for
- * all the PV callbacks.
- */
-
-static __always_inline void __pv_init_node(struct qnode *node) { }
-static __always_inline void __pv_wait_node(struct qnode *node,
- struct qnode *prev) { }
-static __always_inline void __pv_kick_node(struct qspinlock *lock,
- struct qnode *node) { }
-static __always_inline u32 __pv_wait_head_or_lock(struct qspinlock *lock,
- struct qnode *node)
- { return 0; }
-
-#define pv_enabled() false
-
-#define pv_init_node __pv_init_node
-#define pv_wait_node __pv_wait_node
-#define pv_kick_node __pv_kick_node
-#define pv_wait_head_or_lock __pv_wait_head_or_lock
-
#ifdef CONFIG_PARAVIRT_SPINLOCKS
-#define queued_spin_lock_slowpath native_queued_spin_lock_slowpath
-#endif
-
-#endif /* _GEN_PV_LOCK_SLOWPATH */
+#include "qspinlock_paravirt.h"
+#else /* CONFIG_PARAVIRT_SPINLOCKS */
+static __always_inline void pv_init_node(struct qnode *node) { }
+static __always_inline void pv_wait_node(struct qnode *node,
+ struct qnode *prev) { }
+static __always_inline void pv_kick_node(struct qspinlock *lock,
+ struct qnode *node) { }
+static __always_inline u32 pv_wait_head_or_lock(struct qspinlock *lock,
+ struct qnode *node)
+ { return 0; }
+static __always_inline bool pv_hybrid_queued_unfair_trylock(struct qspinlock *lock) { BUILD_BUG(); }
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */

-static inline void queued_spin_lock_mcs_queue(struct qspinlock *lock)
+static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, bool paravirt)
{
struct qnode *prev, *next, *node;
u32 val, old, tail;
@@ -340,8 +324,13 @@ static inline void queued_spin_lock_mcs_queue(struct qspinlock *lock)
*/
if (unlikely(idx >= MAX_NODES)) {
lockevent_inc(lock_no_node);
- while (!queued_spin_trylock(lock))
- cpu_relax();
+ if (paravirt) {
+ while (!pv_hybrid_queued_unfair_trylock(lock))
+ cpu_relax();
+ } else {
+ while (!queued_spin_trylock(lock))
+ cpu_relax();
+ }
goto release;
}

@@ -361,15 +350,21 @@ static inline void queued_spin_lock_mcs_queue(struct qspinlock *lock)

node->locked = 0;
node->next = NULL;
- pv_init_node(node);
+ if (paravirt)
+ pv_init_node(node);

/*
* We touched a (possibly) cold cacheline in the per-cpu queue node;
* attempt the trylock once more in the hope someone let go while we
* weren't watching.
*/
- if (queued_spin_trylock(lock))
- goto release;
+ if (paravirt) {
+ if (pv_hybrid_queued_unfair_trylock(lock))
+ goto release;
+ } else {
+ if (queued_spin_trylock(lock))
+ goto release;
+ }

/*
* Ensure that the initialisation of @node is complete before we
@@ -398,7 +393,8 @@ static inline void queued_spin_lock_mcs_queue(struct qspinlock *lock)
/* Link @node into the waitqueue. */
WRITE_ONCE(prev->next, node);

- pv_wait_node(node, prev);
+ if (paravirt)
+ pv_wait_node(node, prev);
/* Wait for mcs node lock to be released */
smp_cond_load_acquire(&node->locked, VAL);

@@ -434,8 +430,10 @@ static inline void queued_spin_lock_mcs_queue(struct qspinlock *lock)
* If PV isn't active, 0 will be returned instead.
*
*/
- if ((val = pv_wait_head_or_lock(lock, node)))
- goto locked;
+ if (paravirt) {
+ if ((val = pv_wait_head_or_lock(lock, node)))
+ goto locked;
+ }

val = atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK));

@@ -480,7 +478,8 @@ static inline void queued_spin_lock_mcs_queue(struct qspinlock *lock)
next = smp_cond_load_relaxed(&node->next, (VAL));

smp_store_release(&next->locked, 1); /* unlock the mcs node lock */
- pv_kick_node(lock, next);
+ if (paravirt)
+ pv_kick_node(lock, next);

release:
trace_contention_end(lock, 0);
@@ -512,13 +511,12 @@ static inline void queued_spin_lock_mcs_queue(struct qspinlock *lock)
* contended : (*,x,y) +--> (*,0,0) ---> (*,0,1) -' :
* queue : ^--' :
*/
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#define queued_spin_lock_slowpath native_queued_spin_lock_slowpath
+#endif
+
void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
{
- if (pv_enabled()) {
- queued_spin_lock_mcs_queue(lock);
- return;
- }
-
if (virt_spin_lock(lock))
return;

@@ -592,31 +590,17 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
*/
queue:
lockevent_inc(lock_slowpath);
- queued_spin_lock_mcs_queue(lock);
+ queued_spin_lock_mcs_queue(lock, false);
}
EXPORT_SYMBOL(queued_spin_lock_slowpath);

-/*
- * Generate the paravirt code for queued_spin_unlock_slowpath().
- */
-#if !defined(_GEN_PV_LOCK_SLOWPATH) && defined(CONFIG_PARAVIRT_SPINLOCKS)
-#define _GEN_PV_LOCK_SLOWPATH
-
-#undef pv_enabled
-#define pv_enabled() true
-
-#undef pv_init_node
-#undef pv_wait_node
-#undef pv_kick_node
-#undef pv_wait_head_or_lock
-
-#define queued_spin_lock_mcs_queue __pv_queued_spin_lock_mcs_queue
-
-#undef queued_spin_lock_slowpath
-#define queued_spin_lock_slowpath __pv_queued_spin_lock_slowpath
-
-#include "qspinlock_paravirt.h"
-#include "qspinlock.c"
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+#undef queued_spin_lock_slowpath
+void __pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
+{
+ queued_spin_lock_mcs_queue(lock, true);
+}
+EXPORT_SYMBOL(__pv_queued_spin_lock_slowpath);

bool nopvspin __initdata;
static __init int parse_nopvspin(char *arg)
@@ -625,4 +609,4 @@ static __init int parse_nopvspin(char *arg)
return 0;
}
early_param("nopvspin", parse_nopvspin);
-#endif
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
index 97385861adc2..f1922e3a0f7d 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -1,8 +1,4 @@
/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _GEN_PV_LOCK_SLOWPATH
-#error "do not include this file"
-#endif
-
#include <linux/hash.h>
#include <linux/memblock.h>
#include <linux/debug_locks.h>
@@ -50,9 +46,8 @@ enum vcpu_state {
/*
* Hybrid PV queued/unfair lock
*
- * By replacing the regular queued_spin_trylock() with the function below,
- * it will be called once when a lock waiter enter the PV slowpath before
- * being queued.
+ * This function is called once when a lock waiter enters the PV slowpath
+ * before being queued.
*
* The pending bit is set by the queue head vCPU of the MCS wait queue in
* pv_wait_head_or_lock() to signal that it is ready to spin on the lock.
@@ -71,7 +66,6 @@ enum vcpu_state {
* queued lock (no lock starvation) and an unfair lock (good performance
* on not heavily contended locks).
*/
-#define queued_spin_trylock(l) pv_hybrid_queued_unfair_trylock(l)
static inline bool pv_hybrid_queued_unfair_trylock(struct qspinlock *lock)
{
/*
--
2.35.1

2022-07-14 13:48:23

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 07/12] locking/qspinlock: remove arch qspinlock_paravirt.h includes

Hi Nicholas,

I love your patch! Perhaps something to improve:

[auto build test WARNING on tip/locking/core]
[also build test WARNING on tip/x86/core powerpc/next linus/master v5.19-rc6 next-20220714]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Nicholas-Piggin/locking-qspinlock-simplify-code-generation/20220713-151009
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git eae6d58d67d9739be5f7ae2dbead1d0ef6528243
config: i386-allyesconfig (https://download.01.org/0day-ci/archive/20220714/[email protected]/config)
compiler: gcc-11 (Debian 11.3.0-3) 11.3.0
reproduce (this is a W=1 build):
# https://github.com/intel-lab-lkp/linux/commit/87679eeea9f1939c252d16df3ac6a01bf9daaa60
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Nicholas-Piggin/locking-qspinlock-simplify-code-generation/20220713-151009
git checkout 87679eeea9f1939c252d16df3ac6a01bf9daaa60
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash kernel/locking/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

In file included from kernel/locking/qspinlock.c:29:
kernel/locking/qspinlock_stat.h:36:9: warning: no previous prototype for 'lockevent_read' [-Wmissing-prototypes]
36 | ssize_t lockevent_read(struct file *file, char __user *user_buf,
| ^~~~~~~~~~~~~~
kernel/locking/qspinlock.c:705:1: warning: no previous prototype for '__pv_queued_spin_unlock_slowpath' [-Wmissing-prototypes]
705 | __pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> kernel/locking/qspinlock.c:749:16: warning: no previous prototype for '__pv_queued_spin_unlock' [-Wmissing-prototypes]
749 | __visible void __pv_queued_spin_unlock(struct qspinlock *lock)
| ^~~~~~~~~~~~~~~~~~~~~~~


vim +/__pv_queued_spin_unlock +749 kernel/locking/qspinlock.c

91668ee1ed703d Nicholas Piggin 2022-07-13 747
91668ee1ed703d Nicholas Piggin 2022-07-13 748 #ifndef __pv_queued_spin_unlock
91668ee1ed703d Nicholas Piggin 2022-07-13 @749 __visible void __pv_queued_spin_unlock(struct qspinlock *lock)
91668ee1ed703d Nicholas Piggin 2022-07-13 750 {
91668ee1ed703d Nicholas Piggin 2022-07-13 751 u8 locked;
91668ee1ed703d Nicholas Piggin 2022-07-13 752
91668ee1ed703d Nicholas Piggin 2022-07-13 753 /*
91668ee1ed703d Nicholas Piggin 2022-07-13 754 * We must not unlock if SLOW, because in that case we must first
91668ee1ed703d Nicholas Piggin 2022-07-13 755 * unhash. Otherwise it would be possible to have multiple @lock
91668ee1ed703d Nicholas Piggin 2022-07-13 756 * entries, which would be BAD.
91668ee1ed703d Nicholas Piggin 2022-07-13 757 */
91668ee1ed703d Nicholas Piggin 2022-07-13 758 locked = cmpxchg_release(&lock->locked, _Q_LOCKED_VAL, 0);
91668ee1ed703d Nicholas Piggin 2022-07-13 759 if (likely(locked == _Q_LOCKED_VAL))
91668ee1ed703d Nicholas Piggin 2022-07-13 760 return;
91668ee1ed703d Nicholas Piggin 2022-07-13 761
91668ee1ed703d Nicholas Piggin 2022-07-13 762 __pv_queued_spin_unlock_slowpath(lock, locked);
91668ee1ed703d Nicholas Piggin 2022-07-13 763 }
87679eeea9f193 Nicholas Piggin 2022-07-13 764 EXPORT_SYMBOL(__pv_queued_spin_unlock);
91668ee1ed703d Nicholas Piggin 2022-07-13 765 #endif
91668ee1ed703d Nicholas Piggin 2022-07-13 766

--
0-DAY CI Kernel Test Service
https://01.org/lkp

2022-07-14 14:55:06

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 06/12] locking/qspinlock: merge qspinlock_paravirt.h into qspinlock.c

Hi Nicholas,

I love your patch! Perhaps something to improve:

[auto build test WARNING on tip/locking/core]
[also build test WARNING on tip/x86/core powerpc/next linus/master v5.19-rc6 next-20220714]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Nicholas-Piggin/locking-qspinlock-simplify-code-generation/20220713-151009
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git eae6d58d67d9739be5f7ae2dbead1d0ef6528243
config: i386-randconfig-a004
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 5e61b9c556267086ef9b743a0b57df302eef831b)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/91668ee1ed703d7ea84e314136dc732da05ec9e7
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Nicholas-Piggin/locking-qspinlock-simplify-code-generation/20220713-151009
git checkout 91668ee1ed703d7ea84e314136dc732da05ec9e7
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash kernel/locking/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

In file included from kernel/locking/qspinlock.c:29:
kernel/locking/qspinlock_stat.h:36:9: warning: no previous prototype for function 'lockevent_read' [-Wmissing-prototypes]
ssize_t lockevent_read(struct file *file, char __user *user_buf,
^
kernel/locking/qspinlock_stat.h:36:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
ssize_t lockevent_read(struct file *file, char __user *user_buf,
^
static
>> kernel/locking/qspinlock.c:705:1: warning: no previous prototype for function '__pv_queued_spin_unlock_slowpath' [-Wmissing-prototypes]
__pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked)
^
kernel/locking/qspinlock.c:704:11: note: declare 'static' if the function is not intended to be used outside of this translation unit
__visible void
^
static
2 warnings generated.


vim +/__pv_queued_spin_unlock_slowpath +705 kernel/locking/qspinlock.c

699
700 /*
701 * PV versions of the unlock fastpath and slowpath functions to be used
702 * instead of queued_spin_unlock().
703 */
704 __visible void
> 705 __pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked)
706 {
707 struct qnode *node;
708
709 if (unlikely(locked != _Q_SLOW_VAL)) {
710 WARN(!debug_locks_silent,
711 "pvqspinlock: lock 0x%lx has corrupted value 0x%x!\n",
712 (unsigned long)lock, atomic_read(&lock->val));
713 return;
714 }
715
716 /*
717 * A failed cmpxchg doesn't provide any memory-ordering guarantees,
718 * so we need a barrier to order the read of the node data in
719 * pv_unhash *after* we've read the lock being _Q_SLOW_VAL.
720 *
721 * Matches the cmpxchg() in pv_wait_head_or_lock() setting _Q_SLOW_VAL.
722 */
723 smp_rmb();
724
725 /*
726 * Since the above failed to release, this must be the SLOW path.
727 * Therefore start by looking up the blocked node and unhashing it.
728 */
729 node = pv_unhash(lock);
730
731 /*
732 * Now that we have a reference to the (likely) blocked qnode,
733 * release the lock.
734 */
735 smp_store_release(&lock->locked, 0);
736
737 /*
738 * At this point the memory pointed at by lock can be freed/reused,
739 * however we can still use the qnode to kick the CPU.
740 * The other vCPU may not really be halted, but kicking an active
741 * vCPU is harmless other than the additional latency in completing
742 * the unlock.
743 */
744 lockevent_inc(pv_kick_unlock);
745 pv_kick(node->cpu);
746 }
747

--
0-DAY CI Kernel Test Service
https://01.org/lkp


Attachments:
(No filename) (4.42 kB)
config (129.28 kB)
Download all attachments

2022-07-14 16:45:43

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 07/12] locking/qspinlock: remove arch qspinlock_paravirt.h includes

Hi Nicholas,

I love your patch! Perhaps something to improve:

[auto build test WARNING on tip/locking/core]
[also build test WARNING on tip/x86/core powerpc/next linus/master v5.19-rc6 next-20220714]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Nicholas-Piggin/locking-qspinlock-simplify-code-generation/20220713-151009
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git eae6d58d67d9739be5f7ae2dbead1d0ef6528243
config: i386-randconfig-a004 (https://download.01.org/0day-ci/archive/20220715/[email protected]/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 5e61b9c556267086ef9b743a0b57df302eef831b)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/87679eeea9f1939c252d16df3ac6a01bf9daaa60
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Nicholas-Piggin/locking-qspinlock-simplify-code-generation/20220713-151009
git checkout 87679eeea9f1939c252d16df3ac6a01bf9daaa60
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash kernel/locking/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

In file included from kernel/locking/qspinlock.c:29:
kernel/locking/qspinlock_stat.h:36:9: warning: no previous prototype for function 'lockevent_read' [-Wmissing-prototypes]
ssize_t lockevent_read(struct file *file, char __user *user_buf,
^
kernel/locking/qspinlock_stat.h:36:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
ssize_t lockevent_read(struct file *file, char __user *user_buf,
^
static
kernel/locking/qspinlock.c:705:1: warning: no previous prototype for function '__pv_queued_spin_unlock_slowpath' [-Wmissing-prototypes]
__pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked)
^
kernel/locking/qspinlock.c:704:11: note: declare 'static' if the function is not intended to be used outside of this translation unit
__visible void
^
static
>> kernel/locking/qspinlock.c:749:16: warning: no previous prototype for function '__pv_queued_spin_unlock' [-Wmissing-prototypes]
__visible void __pv_queued_spin_unlock(struct qspinlock *lock)
^
kernel/locking/qspinlock.c:749:11: note: declare 'static' if the function is not intended to be used outside of this translation unit
__visible void __pv_queued_spin_unlock(struct qspinlock *lock)
^
static
3 warnings generated.


vim +/__pv_queued_spin_unlock +749 kernel/locking/qspinlock.c

91668ee1ed703d7 Nicholas Piggin 2022-07-13 699
91668ee1ed703d7 Nicholas Piggin 2022-07-13 700 /*
91668ee1ed703d7 Nicholas Piggin 2022-07-13 701 * PV versions of the unlock fastpath and slowpath functions to be used
91668ee1ed703d7 Nicholas Piggin 2022-07-13 702 * instead of queued_spin_unlock().
91668ee1ed703d7 Nicholas Piggin 2022-07-13 703 */
91668ee1ed703d7 Nicholas Piggin 2022-07-13 704 __visible void
91668ee1ed703d7 Nicholas Piggin 2022-07-13 @705 __pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked)
91668ee1ed703d7 Nicholas Piggin 2022-07-13 706 {
91668ee1ed703d7 Nicholas Piggin 2022-07-13 707 struct qnode *node;
91668ee1ed703d7 Nicholas Piggin 2022-07-13 708
91668ee1ed703d7 Nicholas Piggin 2022-07-13 709 if (unlikely(locked != _Q_SLOW_VAL)) {
91668ee1ed703d7 Nicholas Piggin 2022-07-13 710 WARN(!debug_locks_silent,
91668ee1ed703d7 Nicholas Piggin 2022-07-13 711 "pvqspinlock: lock 0x%lx has corrupted value 0x%x!\n",
91668ee1ed703d7 Nicholas Piggin 2022-07-13 712 (unsigned long)lock, atomic_read(&lock->val));
91668ee1ed703d7 Nicholas Piggin 2022-07-13 713 return;
91668ee1ed703d7 Nicholas Piggin 2022-07-13 714 }
91668ee1ed703d7 Nicholas Piggin 2022-07-13 715
91668ee1ed703d7 Nicholas Piggin 2022-07-13 716 /*
91668ee1ed703d7 Nicholas Piggin 2022-07-13 717 * A failed cmpxchg doesn't provide any memory-ordering guarantees,
91668ee1ed703d7 Nicholas Piggin 2022-07-13 718 * so we need a barrier to order the read of the node data in
91668ee1ed703d7 Nicholas Piggin 2022-07-13 719 * pv_unhash *after* we've read the lock being _Q_SLOW_VAL.
91668ee1ed703d7 Nicholas Piggin 2022-07-13 720 *
91668ee1ed703d7 Nicholas Piggin 2022-07-13 721 * Matches the cmpxchg() in pv_wait_head_or_lock() setting _Q_SLOW_VAL.
91668ee1ed703d7 Nicholas Piggin 2022-07-13 722 */
91668ee1ed703d7 Nicholas Piggin 2022-07-13 723 smp_rmb();
91668ee1ed703d7 Nicholas Piggin 2022-07-13 724
91668ee1ed703d7 Nicholas Piggin 2022-07-13 725 /*
91668ee1ed703d7 Nicholas Piggin 2022-07-13 726 * Since the above failed to release, this must be the SLOW path.
91668ee1ed703d7 Nicholas Piggin 2022-07-13 727 * Therefore start by looking up the blocked node and unhashing it.
91668ee1ed703d7 Nicholas Piggin 2022-07-13 728 */
91668ee1ed703d7 Nicholas Piggin 2022-07-13 729 node = pv_unhash(lock);
91668ee1ed703d7 Nicholas Piggin 2022-07-13 730
91668ee1ed703d7 Nicholas Piggin 2022-07-13 731 /*
91668ee1ed703d7 Nicholas Piggin 2022-07-13 732 * Now that we have a reference to the (likely) blocked qnode,
91668ee1ed703d7 Nicholas Piggin 2022-07-13 733 * release the lock.
91668ee1ed703d7 Nicholas Piggin 2022-07-13 734 */
91668ee1ed703d7 Nicholas Piggin 2022-07-13 735 smp_store_release(&lock->locked, 0);
91668ee1ed703d7 Nicholas Piggin 2022-07-13 736
91668ee1ed703d7 Nicholas Piggin 2022-07-13 737 /*
91668ee1ed703d7 Nicholas Piggin 2022-07-13 738 * At this point the memory pointed at by lock can be freed/reused,
91668ee1ed703d7 Nicholas Piggin 2022-07-13 739 * however we can still use the qnode to kick the CPU.
91668ee1ed703d7 Nicholas Piggin 2022-07-13 740 * The other vCPU may not really be halted, but kicking an active
91668ee1ed703d7 Nicholas Piggin 2022-07-13 741 * vCPU is harmless other than the additional latency in completing
91668ee1ed703d7 Nicholas Piggin 2022-07-13 742 * the unlock.
91668ee1ed703d7 Nicholas Piggin 2022-07-13 743 */
91668ee1ed703d7 Nicholas Piggin 2022-07-13 744 lockevent_inc(pv_kick_unlock);
91668ee1ed703d7 Nicholas Piggin 2022-07-13 745 pv_kick(node->cpu);
91668ee1ed703d7 Nicholas Piggin 2022-07-13 746 }
91668ee1ed703d7 Nicholas Piggin 2022-07-13 747
91668ee1ed703d7 Nicholas Piggin 2022-07-13 748 #ifndef __pv_queued_spin_unlock
91668ee1ed703d7 Nicholas Piggin 2022-07-13 @749 __visible void __pv_queued_spin_unlock(struct qspinlock *lock)
91668ee1ed703d7 Nicholas Piggin 2022-07-13 750 {
91668ee1ed703d7 Nicholas Piggin 2022-07-13 751 u8 locked;
91668ee1ed703d7 Nicholas Piggin 2022-07-13 752
91668ee1ed703d7 Nicholas Piggin 2022-07-13 753 /*
91668ee1ed703d7 Nicholas Piggin 2022-07-13 754 * We must not unlock if SLOW, because in that case we must first
91668ee1ed703d7 Nicholas Piggin 2022-07-13 755 * unhash. Otherwise it would be possible to have multiple @lock
91668ee1ed703d7 Nicholas Piggin 2022-07-13 756 * entries, which would be BAD.
91668ee1ed703d7 Nicholas Piggin 2022-07-13 757 */
91668ee1ed703d7 Nicholas Piggin 2022-07-13 758 locked = cmpxchg_release(&lock->locked, _Q_LOCKED_VAL, 0);
91668ee1ed703d7 Nicholas Piggin 2022-07-13 759 if (likely(locked == _Q_LOCKED_VAL))
91668ee1ed703d7 Nicholas Piggin 2022-07-13 760 return;
91668ee1ed703d7 Nicholas Piggin 2022-07-13 761
91668ee1ed703d7 Nicholas Piggin 2022-07-13 762 __pv_queued_spin_unlock_slowpath(lock, locked);
91668ee1ed703d7 Nicholas Piggin 2022-07-13 763 }
87679eeea9f1939 Nicholas Piggin 2022-07-13 764 EXPORT_SYMBOL(__pv_queued_spin_unlock);
91668ee1ed703d7 Nicholas Piggin 2022-07-13 765 #endif
91668ee1ed703d7 Nicholas Piggin 2022-07-13 766

--
0-DAY CI Kernel Test Service
https://01.org/lkp

2022-07-14 16:47:55

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 06/12] locking/qspinlock: merge qspinlock_paravirt.h into qspinlock.c

Hi Nicholas,

I love your patch! Perhaps something to improve:

[auto build test WARNING on tip/locking/core]
[also build test WARNING on tip/x86/core powerpc/next linus/master v5.19-rc6 next-20220714]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Nicholas-Piggin/locking-qspinlock-simplify-code-generation/20220713-151009
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git eae6d58d67d9739be5f7ae2dbead1d0ef6528243
config: x86_64-rhel-8.3-func (https://download.01.org/0day-ci/archive/20220715/[email protected]/config)
compiler: gcc-11 (Debian 11.3.0-3) 11.3.0
reproduce (this is a W=1 build):
# https://github.com/intel-lab-lkp/linux/commit/91668ee1ed703d7ea84e314136dc732da05ec9e7
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Nicholas-Piggin/locking-qspinlock-simplify-code-generation/20220713-151009
git checkout 91668ee1ed703d7ea84e314136dc732da05ec9e7
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/nvme/target/ fs/xfs/ kernel/locking/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

kernel/locking/qspinlock.c: In function 'pv_wait_node':
>> kernel/locking/qspinlock.c:513:14: warning: variable 'wait_early' set but not used [-Wunused-but-set-variable]
513 | bool wait_early;
| ^~~~~~~~~~
kernel/locking/qspinlock.c: At top level:
>> kernel/locking/qspinlock.c:705:1: warning: no previous prototype for '__pv_queued_spin_unlock_slowpath' [-Wmissing-prototypes]
705 | __pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


vim +/wait_early +513 kernel/locking/qspinlock.c

504
505 /*
506 * Wait for node->locked to become true, halt the vcpu after a short spin.
507 * pv_kick_node() is used to set _Q_SLOW_VAL and fill in hash table on its
508 * behalf.
509 */
510 static void pv_wait_node(struct qnode *node, struct qnode *prev)
511 {
512 int loop;
> 513 bool wait_early;
514
515 for (;;) {
516 for (wait_early = false, loop = SPIN_THRESHOLD; loop; loop--) {
517 if (READ_ONCE(node->locked))
518 return;
519 if (pv_wait_early(prev, loop)) {
520 wait_early = true;
521 break;
522 }
523 cpu_relax();
524 }
525
526 /*
527 * Order node->state vs node->locked thusly:
528 *
529 * [S] node->state = vcpu_halted [S] next->locked = 1
530 * MB MB
531 * [L] node->locked [RmW] node->state = vcpu_hashed
532 *
533 * Matches the cmpxchg() from pv_kick_node().
534 */
535 smp_store_mb(node->state, vcpu_halted);
536
537 if (!READ_ONCE(node->locked)) {
538 lockevent_inc(pv_wait_node);
539 lockevent_cond_inc(pv_wait_early, wait_early);
540 pv_wait(&node->state, vcpu_halted);
541 }
542
543 /*
544 * If pv_kick_node() changed us to vcpu_hashed, retain that
545 * value so that pv_wait_head_or_lock() knows to not also try
546 * to hash this lock.
547 */
548 cmpxchg(&node->state, vcpu_halted, vcpu_running);
549
550 /*
551 * If the locked flag is still not set after wakeup, it is a
552 * spurious wakeup and the vCPU should wait again. However,
553 * there is a pretty high overhead for CPU halting and kicking.
554 * So it is better to spin for a while in the hope that the
555 * MCS lock will be released soon.
556 */
557 lockevent_cond_inc(pv_spurious_wakeup,
558 !READ_ONCE(node->locked));
559 }
560
561 /*
562 * By now our node->locked should be 1 and our caller will not actually
563 * spin-wait for it. We do however rely on our caller to do a
564 * load-acquire for us.
565 */
566 }
567
568 /*
569 * Called after setting next->locked = 1 when we're the lock owner.
570 *
571 * Instead of waking the waiters stuck in pv_wait_node() advance their state
572 * such that they're waiting in pv_wait_head_or_lock(), this avoids a
573 * wake/sleep cycle.
574 */
575 static void pv_kick_node(struct qspinlock *lock, struct qnode *node)
576 {
577 /*
578 * If the vCPU is indeed halted, advance its state to match that of
579 * pv_wait_node(). If OTOH this fails, the vCPU was running and will
580 * observe its next->locked value and advance itself.
581 *
582 * Matches with smp_store_mb() and cmpxchg() in pv_wait_node()
583 *
584 * The write to next->locked in arch_mcs_spin_unlock_contended()
585 * must be ordered before the read of node->state in the cmpxchg()
586 * below for the code to work correctly. To guarantee full ordering
587 * irrespective of the success or failure of the cmpxchg(),
588 * a relaxed version with explicit barrier is used. The control
589 * dependency will order the reading of node->state before any
590 * subsequent writes.
591 */
592 smp_mb__before_atomic();
593 if (cmpxchg_relaxed(&node->state, vcpu_halted, vcpu_hashed)
594 != vcpu_halted)
595 return;
596
597 /*
598 * Put the lock into the hash table and set the _Q_SLOW_VAL.
599 *
600 * As this is the same vCPU that will check the _Q_SLOW_VAL value and
601 * the hash table later on at unlock time, no atomic instruction is
602 * needed.
603 */
604 WRITE_ONCE(lock->locked, _Q_SLOW_VAL);
605 (void)pv_hash(lock, node);
606 }
607
608 /*
609 * Wait for l->locked to become clear and acquire the lock;
610 * halt the vcpu after a short spin.
611 * __pv_queued_spin_unlock() will wake us.
612 *
613 * The current value of the lock will be returned for additional processing.
614 */
615 static u32
616 pv_wait_head_or_lock(struct qspinlock *lock, struct qnode *node)
617 {
618 struct qspinlock **lp = NULL;
619 int waitcnt = 0;
620 int loop;
621
622 /*
623 * If pv_kick_node() already advanced our state, we don't need to
624 * insert ourselves into the hash table anymore.
625 */
626 if (READ_ONCE(node->state) == vcpu_hashed)
627 lp = (struct qspinlock **)1;
628
629 /*
630 * Tracking # of slowpath locking operations
631 */
632 lockevent_inc(lock_slowpath);
633
634 for (;; waitcnt++) {
635 /*
636 * Set correct vCPU state to be used by queue node wait-early
637 * mechanism.
638 */
639 WRITE_ONCE(node->state, vcpu_running);
640
641 /*
642 * Set the pending bit in the active lock spinning loop to
643 * disable lock stealing before attempting to acquire the lock.
644 */
645 set_pending(lock);
646 for (loop = SPIN_THRESHOLD; loop; loop--) {
647 if (trylock_clear_pending(lock))
648 goto gotlock;
649 cpu_relax();
650 }
651 clear_pending(lock);
652
653
654 if (!lp) { /* ONCE */
655 lp = pv_hash(lock, node);
656
657 /*
658 * We must hash before setting _Q_SLOW_VAL, such that
659 * when we observe _Q_SLOW_VAL in __pv_queued_spin_unlock()
660 * we'll be sure to be able to observe our hash entry.
661 *
662 * [S] <hash> [Rmw] l->locked == _Q_SLOW_VAL
663 * MB RMB
664 * [RmW] l->locked = _Q_SLOW_VAL [L] <unhash>
665 *
666 * Matches the smp_rmb() in __pv_queued_spin_unlock().
667 */
668 if (xchg(&lock->locked, _Q_SLOW_VAL) == 0) {
669 /*
670 * The lock was free and now we own the lock.
671 * Change the lock value back to _Q_LOCKED_VAL
672 * and unhash the table.
673 */
674 WRITE_ONCE(lock->locked, _Q_LOCKED_VAL);
675 WRITE_ONCE(*lp, NULL);
676 goto gotlock;
677 }
678 }
679 WRITE_ONCE(node->state, vcpu_hashed);
680 lockevent_inc(pv_wait_head);
681 lockevent_cond_inc(pv_wait_again, waitcnt);
682 pv_wait(&lock->locked, _Q_SLOW_VAL);
683
684 /*
685 * Because of lock stealing, the queue head vCPU may not be
686 * able to acquire the lock before it has to wait again.
687 */
688 }
689
690 /*
691 * The cmpxchg() or xchg() call before coming here provides the
692 * acquire semantics for locking. The dummy ORing of _Q_LOCKED_VAL
693 * here is to indicate to the compiler that the value will always
694 * be nozero to enable better code optimization.
695 */
696 gotlock:
697 return (u32)(atomic_read(&lock->val) | _Q_LOCKED_VAL);
698 }
699
700 /*
701 * PV versions of the unlock fastpath and slowpath functions to be used
702 * instead of queued_spin_unlock().
703 */
704 __visible void
> 705 __pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked)
706 {
707 struct qnode *node;
708
709 if (unlikely(locked != _Q_SLOW_VAL)) {
710 WARN(!debug_locks_silent,
711 "pvqspinlock: lock 0x%lx has corrupted value 0x%x!\n",
712 (unsigned long)lock, atomic_read(&lock->val));
713 return;
714 }
715
716 /*
717 * A failed cmpxchg doesn't provide any memory-ordering guarantees,
718 * so we need a barrier to order the read of the node data in
719 * pv_unhash *after* we've read the lock being _Q_SLOW_VAL.
720 *
721 * Matches the cmpxchg() in pv_wait_head_or_lock() setting _Q_SLOW_VAL.
722 */
723 smp_rmb();
724
725 /*
726 * Since the above failed to release, this must be the SLOW path.
727 * Therefore start by looking up the blocked node and unhashing it.
728 */
729 node = pv_unhash(lock);
730
731 /*
732 * Now that we have a reference to the (likely) blocked qnode,
733 * release the lock.
734 */
735 smp_store_release(&lock->locked, 0);
736
737 /*
738 * At this point the memory pointed at by lock can be freed/reused,
739 * however we can still use the qnode to kick the CPU.
740 * The other vCPU may not really be halted, but kicking an active
741 * vCPU is harmless other than the additional latency in completing
742 * the unlock.
743 */
744 lockevent_inc(pv_kick_unlock);
745 pv_kick(node->cpu);
746 }
747

--
0-DAY CI Kernel Test Service
https://01.org/lkp

2022-07-14 21:28:28

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 06/12] locking/qspinlock: merge qspinlock_paravirt.h into qspinlock.c

Hi Nicholas,

I love your patch! Perhaps something to improve:

[auto build test WARNING on tip/locking/core]
[also build test WARNING on tip/x86/core powerpc/next linus/master v5.19-rc6 next-20220714]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Nicholas-Piggin/locking-qspinlock-simplify-code-generation/20220713-151009
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git eae6d58d67d9739be5f7ae2dbead1d0ef6528243
config: x86_64-randconfig-a003 (https://download.01.org/0day-ci/archive/20220715/[email protected]/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 5e61b9c556267086ef9b743a0b57df302eef831b)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/91668ee1ed703d7ea84e314136dc732da05ec9e7
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Nicholas-Piggin/locking-qspinlock-simplify-code-generation/20220713-151009
git checkout 91668ee1ed703d7ea84e314136dc732da05ec9e7
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash kernel/locking/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

kernel/locking/qspinlock.c:513:7: warning: variable 'wait_early' set but not used [-Wunused-but-set-variable]
bool wait_early;
^
>> kernel/locking/qspinlock.c:619:6: warning: variable 'waitcnt' set but not used [-Wunused-but-set-variable]
int waitcnt = 0;
^
>> kernel/locking/qspinlock.c:705:1: warning: no previous prototype for function '__pv_queued_spin_unlock_slowpath' [-Wmissing-prototypes]
__pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked)
^
kernel/locking/qspinlock.c:704:11: note: declare 'static' if the function is not intended to be used outside of this translation unit
__visible void
^
static
3 warnings generated.


vim +/waitcnt +619 kernel/locking/qspinlock.c

607
608 /*
609 * Wait for l->locked to become clear and acquire the lock;
610 * halt the vcpu after a short spin.
611 * __pv_queued_spin_unlock() will wake us.
612 *
613 * The current value of the lock will be returned for additional processing.
614 */
615 static u32
616 pv_wait_head_or_lock(struct qspinlock *lock, struct qnode *node)
617 {
618 struct qspinlock **lp = NULL;
> 619 int waitcnt = 0;
620 int loop;
621
622 /*
623 * If pv_kick_node() already advanced our state, we don't need to
624 * insert ourselves into the hash table anymore.
625 */
626 if (READ_ONCE(node->state) == vcpu_hashed)
627 lp = (struct qspinlock **)1;
628
629 /*
630 * Tracking # of slowpath locking operations
631 */
632 lockevent_inc(lock_slowpath);
633
634 for (;; waitcnt++) {
635 /*
636 * Set correct vCPU state to be used by queue node wait-early
637 * mechanism.
638 */
639 WRITE_ONCE(node->state, vcpu_running);
640
641 /*
642 * Set the pending bit in the active lock spinning loop to
643 * disable lock stealing before attempting to acquire the lock.
644 */
645 set_pending(lock);
646 for (loop = SPIN_THRESHOLD; loop; loop--) {
647 if (trylock_clear_pending(lock))
648 goto gotlock;
649 cpu_relax();
650 }
651 clear_pending(lock);
652
653
654 if (!lp) { /* ONCE */
655 lp = pv_hash(lock, node);
656
657 /*
658 * We must hash before setting _Q_SLOW_VAL, such that
659 * when we observe _Q_SLOW_VAL in __pv_queued_spin_unlock()
660 * we'll be sure to be able to observe our hash entry.
661 *
662 * [S] <hash> [Rmw] l->locked == _Q_SLOW_VAL
663 * MB RMB
664 * [RmW] l->locked = _Q_SLOW_VAL [L] <unhash>
665 *
666 * Matches the smp_rmb() in __pv_queued_spin_unlock().
667 */
668 if (xchg(&lock->locked, _Q_SLOW_VAL) == 0) {
669 /*
670 * The lock was free and now we own the lock.
671 * Change the lock value back to _Q_LOCKED_VAL
672 * and unhash the table.
673 */
674 WRITE_ONCE(lock->locked, _Q_LOCKED_VAL);
675 WRITE_ONCE(*lp, NULL);
676 goto gotlock;
677 }
678 }
679 WRITE_ONCE(node->state, vcpu_hashed);
680 lockevent_inc(pv_wait_head);
681 lockevent_cond_inc(pv_wait_again, waitcnt);
682 pv_wait(&lock->locked, _Q_SLOW_VAL);
683
684 /*
685 * Because of lock stealing, the queue head vCPU may not be
686 * able to acquire the lock before it has to wait again.
687 */
688 }
689
690 /*
691 * The cmpxchg() or xchg() call before coming here provides the
692 * acquire semantics for locking. The dummy ORing of _Q_LOCKED_VAL
693 * here is to indicate to the compiler that the value will always
694 * be nozero to enable better code optimization.
695 */
696 gotlock:
697 return (u32)(atomic_read(&lock->val) | _Q_LOCKED_VAL);
698 }
699
700 /*
701 * PV versions of the unlock fastpath and slowpath functions to be used
702 * instead of queued_spin_unlock().
703 */
704 __visible void
> 705 __pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked)
706 {
707 struct qnode *node;
708
709 if (unlikely(locked != _Q_SLOW_VAL)) {
710 WARN(!debug_locks_silent,
711 "pvqspinlock: lock 0x%lx has corrupted value 0x%x!\n",
712 (unsigned long)lock, atomic_read(&lock->val));
713 return;
714 }
715
716 /*
717 * A failed cmpxchg doesn't provide any memory-ordering guarantees,
718 * so we need a barrier to order the read of the node data in
719 * pv_unhash *after* we've read the lock being _Q_SLOW_VAL.
720 *
721 * Matches the cmpxchg() in pv_wait_head_or_lock() setting _Q_SLOW_VAL.
722 */
723 smp_rmb();
724
725 /*
726 * Since the above failed to release, this must be the SLOW path.
727 * Therefore start by looking up the blocked node and unhashing it.
728 */
729 node = pv_unhash(lock);
730
731 /*
732 * Now that we have a reference to the (likely) blocked qnode,
733 * release the lock.
734 */
735 smp_store_release(&lock->locked, 0);
736
737 /*
738 * At this point the memory pointed at by lock can be freed/reused,
739 * however we can still use the qnode to kick the CPU.
740 * The other vCPU may not really be halted, but kicking an active
741 * vCPU is harmless other than the additional latency in completing
742 * the unlock.
743 */
744 lockevent_inc(pv_kick_unlock);
745 pv_kick(node->cpu);
746 }
747

--
0-DAY CI Kernel Test Service
https://01.org/lkp

2022-07-29 04:04:23

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH v2 06/12] locking/qspinlock: merge qspinlock_paravirt.h into qspinlock.c

Excerpts from kernel test robot's message of July 15, 2022 12:16 am:
> Hi Nicholas,
>
> I love your patch! Perhaps something to improve:
>
> [auto build test WARNING on tip/locking/core]
> [also build test WARNING on tip/x86/core powerpc/next linus/master v5.19-rc6 next-20220714]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Nicholas-Piggin/locking-qspinlock-simplify-code-generation/20220713-151009
> base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git eae6d58d67d9739be5f7ae2dbead1d0ef6528243
> config: i386-randconfig-a004
> compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 5e61b9c556267086ef9b743a0b57df302eef831b)
> reproduce (this is a W=1 build):
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # https://github.com/intel-lab-lkp/linux/commit/91668ee1ed703d7ea84e314136dc732da05ec9e7
> git remote add linux-review https://github.com/intel-lab-lkp/linux
> git fetch --no-tags linux-review Nicholas-Piggin/locking-qspinlock-simplify-code-generation/20220713-151009
> git checkout 91668ee1ed703d7ea84e314136dc732da05ec9e7
> # save the config file
> mkdir build_dir && cp config build_dir/.config
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash kernel/locking/
>
> If you fix the issue, kindly add following tag where applicable
> Reported-by: kernel test robot <[email protected]>
>
> All warnings (new ones prefixed by >>):
>
> In file included from kernel/locking/qspinlock.c:29:
> kernel/locking/qspinlock_stat.h:36:9: warning: no previous prototype for function 'lockevent_read' [-Wmissing-prototypes]
> ssize_t lockevent_read(struct file *file, char __user *user_buf,
> ^
> kernel/locking/qspinlock_stat.h:36:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
> ssize_t lockevent_read(struct file *file, char __user *user_buf,
> ^
> static
>>> kernel/locking/qspinlock.c:705:1: warning: no previous prototype for function '__pv_queued_spin_unlock_slowpath' [-Wmissing-prototypes]
> __pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked)
> ^
> kernel/locking/qspinlock.c:704:11: note: declare 'static' if the function is not intended to be used outside of this translation unit
> __visible void
> ^
> static
> 2 warnings generated.

These aren't new warnings btw, just existing W=1 warnings in code that
moved.

Thanks,
Nick