There are some code paths in the kernel where arch_spin_lock() will be
called directly when the lock isn't expected to be contended and critical
section is short. For example, tracing_saved_cmdlines_size_read()
in kernel/trace/trace.c does that.
In most cases, preemption is also not disabled. This creates a problem
for the qspinlock slowpath which expects preemption to be disabled
to guarantee the safe use of per cpu qnodes structure. To work around
these special use cases, add a preemption count check in the slowpath
and do a simple spin-wait when preemption isn't disabled.
Fixes: a33fda35e3a7 ("Introduce a simple generic 4-byte queued spinlock")
Signed-off-by: Waiman Long <[email protected]>
---
[v2] Move down spin-wait to after the pending bit wait.
kernel/locking/qspinlock.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 65a9a10caa6f..d0159038084d 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -398,6 +398,23 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
queue:
lockevent_inc(lock_slowpath);
pv_queue:
+#ifdef CONFIG_PREEMPT_COUNT
+ /*
+ * As arch_spin_lock() can be called directly in some use cases
+ * where the lock isn't expected to be contended, critical section
+ * is short and preemption isn't disabled, we can't use qnodes in
+ * this case as state may be screwed up in case preemption happens
+ * or preemption warning may be printed (CONFIG_DEBUG_PREEMPT).
+ * Just do a simple spin-wait in this case as the lock shouldn't be
+ * contended for long.
+ */
+ if (unlikely(!preempt_count())) {
+ while (!queued_spin_trylock(lock))
+ cpu_relax();
+ return;
+ }
+#endif
+
node = this_cpu_ptr(&qnodes[0].mcs);
idx = node->count++;
tail = encode_tail(smp_processor_id(), idx);
--
2.31.1
On 9/20/22 15:55, Waiman Long wrote:
> There are some code paths in the kernel where arch_spin_lock() will be
> called directly when the lock isn't expected to be contended and critical
> section is short. For example, tracing_saved_cmdlines_size_read()
> in kernel/trace/trace.c does that.
>
> In most cases, preemption is also not disabled. This creates a problem
> for the qspinlock slowpath which expects preemption to be disabled
> to guarantee the safe use of per cpu qnodes structure. To work around
> these special use cases, add a preemption count check in the slowpath
> and do a simple spin-wait when preemption isn't disabled.
>
> Fixes: a33fda35e3a7 ("Introduce a simple generic 4-byte queued spinlock")
> Signed-off-by: Waiman Long <[email protected]>
On second thought, I believe the proper way to fix this is to make sure
that all the callers of arch_spin_lock() has preemption properly
disabled. Will work on another patch set to do that. So please ignore
this patch and sorry for the noise.
Cheers,
Longman
On Tue, Sep 20, 2022 at 03:55:42PM -0400, Waiman Long wrote:
> There are some code paths in the kernel where arch_spin_lock() will be
> called directly when the lock isn't expected to be contended and critical
> section is short. For example, tracing_saved_cmdlines_size_read()
> in kernel/trace/trace.c does that.
>
> In most cases, preemption is also not disabled. This creates a problem
> for the qspinlock slowpath which expects preemption to be disabled
Using arch_spin_lock() without disabling preemption is a straight up
bug. Don't work around that.