2017-07-14 13:49:34

by Prateek Sood

[permalink] [raw]
Subject: [PATCH] osq_lock: avoid live-lock issue for RT task

Live Lock due to task spinning while unqueue of CPU osq_node
from optimistic_spin_queue. Task T1 had decremented mutex count to
acquire the lock on CPU0. Before setting owner it got preempted. On
CPU1 task T2 acquired osq_lock and started spinning on owner of mutex
with preemption disabled. CPU1 runq has one task, so need_resched will
not be set. On CPU0 task T3 tried to acquire osq_lock to spin on the
same mutex. At this time following scenario causes soft lockup:

After preemption of task T1, RT task T3 tried to acquire the same
mutex. It will start spinning on the osq_lock until the lock is available
or need_resched is set. For RT task, need_resched will not be set. Task T3
will not be able to bail out of the infinite loop.

Signed-off-by: Prateek Sood <[email protected]>
---
kernel/locking/osq_lock.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 05a3785..99b8d99 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -1,6 +1,7 @@
#include <linux/percpu.h>
#include <linux/sched.h>
#include <linux/osq_lock.h>
+#include <linux/sched/rt.h>

/*
* An MCS like lock especially tailored for optimistic spinning for sleeping
@@ -85,6 +86,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
{
struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
struct optimistic_spin_node *prev, *next;
+ struct task_struct *task = current;
int curr = encode_cpu(smp_processor_id());
int old;

@@ -118,8 +120,13 @@ bool osq_lock(struct optimistic_spin_queue *lock)
while (!READ_ONCE(node->locked)) {
/*
* If we need to reschedule bail... so we can block.
+ * If a task spins on owner on a CPU after acquiring
+ * osq_lock while a RT task spins on another CPU to
+ * acquire osq_lock, it will starve the owner from
+ * completing if owner is to be scheduled on the same CPU.
+ * It will be a live lock.
*/
- if (need_resched())
+ if (need_resched() || rt_task(task))
goto unqueue;

cpu_relax_lowlatency();
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.,
is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.


2017-07-18 11:36:22

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] osq_lock: avoid live-lock issue for RT task

On Fri, Jul 14, 2017 at 07:19:09PM +0530, Prateek Sood wrote:
> Live Lock due to task spinning while unqueue of CPU osq_node
> from optimistic_spin_queue. Task T1 had decremented mutex count to
> acquire the lock on CPU0. Before setting owner it got preempted.

You've been working on ancient kernels... That can no longer happen.

Please see if this is still an issue after:

3ca0ff571b09 ("locking/mutex: Rework mutex::owner")

If so, please write an up-to-date Changelog and patch (it doesn't
apply because of commit

5aff60a191e5 ("locking/osq: Break out of spin-wait busy waiting loop for a preempted vCPU in osq_lock()")

)