2023-07-07 02:23:04

by Henry Wu

[permalink] [raw]
Subject: Fwd: Possible race in rt_mutex_adjust_prio_chain

I forget to CC linux-kernel, sorry for duplicate mail.

Hi, Peter.

Peter Zijlstra <[email protected]> 于2023年7月6日周四 20:01写道:
>
> On Thu, Jul 06, 2023 at 02:08:20PM +0800, Henry Wu wrote:
> > Hi,
> >
> > I found that it's not safe to change waiter->prio after waiter
> > dequeued from mutex's waiter rbtree because it's still on owner's
> > pi_waiters rbtree. From my analysis, waiters on pi_waiters rbtree
> > should be protected by pi_lock of task which have pi_waiters and
> > corresponding rt_mutex's wait_lock, but pi_waiters is shared by many
> > locks owned by this task, so actually we serialize changes on
> > pi_waiters only by pi_lock.
> >
> > `rt_mutex_adjust_prio_chain' changes key of nodes of pi_waiters rbtree
> > without pi_lock and pi_waiters rbtree's invariant is violated. Maybe
> > we are enqueuing waiter on other cpu and pi_waiters rbtree will be
> > corrupted.
>
> Are you talking about [7];
>
> Where we do waiter_update_prio() while only
> holding [L] rtmutex->wait_lock.
>
> VS
>
> rt_mutex_adjust_prio() / task_top_pi_waiter() that accesses ->pi_waiters
> while holding [P] task->pi_lock.
>
> ?
>
> I'll go stare at that in more detail -- but I wanted to verify that's
> what you're talking about.
>

I refered step 7. I checked every call site of rt_mutex_adjust_prio()
and all of them are protected by the right pi_lock.

Imagine a scenario that we have two rt_mutex (M1, M2) and three
threads (A, B, C). Both rt_mutex are owned by thread A. B is blocked
when acquiring M1 and C is blocked when acquiring M2. We use
sched_setattr to change priority of B and C.

CPU0 CPU1
......... ..........

rt_mutex_adjust_prio_chain(C)
rt_mutex_adjust_prio_chain(B)
......
[7] update waiter->prio
(Now A's pi_waiters rbtree is
corrupted temporarily)
...... [11] enqueue
operation may select
insert position
according to corrupted
waiter node CPU0 created.
[11] Even though we fixed
corrupted waiter now, we
are not sure about pi_waiters's
sanity because other cpu may
create new invariant violation
based on our violation.

> > I attach a source file which can trigger this violation. I tested it
> > on Ubuntu 20.04 LTS with 5.4 kernel.
>
> Well, that's a horribly old kernel :-( Please double check on v6.4 and
> consult that code for the discussion above -- I'm really not too
> interested in debugging something ancient.

I revised my code to make it work on 6.4.2 and fixed one logic error.
I tested it on Fedora 38 with kernel 6.4.2-858.vanilla.fc38.x86_64.
You can find the new code in attachment.

$ sudo ./a.out
.........................
prio: -21
prio: -21
prio: -21
prio: -21
prio: -21
prio: -21
prio: -20
prio: -20
prio: -20
prio: -20
prio: -20
prio: -20
prio: -20
prio: -20
prio: -20
prio: -20
prio: -20
PID LWP TTY TIME CMD
4564 4564 pts/0 00:00:01 a.out
4564 4565 pts/0 00:00:00 waiter-0
4564 4566 pts/0 00:00:00 waiter-1
4564 4567 pts/0 00:00:00 waiter-2
4564 4568 pts/0 00:00:00 waiter-3
4564 4569 pts/0 00:00:00 waiter-4
4564 4570 pts/0 00:00:00 waiter-5
4564 4571 pts/0 00:00:00 waiter-6
4564 4572 pts/0 00:00:00 waiter-7
4564 4573 pts/0 00:00:00 waiter-8
4564 4574 pts/0 00:00:00 waiter-9
4564 4575 pts/0 00:00:00 waiter-10
4564 4576 pts/0 00:00:00 waiter-11
4564 4577 pts/0 00:00:00 waiter-12
4564 4578 pts/0 00:00:00 waiter-13
4564 4579 pts/0 00:00:00 waiter-14
4564 4580 pts/0 00:00:00 waiter-15
4564 4581 pts/0 00:00:00 waiter-16
4564 4582 pts/0 00:00:00 waiter-17
4564 4583 pts/0 00:00:00 waiter-18
4564 4584 pts/0 00:00:00 waiter-19
4564 4585 pts/0 00:00:00 changer-0
4564 4586 pts/0 00:00:00 changer-1
4564 4587 pts/0 00:00:00 changer-2
4564 4588 pts/0 00:00:00 changer-3
4564 4589 pts/0 00:00:00 changer-4
4564 4590 pts/0 00:00:00 changer-5
4564 4591 pts/0 00:00:00 changer-6
4564 4592 pts/0 00:00:00 changer-7
4564 4593 pts/0 00:00:00 changer-8
4564 4594 pts/0 00:00:00 changer-9
4564 4595 pts/0 00:00:00 changer-10
4564 4596 pts/0 00:00:00 changer-11
4564 4597 pts/0 00:00:00 changer-12
4564 4598 pts/0 00:00:00 changer-13
4564 4599 pts/0 00:00:00 changer-14
4564 4600 pts/0 00:00:00 changer-15
4564 4601 pts/0 00:00:00 changer-16
4564 4602 pts/0 00:00:00 changer-17
4564 4603 pts/0 00:00:00 changer-18
4564 4604 pts/0 00:00:00 changer-19
found race, hang...

$ sudo crash --no_module
.....................
crash> task -R prio,normal_prio,rt_priority,pi_waiters 4564
PID: 4564 TASK: ffff9b7c8480a8c0 CPU: 3 COMMAND: "a.out"
prio = 80,
normal_prio = 120,
rt_priority = 0,
pi_waiters = {
rb_root = {
rb_node = 0xffffb5bad2ddfcf8
},
rb_leftmost = 0xffffb5bad2da7d98
},

crash> print (struct rb_node *)0xffffb5bad2ddfcf8
$1 = (struct rb_node *) 0xffffb5bad2ddfcf8
crash> print *(struct rt_mutex_waiter *)((void *)$ - 24)
$2 = {
tree_entry = {
__rb_parent_color = 1,
rb_right = 0x0,
rb_left = 0x0
},
pi_tree_entry = {
__rb_parent_color = 1,
rb_right = 0xffffb5bad2df7d28,
rb_left = 0xffffb5bad2dafd68
},
task = 0xffff9b7c80388000,
lock = 0xffff9b7caa2cceb0,
wake_state = 3,
prio = 89,
deadline = 0,
ww_ctx = 0x0
}
crash> print $1->rb_left
$3 = (struct rb_node *) 0xffffb5bad2dafd68
crash> print *(struct rt_mutex_waiter *)((void *)$ - 24)
$4 = {
tree_entry = {
__rb_parent_color = 1,
rb_right = 0x0,
rb_left = 0x0
},
pi_tree_entry = {
__rb_parent_color = 18446662412739149049,
rb_right = 0xffffb5bad2defdb8,
rb_left = 0xffffb5bad2dbfd30
},
task = 0xffff9b7d2bca28c0,
lock = 0xffff9b7d004a2970,
wake_state = 3,
prio = 83,
deadline = 0,
ww_ctx = 0x0
}
crash> print $1->rb_left->rb_left
$5 = (struct rb_node *) 0xffffb5bad2dbfd30
crash> print *(struct rt_mutex_waiter *)((void *)$ - 24)
$6 = {
tree_entry = {
__rb_parent_color = 1,
rb_right = 0x0,
rb_left = 0x0
},
pi_tree_entry = {
__rb_parent_color = 18446662412738952552,
rb_right = 0xffffb5bad2d87d18,
rb_left = 0xffffb5bad2da7d98
},
task = 0xffff9b7cfd55a8c0,
lock = 0xffff9b7caa2cc6d0,
wake_state = 3,
prio = 79,
deadline = 0,
ww_ctx = 0x0
}
crash> print $1->rb_left->rb_left->rb_left
$7 = (struct rb_node *) 0xffffb5bad2da7d98
crash> print *(struct rt_mutex_waiter *)((void *)$ - 24)
$8 = {
tree_entry = {
__rb_parent_color = 1,
rb_right = 0x0,
rb_left = 0x0
},
pi_tree_entry = {
__rb_parent_color = 18446662412739018033,
rb_right = 0x0,
rb_left = 0x0
},
task = 0xffff9b7c80bd0000,
lock = 0xffff9b7caa2ccb50,
wake_state = 3,
prio = 80,
deadline = 0,
ww_ctx = 0x0
}
crash>

Key order invariant of pi_waiters had been violated by the last two
waiters above.

Thanks.

Henry


Attachments:
pi_642.c (5.26 kB)

2023-07-07 13:29:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Fwd: Possible race in rt_mutex_adjust_prio_chain

On Fri, Jul 07, 2023 at 10:09:04AM +0800, Henry Wu wrote:

> Imagine a scenario that we have two rt_mutex (M1, M2) and three
> threads (A, B, C). Both rt_mutex are owned by thread A. B is blocked
> when acquiring M1 and C is blocked when acquiring M2. We use
> sched_setattr to change priority of B and C.

A
/ \
M1 M2
| |
B C

And in that case L will be two different locks, which I overlooked
yesterday. So L can't save the day.

This means I either need to take P2 earlier in overlap with P1 -- which
is tricky at best -- or duplicate the data to retain consistency.

Duplicating the data would still leave the logical race condition in
that a concurrent observer will not observe the waiter in the 'right'
location, however it should be harmless since both will continue the
propagate their resp state.

The below implements this duplication and seems to not insta-crash.

I'll endeavour to write a few comments and a Changelog to go with it.

---
kernel/locking/rtmutex.c | 93 ++++++++++++++++++++++++-----------------
kernel/locking/rtmutex_api.c | 2 +-
kernel/locking/rtmutex_common.h | 34 +++++++++------
3 files changed, 76 insertions(+), 53 deletions(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 728f434de2bb..3af6772cebb6 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -336,18 +336,27 @@ static __always_inline int __waiter_prio(struct task_struct *task)
static __always_inline void
waiter_update_prio(struct rt_mutex_waiter *waiter, struct task_struct *task)
{
- waiter->prio = __waiter_prio(task);
- waiter->deadline = task->dl.deadline;
+ waiter->tree.prio = __waiter_prio(task);
+ waiter->tree.deadline = task->dl.deadline;
+}
+
+static __always_inline void
+waiter_clone_prio(struct rt_mutex_waiter *waiter)
+{
+ waiter->pi_tree.prio = waiter->tree.prio;
+ waiter->pi_tree.deadline = waiter->tree.deadline;
}

/*
- * Only use with rt_mutex_waiter_{less,equal}()
+ * Only use with rt_waiter_node_{less,equal}()
*/
+#define task_to_waiter_node(p) \
+ &(struct rt_waiter_node){ .prio = __waiter_prio(p), .deadline = (p)->dl.deadline }
#define task_to_waiter(p) \
- &(struct rt_mutex_waiter){ .prio = __waiter_prio(p), .deadline = (p)->dl.deadline }
+ &(struct rt_mutex_waiter){ .tree = *task_to_waiter_node(p) }

-static __always_inline int rt_mutex_waiter_less(struct rt_mutex_waiter *left,
- struct rt_mutex_waiter *right)
+static __always_inline int rt_waiter_node_less(struct rt_waiter_node *left,
+ struct rt_waiter_node *right)
{
if (left->prio < right->prio)
return 1;
@@ -364,8 +373,8 @@ static __always_inline int rt_mutex_waiter_less(struct rt_mutex_waiter *left,
return 0;
}

-static __always_inline int rt_mutex_waiter_equal(struct rt_mutex_waiter *left,
- struct rt_mutex_waiter *right)
+static __always_inline int rt_waiter_node_equal(struct rt_waiter_node *left,
+ struct rt_waiter_node *right)
{
if (left->prio != right->prio)
return 0;
@@ -385,7 +394,7 @@ static __always_inline int rt_mutex_waiter_equal(struct rt_mutex_waiter *left,
static inline bool rt_mutex_steal(struct rt_mutex_waiter *waiter,
struct rt_mutex_waiter *top_waiter)
{
- if (rt_mutex_waiter_less(waiter, top_waiter))
+ if (rt_waiter_node_less(&waiter->tree, &top_waiter->tree))
return true;

#ifdef RT_MUTEX_BUILD_SPINLOCKS
@@ -393,30 +402,30 @@ static inline bool rt_mutex_steal(struct rt_mutex_waiter *waiter,
* Note that RT tasks are excluded from same priority (lateral)
* steals to prevent the introduction of an unbounded latency.
*/
- if (rt_prio(waiter->prio) || dl_prio(waiter->prio))
+ if (rt_prio(waiter->tree.prio) || dl_prio(waiter->tree.prio))
return false;

- return rt_mutex_waiter_equal(waiter, top_waiter);
+ return rt_waiter_node_equal(&waiter->tree, &top_waiter->tree);
#else
return false;
#endif
}

#define __node_2_waiter(node) \
- rb_entry((node), struct rt_mutex_waiter, tree_entry)
+ rb_entry((node), struct rt_mutex_waiter, tree.entry)

static __always_inline bool __waiter_less(struct rb_node *a, const struct rb_node *b)
{
struct rt_mutex_waiter *aw = __node_2_waiter(a);
struct rt_mutex_waiter *bw = __node_2_waiter(b);

- if (rt_mutex_waiter_less(aw, bw))
+ if (rt_waiter_node_less(&aw->tree, &bw->tree))
return 1;

if (!build_ww_mutex())
return 0;

- if (rt_mutex_waiter_less(bw, aw))
+ if (rt_waiter_node_less(&bw->tree, &aw->tree))
return 0;

/* NOTE: relies on waiter->ww_ctx being set before insertion */
@@ -434,48 +443,54 @@ static __always_inline bool __waiter_less(struct rb_node *a, const struct rb_nod
static __always_inline void
rt_mutex_enqueue(struct rt_mutex_base *lock, struct rt_mutex_waiter *waiter)
{
- rb_add_cached(&waiter->tree_entry, &lock->waiters, __waiter_less);
+ rb_add_cached(&waiter->tree.entry, &lock->waiters, __waiter_less);
}

static __always_inline void
rt_mutex_dequeue(struct rt_mutex_base *lock, struct rt_mutex_waiter *waiter)
{
- if (RB_EMPTY_NODE(&waiter->tree_entry))
+ if (RB_EMPTY_NODE(&waiter->tree.entry))
return;

- rb_erase_cached(&waiter->tree_entry, &lock->waiters);
- RB_CLEAR_NODE(&waiter->tree_entry);
+ rb_erase_cached(&waiter->tree.entry, &lock->waiters);
+ RB_CLEAR_NODE(&waiter->tree.entry);
}

-#define __node_2_pi_waiter(node) \
- rb_entry((node), struct rt_mutex_waiter, pi_tree_entry)
+#define __node_2_rt_node(node) \
+ rb_entry((node), struct rt_waiter_node, entry)

-static __always_inline bool
-__pi_waiter_less(struct rb_node *a, const struct rb_node *b)
+static __always_inline bool __pi_waiter_less(struct rb_node *a, const struct rb_node *b)
{
- return rt_mutex_waiter_less(__node_2_pi_waiter(a), __node_2_pi_waiter(b));
+ return rt_waiter_node_less(__node_2_rt_node(a), __node_2_rt_node(b));
}

static __always_inline void
rt_mutex_enqueue_pi(struct task_struct *task, struct rt_mutex_waiter *waiter)
{
- rb_add_cached(&waiter->pi_tree_entry, &task->pi_waiters, __pi_waiter_less);
+ lockdep_assert_held(&task->pi_lock);
+
+ rb_add_cached(&waiter->pi_tree.entry, &task->pi_waiters, __pi_waiter_less);
}

static __always_inline void
rt_mutex_dequeue_pi(struct task_struct *task, struct rt_mutex_waiter *waiter)
{
- if (RB_EMPTY_NODE(&waiter->pi_tree_entry))
+ lockdep_assert_held(&task->pi_lock);
+
+ if (RB_EMPTY_NODE(&waiter->pi_tree.entry))
return;

- rb_erase_cached(&waiter->pi_tree_entry, &task->pi_waiters);
- RB_CLEAR_NODE(&waiter->pi_tree_entry);
+ rb_erase_cached(&waiter->pi_tree.entry, &task->pi_waiters);
+ RB_CLEAR_NODE(&waiter->pi_tree.entry);
}

-static __always_inline void rt_mutex_adjust_prio(struct task_struct *p)
+static __always_inline void rt_mutex_adjust_prio(struct rt_mutex_base *lock,
+ struct task_struct *p)
{
struct task_struct *pi_task = NULL;

+ lockdep_assert_held(&lock->wait_lock);
+ lockdep_assert(rt_mutex_owner(lock) == p);
lockdep_assert_held(&p->pi_lock);

if (task_has_pi_waiters(p))
@@ -756,7 +771,7 @@ static int __sched rt_mutex_adjust_prio_chain(struct task_struct *task,
* enabled we continue, but stop the requeueing in the chain
* walk.
*/
- if (rt_mutex_waiter_equal(waiter, task_to_waiter(task))) {
+ if (rt_waiter_node_equal(&waiter->tree, task_to_waiter_node(task))) {
if (!detect_deadlock)
goto out_unlock_pi;
else
@@ -874,11 +889,6 @@ static int __sched rt_mutex_adjust_prio_chain(struct task_struct *task,
* or
*
* DL CBS enforcement advancing the effective deadline.
- *
- * Even though pi_waiters also uses these fields, and that tree is only
- * updated in [11], we can do this here, since we hold [L], which
- * serializes all pi_waiters access and rb_erase() does not care about
- * the values of the node being removed.
*/
waiter_update_prio(waiter, task);

@@ -921,8 +931,9 @@ static int __sched rt_mutex_adjust_prio_chain(struct task_struct *task,
* and adjust the priority of the owner.
*/
rt_mutex_dequeue_pi(task, prerequeue_top_waiter);
+ waiter_clone_prio(waiter);
rt_mutex_enqueue_pi(task, waiter);
- rt_mutex_adjust_prio(task);
+ rt_mutex_adjust_prio(lock, task);

} else if (prerequeue_top_waiter == waiter) {
/*
@@ -937,8 +948,9 @@ static int __sched rt_mutex_adjust_prio_chain(struct task_struct *task,
*/
rt_mutex_dequeue_pi(task, waiter);
waiter = rt_mutex_top_waiter(lock);
+ waiter_clone_prio(waiter);
rt_mutex_enqueue_pi(task, waiter);
- rt_mutex_adjust_prio(task);
+ rt_mutex_adjust_prio(lock, task);
} else {
/*
* Nothing changed. No need to do any priority
@@ -1154,6 +1166,7 @@ static int __sched task_blocks_on_rt_mutex(struct rt_mutex_base *lock,
waiter->task = task;
waiter->lock = lock;
waiter_update_prio(waiter, task);
+ waiter_clone_prio(waiter);

/* Get the top priority waiter on the lock */
if (rt_mutex_has_waiters(lock))
@@ -1187,7 +1200,7 @@ static int __sched task_blocks_on_rt_mutex(struct rt_mutex_base *lock,
rt_mutex_dequeue_pi(owner, top_waiter);
rt_mutex_enqueue_pi(owner, waiter);

- rt_mutex_adjust_prio(owner);
+ rt_mutex_adjust_prio(lock, owner);
if (owner->pi_blocked_on)
chain_walk = 1;
} else if (rt_mutex_cond_detect_deadlock(waiter, chwalk)) {
@@ -1234,6 +1247,8 @@ static void __sched mark_wakeup_next_waiter(struct rt_wake_q_head *wqh,
{
struct rt_mutex_waiter *waiter;

+ lockdep_assert_held(&lock->wait_lock);
+
raw_spin_lock(&current->pi_lock);

waiter = rt_mutex_top_waiter(lock);
@@ -1246,7 +1261,7 @@ static void __sched mark_wakeup_next_waiter(struct rt_wake_q_head *wqh,
* task unblocks.
*/
rt_mutex_dequeue_pi(current, waiter);
- rt_mutex_adjust_prio(current);
+ rt_mutex_adjust_prio(lock, current);

/*
* As we are waking up the top waiter, and the waiter stays
@@ -1482,7 +1497,7 @@ static void __sched remove_waiter(struct rt_mutex_base *lock,
if (rt_mutex_has_waiters(lock))
rt_mutex_enqueue_pi(owner, rt_mutex_top_waiter(lock));

- rt_mutex_adjust_prio(owner);
+ rt_mutex_adjust_prio(lock, owner);

/* Store the lock on which owner is blocked or NULL */
next_lock = task_blocked_on_lock(owner);
diff --git a/kernel/locking/rtmutex_api.c b/kernel/locking/rtmutex_api.c
index cb9fdff76a8a..a6974d044593 100644
--- a/kernel/locking/rtmutex_api.c
+++ b/kernel/locking/rtmutex_api.c
@@ -459,7 +459,7 @@ void __sched rt_mutex_adjust_pi(struct task_struct *task)
raw_spin_lock_irqsave(&task->pi_lock, flags);

waiter = task->pi_blocked_on;
- if (!waiter || rt_mutex_waiter_equal(waiter, task_to_waiter(task))) {
+ if (!waiter || rt_waiter_node_equal(&waiter->tree, task_to_waiter_node(task))) {
raw_spin_unlock_irqrestore(&task->pi_lock, flags);
return;
}
diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h
index c47e8361bfb5..3d7457bffd43 100644
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -17,27 +17,34 @@
#include <linux/rtmutex.h>
#include <linux/sched/wake_q.h>

+
+/*
+ * @prio: Priority of the waiter
+ * @deadline: Deadline of the waiter if applicable
+ */
+struct rt_waiter_node {
+ struct rb_node entry;
+ int prio;
+ u64 deadline;
+};
+
/*
* This is the control structure for tasks blocked on a rt_mutex,
* which is allocated on the kernel stack on of the blocked task.
*
- * @tree_entry: pi node to enqueue into the mutex waiters tree
- * @pi_tree_entry: pi node to enqueue into the mutex owner waiters tree
+ * @tree: pi node to enqueue into the mutex waiters tree
+ * @pi_tree: pi node to enqueue into the mutex owner waiters tree
* @task: task reference to the blocked task
* @lock: Pointer to the rt_mutex on which the waiter blocks
* @wake_state: Wakeup state to use (TASK_NORMAL or TASK_RTLOCK_WAIT)
- * @prio: Priority of the waiter
- * @deadline: Deadline of the waiter if applicable
* @ww_ctx: WW context pointer
*/
struct rt_mutex_waiter {
- struct rb_node tree_entry;
- struct rb_node pi_tree_entry;
+ struct rt_waiter_node tree;
+ struct rt_waiter_node pi_tree;
struct task_struct *task;
struct rt_mutex_base *lock;
unsigned int wake_state;
- int prio;
- u64 deadline;
struct ww_acquire_ctx *ww_ctx;
};

@@ -105,7 +112,7 @@ static inline bool rt_mutex_waiter_is_top_waiter(struct rt_mutex_base *lock,
{
struct rb_node *leftmost = rb_first_cached(&lock->waiters);

- return rb_entry(leftmost, struct rt_mutex_waiter, tree_entry) == waiter;
+ return rb_entry(leftmost, struct rt_mutex_waiter, tree.entry) == waiter;
}

static inline struct rt_mutex_waiter *rt_mutex_top_waiter(struct rt_mutex_base *lock)
@@ -114,7 +121,7 @@ static inline struct rt_mutex_waiter *rt_mutex_top_waiter(struct rt_mutex_base *
struct rt_mutex_waiter *w = NULL;

if (leftmost) {
- w = rb_entry(leftmost, struct rt_mutex_waiter, tree_entry);
+ w = rb_entry(leftmost, struct rt_mutex_waiter, tree.entry);
BUG_ON(w->lock != lock);
}
return w;
@@ -127,8 +134,9 @@ static inline int task_has_pi_waiters(struct task_struct *p)

static inline struct rt_mutex_waiter *task_top_pi_waiter(struct task_struct *p)
{
+ lockdep_assert_held(&p->pi_lock);
return rb_entry(p->pi_waiters.rb_leftmost, struct rt_mutex_waiter,
- pi_tree_entry);
+ pi_tree.entry);
}

#define RT_MUTEX_HAS_WAITERS 1UL
@@ -190,8 +198,8 @@ static inline void debug_rt_mutex_free_waiter(struct rt_mutex_waiter *waiter)
static inline void rt_mutex_init_waiter(struct rt_mutex_waiter *waiter)
{
debug_rt_mutex_init_waiter(waiter);
- RB_CLEAR_NODE(&waiter->pi_tree_entry);
- RB_CLEAR_NODE(&waiter->tree_entry);
+ RB_CLEAR_NODE(&waiter->pi_tree.entry);
+ RB_CLEAR_NODE(&waiter->tree.entry);
waiter->wake_state = TASK_NORMAL;
waiter->task = NULL;
}

2023-07-07 16:11:09

by Mike Galbraith

[permalink] [raw]
Subject: Re: Fwd: Possible race in rt_mutex_adjust_prio_chain

On Fri, 2023-07-07 at 14:59 +0200, Peter Zijlstra wrote:
>
> The below implements this duplication and seems to not insta-crash.

RT bits of ww_mutex.h needed tree_entry -> tree.entry. Modulo that, RT
seems content.

-Mike

2023-07-07 16:40:22

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Fwd: Possible race in rt_mutex_adjust_prio_chain

On Fri, Jul 07, 2023 at 05:39:38PM +0200, Mike Galbraith wrote:
> On Fri, 2023-07-07 at 14:59 +0200, Peter Zijlstra wrote:
> >
> > The below implements this duplication and seems to not insta-crash.
>
> RT bits of ww_mutex.h needed tree_entry -> tree.entry. Modulo that, RT
> seems content.

Ah indeed, just in time. Added, I'll post a real patch soon.

2023-07-07 17:43:16

by Henry Wu

[permalink] [raw]
Subject: Re: Fwd: Possible race in rt_mutex_adjust_prio_chain

Hi, Peter and Mike.

Mike Galbraith <[email protected]> 于2023年7月7日周五 23:39写道:
>
> On Fri, 2023-07-07 at 14:59 +0200, Peter Zijlstra wrote:
> >
> > The below implements this duplication and seems to not insta-crash.
>
> RT bits of ww_mutex.h needed tree_entry -> tree.entry. Modulo that, RT
> seems content.
>
> -Mike

I patched my kernel with Peter's patch and tested it with my test
program. I haven't seen any race so far and I will test more tomorrow.
Fedora's default kernel config doesn't enable CONFIG_PREEMPT_RT so I
didn't come across with compile error. I will test final patch if
available.

2023-07-13 13:20:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Fwd: Possible race in rt_mutex_adjust_prio_chain

On Sat, Jul 08, 2023 at 01:28:25AM +0800, Henry Wu wrote:

> I will test final patch if available.

https://lkml.kernel.org/r/20230707161052.GF2883469%40hirez.programming.kicks-ass.net

Does that work for you; could you reply with a tested-by if possible?

2023-07-14 04:21:46

by Henry Wu

[permalink] [raw]
Subject: Re: Fwd: Possible race in rt_mutex_adjust_prio_chain

Peter:

Peter Zijlstra <[email protected]> 于2023年7月13日周四 20:55写道:
>
> On Sat, Jul 08, 2023 at 01:28:25AM +0800, Henry Wu wrote:
>
> > I will test final patch if available.
>
> https://lkml.kernel.org/r/20230707161052.GF2883469%40hirez.programming.kicks-ass.net
>
> Does that work for you; could you reply with a tested-by if possible?

Sorry, I missed your patch and I will test it as soon as possible. I
will reply in two days.

Sincerely,

Henry