Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757884AbcDEJUL (ORCPT ); Tue, 5 Apr 2016 05:20:11 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:48829 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757862AbcDEJUI (ORCPT ); Tue, 5 Apr 2016 05:20:08 -0400 Date: Tue, 5 Apr 2016 11:19:54 +0200 From: Peter Zijlstra To: xlpang@redhat.com Cc: linux-kernel@vger.kernel.org, Juri Lelli , Ingo Molnar , Steven Rostedt , Thomas Gleixner Subject: Re: [PATCH] sched/deadline/rtmutex: Fix a PI crash for deadline tasks Message-ID: <20160405091954.GI3448@twins.programming.kicks-ass.net> References: <1459508418-25577-1-git-send-email-xlpang@redhat.com> <20160401113827.GQ3430@twins.programming.kicks-ass.net> <56FE685E.6080001@redhat.com> <19912883-8AB1-4DFD-A0E1-F23057785243@infradead.org> <56FE78E0.5060504@redhat.com> <20160401215143.GB2906@worktop> <57037974.1020002@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <57037974.1020002@redhat.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2688 Lines: 72 On Tue, Apr 05, 2016 at 04:38:12PM +0800, Xunlei Pang wrote: > On 2016/04/02 at 05:51, Peter Zijlstra wrote: > > On Fri, Apr 01, 2016 at 09:34:24PM +0800, Xunlei Pang wrote: > > > >>>> I checked the code, currently only deadline accesses the > >>>> pi_waiters/pi_waiters_leftmost > >>>> without pi_lock held via rt_mutex_get_top_task(), other cases all have > >>>> pi_lock held. > >> Any better ideas is welcome. > > Something like the below _might_ work; but its late and I haven't looked > > at the PI code in a while. This basically caches a pointer to the top > > waiter task in the running task_struct, under pi_lock and rq->lock, and > > therefore we can use it with only rq->lock held. > > > > Since the task is blocked, and cannot unblock without taking itself from > > the block chain -- which would cause rt_mutex_setprio() to set another > > top waiter task, the lifetime rules should be good. > > In rt_mutex_slowunlock(), we release pi_lock and and wait_lock first, then > wake up the top waiter, then call rt_mutex_adjust_prio(), so there is a small > window without any lock or irq disabled between the top waiter wake up > and rt_mutex_adjust_prio(), which can cause problems. That is rt_mutex_fastunlock()'s: bool deboost = slowfs(lock, &wake_q); /* -> rt_mutex_slowunlock() */ wake_up_q(&wake_q); if (deboost) rt_mutex_adjust_prio(current); (and the IRQ enabled is irrelevant, SMP can race regardless) > For example, before calling rt_mutex_adjust_prio() to adjust the cached pointer, > if current is preempted and the waken top waiter exited, after that, the task is > back, and it may enter enqueue_task_dl() before entering rt_mutex_adjust_prio(), > where the cached pointer is updated, so it will access a stale cached pointer. Hmm, so I would argue that that is a bug in any case. Its an effective priority 'leak', we should deboost before letting the booster run again. But it looks like a simple fix, simply call wake_up_q() after the deboost. The wake_q has a reference on the task so it cannot go away, which ensures any dereferences from within the DL code must still be valid. Or did I miss something (again) ? :-) --- kernel/locking/rtmutex.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 3e746607abe5..36eb232bd29f 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -1390,11 +1390,11 @@ rt_mutex_fastunlock(struct rt_mutex *lock, } else { bool deboost = slowfn(lock, &wake_q); - wake_up_q(&wake_q); - /* Undo pi boosting if necessary: */ if (deboost) rt_mutex_adjust_prio(current); + + wake_up_q(&wake_q); } }