Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755837AbcDLDIL (ORCPT ); Mon, 11 Apr 2016 23:08:11 -0400 Received: from mx1.redhat.com ([209.132.183.28]:49393 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755762AbcDLDIJ (ORCPT ); Mon, 11 Apr 2016 23:08:09 -0400 Reply-To: xlpang@redhat.com Subject: Re: [PATCH] sched/deadline/rtmutex: Fix a PI crash for deadline tasks References: <56FE78E0.5060504@redhat.com> <20160401215143.GB2906@worktop> <57037974.1020002@redhat.com> <20160405091954.GI3448@twins.programming.kicks-ass.net> <20160405092954.GC24771@twins.programming.kicks-ass.net> <20160408122510.15978179@gandalf.local.home> <20160408173835.GC1087@worktop> <20160408145055.13c98a75@gandalf.local.home> <20160408185916.GQ3448@twins.programming.kicks-ass.net> <57087633.3020700@redhat.com> <20160409132935.GU3448@twins.programming.kicks-ass.net> <570A0D37.8080400@redhat.com> To: Peter Zijlstra Cc: Steven Rostedt , linux-kernel@vger.kernel.org, Juri Lelli , Ingo Molnar , Thomas Gleixner From: Xunlei Pang Message-ID: <570C6694.50803@redhat.com> Date: Tue, 12 Apr 2016 11:08:04 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <570A0D37.8080400@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4151 Lines: 92 On 2016/04/10 at 16:22, Xunlei Pang wrote: > On 2016/04/09 at 21:29, Peter Zijlstra wrote: >> On Sat, Apr 09, 2016 at 11:25:39AM +0800, Xunlei Pang wrote: >> >>>> In any case, I just realized we do not in fact provide this guarantee >>>> (of pointing to a blocked task) that needs a bit more work. >>> Current patch calls rt_mutex_adjust_prio() before wake_up_q() the >>> wakee, at that moment the wakee has been removed by >>> rt_mutex_slowunlock()->mark_wakeup_next_waiter(), from current's >>> pi_waiters, rt_mutex_adjust_prio() won't see this wakee, so I think >>> this should not be problem. >> No, any wakeup after mark_wakeup_next_waiter() will make the task run. >> And since we must always consider spurious wakeups, we cannot rely on us >> (eg. our wake_up_q call) being the one and only. >> >> Therefore it is possible and the only thing that stands between us and >> doom is the fact that the wake_q stuff holds a task reference. >> >> But we cannot guarantee that the task we have a pointer to is in fact >> blocked. > Does that really matters? the pointer is accessed on behalf of current, and current > will call rt_mutex_adjust_prio() very soon to update the right pointer. > > Also the pointer is used to update current's deadline/runtime, we can restore these > params in rt_mutex_setprio() for deboost cases. I just checked current code, it did > nothing to restore current's deadline/runtime when deboosting, maybe we can leave > this job to future deadline bandwidth inheritance? > > Regards, > Xunlei > I spotted another issue, we access pi_task without any lock in enqueue_task_dl(), though we have "dl_prio(pi_task->normal_prio)" condition, that's not enough, "dl_period" and "dl_runtime" of pi_task can change, if it changed to !deadline class, dl_runtime was cleared to 0, we will hit a forever loop in replenish_dl_entity() below: while (dl_se->runtime <= 0) { dl_se->deadline += pi_se->dl_period; dl_se->runtime += pi_se->dl_runtime; } or hit "BUG_ON(pi_se->dl_runtime <= 0);". That's all because without any lock of that task, there is no guarantee. So I'm thinking of adding more members in rt_mutex_waiter(we don't lose memory, it's defined on stack) and use this structure instead of task_struct as the top waiter (i.e. using get_pi_top_waiter() instead of get_pi_top_task()), like: Step1: struct rt_mutex_waiter { int prio; + /* updated under waiter's pi_lock and rt_mutex lock */ + u64 dl_runtime, dl_period; + /* + * under owner's pi_lock, rq lock, and rt_mutex lock, copied + * directly from dl_runtime, dl_period(under same rt_mutex lock). + */ + u64 dl_runtime_copy, dl_period_copy; Similarly, adding the memember in task_struct: #ifdef CONFIG_RT_MUTEXES /* PI waiters blocked on a rt_mutex held by this task */ struct rb_root pi_waiters; struct rb_node *pi_waiters_leftmost; + struct rb_node *pi_waiters_leftmost_copy; /* updated unlock pi_lock and rq lock */ /* Deadlock detection and priority inheritance handling */ struct rt_mutex_waiter *pi_blocked_on; #endif Then, we can update "pi_waiters_leftmost_copy" together with "dl_runtime_copy" and "dl_period_copy" under rq lock, then enqueue_task_dl() can access them without any problem. Step2: We must update "dl_runtime_copy" and "dl_period_copy" under rt_mutex lock, because it is copied from "dl_runtime" and "dl_period" of rt_mutex_waiter, so we add a new update function as long as we held rq lock and rt_mutex lock, mainly can be implemented in rt_mutex_setprio. Step3: Note that rt_mutex_setprio() can be called without rtmutex lock by rt_mutex_adjust_prio(), we can add a parameter to indicate not doing the copy-updating work at that place, the same applies to rt_mutex_setprio(add a new waiter parameter and keep the original "prio" parameter). Then we can do the copy-updating work in mark_wakeup_next_waiter() before unlocking current's pi_lock, as long as we hold rq lock, because rtmutex lock and owner's pi_lock was already held. This can also solve the issue you mentioned with only a little overhead involved. Regards, Xunlei