Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932528AbbFIJaU (ORCPT ); Tue, 9 Jun 2015 05:30:20 -0400 Received: from www.linutronix.de ([62.245.132.108]:45375 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753715AbbFIJaI (ORCPT ); Tue, 9 Jun 2015 05:30:08 -0400 Date: Tue, 9 Jun 2015 11:29:59 +0200 (CEST) From: Thomas Gleixner To: Davidlohr Bueso cc: Peter Zijlstra , Ingo Molnar , Steven Rostedt , Mike Galbraith , "Paul E. McKenney" , Sebastian Andrzej Siewior , linux-kernel@vger.kernel.org Subject: Re: [PATCH -rfc 4/4] locking/rtmutex: Support spin on owner (osq) In-Reply-To: <1433824902.3165.61.camel@stgolabs.net> Message-ID: References: <1432056298-18738-1-git-send-email-dave@stgolabs.net> <1432056298-18738-5-git-send-email-dave@stgolabs.net> <1433824902.3165.61.camel@stgolabs.net> User-Agent: Alpine 2.11 (DEB 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3535 Lines: 103 On Mon, 8 Jun 2015, Davidlohr Bueso wrote: > On Fri, 2015-06-05 at 15:59 +0200, Thomas Gleixner wrote: > > rt_mutex_has_waiters() looks at the root pointer of the rbtree head > > whether that's empty. You can do a lockless check of that as well, > > right? So what's the FAST part of that function and how is that > > related to a point after we called mark_rt_mutex_waiters()? > > You're right, we could use rt_mutex_has_waiters(). When I thought of > this originally, I was considering something like: > > if (rt_mutex_has_waiters(lock)) { > if (current->prio >= rt_mutex_top_waiter(lock)->prio) > ... > > Which obviously requires the wait_lock, but I did not consider just > using the tree. However, the consequence I see in doing this is that we > would miss scenarios where mark_rt_mutex_waiters() is called (under nil > owner, for example), so we would force tasks to block only when there > are truly waiters. Fair enough. But we really want a proper comment explaining it. > > > +static bool rt_mutex_optimistic_spin(struct rt_mutex *lock) > > > +{ > > > + bool taken = false; > > > + > > > + preempt_disable(); > > > + > > > + if (!rt_mutex_can_spin_on_owner(lock)) > > > + goto done; > > > + /* > > > + * In order to avoid a stampede of mutex spinners trying to > > > + * acquire the mutex all at once, the spinners need to take a > > > + * MCS (queued) lock first before spinning on the owner field. > > > + */ > > > + if (!osq_lock(&lock->osq)) > > > + goto done; > > > > Hmm. The queue lock is serializing potential spinners, right? > > Yes. > > > > > So that's going to lead to a potential priority ordering problem > > because if a lower prio task wins the racing to the ocq_lock queue, > > then the higher prio waiter will be queued behind and blocked from > > taking the lock first. > > Hmm yes, ocq is a fair lock. However I believe this is mitigated by (a) > the conservative spinning approach, and (b) by osq_lock's need_resched() > check, so at least a spinner will abort if a higher prio task comes in. > But of course, this only deals with spinners, and we cannot account for > a lower prio owner task. > > So if this is not acceptable, I guess we'll have to do without the mcs > like properties. I'd say it accounts as priority inversion. If you look at the RT code, then you'll notice that in the slow lock path we queue the incoming waiter (including the PI dance) and then spin only if the waiter is the top waiter on the lock. Surely it would be nice to avoid the whole PI dance, but OTOH it can lead to the following issue (and some others): CPU0 CPU1 T0 prio=10 T1 prio=20 lock(RTM); lock(RTM); spin() ->preempt() T2 prio=15 if (!owner->on_cpu) break; block_on_rtmutex(); prio_boost(T0); ->preempt() T0 prio=20 unlock(RTM); IIRC, the initial RT attempt was to spin w/o the PI dance but we gave up on it due to latency and correctness issues. I wrapped my head around doing the following: 1) Lightweight mark the owner as boosted, w/o actually boosting it to keep it on the cpu 2) Have a priority check in the spin to drop out when a higher prio waiter comes in. But #1 is a nightmare in the scheduler to do, so I gave up on it. If you have a better idea, you're welcome. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/