Date: Tue, 9 Jun 2015 11:29:59 +0200 (CEST)
From: Thomas Gleixner <tglx@linutronix.de>
To: Davidlohr Bueso <dave@stgolabs.net>
cc: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Mike Galbraith <umgwanakikbuti@gmail.com>,
        "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
        Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
        linux-kernel@vger.kernel.org
Subject: Re: [PATCH -rfc 4/4] locking/rtmutex: Support spin on owner (osq)
In-Reply-To: <1433824902.3165.61.camel@stgolabs.net>
Message-ID: <alpine.DEB.2.11.1506091028500.4133@nanos>
References: <1432056298-18738-1-git-send-email-dave@stgolabs.net>  <1432056298-18738-5-git-send-email-dave@stgolabs.net>  <alpine.DEB.2.11.1506051443250.7723@nanos> <1433824902.3165.61.camel@stgolabs.net>
User-Agent: Alpine 2.11 (DEB 23 2013-08-11)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3535
Lines: 103

On Mon, 8 Jun 2015, Davidlohr Bueso wrote:
> On Fri, 2015-06-05 at 15:59 +0200, Thomas Gleixner wrote:
> > rt_mutex_has_waiters() looks at the root pointer of the rbtree head
> > whether that's empty. You can do a lockless check of that as well,
> > right? So what's the FAST part of that function and how is that
> > related to a point after we called mark_rt_mutex_waiters()?
> 
> You're right, we could use rt_mutex_has_waiters(). When I thought of
> this originally, I was considering something like:
> 
> if (rt_mutex_has_waiters(lock)) {
> 	if (current->prio >= rt_mutex_top_waiter(lock)->prio)
> 	...
> 
> Which obviously requires the wait_lock, but I did not consider just
> using the tree. However, the consequence I see in doing this is that we
> would miss scenarios where mark_rt_mutex_waiters() is called (under nil
> owner, for example), so we would force tasks to block only when there
> are truly waiters.

Fair enough. But we really want a proper comment explaining it.

> > > +static bool rt_mutex_optimistic_spin(struct rt_mutex *lock)
> > > +{
> > > +	bool taken = false;
> > > +
> > > +	preempt_disable();
> > > +
> > > +	if (!rt_mutex_can_spin_on_owner(lock))
> > > +		goto done;
> > > +	/*
> > > +	 * In order to avoid a stampede of mutex spinners trying to
> > > +	 * acquire the mutex all at once, the spinners need to take a
> > > +	 * MCS (queued) lock first before spinning on the owner field.
> > > +	 */
> > > +	if (!osq_lock(&lock->osq))
> > > +		goto done;
> > 
> > Hmm. The queue lock is serializing potential spinners, right?
> 
> Yes.
> 
> > 
> > So that's going to lead to a potential priority ordering problem
> > because if a lower prio task wins the racing to the ocq_lock queue,
> > then the higher prio waiter will be queued behind and blocked from
> > taking the lock first.
> 
> Hmm yes, ocq is a fair lock. However I believe this is mitigated by (a)
> the conservative spinning approach, and (b) by osq_lock's need_resched()
> check, so at least a spinner will abort if a higher prio task comes in.
> But of course, this only deals with spinners, and we cannot account for
> a lower prio owner task.
> 
> So if this is not acceptable, I guess we'll have to do without the mcs
> like properties.

I'd say it accounts as priority inversion.

If you look at the RT code, then you'll notice that in the slow lock
path we queue the incoming waiter (including the PI dance) and then
spin only if the waiter is the top waiter on the lock.

Surely it would be nice to avoid the whole PI dance, but OTOH it can
lead to the following issue (and some others):

CPU0   		CPU1

T0 prio=10	T1 prio=20
lock(RTM);
		lock(RTM);
		spin()
->preempt()
T2 prio=15	   if (!owner->on_cpu)
   		       break;
		block_on_rtmutex();
			prio_boost(T0);	
->preempt()
T0 prio=20
unlock(RTM);

IIRC, the initial RT attempt was to spin w/o the PI dance but we gave
up on it due to latency and correctness issues.

I wrapped my head around doing the following:

  1) Lightweight mark the owner as boosted, w/o actually boosting it to
     keep it on the cpu 

  2) Have a priority check in the spin to drop out when a higher prio
     waiter comes in.

But #1 is a nightmare in the scheduler to do, so I gave up on it. If
you have a better idea, you're welcome.

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/