Message-ID: <1435796198.25652.121.camel@stgolabs.net>
Subject: Re: [PATCH -tip v2 1/2] locking/rtmutex: Support spin on owner
From: Davidlohr Bueso <dave@stgolabs.net>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
        Darren Hart <dvhart@infradead.org>,
        Steven Rostedt <rostedt@goodmis.org>,
        Mike Galbraith <umgwanakikbuti@gmail.com>,
        "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
        Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
        linux-kernel@vger.kernel.org
Date: Wed, 01 Jul 2015 17:16:38 -0700
In-Reply-To: <alpine.DEB.2.11.1507020002170.3916@nanos>
References: <1435782588-4177-1-git-send-email-dave@stgolabs.net>
	 <alpine.DEB.2.11.1507020002170.3916@nanos>
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4007
Lines: 104

On Thu, 2015-07-02 at 00:27 +0200, Thomas Gleixner wrote:
> On Wed, 1 Jul 2015, Davidlohr Bueso wrote:
> 
> > Similar to what we have in other locks, particularly regular mutexes, the
> > idea is that as long as the owner is running, there is a fair chance it'll
> > release the lock soon, and thus a task trying to acquire the rtmutex will
> > better off spinning instead of blocking immediately after the fastpath.
> > Conditions to stop spinning and enter the slowpath are simple:
> > 
> > (1) Upon need_resched()
> > (2) Current lock owner blocks
> >  
> > Because rtmutexes track the lock owner atomically, we can extend the fastpath
> > to continue polling on the lock owner via cmpxchg(lock->owner, NULL, current).
> > 
> > However, this is a conservative approach, such that if there are any waiters
> > in-line, we stop spinning and immediately take the traditional slowpath. This
> > allows priority boosting to take precedence over spinning, as otherwise we
> > could starve a higher priority queued-up task (ie: top waiter) if spinners
> > constantly steal the lock.
> 
> I'm a bit wary about the whole approach. In the RT tree we spin AFTER
> we've enqueued the waiter and run priority boosting. While I can see
> the charm of your approach, i.e. avoiding the prio boost dance for the
> simple case, this can introduce larger latencies.
> 
> T1 (prio = 0)  	      T2 (prio = 50)
>  lock(RTM);
> 		      lock(RTM);
> 		       spin()
> -->preemption	        
> T3 (prio = 10) 		leave spin, because owner is not on cpu
>    	   	       
> 		       enqueue();
> 		       boost();
> 		       schedule();
> -->preemption
> T1 (prio = 50)
> 
> So we trade two extra context switches in the worst case for an
> enhancement of performance in the normal case. I cannot quantify the
> impact of this, but we really need to evaluate that proper before
> going there.

This is a very good point.

My first thought is that for a general purpose OS, the extra latency is
probably ok -- if you _really_ rely on rt characteristics enough to care
about this, you should be using the rt-patchset to begin with, methinks.
Aside from the non-blocking performance benefits of spinning, avoiding
the wait_lock (and the pi_lock in the case of doing the spinning AFTER
the boosting) can ease a lot of the lock contention.

> Aside of that, if the lock is really contended, then you force all
> spinners off the cpu, if one of the spinners starts blocking simply
> because you have no idea which one is the top prio spinner.
> 
> T1 (prio = 0)  	      T2 (prio = 50)  	  T3 (prio = 10)
>  lock(RTM);
> 		      lock(RTM);	  lock(RTM);
> 		       spin()		   spin();
> 		               		  --> preemption
> 					   enqueue()
> 					   boost();
> 					   schedule();
> 		       sees waiter bit
> 		       enqueue();
> 		       boost();
> 		       schedule();
> 
> T2 could happily keep spinning despite T3 going to sleep. I'm not sure
> if that's what we want to achieve.

Yeah this is one of the reasons why I was tempted of checking the
top-waiter prio against current prio to possibly keep spinning. Now
doing so without holding the wait_lock is obviously racy, however safe
afaict. If the top-waiter changes after it is checked by the spinner,
and we get it wrong, at worst we send that thread falsely to sleep,
otherwise we bogus spin once more -- neither of which is the end of the
world. Ie:

[top waiter prio 10]
T1 (prio 0)						T2		T3 (prio 0)
lock(RTM)
 if (rt_mutex_has_waiters(lock) &&					lock(RTM)
     rt_mutex_top_waiter()->prio > current->prio){			spin
	   ...						[release lock]
							[wakeup top waiter]

									[adds itself to the tree]
     }

But I could be overlooking something, which is why I chose to exclude it
in this patch.

Thanks,
Davidlohr


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/