Date: Thu, 30 Jul 2015 08:34:52 -0700
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, mingo@kernel.org, jiangshanlai@gmail.com,
        dipankar@in.ibm.com, akpm@linux-foundation.org,
        mathieu.desnoyers@efficios.com, josh@joshtriplett.org,
        tglx@linutronix.de, rostedt@goodmis.org, dhowells@redhat.com,
        edumazet@google.com, dvhart@linux.intel.com, fweisbec@gmail.com,
        oleg@redhat.com, bobby.prani@gmail.com, dave@stgolabs.net,
        waiman.long@hp.com
Subject: Re: [PATCH tip/core/rcu 19/19] rcu: Add fastpath bypassing funnel
 locking
Message-ID: <20150730153452.GG27280@linux.vnet.ibm.com>
Reply-To: paulmck@linux.vnet.ibm.com
References: <20150717232901.GA22511@linux.vnet.ibm.com>
 <1437175764-24096-1-git-send-email-paulmck@linux.vnet.ibm.com>
 <1437175764-24096-19-git-send-email-paulmck@linux.vnet.ibm.com>
 <20150730144455.GZ19282@twins.programming.kicks-ass.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150730144455.GZ19282@twins.programming.kicks-ass.net>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2116
Lines: 56

On Thu, Jul 30, 2015 at 04:44:55PM +0200, Peter Zijlstra wrote:
> On Fri, Jul 17, 2015 at 04:29:24PM -0700, Paul E. McKenney wrote:
> 
> >  	/*
> > +	 * First try directly acquiring the root lock in order to reduce
> > +	 * latency in the common case where expedited grace periods are
> > +	 * rare.  We check mutex_is_locked() to avoid pathological levels of
> > +	 * memory contention on ->exp_funnel_mutex in the heavy-load case.
> > +	 */
> > +	rnp0 = rcu_get_root(rsp);
> > +	if (!mutex_is_locked(&rnp0->exp_funnel_mutex)) {
> > +		if (mutex_trylock(&rnp0->exp_funnel_mutex)) {
> > +			if (sync_exp_work_done(rsp, rnp0, NULL,
> > +					       &rsp->expedited_workdone0, s))
> > +				return NULL;
> > +			return rnp0;
> > +		}
> > +	}
> 
> So our 'new' locking primitives do things like:
> 
> static __always_inline int queued_spin_trylock(struct qspinlock *lock)
> {
>         if (!atomic_read(&lock->val) &&
>            (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) == 0))
>                 return 1;
>         return 0;
> }
> 
> mutexes do not do this.
> 
> Now I suppose the question is, does that extra read slow down the
> (common) uncontended case? (remember, we should optimize locks for the
> uncontended case, heavy lock contention should be fixed with better
> locking schemes, not lock implementations).
> 
> Davidlohr, Waiman, do we have data on this?
> 
> If the extra read before the cmpxchg() does not hurt, we should do the
> same for mutex and make the above redundant.

I am pretty sure that different hardware wants it done differently.  :-/
So I agree that hard data would be good.

I could probably further optimize the RCU code by checking for a
single-node tree, but I am not convinced that this is worthwhile.
However, skipping three cache misses in the uncontended case is
definitely worthwhile, hence this patch.  ;-)

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/