Date: Fri, 10 Apr 2015 19:44:00 +0200
From: Ingo Molnar <mingo@kernel.org>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
        Jason Low <jason.low2@hp.com>, Peter Zijlstra <peterz@infradead.org>,
        Davidlohr Bueso <dave@stgolabs.net>,
        Tim Chen <tim.c.chen@linux.intel.com>,
        Aswin Chandramouleeswaran <aswin@hp.com>,
        LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mutex: Speed up mutex_spin_on_owner() by not taking the
 RCU lock
Message-ID: <20150410174400.GA6563@gmail.com>
References: <20150409053725.GB13871@gmail.com>
 <1428561611.3506.78.camel@j-VirtualBox>
 <20150409075311.GA4645@gmail.com>
 <CA+55aFz6KKxGVxPAbsmw9GsKJfy85P2C0EmYBrGpn+aJDjZJWw@mail.gmail.com>
 <20150409175652.GI6464@linux.vnet.ibm.com>
 <CA+55aFzXMDjQQ7jTjsPdh1RikXfgV7OCd-+13cz06MOmDBA33w@mail.gmail.com>
 <CA+55aFwZWi6ecDmVsMBQJTrgrW3GD2DaRtpiOspe=5amR1=dNg@mail.gmail.com>
 <20150409183926.GM6464@linux.vnet.ibm.com>
 <20150410090051.GA28549@gmail.com>
 <20150410142024.GY6464@linux.vnet.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150410142024.GY6464@linux.vnet.ibm.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1567
Lines: 42


* Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

> > No RCU overhead, and this is the access to owner->on_cpu:
> > 
> >   69:	49 8b 81 10 c0 ff ff 	mov    -0x3ff0(%r9),%rax
> > 
> > Totally untested and all that, I only built the mutex.o.
> > 
> > What do you think? Am I missing anything?
> 
> I suspect it is good, but let's take a look at Linus' summary of the code:
> 
>         rcu_read_lock();
>         while (sem->owner == owner) {
>                 if (!owner->on_cpu || need_resched())
>                         break;
>                 cpu_relax_lowlatency();
>         }
>         rcu_read_unlock();

Note that I patched the mutex case as a prototype, which is more 
commonly used than rwsem-xadd. But the rwsem case is similar as well.

> The cpu_relax_lowlatency() looks to have barrier() semantics, so the 
> sem->owner should get reloaded every time through the loop.  This is 
> needed, because otherwise the task structure could get freed and 
> reallocated as something else that happened to have the field at the 
> ->on_cpu offset always zero, resulting in an infinite loop.

So at least with the get_kernel(..., &owner->on_cpu) approach, the 
get_kernel() copy has barrier semantics as well (it's in assembly), so 
it will be reloaded in every iteration in a natural fashion.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/