DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 19FEF235E1
Date: Tue, 13 Jun 2017 21:15:47 -0400
From: Steven Rostedt <rostedt@goodmis.org>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
        Krister Johansen <kjlx@templeofstupid.com>,
        Ingo Molnar <mingo@kernel.org>, linux-kernel@vger.kernel.org,
        Paul Gortmaker <paul.gortmaker@windriver.com>,
        Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH tip/sched/core] Add comments to aid in safer usage of
 swake_up.
Message-ID: <20170613211547.49814d25@gandalf.local.home>
In-Reply-To: <20170613234205.GD3721@linux.vnet.ibm.com>
References: <20170609032546.GF2553@templeofstupid.com>
        <20170609071957.GJ8337@worktop.programming.kicks-ass.net>
        <20170609124554.GF3721@linux.vnet.ibm.com>
        <20170613192308.173dd86a@gandalf.local.home>
        <20170613234205.GD3721@linux.vnet.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 8BIT
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4366
Lines: 123

On Tue, 13 Jun 2017 16:42:05 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Tue, Jun 13, 2017 at 07:23:08PM -0400, Steven Rostedt wrote:
> > On Fri, 9 Jun 2017 05:45:54 -0700
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> >   
> > > On Fri, Jun 09, 2017 at 09:19:57AM +0200, Peter Zijlstra wrote:  
> > > > On Thu, Jun 08, 2017 at 08:25:46PM -0700, Krister Johansen wrote:    
> > > > > The behavior of swake_up() differs from that of wake_up(), and from the
> > > > > swake_up() that came from RT linux. A memory barrier, or some other
> > > > > synchronization, is needed prior to a swake_up so that the waiter sees
> > > > > the condition set by the waker, and so that the waker does not see an
> > > > > empty wait list.    
> > > > 
> > > > Urgh.. let me stare at that. But it sounds like the wrong solution since
> > > > we wanted to keep the wait and swait APIs as close as possible.    
> > > 
> > > But don't they both need some sort of ordering, be it memory barriers or
> > > locking, to handle the case where the wait/swait doesn't actually sleep?
> > >   
> > 
> > Looking at an RCU example, and assuming that ordering can move around
> > within a spin lock, and that changes can leak into a spin lock region
> > from both before and after. Could we have:
> > 
> > (looking at __call_rcu_core() and rcu_gp_kthread()
> > 
> > 	CPU0				CPU1
> > 	----				----
> > 				__call_rcu_core() {
> > 
> > 				 spin_lock(rnp_root)
> > 				 need_wake = __rcu_start_gp() {
> > 				  rcu_start_gp_advanced() {
> > 				   gp_flags = FLAG_INIT
> > 				  }
> > 				 }
> > 
> >  rcu_gp_kthread() {
> >    swait_event_interruptible(wq,
> > 	gp_flags & FLAG_INIT) {
> >    spin_lock(q->lock)
> > 
> > 				*fetch wq->task_list here! *
> > 
> >    list_add(wq->task_list, q->task_list)
> >    spin_unlock(q->lock);
> > 
> >    *fetch old value of gp_flags here *  
> 
> Both reads of ->gp_flags are READ_ONCE(), so having seen the new value
> in swait_event_interruptible(), this task/CPU cannot see the old value
> from some later access.  You have to have accesses to two different
> variables to require a memory barrier (at least assuming consistent use
> of READ_ONCE(), WRITE_ONCE(), or equivalent).

If I'm not mistaken, READ_ONCE() and WRITE_ONCE() is just volatiles
added. The compiler may not leak or move the the fetches, but what
about the hardware?

A spin_lock() only needs to make sure what is after it does not leak
before it.

A spin_unlock() only needs to make sure what is before it must not leak
after it.

>From my understandings of reading memory-barrier.txt, there's no
guarantees that the hardware doesn't let reads or writes that happen
before a spin_lock() happen after it. Nor does it guarantee that reads
or writes that happen after a spin_unlock() doesn't happen before it.

The spin_locks only need to protect the inside of the critical section,
not the outside of it leaking in.

I'm looking at this in particular:

====
  (1) ACQUIRE operation implication:

     Memory operations issued after the ACQUIRE will be completed after the
     ACQUIRE operation has completed.

     Memory operations issued before the ACQUIRE may be completed after
     the ACQUIRE operation has completed.  An smp_mb__before_spinlock(),
     combined with a following ACQUIRE, orders prior stores against
     subsequent loads and stores.  Note that this is weaker than smp_mb()!
     The smp_mb__before_spinlock() primitive is free on many architectures.

 (2) RELEASE operation implication:

     Memory operations issued before the RELEASE will be completed before the
     RELEASE operation has completed.

     Memory operations issued after the RELEASE may be completed before the
     RELEASE operation has completed.
====

-- Steve


> 
> > 				 spin_unlock(rnp_root)
> > 
> > 				 rcu_gp_kthread_wake() {
> > 				  swake_up(wq) {
> > 				   swait_active(wq) {
> > 				    list_empty(wq->task_list)
> > 
> > 				   } * return false *
> > 
> >   if (condition) * false *
> >     schedule();
> > 
> > Looks like a memory barrier is missing. Perhaps we should slap on into
> > swait_active()? I don't think it is wise to let users add there own, as
> > I think we currently have bugs now.  
> 
> I -know- I have bugs now.  ;-)
> 
> But I don't believe this is one of them.  Or am I getting confused?
> 
> 							Thanx, Paul