Date: Mon, 19 Jun 2017 18:24:28 +0200
From: Andrea Parri <parri.andrea@gmail.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: linux-kernel@vger.kernel.org, priyalee.kushwaha@intel.com,
        drozdziak1@gmail.com, arnd@arndb.de, ldr709@gmail.com,
        tglx@linutronix.de, peterz@infradead.org, josh@joshtriplett.org,
        nico@linaro.org, kjlx@templeofstupid.com, vegard.nossum@oracle.com,
        torvalds@linux-foundation.org, dcb314@hotmail.com,
        fengguang.wu@intel.com, fweisbec@gmail.com, riel@redhat.com,
        rostedt@goodmis.org, mingo@kernel.org, stern@rowland.harvard.edu,
        luc.maranget@inria.fr, j.alglave@ucl.ac.uk
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13
Message-ID: <20170619162428.GA9632@andrea>
References: <20170612213754.GA7201@linux.vnet.ibm.com>
 <20170614025404.GA2525@andrea>
 <20170614043317.GK3721@linux.vnet.ibm.com>
 <20170614143322.GA3368@andrea>
 <20170614202329.GV3721@linux.vnet.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170614202329.GV3721@linux.vnet.ibm.com>
User-Agent: Mutt/1.5.24 (2015-08-30)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 7829
Lines: 175

On Wed, Jun 14, 2017 at 01:23:29PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 14, 2017 at 04:33:22PM +0200, Andrea Parri wrote:
> > On Tue, Jun 13, 2017 at 09:33:17PM -0700, Paul E. McKenney wrote:
> > > On Wed, Jun 14, 2017 at 04:54:04AM +0200, Andrea Parri wrote:
> > > > On Mon, Jun 12, 2017 at 02:37:55PM -0700, Paul E. McKenney wrote:
> > > > > Hello, Ingo,
> > > > > 
> > > > > This pull request is unusual in being a single linear set of commits,
> > > > > as opposed to my usual topic branches.  This is due to the many
> > > > > large-footprint changes, which means that reasonable topic branches
> > > > > result in large numbers of merge conflicts.  In addition, some commits
> > > > > depend on other commits that should be on different topic branches.
> > > > > I will return to the topic-branch style next time.
> > > > > 
> > > > > The largest feature of this series is shrinking and simplification,
> > > > > with the following diffstat summary:
> > > > > 
> > > > >  79 files changed, 1496 insertions(+), 4211 deletions(-)
> > > > > 
> > > > > In other words, this series represents a net reduction of more than 2700
> > > > > lines of code.
> > > > > 
> > > > > These commits were posted to LKML:
> > > > > 
> > > > > 	http://lkml.kernel.org/r/20170525215934.GA11578@linux.vnet.ibm.com
> > > > 
> > > > I did raise some issues (AFAICT, unresolved) concerning...
> > > > 
> > > > 
> > > > > 
> > > > > Two of these commits (46/88 and 48/88) have been deferred, most likely
> > > > > to v4.14.  All of the remaining commits have been subjected to the 0day
> > > > > Test Robot and -next testing, and are availiable in teh git repository at:
> > > > > 
> > > > >   git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git for-mingo
> > > > > 
> > > > > for you to fetch changes up to 6d48152eafde1f0d0a4a9e0584fa7d9ff4fbfdac:
> > > > > 
> > > > >   rcu: Remove RCU CPU stall warnings from Tiny RCU (2017-06-08 18:52:45 -0700)
> > > > > 
> > > > > ----------------------------------------------------------------
> > > > > Arnd Bergmann (1):
> > > > >       bcm47xx: Fix build regression
> > > > > 
> > > > > Paul E. McKenney (83):
> > > > >       rcutorture: Add lockdep to one of the SRCU scenarios
> > > > >       rcutorture: Add three-level tree test for Tree SRCU
> > > > >       rcutorture: Fix bug in reporting Kconfig mis-settings
> > > > >       rcutorture: Add a scenario for Tiny SRCU
> > > > >       rcutorture: Add a scenario for Classic SRCU
> > > > >       rcu: Prevent rcu_barrier() from starting needless grace periods
> > > > >       rcutorture: Correctly handle CONFIG_RCU_TORTURE_TEST_* options
> > > > >       rcutorture: Update test scenarios based on new Kconfig dependencies
> > > > >       srcu: Eliminate possibility of destructive counter overflow
> > > > >       rcu: Complain if blocking in preemptible RCU read-side critical section
> > > > >       rcuperf: Defer expedited/normal check to end of test
> > > > >       rcuperf: Remove conflicting Kconfig options
> > > > >       rcu: Remove obsolete reference to synchronize_kernel()
> > > > >       rcuperf: Add ability to performance-test call_rcu() and friends
> > > > >       rcuperf: Add a Kconfig-fragment file for Classic SRCU
> > > > >       rcu: Make sync_rcu_preempt_exp_done() return bool
> > > > >       checkpatch: Remove checks for expedited grace periods
> > > > >       rcuperf: Add test for dynamically initialized srcu_struct
> > > > >       doc/atomic_ops: Clarify smp_mb__{before,after}_atomic()
> > > > >       atomics: Add header comment so spin_unlock_wait()
> > > > 
> > > > ... this one: c.f.,
> > > > 
> > > >   http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1418503.html
> > > > 
> > > > Any hints about those?
> > > 
> > > I suggest being -extremely- clear.  This is about ARM, correct?
> > > If so, lay out the exact situation (code example, which hardware,
> > > which spin_unlock_wait(), which sequence of events) that could lead to
> > > the failure.
> > > 
> > > The problem here is that no one knows which of the 30 CPU families you
> > > might be talking about, nor do they know exactly what the problem is.
> > > I didn't worry about it at the time because I figured that you had
> > > sent private email to Will with the full story.
> > > 
> > > Yes, the four of us (you, Alan, Luc, and me) discussed it, but we weren't
> > > sure whether it was a bug in the memory model, the spin_unlock_wait()
> > > code, or my description of spin_unlock_wait().  Given that Will didn't
> > > object to my April 13th email (the one that you were not CCed on),
> > > I figured that he wasn't going to claim that the spin_unlock_wait()
> > > description was wrong, especially since he went to so much effort some
> > > years back to make ARM64 meet that description.
> > > 
> > > So again, I recommend replying to your msg1418503.html email with
> > > a code fragment demonstrating the problem, exact identification of
> > > the hardware that might be susceptible (ARM64? ARM32? Which ARM32?),
> > > exact identification of which spin_unlock_wait() function you suspect,
> > > and a clear bullet-form sequence of events that shows how you believe
> > > that the problem can occur.
> > > 
> > > That makes it easy for people to see what your concern is, easy for
> > > them to check their code and hardware, and hard for them to ignore you.
> > > 
> > > Make sense?
> > 
> > My concerns originates from the fact that none of the implementations
> > (of spin_unlock_wait()) for the architectures touched by:
> > 
> >   726328d92a42b6d4b76078e2659f43067f82c4e8
> >   ("locking/spinlock, arch: Update and fix spin_unlock_wait() implementations"
> > 
> > currently contain any traces of that RELEASE/spin_unlock() from your:
> > 
> >   "Semantically this is equivalent to a spin_lock() immediately followed
> >    by a spin_unlock()."
> > 
> > In fact, the header of that commit states:
> > 
> >   "The update is in semantics; where it previously was only a control
> >    dependency, we now upgrade to a full load-acquire [...]"
> > 
> > For an example leveraging this RELEASE, consider:
> > 
> >   [initially: X = 0, s UNLOCKED]
> > 
> >   P0                      P1
> >   X = 1;                  spin_lock(s);
> >   spin_unlock_wait(s);    r0 = X;
> > 
> > According to the "spin_lock(); spin_unlock() semantics" this has one
> > non-deadlocking execution, and the RELEASE from the spin_unlock_wait()
> > (paired with the ACQUIRE from the spin_lock() in P1) guarantees that
> > r0 = 1 in this execution. AFAICT, this same conclusion does not hold
> > according to the "smp_cond_load_acquire() semantics" (726328d92a42b).
> 
> OK.  For exactly which CPU families do you believe that this fails to
> hold.  That is, given the implementations of spin_unlock_wait() and
> spin_lock() for the various CPU families, which will break and why?

Considering the case of x86, the question amounts to ask whether
the "exists" clause of the following test can be satisfied:

X86 TEST
{
 1:EAX=1;
}
 P0          | P1                ;
 MOV [X],$1  | LOCK XCHG [s],EAX ;
 MOV EAX,[s] | MOV EBX,[X]       ;
exists
(0:EAX=0 /\ 1:EBX=0)

The answer is "Yes", as illustrated by the following sequence of
events:

  1) P0's store to X is placed into the store buffer (of P0);
  2) P0 loads 0 from s;
  3) P1 executes LOCK XCHG;
  4) P1 loads X=0 from memory;
  5) P0's store to X is flushed from store buffer to memory.

---

This analysis shows that the x86 implementation needs additional
synchronization to meet a "spin_lock();spin_unlock()" semantics.

I believe that the same conclusion holds for other architectures
(e.g., sparc, ia64, arm(32), alpha).

In fact, the only two implementations I know that _seem_ able to
meet that semantics are arm64's and powerpc's.

  Andrea


> 
> 							Thanx, Paul
>