Date:   Sat, 25 Feb 2023 18:58:49 -0800
From:   "Paul E. McKenney" <paulmck@kernel.org>
To:     Alan Stern <stern@rowland.harvard.edu>
Cc:     Jonas Oberhauser <jonas.oberhauser@huawei.com>,
        parri.andrea@gmail.com, will@kernel.org, peterz@infradead.org,
        boqun.feng@gmail.com, npiggin@gmail.com, dhowells@redhat.com,
        j.alglave@ucl.ac.uk, luc.maranget@inria.fr, akiyks@gmail.com,
        dlustig@nvidia.com, joel@joelfernandes.org, urezki@gmail.com,
        quic_neeraju@quicinc.com, frederic@kernel.org,
        linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] tools/memory-model: Make ppo a subrelation of po
Message-ID: <20230226025849.GA2393840@paulmck-ThinkPad-P17-Gen-1>
Reply-To: paulmck@kernel.org
References: <20230224135251.24989-1-jonas.oberhauser@huaweicloud.com>
 <Y/jYm0AZfPHkIalK@rowland.harvard.edu>
 <20230224183758.GQ2948950@paulmck-ThinkPad-P17-Gen-1>
 <20230226010110.GA1576556@paulmck-ThinkPad-P17-Gen-1>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20230226010110.GA1576556@paulmck-ThinkPad-P17-Gen-1>
Precedence: bulk

On Sat, Feb 25, 2023 at 05:01:10PM -0800, Paul E. McKenney wrote:
> On Fri, Feb 24, 2023 at 10:37:58AM -0800, Paul E. McKenney wrote:
> > On Fri, Feb 24, 2023 at 10:32:43AM -0500, Alan Stern wrote:
> > > On Fri, Feb 24, 2023 at 02:52:51PM +0100, Jonas Oberhauser wrote:
> > > > As stated in the documentation and implied by its name, the ppo
> > > > (preserved program order) relation is intended to link po-earlier
> > > > to po-later instructions under certain conditions.  However, a
> > > > corner case currently allows instructions to be linked by ppo that
> > > > are not executed by the same thread, i.e., instructions are being
> > > > linked that have no po relation.
> > > > 
> > > > This happens due to the mb/strong-fence/fence relations, which (as
> > > > one case) provide order when locks are passed between threads
> > > > followed by an smp_mb__after_unlock_lock() fence.  This is
> > > > illustrated in the following litmus test (as can be seen when using
> > > > herd7 with `doshow ppo`):
> > > > 
> > > > P0(int *x, int *y)
> > > > {
> > > >     spin_lock(x);
> > > >     spin_unlock(x);
> > > > }
> > > > 
> > > > P1(int *x, int *y)
> > > > {
> > > >     spin_lock(x);
> > > >     smp_mb__after_unlock_lock();
> > > >     *y = 1;
> > > > }
> > > > 
> > > > The ppo relation will link P0's spin_lock(x) and P1's *y=1, because
> > > > P0 passes a lock to P1 which then uses this fence.
> > > > 
> > > > The patch makes ppo a subrelation of po by letting fence contribute
> > > > to ppo only in case the fence links events of the same thread.
> > > > 
> > > > Signed-off-by: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>
> > > > ---
> > > >  tools/memory-model/linux-kernel.cat | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/tools/memory-model/linux-kernel.cat b/tools/memory-model/linux-kernel.cat
> > > > index cfc1b8fd46da..adf3c4f41229 100644
> > > > --- a/tools/memory-model/linux-kernel.cat
> > > > +++ b/tools/memory-model/linux-kernel.cat
> > > > @@ -82,7 +82,7 @@ let rwdep = (dep | ctrl) ; [W]
> > > >  let overwrite = co | fr
> > > >  let to-w = rwdep | (overwrite & int) | (addr ; [Plain] ; wmb)
> > > >  let to-r = (addr ; [R]) | (dep ; [Marked] ; rfi)
> > > > -let ppo = to-r | to-w | fence | (po-unlock-lock-po & int)
> > > > +let ppo = to-r | to-w | (fence & int) | (po-unlock-lock-po & int)
> > > >  
> > > >  (* Propagation: Ordering from release operations and strong fences. *)
> > > >  let A-cumul(r) = (rfe ; [Marked])? ; r
> > > 
> > > Acked-by: Alan Stern <stern@rowland.harvard.edu>
> > 
> > Queued for the v6.4 merge window (not the current one), thank you both!
> 
> I tested both Alan's and Jonas's commit.  These do not see to produce
> any significant differences in behavior, which is of course a good thing.
> 
> Here are the differences and a few oddities:
> 
> auto/C-RR-G+RR-R+RR-G+RR-G+RR-R+RR-R+RR-R+RR-R.litmus
> 
> 	Timed out with changes, completed without them.  But it completed
> 	in 558.29 seconds against a limit of 600 seconds, so never mind.
> 
> auto/C-RR-G+RR-R+RR-R+RR-G+RR-R+RR-R+RR-G+RR-R.litmus
> 
> 	Timed out with changes, completed without them.  But it completed
> 	in 580.01 seconds against a limit of 600 seconds, so never mind. *
> 
> auto/C-RR-G+RR-R+RR-R+RR-R+RR-R+RR-G+RR-R+RR-R.litmus
> 
> 	Timed out with changes, completed without them.  But it completed
> 	in 522.29 seconds against a limit of 600 seconds, so never mind.
> 
> auto/C-RR-G+RR-R+RR-R+RR-R+RR-R+RR-G+RR-G+RR-R.litmus
> 
> 	Timed out with changes, completed without them.  But it completed
> 	in 588.70 seconds against a limit of 600 seconds, so never mind.
> 
> All tests that didn't time out matched Results comments.
> 
> The reason I am so cavalier about the times is that I was foolishly
> running rcutorture concurrently with the new-version testing.  I re-ran
> and of them, only auto/C-RR-G+RR-R+RR-R+RR-G+RR-R+RR-R+RR-G+RR-R.litmus
> timed out the second time.  I re-ran it again, but without a time limit,
> and it completed properly in 364.8 seconds compared to 580.  A rerun
> took 360.1 seconds.  So things have slowed down a bit.
> 
> A few other oddities:
> 
> litmus/auto/C-LB-Lww+R-OC.litmus
> 
> 	Both versions flag a data race, which I am not seeing.	It appears
> 	to me that P1's store to u0 cannot happen unless P0's store
> 	has completed.  So what am I missing here?
> 
> litmus/auto/C-LB-Lrw+R-OC.litmus
> litmus/auto/C-LB-Lww+R-Oc.litmus
> litmus/auto/C-LB-Lrw+R-Oc.litmus
> litmus/auto/C-LB-Lrw+R-A+R-Oc.litmus
> litmus/auto/C-LB-Lww+R-A+R-OC.litmus
> 
> 	Ditto.  (There are likely more.)
> 
> Thoughts?

And what happened here was that I conflated LKMM with the C++ memory
model, producing something stronger than either.

Never mind!!!

							Thanx, Paul