Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20;
Date:   Thu, 15 Dec 2022 17:09:14 -0800
From:   "Paul E. McKenney" <paulmck@kernel.org>
To:     Joel Fernandes <joel@joelfernandes.org>
Cc:     Frederic Weisbecker <frederic@kernel.org>, boqun.feng@gmail.com,
        neeraj.iitr10@gmail.com, urezki@gmail.com, rcu@vger.kernel.org,
        linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC] srcu: Yet more detail for
 srcu_readers_active_idx_check() comments
Message-ID: <20221216010914.GX4001@paulmck-ThinkPad-P17-Gen-1>
Reply-To: paulmck@kernel.org
References: <20221215201356.GM4001@paulmck-ThinkPad-P17-Gen-1>
 <EE4EC3CC-395E-475D-BEBE-545955AB97C8@joelfernandes.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <EE4EC3CC-395E-475D-BEBE-545955AB97C8@joelfernandes.org>
Precedence: bulk

On Thu, Dec 15, 2022 at 05:13:47PM -0500, Joel Fernandes wrote:
> > On Dec 15, 2022, at 3:13 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> > ﻿On Thu, Dec 15, 2022 at 05:58:14PM +0000, Joel Fernandes wrote:
> >>> On Thu, Dec 15, 2022 at 5:48 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> >>> 
> >>>> On Thu, Dec 15, 2022 at 5:08 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >>> 
> >>>>> Scenario for the reader to increment the old idx once:
> >>>>> 
> >>>>> _ Assume ssp->srcu_idx is initially 0.
> >>>>> _ The READER reads idx that is 0
> >>>>> _ The updater runs and flips the idx that is now 1
> >>>>> _ The reader resumes with 0 as an index but on the next srcu_read_lock()
> >>>>>  it will see the new idx which is 1
> >>>>> 
> >>>>> What could be the scenario for it to increment the old idx twice?
> >>>> 
> >>>> Unless I am missing something, the reader must reference the
> >>>> srcu_unlock_count[old_idx] and then do smp_mb() before it will be
> >>>> absolutely guaranteed of seeing the new value of ->srcu_idx.
> >>> 
> >>> I think both of you are right depending on how the flip raced with the
> >>> first reader's unlock in that specific task.
> >>> 
> >>> If the first read section's srcu_read_unlock() and its corresponding
> >>> smp_mb()  happened before the flip, then the increment of old idx
> >>> would happen only once. The next srcu_read_lock() will read the new
> >>> index. If the srcu_read_unlock() and it's corresponding smp_mb()
> >>> happened after the flip, the old_idx will be sampled again and can be
> >>> incremented twice. So it depends on how the flip races with
> >>> srcu_read_unlock().
> >> 
> >> I am sorry this is inverted, but my statement's gist stands I believe:
> >> 
> >> 1. Flip+smp_mb() happened before unlock's smp_mb() -- reader will not
> >> increment old_idx the second time.
> > 
> > By "increment old_idx" you mean "increment ->srcu_lock_count[old_idx]",
> > correct?
> 
> Yes sorry for confusing, i indeed meant lock count increment corresponding to the old index.

I guessed correctly!!!  Don't worry, it won't happen again.  ;-)

> > Again, the important ordering isn't the smp_mb(), but the accesses,
> > in this case, the accesses to ->srcu_unlock_count[idx].
> 
> I was talking about ordering of the flip of index (write) with respect
> to both the reading of the old index  in the rcu_read_lock() and its
> subsequent lock count increment corresponding to that index. I believe
> we are talking her about how this race can effect the wrap around issues
> when scanning for readers in the pre flip index, and we concluded that
> there can be at most 2 of these on the SAME task.

Agreed.

>                                                   The third time, reader
> will always see the new flipped index because of the memory barriers on
> both sides. IOW, the same task cannot overflow the lock counter on the
> preflipped index and cause issues. However there can be Nt different
> tasks so perhaps you can have 2*Nt number of preempted readers that had
> sampled the old index and now will do a lock and unlock on that old index,
> potentially causing a lock==unlock match when there should not be a match.

So each task can do one old-index ->srcu_lock_count[] increment, and Nc of
them can do a second one, where Nc is the number of CPUs.  This is because
a given task's smp_mb() applies to all later code executed by that task
and also to code executed by other tasks running later on that same CPU.

> >> 2. unlock()'s smp_mb() happened before Flip+smp_mb() , now the reader
> >> has no new smp_mb() that happens AFTER the flip happened. So it can
> >> totally sample the old idx again -- that particular reader will
> >> increment twice, but the next time, it will see the flipped one.
> > 
> > I will let you transliterate both.  ;-)
> 
> I think I see what you mean now :)
> 
> I believe the access I am referring to is the read of idx on one side and the write to idx on the other. However that is incomplete and I need to pair that with some of other access on both sides.
> 
> So perhaps this:
> 
> Writer does flip + smp_mb + read unlock counts [1]
> 
> Reader does:
>  read idx + smp_mb() + increment lock counts [2]
> 
> And subsequently reader does
> Smp_mb() + increment unlock count. [3]
> 
> So [1] races with either [2] or [2]+[3].
> 
> Is that fair?

That does look much better, thank you!

> >> Did I get that right? Thanks.
> > 
> > So why am I unhappy with orderings of smp_mb()?
> > 
> > To see this, let's take the usual store-buffering litmus test:
> > 
> >    CPU 0            CPU 1
> >    WRITE_ONCE(x, 1);    WRITE_ONCE(y, 1);
> >    smp_mb();        smp_mb();
> >    r0 = READ_ONCE(y);    r1 = READ_ONCE(x);
> > 
> > Suppose CPU 0's smp_mb() happens before that of CPU 1:
> > 
> >    CPU 0            CPU 1
> >    WRITE_ONCE(x, 1);    WRITE_ONCE(y, 1);
> >    smp_mb();
> >                smp_mb();
> >    r0 = READ_ONCE(y);    r1 = READ_ONCE(x);
> > 
> > We get r0 == r1 == 1.
> > 
> > Compare this to CPU 1's smp_mb() happening before that of CPU 0:
> > 
> >    CPU 0            CPU 1
> >    WRITE_ONCE(x, 1);    WRITE_ONCE(y, 1);
> >                smp_mb();
> >    smp_mb();
> >    r0 = READ_ONCE(y);    r1 = READ_ONCE(x);
> > 
> > We still get r0 == r1 == 1.  Reversing the order of the two smp_mb()
> > calls changed nothing.
> > 
> > But, if we order CPU 1's write to follow CPU 0's read, then we have
> > this:
> > 
> >    CPU 0            CPU 1
> >    WRITE_ONCE(x, 1);
> >    smp_mb();
> >    r0 = READ_ONCE(y);
> >                WRITE_ONCE(y, 1);
> >                smp_mb();
> >                r1 = READ_ONCE(x);
> > 
> > Here, given that r0 had the final value of zero, we know that
> > r1 must have a final value of 1.
> > 
> > And suppose we reverse this:
> > 
> >    CPU 0            CPU 1
> >                WRITE_ONCE(y, 1);
> >                smp_mb();
> >                r1 = READ_ONCE(x);
> >    WRITE_ONCE(x, 1);
> >    smp_mb();
> >    r0 = READ_ONCE(y);
> > 
> > Now there is a software-visible difference in behavior.  The value of
> > r0 is now 1 instead of zero and the value of r1 is now 0 instead of 1.
> > 
> > Does this make sense?
> 
> Yes I see what you mean. In first case, smp_mb() ordering didn’t matter. But in the second case it does.

Yes, there have to be accesses for the software to even see the effects
of a given smp_mb().

							Thanx, Paul