Date: Thu, 19 Oct 2006 15:46:46 -0700
From: "Paul E. McKenney" <paulmck@us.ibm.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: David Howells <dhowells@redhat.com>,
       Kernel development list <linux-kernel@vger.kernel.org>
Subject: Re: Uses for memory barriers
Message-ID: <20061019224646.GC2265@us.ibm.com>
Reply-To: paulmck@us.ibm.com
References: <20061019192158.GB2265@us.ibm.com> <Pine.LNX.4.44L0.0610191557220.8492-100000@iolanthe.rowland.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <Pine.LNX.4.44L0.0610191557220.8492-100000@iolanthe.rowland.org>
User-Agent: Mutt/1.4.1i
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 10117
Lines: 253

On Thu, Oct 19, 2006 at 04:55:16PM -0400, Alan Stern wrote:
> On Thu, 19 Oct 2006, Paul E. McKenney wrote:
> 
> > > I see no reason to think the control dependency in CPU 1's assignment to B 
> > > is any weaker than a memory barrier.
> > 
> > I am assuming that you have in mind a special restricted memory barrier
> > that applies only to the load of A, and not necessarily to any other
> > preceding operations.  Otherwise, these two code sequences would be
> > equivalent, and they are not (as usual, all variables initially zero):
> > 
> > 	CPU 0, Sequence 1	CPU 0, Sequence 2	CPU 1
> > 
> > 	A=1			A=1			while (C==0);
> > 	while (B==0);		while (B==0);		smp_mb();
> > 	C=1			smp_mb();		assert(A==1);
> > 				C=1
> > 
> > In sequence 1, CPU 1's assertion can fail.  Not so with sequence 2.
> 
> Yes, that's a very good point.  Indeed, I meant a restricted memory
> barrier applying only to the two accesses involved.  In the same sort of
> way rmb() is a restricted memory barrier, applying only to pairs of
> loads.

OK.

> > Regardless of your definition of your posited memory barrier corresponding
> > to the control dependency, a counter example:
> > 
> > 	CPU 1			CPU 2
> > 
> > 	A=1;
> > 	...
> > 	while (A==0);		while (B==0);
> > 	B=1			smp_mb()
> > 				assert(A==1) <fails>
> > 
> > Here, placing an smp_mb() after the "while (A==0)" does make a difference.
> > 
> > Degenerate, perhaps, given that the same CPU is assigning and while-ing,
> > but so it goes.
> 
> The smp_mb() does make a difference.  But it doesn't invalidate my notion
> of a dependency acting as a restricted memory barrier.  The notion allows
> you to conclude from this example only that ld_1(A) >v ld_2(A), which is
> meaningless (using your convention for >v).  It doesn't allow you to
> conclude st_1(A) >v ld_2(A).

Yes, assuming that control dependencies result in your restricted memory
barrier.

> > Even assuming a special restricted memory barrier, the example of DEC
> > Alpha and pointer dereferencing gives me pause.  Feel free to berate
> > me for this, as you have done in the past.  ;-)
> 
> Ah, interesting comment.  With the Alpha and pointer dereferencing, the
> problems arise because of failure to respect a data dependency between two
> loads.  Here I am talking about a dependency between a load and a
> subsequent store, so it isn't the same thing at all.  Failure to respect
> this kind of dependency would mean the CPU was writing a value before it
> knew what value to write (or whether to write it, or where to write it).  
> Not even the most aggressively speculative machine will do that!

http://www.tinker.ncsu.edu/techreports/vssepic.pdf

Not exactly the same thing, but certainly a very similar level of
speculative aggression!

> > Seriously, my judgement of this would depend on exactly what part of
> > the smp_mb() semantics you are claiming for the control dependency.
> > I do not believe that we could make progress without appealing to a
> > specific implementation, so I would rather ignore control dependencies,
> > at least for non-MMIO accesses.  MMIO would be another story altogether.
> 
> What I'm claiming is exactly what was written in an earlier email:
> 
> 	st(A) < st(B) >v ac(B) < ac(A)  implies  st(A) >v ac(A), and
> 
> 	ld(A) < st(B) >v ac(B) < st(A)  implies  st(A) !>v ld(A).
> 
> Here I'm using your convention for >v, and < indicates either an explicit
> barrier between two accesses or a dependency between a load and a later
> store.

Your notion of control-dependency barriers makes sense in an intuitive
sense.  Does Linux rely on it, other than for MMIO accesses?

> > > "Sequentially precedes" means that the system behaves as though there were 
> > > a memory barrier between the two accesses.
> > 
> > OK.  As noted above, if I were to interpret "a memory barrier" as really
> > being everything entailed by smp_mb(), I disagree with your statement in an
> > earlier email stating:
> > 
> > 	Similarly, in the program "if (A) B = 2;" the load(A) sequentially
> > 	precedes the store(B) -- thanks to the dependency or (if you
> > 	prefer) the absence of speculative stores.
> > 
> > However, I don't believe that is what you mean by "a memory barrier" in
> > this case -- my guess again is that you mean a special memory barrier that
> > applies only the the load of A in one direction, but that applies to
> > everything following the load in the other direction.
> 
> It applies to the load of A in one direction and to all later stores in
> the other direction.  Not to later loads.

Ah, good point -- I didn't pick up on the fact that it needn't constrain
later loads.

> > I would use ">p" for the program-order relationship, and probably something
> > like ">b" for the memory-barrier relationship.  There are other orderings,
> > including the control-flow ordering discussed earlier, data dependencies,
> > and so on.
> 
> > The literature is quite inconsistent.  The DEC Alpha manual takes your
> > approach, while Gharachorloo's dissertation takes my approach.  Not to
> > be outdone, Steinke and Nutt's JACM paper (written long after the other
> > two) uses different directions for different types of orderings!!!
> > See http://arxiv.org/PS_cache/cs/pdf/0208/0208027.pdf, page 49,
> > Definitions A.5, A.6 on the one hand and Definition A.7 on the other.  ;-)
> 
> > This is the connotation conflict.  For you, it is confusing to write
> > "A > B" when A precedes B.  For me, it is confusing to write "st < ld"
> > when data flows from the "st" to the "ld".  So, the only way to resolve
> > this is to avoid use of ">" like the plague!
> 
> Okay, let's change the notation.  I don't like <v very much.  Let's not
> worry about potential confusion with implication signs, and use
> 
> 	1:st(A) -> 2:st(A)

Would "=>" work, or does that conflict with something else?

And the number before the colon is the CPU #, right?

> to indicate that 1:st occurs earlier than 2:st in the global ordering of 
> all stores to A.  And let's use
> 
> 	3:st(B) -> 4:ld(B)
> 
> to mean that 4:ld returned the value either of 3:st or of some other store 
> to B occuring later in the global ordering of all such stores.

OK...  Though expressing your English description formally is a bit messy,
it does capture a very useful idiom.

> Lastly, let's use
> 
> 	5:ac(A) +> 6:ac(B)
> 
> to indicate either that the two accesses are separated by a memory barrier 
> or that 5:ac is a load and 6:ac is a dependent store (all occurring on the 
> same CPU).

So the number preceding the colon is the value being loaded or stored?

Either way, the symbols seem reasonable.  In a PDF, I would probably
set a symbol indicating the type of flow over a hollow arrow or something.

> > And in a cache-coherent system, there must be.  Or, more precisely,
> > there must not be different sequences of loads that indicate inconsistent
> > orderings of stores to a given single variable.  If the system can
> > prove that there are no concurrent loads during a given period of
> > time, I guess it would be within its rights to ditch cache coherence
> > for that variable during that time...
> 
> What about indirect indications of inconsistency?  See my example below.

I have some questions about that one.

> > > (BTW, can you explain the difference between "cache coherent" and "cache 
> > > consistent"?  I never quite got it straight...)
> > 
> > "Cache coherent" is the preferred term, though "cache consistent" is
> > sometimes used as a synonym.  If you want to be painfully correct, you
> > would say "cache coherent" when talking about stores to a single variable,
> > and "memory consistency model" when talking about ordering of accesses
> > to multiple variables.
> 
> Hmmm.  Then what about "DMA coherent" vs. "DMA consistent"?

No idea.  Having worked with systems where DMA did not play particularly
nicely with the cache-coherence protocol, they both sound like good things,
though.  ;-)

As near as I can tell by looking around, they are synonyms or nearly so.

> > > The analogy breaks down for pairs of stores.  If stores are blind then 
> > > they can't see other stores -- but we need them to.
> > 
> > I would instead say that you need to execute some loads in order to be
> > able to see the effects of your pairs of stores.
> 
> Consider this example:
> 
> 	CPU 0			CPU 1
> 	-----			-----
> 	A = 1;			B = 2;
> 	mb();			mb();
> 	B = 1;			X = A + 1;
> 	...
> 	assert(!(B==2 && X==1));
> 
> The assertion cannot fail.  But to prove it in our formalism requires 
> writing  st_0(B=1) -> st_1(B=2).  In other words, CPU 1's store to B sees 
> (i.e., overwrites) CPU 0's store to B.

Alternatively, we could use a notation that states that a given load gets
exactly the value from a given store, for example "st ==> ld" as opposed
to "st => ld", where there might be an intervening store.

(1)	B==2 -> st_1(B=2) ==> ld_0(B==2)

	Because there is only one store of 2 into B.

(2)	But st_0(B=1) =p> ld_0(B) -> st_0(B=1) => ld_0(B)

	Here I use "=p>" to indicate program order, and rely on the
	fact that a CPU must see its own accesses in order.

(3)	(1) and (2) imply st_0(B=1) => st_1(B=2) ==> ld_0(B==2)

	So, yes, we do end up saying something about the order of the
	stores, but only indirectly, based on other observations -- in
	this case, program order and direct value sequence.  In other
	words, we can sometimes say things about the order of stores
	even though stores are blind.

(4)	By memory-barrier implication:

	(a)	st_0(A=1) +> st_0(B=1) &&
	
	(b)	st_1(B=2) +> ld_1(A) &&
	
	(c)	st_0(B=1) => st_1(B=2)

	-> st_0(A=1) => ld_1(A)

(5)	Since there is only one store to A: st_0(A=1) ==> ld_1(A==1)

(6)	Therefore, X==2 and the assertion cannot fail if B==2.  But
	if the assertion fails, it must be true that B==2, so the
	assertion cannot fail.

Is that more or less what you had in mind?

							Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/