LinuxLists.cc - [PATCH] tools/memory-model: Make ppo a subrelation of po

2023-01-17 21:57:35

Subject: [PATCH] tools/memory-model: Make ppo a subrelation of po

As stated in the documentation and implied by its name, the ppo
(preserved program order) relation is intended to link po-earlier
to po-later instructions under certain conditions. However, a
corner case currently allows instructions to be linked by ppo that
are not executed by the same thread, i.e., instructions are being
linked that have no po relation.

This happens due to the mb/strong-fence relations, which (as one
case) provide order when locks are passed between threads followed
by an smp_mb__after_unlock_lock() fence. This is illustrated in
the following litmus test (as can be seen when using herd7 with
`doshow ppo`):

P0(int *x, int *y)
{
spin_lock(x);
spin_unlock(x);
}

P1(int *x, int *y)
{
spin_lock(x);
smp_mb__after_unlock_lock();
*y = 1;
}

The ppo relation will link P0's spin_lock(x) and P1's *y=1,
because P0 passes a lock to P1 which then uses this fence.

The patch makes ppo a subrelation of po by eliminating this
possibility from mb and strong-fence, and instead introduces the
notion of strong ordering operations, which are allowed to link
events of distinct threads.

Signed-off-by: Jonas Oberhauser <[email protected]>
---
.../Documentation/explanation.txt | 101 +++++++++++-------
tools/memory-model/linux-kernel.cat | 20 ++--
2 files changed, 76 insertions(+), 45 deletions(-)

diff --git a/tools/memory-model/Documentation/explanation.txt b/tools/memory-model/Documentation/explanation.txt
index e901b47236c3..4f5f6ac098de 100644
--- a/tools/memory-model/Documentation/explanation.txt
+++ b/tools/memory-model/Documentation/explanation.txt
@@ -865,14 +865,44 @@ propagates stores. When a fence instruction is executed on CPU C:
propagate to all other CPUs before any instructions po-after
the strong fence are executed on C.

-The propagation ordering enforced by release fences and strong fences
-affects stores from other CPUs that propagate to CPU C before the
-fence is executed, as well as stores that are executed on C before the
-fence. We describe this property by saying that release fences and
-strong fences are A-cumulative. By contrast, smp_wmb() fences are not
-A-cumulative; they only affect the propagation of stores that are
-executed on C before the fence (i.e., those which precede the fence in
-program order).
+ Whenever any CPU C' executes an unlock operation U such that
+ CPU C executes a lock operation L followed by a po-later
+ smp_mb__after_unlock_lock() fence, and L is either a later lock
+ operation on the lock released by U or is po-after U, then any
+ store that propagates to C' before U must propagate to all other
+ CPUs before any instructions po-after the fence are executed on C.
+
+While smp_wmb() and release fences only force certain earlier stores
+to propagate to another CPU C' before certain later stores propagate
+to the same CPU C', strong fences and smp_mb__after_unlock_lock()
+force those stores to propagate to all other CPUs before any later
+instruction is executed. We collectively refer to the latter
+operations as strong ordering operations, as they provide much
+stronger ordering in two ways:
+
+ Firstly, strong ordering operations also create order between
+ earlier stores and later reads.
+
+ Secondly, strong ordering operations create a form of global
+ ordering: When an earlier store W propagates to CPU C and is
+ ordered by a strong ordering operation with a store W' of C,
+ and another CPU C' observes W' and in response issues yet
+ another store W'', then W'' also can not propagate to any CPU
+ before W. By contrast, a release fence or smp_wmb() would only
+ order W and W', but not force any ordering between W and W''.
+ To summarize, the ordering forced by strong ordering operations
+ extends to later stores of all CPUs, while other fences only
+ force ordering with relation to stores on the CPU that executes
+ the fence.
+
+The propagation ordering enforced by release fences and strong ordering
+operations affects stores from other CPUs that propagate to CPU C before
+the fence is executed, as well as stores that are executed on C before
+the fence. We describe this property by saying that release fences and
+strong ordering operations are A-cumulative. By contrast, smp_wmb()
+fences are not A-cumulative; they only affect the propagation of stores
+that are executed on C before the fence (i.e., those which precede the
+fence in program order).

rcu_read_lock(), rcu_read_unlock(), and synchronize_rcu() fences have
other properties which we discuss later.
@@ -1425,36 +1455,36 @@ requirement is the content of the LKMM's "happens-before" axiom.

The LKMM defines yet another relation connected to times of
instruction execution, but it is not included in hb. It relies on the
-particular properties of strong fences, which we cover in the next
-section.
+particular properties of strong ordering operations, which we cover in the
+next section.

THE PROPAGATES-BEFORE RELATION: pb
----------------------------------

The propagates-before (pb) relation capitalizes on the special
-features of strong fences. It links two events E and F whenever some
-store is coherence-later than E and propagates to every CPU and to RAM
-before F executes. The formal definition requires that E be linked to
-F via a coe or fre link, an arbitrary number of cumul-fences, an
-optional rfe link, a strong fence, and an arbitrary number of hb
-links. Let's see how this definition works out.
+features of strong ordering operations. It links two events E and F
+whenever some store is coherence-later than E and propagates to every
+CPU and to RAM before F executes. The formal definition requires that
+E be linked to F via a coe or fre link, an arbitrary number of
+cumul-fences, an optional rfe link, a strong ordering operation, and an
+arbitrary number of hb links. Let's see how this definition works out.

Consider first the case where E is a store (implying that the sequence
of links begins with coe). Then there are events W, X, Y, and Z such
that:

- E ->coe W ->cumul-fence* X ->rfe? Y ->strong-fence Z ->hb* F,
+ E ->coe W ->cumul-fence* X ->rfe? Y ->strong-order Z ->hb* F,

where the * suffix indicates an arbitrary number of links of the
specified type, and the ? suffix indicates the link is optional (Y may
be equal to X). Because of the cumul-fence links, we know that W will
propagate to Y's CPU before X does, hence before Y executes and hence
-before the strong fence executes. Because this fence is strong, we
-know that W will propagate to every CPU and to RAM before Z executes.
-And because of the hb links, we know that Z will execute before F.
-Thus W, which comes later than E in the coherence order, will
-propagate to every CPU and to RAM before F executes.
+before the strong ordering operation executes. Because of the strong
+ordering, we know that W will propagate to every CPU and to RAM before
+Z executes. And because of the hb links, we know that Z will execute
+before F. Thus W, which comes later than E in the coherence order,
+will propagate to every CPU and to RAM before F executes.

The case where E is a load is exactly the same, except that the first
link in the sequence is fre instead of coe.
@@ -1637,8 +1667,8 @@ does not imply X ->rcu-order V, because the sequence contains only
one rcu-gp link but two rcu-rscsi links.

The rcu-order relation is important because the Grace Period Guarantee
-means that rcu-order links act kind of like strong fences. In
-particular, E ->rcu-order F implies not only that E begins before F
+means that rcu-order links act kind of like a strong ordering operation.
+In particular, E ->rcu-order F implies not only that E begins before F
ends, but also that any write po-before E will propagate to every CPU
before any instruction po-after F can execute. (However, it does not
imply that E must execute before F; in fact, each synchronize_rcu()
@@ -1675,24 +1705,23 @@ The rcu-fence relation is a simple extension of rcu-order. While
rcu-order only links certain fence events (calls to synchronize_rcu(),
rcu_read_lock(), or rcu_read_unlock()), rcu-fence links any events
that are separated by an rcu-order link. This is analogous to the way
-the strong-fence relation links events that are separated by an
+the strong-order relation links events that are separated by an
smp_mb() fence event (as mentioned above, rcu-order links act kind of
-like strong fences). Written symbolically, X ->rcu-fence Y means
-there are fence events E and F such that:
+like strong ordering operations). Written symbolically, X ->rcu-fence Y
+means there are fence events E and F such that:

X ->po E ->rcu-order F ->po Y.

From the discussion above, we see this implies not only that X
executes before Y, but also (if X is a store) that X propagates to
-every CPU before Y executes. Thus rcu-fence is sort of a
-"super-strong" fence: Unlike the original strong fences (smp_mb() and
+every CPU before Y executes. Thus unlike strong fences (smp_mb() and
synchronize_rcu()), rcu-fence is able to link events on different
CPUs. (Perhaps this fact should lead us to say that rcu-fence isn't
really a fence at all!)

Finally, the LKMM defines the RCU-before (rb) relation in terms of
rcu-fence. This is done in essentially the same way as the pb
-relation was defined in terms of strong-fence. We will omit the
+relation was defined in terms of strong-order. We will omit the
details; the end result is that E ->rb F implies E must execute
before F, just as E ->pb F does (and for much the same reasons).

@@ -2134,7 +2163,7 @@ intermediate event Z such that:

and either:

- Z is connected to Y by a strong-fence link followed by a
+ Z is connected to Y by a strong-order link followed by a
possibly empty sequence of xb links,

or:
@@ -2153,10 +2182,10 @@ The motivations behind this definition are straightforward:
from W, which certainly means that W must have propagated to
R's CPU before R executed.

- strong-fence memory barriers force stores that are po-before
- the barrier, or that propagate to the barrier's CPU before the
- barrier executes, to propagate to all CPUs before any events
- po-after the barrier can execute.
+ Strong ordering operations force stores that are po-before
+ the operation or that propagate to the CPU that begins the
+ operation before the operation executes, to propagate to all
+ CPUs before any events po-after the operation execute.

To see how this works out in practice, consider our old friend, the MP
pattern (with fences and statement labels, but without the conditional
@@ -2485,7 +2514,7 @@ sequence or if W can be linked to R by a
sequence. For the cases involving a vis link, the LKMM also accepts
sequences in which W is linked to W' or R by a

- strong-fence ; xb* ; {w and/or r}-pre-bounded
+ strong-order ; xb* ; {w and/or r}-pre-bounded

sequence with no post-bounding, and in every case the LKMM also allows
the link simply to be a fence with no bounding at all. If no sequence
diff --git a/tools/memory-model/linux-kernel.cat b/tools/memory-model/linux-kernel.cat
index 07f884f9b2bf..1d91edbc10fe 100644
--- a/tools/memory-model/linux-kernel.cat
+++ b/tools/memory-model/linux-kernel.cat
@@ -36,9 +36,7 @@ let wmb = [W] ; fencerel(Wmb) ; [W]
let mb = ([M] ; fencerel(Mb) ; [M]) |
([M] ; fencerel(Before-atomic) ; [RMW] ; po? ; [M]) |
([M] ; po? ; [RMW] ; fencerel(After-atomic) ; [M]) |
- ([M] ; po? ; [LKW] ; fencerel(After-spinlock) ; [M]) |
- ([M] ; po ; [UL] ; (co | po) ; [LKW] ;
- fencerel(After-unlock-lock) ; [M])
+ ([M] ; po? ; [LKW] ; fencerel(After-spinlock) ; [M])
let gp = po ; [Sync-rcu | Sync-srcu] ; po?
let strong-fence = mb | gp

@@ -91,8 +89,12 @@ acyclic hb as happens-before
(* Write and fence propagation ordering *)
(****************************************)

-(* Propagation: Each non-rf link needs a strong fence. *)
-let pb = prop ; strong-fence ; hb* ; [Marked]
+(* Strong ordering operations *)
+let strong-order = strong-fence | ([M] ; po-unlock-lock-po ;
+ [After-unlock-lock] ; po ; [M])
+
+(* Propagation: Each non-rf link needs a strong ordering operation. *)
+let pb = prop ; strong-order ; hb* ; [Marked]
acyclic pb as propagation

(*******)
@@ -141,7 +143,7 @@ let rec rcu-order = rcu-gp | srcu-gp |
(rcu-order ; rcu-link ; rcu-order)
let rcu-fence = po ; rcu-order ; po?
let fence = fence | rcu-fence
-let strong-fence = strong-fence | rcu-fence
+let strong-order = strong-order | rcu-fence

(* rb orders instructions just as pb does *)
let rb = prop ; rcu-fence ; hb* ; pb* ; [Marked]
@@ -169,7 +171,7 @@ flag ~empty mixed-accesses as mixed-accesses
(* Executes-before and visibility *)
let xbstar = (hb | pb | rb)*
let vis = cumul-fence* ; rfe? ; [Marked] ;
- ((strong-fence ; [Marked] ; xbstar) | (xbstar & int))
+ ((strong-order ; [Marked] ; xbstar) | (xbstar & int))

(* Boundaries for lifetimes of plain accesses *)
let w-pre-bounded = [Marked] ; (addr | fence)?
@@ -180,9 +182,9 @@ let r-post-bounded = (nonrw-fence | ([~Noreturn] ; fencerel(Rmb) ; [R4rmb]))? ;
[Marked]

(* Visibility and executes-before for plain accesses *)
-let ww-vis = fence | (strong-fence ; xbstar ; w-pre-bounded) |
+let ww-vis = fence | (strong-order ; xbstar ; w-pre-bounded) |
(w-post-bounded ; vis ; w-pre-bounded)
-let wr-vis = fence | (strong-fence ; xbstar ; r-pre-bounded) |
+let wr-vis = fence | (strong-order ; xbstar ; r-pre-bounded) |
(w-post-bounded ; vis ; r-pre-bounded)
let rw-xbstar = fence | (r-post-bounded ; xbstar ; w-pre-bounded)

--
2.17.1

2023-01-18 01:07:15

by Boqun Feng

[permalink] [raw]

Subject: Re: [PATCH] tools/memory-model: Make ppo a subrelation of po

On Tue, Jan 17, 2023 at 08:31:59PM +0100, Jonas Oberhauser wrote:
> As stated in the documentation and implied by its name, the ppo
> (preserved program order) relation is intended to link po-earlier
> to po-later instructions under certain conditions. However, a
> corner case currently allows instructions to be linked by ppo that
> are not executed by the same thread, i.e., instructions are being
> linked that have no po relation.
>
> This happens due to the mb/strong-fence relations, which (as one
> case) provide order when locks are passed between threads followed
> by an smp_mb__after_unlock_lock() fence. This is illustrated in
> the following litmus test (as can be seen when using herd7 with
> `doshow ppo`):
>
> P0(int *x, int *y)
> {
> spin_lock(x);
> spin_unlock(x);
> }
>
> P1(int *x, int *y)
> {
> spin_lock(x);
> smp_mb__after_unlock_lock();
> *y = 1;
> }
>
> The ppo relation will link P0's spin_lock(x) and P1's *y=1,
> because P0 passes a lock to P1 which then uses this fence.
>
> The patch makes ppo a subrelation of po by eliminating this
> possibility from mb and strong-fence, and instead introduces the
> notion of strong ordering operations, which are allowed to link
> events of distinct threads.
>
> Signed-off-by: Jonas Oberhauser <[email protected]>
> ---
> .../Documentation/explanation.txt | 101 +++++++++++-------
> tools/memory-model/linux-kernel.cat | 20 ++--

I've reviewed the cat change part, and it looks good to me. Now going to
catch up with the interesting discussion on rcu-order...

One more thing, do we want something in the cat file to describe our
"internal axioms" as follow?

(* Model internal invariants *)
(* ppo is the subset of po *)
flag ~empty (ppo \ po) as INTERNAL-ERROR-PPO

Maybe it's helpful for people working on other tools to understand LKMM
cat file?

Anyway, with or without it:

Acked-by: Boqun Feng <[email protected]>

Regards,
Boqun

> 2 files changed, 76 insertions(+), 45 deletions(-)
>
> diff --git a/tools/memory-model/Documentation/explanation.txt b/tools/memory-model/Documentation/explanation.txt
> index e901b47236c3..4f5f6ac098de 100644
> --- a/tools/memory-model/Documentation/explanation.txt
> +++ b/tools/memory-model/Documentation/explanation.txt
> @@ -865,14 +865,44 @@ propagates stores. When a fence instruction is executed on CPU C:
> propagate to all other CPUs before any instructions po-after
> the strong fence are executed on C.
>
> -The propagation ordering enforced by release fences and strong fences
> -affects stores from other CPUs that propagate to CPU C before the
> -fence is executed, as well as stores that are executed on C before the
> -fence. We describe this property by saying that release fences and
> -strong fences are A-cumulative. By contrast, smp_wmb() fences are not
> -A-cumulative; they only affect the propagation of stores that are
> -executed on C before the fence (i.e., those which precede the fence in
> -program order).
> + Whenever any CPU C' executes an unlock operation U such that
> + CPU C executes a lock operation L followed by a po-later
> + smp_mb__after_unlock_lock() fence, and L is either a later lock
> + operation on the lock released by U or is po-after U, then any
> + store that propagates to C' before U must propagate to all other
> + CPUs before any instructions po-after the fence are executed on C.
> +
> +While smp_wmb() and release fences only force certain earlier stores
> +to propagate to another CPU C' before certain later stores propagate
> +to the same CPU C', strong fences and smp_mb__after_unlock_lock()
> +force those stores to propagate to all other CPUs before any later
> +instruction is executed. We collectively refer to the latter
> +operations as strong ordering operations, as they provide much
> +stronger ordering in two ways:
> +
> + Firstly, strong ordering operations also create order between
> + earlier stores and later reads.
> +
> + Secondly, strong ordering operations create a form of global
> + ordering: When an earlier store W propagates to CPU C and is
> + ordered by a strong ordering operation with a store W' of C,
> + and another CPU C' observes W' and in response issues yet
> + another store W'', then W'' also can not propagate to any CPU
> + before W. By contrast, a release fence or smp_wmb() would only
> + order W and W', but not force any ordering between W and W''.
> + To summarize, the ordering forced by strong ordering operations
> + extends to later stores of all CPUs, while other fences only
> + force ordering with relation to stores on the CPU that executes
> + the fence.
> +
> +The propagation ordering enforced by release fences and strong ordering
> +operations affects stores from other CPUs that propagate to CPU C before
> +the fence is executed, as well as stores that are executed on C before
> +the fence. We describe this property by saying that release fences and
> +strong ordering operations are A-cumulative. By contrast, smp_wmb()
> +fences are not A-cumulative; they only affect the propagation of stores
> +that are executed on C before the fence (i.e., those which precede the
> +fence in program order).
>
> rcu_read_lock(), rcu_read_unlock(), and synchronize_rcu() fences have
> other properties which we discuss later.
> @@ -1425,36 +1455,36 @@ requirement is the content of the LKMM's "happens-before" axiom.
>
> The LKMM defines yet another relation connected to times of
> instruction execution, but it is not included in hb. It relies on the
> -particular properties of strong fences, which we cover in the next
> -section.
> +particular properties of strong ordering operations, which we cover in the
> +next section.
>
>
> THE PROPAGATES-BEFORE RELATION: pb
> ----------------------------------
>
> The propagates-before (pb) relation capitalizes on the special
> -features of strong fences. It links two events E and F whenever some
> -store is coherence-later than E and propagates to every CPU and to RAM
> -before F executes. The formal definition requires that E be linked to
> -F via a coe or fre link, an arbitrary number of cumul-fences, an
> -optional rfe link, a strong fence, and an arbitrary number of hb
> -links. Let's see how this definition works out.
> +features of strong ordering operations. It links two events E and F
> +whenever some store is coherence-later than E and propagates to every
> +CPU and to RAM before F executes. The formal definition requires that
> +E be linked to F via a coe or fre link, an arbitrary number of
> +cumul-fences, an optional rfe link, a strong ordering operation, and an
> +arbitrary number of hb links. Let's see how this definition works out.
>
> Consider first the case where E is a store (implying that the sequence
> of links begins with coe). Then there are events W, X, Y, and Z such
> that:
>
> - E ->coe W ->cumul-fence* X ->rfe? Y ->strong-fence Z ->hb* F,
> + E ->coe W ->cumul-fence* X ->rfe? Y ->strong-order Z ->hb* F,
>
> where the * suffix indicates an arbitrary number of links of the
> specified type, and the ? suffix indicates the link is optional (Y may
> be equal to X). Because of the cumul-fence links, we know that W will
> propagate to Y's CPU before X does, hence before Y executes and hence
> -before the strong fence executes. Because this fence is strong, we
> -know that W will propagate to every CPU and to RAM before Z executes.
> -And because of the hb links, we know that Z will execute before F.
> -Thus W, which comes later than E in the coherence order, will
> -propagate to every CPU and to RAM before F executes.
> +before the strong ordering operation executes. Because of the strong
> +ordering, we know that W will propagate to every CPU and to RAM before
> +Z executes. And because of the hb links, we know that Z will execute
> +before F. Thus W, which comes later than E in the coherence order,
> +will propagate to every CPU and to RAM before F executes.
>
> The case where E is a load is exactly the same, except that the first
> link in the sequence is fre instead of coe.
> @@ -1637,8 +1667,8 @@ does not imply X ->rcu-order V, because the sequence contains only
> one rcu-gp link but two rcu-rscsi links.
>
> The rcu-order relation is important because the Grace Period Guarantee
> -means that rcu-order links act kind of like strong fences. In
> -particular, E ->rcu-order F implies not only that E begins before F
> +means that rcu-order links act kind of like a strong ordering operation.
> +In particular, E ->rcu-order F implies not only that E begins before F
> ends, but also that any write po-before E will propagate to every CPU
> before any instruction po-after F can execute. (However, it does not
> imply that E must execute before F; in fact, each synchronize_rcu()
> @@ -1675,24 +1705,23 @@ The rcu-fence relation is a simple extension of rcu-order. While
> rcu-order only links certain fence events (calls to synchronize_rcu(),
> rcu_read_lock(), or rcu_read_unlock()), rcu-fence links any events
> that are separated by an rcu-order link. This is analogous to the way
> -the strong-fence relation links events that are separated by an
> +the strong-order relation links events that are separated by an
> smp_mb() fence event (as mentioned above, rcu-order links act kind of
> -like strong fences). Written symbolically, X ->rcu-fence Y means
> -there are fence events E and F such that:
> +like strong ordering operations). Written symbolically, X ->rcu-fence Y
> +means there are fence events E and F such that:
>
> X ->po E ->rcu-order F ->po Y.
>
> From the discussion above, we see this implies not only that X
> executes before Y, but also (if X is a store) that X propagates to
> -every CPU before Y executes. Thus rcu-fence is sort of a
> -"super-strong" fence: Unlike the original strong fences (smp_mb() and
> +every CPU before Y executes. Thus unlike strong fences (smp_mb() and
> synchronize_rcu()), rcu-fence is able to link events on different
> CPUs. (Perhaps this fact should lead us to say that rcu-fence isn't
> really a fence at all!)
>
> Finally, the LKMM defines the RCU-before (rb) relation in terms of
> rcu-fence. This is done in essentially the same way as the pb
> -relation was defined in terms of strong-fence. We will omit the
> +relation was defined in terms of strong-order. We will omit the
> details; the end result is that E ->rb F implies E must execute
> before F, just as E ->pb F does (and for much the same reasons).
>
> @@ -2134,7 +2163,7 @@ intermediate event Z such that:
>
> and either:
>
> - Z is connected to Y by a strong-fence link followed by a
> + Z is connected to Y by a strong-order link followed by a
> possibly empty sequence of xb links,
>
> or:
> @@ -2153,10 +2182,10 @@ The motivations behind this definition are straightforward:
> from W, which certainly means that W must have propagated to
> R's CPU before R executed.
>
> - strong-fence memory barriers force stores that are po-before
> - the barrier, or that propagate to the barrier's CPU before the
> - barrier executes, to propagate to all CPUs before any events
> - po-after the barrier can execute.
> + Strong ordering operations force stores that are po-before
> + the operation or that propagate to the CPU that begins the
> + operation before the operation executes, to propagate to all
> + CPUs before any events po-after the operation execute.
>
> To see how this works out in practice, consider our old friend, the MP
> pattern (with fences and statement labels, but without the conditional
> @@ -2485,7 +2514,7 @@ sequence or if W can be linked to R by a
> sequence. For the cases involving a vis link, the LKMM also accepts
> sequences in which W is linked to W' or R by a
>
> - strong-fence ; xb* ; {w and/or r}-pre-bounded
> + strong-order ; xb* ; {w and/or r}-pre-bounded
>
> sequence with no post-bounding, and in every case the LKMM also allows
> the link simply to be a fence with no bounding at all. If no sequence
> diff --git a/tools/memory-model/linux-kernel.cat b/tools/memory-model/linux-kernel.cat
> index 07f884f9b2bf..1d91edbc10fe 100644
> --- a/tools/memory-model/linux-kernel.cat
> +++ b/tools/memory-model/linux-kernel.cat
> @@ -36,9 +36,7 @@ let wmb = [W] ; fencerel(Wmb) ; [W]
> let mb = ([M] ; fencerel(Mb) ; [M]) |
> ([M] ; fencerel(Before-atomic) ; [RMW] ; po? ; [M]) |
> ([M] ; po? ; [RMW] ; fencerel(After-atomic) ; [M]) |
> - ([M] ; po? ; [LKW] ; fencerel(After-spinlock) ; [M]) |
> - ([M] ; po ; [UL] ; (co | po) ; [LKW] ;
> - fencerel(After-unlock-lock) ; [M])
> + ([M] ; po? ; [LKW] ; fencerel(After-spinlock) ; [M])
> let gp = po ; [Sync-rcu | Sync-srcu] ; po?
> let strong-fence = mb | gp
>
> @@ -91,8 +89,12 @@ acyclic hb as happens-before
> (* Write and fence propagation ordering *)
> (****************************************)
>
> -(* Propagation: Each non-rf link needs a strong fence. *)
> -let pb = prop ; strong-fence ; hb* ; [Marked]
> +(* Strong ordering operations *)
> +let strong-order = strong-fence | ([M] ; po-unlock-lock-po ;
> + [After-unlock-lock] ; po ; [M])
> +
> +(* Propagation: Each non-rf link needs a strong ordering operation. *)
> +let pb = prop ; strong-order ; hb* ; [Marked]
> acyclic pb as propagation
>
> (*******)
> @@ -141,7 +143,7 @@ let rec rcu-order = rcu-gp | srcu-gp |
> (rcu-order ; rcu-link ; rcu-order)
> let rcu-fence = po ; rcu-order ; po?
> let fence = fence | rcu-fence
> -let strong-fence = strong-fence | rcu-fence
> +let strong-order = strong-order | rcu-fence
>
> (* rb orders instructions just as pb does *)
> let rb = prop ; rcu-fence ; hb* ; pb* ; [Marked]
> @@ -169,7 +171,7 @@ flag ~empty mixed-accesses as mixed-accesses
> (* Executes-before and visibility *)
> let xbstar = (hb | pb | rb)*
> let vis = cumul-fence* ; rfe? ; [Marked] ;
> - ((strong-fence ; [Marked] ; xbstar) | (xbstar & int))
> + ((strong-order ; [Marked] ; xbstar) | (xbstar & int))
>
> (* Boundaries for lifetimes of plain accesses *)
> let w-pre-bounded = [Marked] ; (addr | fence)?
> @@ -180,9 +182,9 @@ let r-post-bounded = (nonrw-fence | ([~Noreturn] ; fencerel(Rmb) ; [R4rmb]))? ;
> [Marked]
>
> (* Visibility and executes-before for plain accesses *)
> -let ww-vis = fence | (strong-fence ; xbstar ; w-pre-bounded) |
> +let ww-vis = fence | (strong-order ; xbstar ; w-pre-bounded) |
> (w-post-bounded ; vis ; w-pre-bounded)
> -let wr-vis = fence | (strong-fence ; xbstar ; r-pre-bounded) |
> +let wr-vis = fence | (strong-order ; xbstar ; r-pre-bounded) |
> (w-post-bounded ; vis ; r-pre-bounded)
> let rw-xbstar = fence | (r-post-bounded ; xbstar ; w-pre-bounded)
>
> --
> 2.17.1
>

2023-01-18 13:09:36

by Jonas Oberhauser

[permalink] [raw]

Subject: Re: [PATCH] tools/memory-model: Make ppo a subrelation of po

On 1/18/2023 12:45 AM, Boqun Feng wrote:
> On Tue, Jan 17, 2023 at 08:31:59PM +0100, Jonas Oberhauser wrote:
>> ---
>> .../Documentation/explanation.txt | 101 +++++++++++-------
>> tools/memory-model/linux-kernel.cat | 20 ++--
> I've reviewed the cat change part, and it looks good to me.

It's not the cat part that worries me! I'm worried that I made the
explanation inconsistent somewhere, or that it's not very
understandable. If you have time, please take a look at that as well.

> One more thing, do we want something in the cat file to describe our
> "internal axioms" as follow?
>
> (* Model internal invariants *)
> (* ppo is the subset of po *)
> flag ~empty (ppo \ po) as INTERNAL-ERROR-PPO
>
> Maybe it's helpful for people working on other tools to understand LKMM
> cat file?

I've thought that there should be a way to continuously test/prove
properties of LKMM so that future changes to LKMM can't silently
invalidate these properites (like what happened when this fence was
added to LKMM).

I'm not sure yet what's the best way to do this. I suppose a simple
first step could be to have an additional cat file lkmm-properties.cat
that flags all violations of the properties in the litmus tests, but I'm
not sure how much that really helps: I think many violations are not
present in any of the existing litmus tests. And some properties are
hard to state as a cat flag.

best wishes and thanks for reviewing,
jonas

2023-01-18 20:44:32

by Alan Stern

[permalink] [raw]

Subject: Re: [PATCH] tools/memory-model: Make ppo a subrelation of po

2023-01-18 21:31:28

by Andrea Parri

[permalink] [raw]

Subject: Re: [PATCH] tools/memory-model: Make ppo a subrelation of po

> It would be nice here to have a separate term for a potentially
> cross-CPU fence.
>
> In fact, why don't we make a concerted effort to straighten out the
> terminology more fully? Now seems like a good time to do it.
>
> To begin with, let's be more careful about the difference between an
> order-inducing object (an event or pair of events) and the relation of
> being ordered by such an object. For instance, given:
>
> A: WRITE_ONCE(x, 1);
> B: smp_mb();
> C: r1 = READ_ONCE(y);
>
> then B is an order-inducing object (a memory barrier), and (A,C) is a
> pair of events ordered by that object. In general, an order is related
> to an order-inducing object by:
>
> order = po ; [order-inducing object] ; po
>
> with suitable modifications for things like smp_store_release where
> one of the events being ordered _is_ the order-inducing event.
>
> So for example, we could consistently refer to all order-inducing events
> as either barriers or fences, and all order-reflecting relations as
> orders. This would require widespread changes to the .cat file, but I
> think it would be worthwhile.
>
> (Treating "barrier" and "fence" as synonyms seems to be too deeply
> entrenched to try and fight against.)
>
> Once that is straightened out, we can distinguish between fences or
> orders that are weak vs. strong. And then we can divide up strong
> fences/orders into single-CPU vs. cross-CPU, if we want to.
>
> How does that sound?

Sounds like a lot of work, renaming and review, for no clear win
to me. :-) But hey, if other are into it...

Andrea

2023-01-18 21:40:59

by Andrea Parri

[permalink] [raw]

On 1/25/2023 3:57 AM, Alan Stern wrote:
> On Tue, Jan 24, 2023 at 09:23:02PM +0100, Jonas Oberhauser wrote:
>>
>> On 1/24/2023 6:14 PM, Alan Stern wrote:
>>> On Tue, Jan 24, 2023 at 02:14:03PM +0100, Jonas Oberhauser wrote:
>>>> After mulling it over a bit in my big old head, I consider that even though
>>>> dropping the [W] may be shorter, it might make for the simpler model by
>>>> excluding lots of cases.
>>>> That makes me think you should do it for real in the definition of prop. And
>>>> not just at the very end, because in fact each cumul-fence link might come
>>>> from a non-A-cumulative fence. So the same argument you are giving should be
>>>> applied recursively.
>>>> Either
>>>>
>>>> prop = (overwrite & ext)? ; (cumul-fence; [W])* ; rfe?
>>>>
>>>> or integrate it directly into cumul-fence.
>>> I dislike this sort of argument. I understand the formal memory model
>>> by relating it to the informal operational model. Thus, cumul-fence
>>> links a write W to another event E when the fence guarantees that W will
>>> propagate to E's CPU before E executes.
>> I later wondered why it's not defined like this and realized that prop means
>> that it's before E executes.
>>
>>> That's how the memory model
>>> expresses the propagation properties of these fences.
>> I don't think that's really a perfect match though.
>> For example, W ->wmb E (and thus cumul-fence) does guarantee that W
>> propagates to E's CPU before E executes.
>> But the propagation property of wmb is that W propagates to every CPU before
>> E propagates to that CPU.
>> It just so happens that the time E propagates to E's CPU is the time it
>> executes.
>>
>> Indeed, looking at the non-strong properties of fences only, should give
>> rise to a relation that only says "W propagates to any CPU before E
>> propagates to that CPU" and that is a relation between stores. And quite
>> different from "W propagates to E's CPU before E executes".
>>
>> I believe that relation is (cumul-fence;[W])+.
> Add an rfe? to the end and you get the "before E executes" version.

Yes, but with the minor caveat that this is only for the "because of the
weak propagation ordering of fences (pfence)" case.
Current prop also includes some other "before E executes" cases, e.g.,
when the last fence is po-unlock-lock-po or a strong-fence.

> Or more accurately (rfe? ; ppo*). Hmmm, the only reason for omitting that
> ppo* term in the model is that it would never be needed. So maybe we
> should after all do the same for the hb* term at the end of pb and the
> (hb* | pb*) part at the end of rb.
>
>
> Starting from first principles, it's apparent that each of these types
> of propagation fences is associated with two relations: one involving
> propagation order and a companion relation involving execution order.
>
> Here's what I mean. For the sake of discussion let's define several
> classes of fences:
>
> efences are those which constrain execution order;
>
> pfences are those which constrain propagation order;
>
> sfences are those which strongly constrain propagation order.
>
> Each class includes the following ones. (And if you like, you can
> insert afences between pfences and sfences -- they would be the
> A-cumulative fences.)
>
> Now, the memory model builds up successively more inclusive notions of
> execution order. This process starts with execution of instructions in
> the same CPU not involving fences. Thus we have the ppo relations:
> dependencies and a few oddball things like ((overwrite ; rfe) & int) or
> ([UL] ; po ; [LKR]).
>
> Next, the efences also restrict single-CPU execution order. These
> fences only need to have one associated relation since they don't
> specifically involve propagation. Adding rfe to the list gives us
> inter-CPU ordering.
>
> Then associated with pfences we have the relation you've been talking
> about:
>
> W propagates to each CPU before W' does.
>
> This is (cumul-fence ; [W]). Perhaps a better name for it would be
> wprop. Given this relation, we obtain a companion relation that
> restricts execution order:
>
> ((overwrite & ext) ; wprop+ ; rfe) & int.
>
> (Note that the overall form is the same for afences as for pfences.)
> Adding this companion relation into the mix gives us essentially hb.
>
> For sfences the associated relation expresses:
>
> W propagates to every CPU before Y executes.
>
> This is basically (wprop* ; rfe? ; sfence) (using the fact that all
> sfences are A-cumulative) -- or if you prefer, (wprop* ; cumul-sfence).
> We can call this sprop. Then the companion relation restricting
> execution order is:
>
> (overwrite & ext) ; sprop
>
> For RCU, the associated relation expressing t2(A) < t1(B) is rcu-order
> and the companion relation is rcu-fence.
Do we put rcu-order under sprop as well? Otherwise you need

(overwrite & ext)? ; rcu-fence

to express the full companion relation.

>
> Putting all those execution-order relations together gives us xb, the
> executes-before relation. Then the only axiom we need for all of this
> that xb is acyclic.
>
> Of course, I have left out a lot of details. Still, how does that sound
> as a scheme for rationalizing the memory model?

It seems like we're on the same page!
It would be an honor for me to fill in the details and propose a patch,
if you're interested.

Have fun, jonas

2023-01-25 15:56:55

by Alan Stern

[permalink] [raw]

Subject: Re: [PATCH] tools/memory-model: Make ppo a subrelation of po

On Wed, Jan 25, 2023 at 02:06:01PM +0100, Jonas Oberhauser wrote:
>
>
> On 1/25/2023 3:57 AM, Alan Stern wrote:
> > Starting from first principles, it's apparent that each of these types
> > of propagation fences is associated with two relations: one involving
> > propagation order and a companion relation involving execution order.
> >
> > Here's what I mean. For the sake of discussion let's define several
> > classes of fences:
> >
> > efences are those which constrain execution order;
> >
> > pfences are those which constrain propagation order;
> >
> > sfences are those which strongly constrain propagation order.
> >
> > Each class includes the following ones. (And if you like, you can
> > insert afences between pfences and sfences -- they would be the
> > A-cumulative fences.)
> >
> > Now, the memory model builds up successively more inclusive notions of
> > execution order. This process starts with execution of instructions in
> > the same CPU not involving fences. Thus we have the ppo relations:
> > dependencies and a few oddball things like ((overwrite ; rfe) & int) or
> > ([UL] ; po ; [LKR]).
> >
> > Next, the efences also restrict single-CPU execution order. These
> > fences only need to have one associated relation since they don't
> > specifically involve propagation. Adding rfe to the list gives us
> > inter-CPU ordering.
> >
> > Then associated with pfences we have the relation you've been talking
> > about:
> >
> > W propagates to each CPU before W' does.
> >
> > This is (cumul-fence ; [W]). Perhaps a better name for it would be
> > wprop. Given this relation, we obtain a companion relation that
> > restricts execution order:
> >
> > ((overwrite & ext) ; wprop+ ; rfe) & int.
> >
> > (Note that the overall form is the same for afences as for pfences.)
> > Adding this companion relation into the mix gives us essentially hb.
> >
> > For sfences the associated relation expresses:
> >
> > W propagates to every CPU before Y executes.
> >
> > This is basically (wprop* ; rfe? ; sfence) (using the fact that all
> > sfences are A-cumulative) -- or if you prefer, (wprop* ; cumul-sfence).
> > We can call this sprop. Then the companion relation restricting
> > execution order is:
> >
> > (overwrite & ext) ; sprop
> >
> > For RCU, the associated relation expressing t2(A) < t1(B) is rcu-order
> > and the companion relation is rcu-fence.
> Do we put rcu-order under sprop as well? Otherwise you need
>
> (overwrite & ext)? ; rcu-fence
>
> to express the full companion relation.

My mistake; what I meant was something a little smaller than rb. That
is,

(overwrite & ext) ; wprop* ; rfe? ; rcu-fence

In other words, a relation expressing that rcu-fence acts as a strong
fence. But also something expressing that rcu-fence acts as an efence?
-- I guess this is covered if the (overwrite & ext) part above is made
optional, although that feels a little artificial.

I don't think we will be able to include rcu-fence in sprop, because we
will need to use sprop in the definition of rcu-fence.

Oh yes, one other piece of terminology I forgot to mention. The things
you referred to before as "ordering operations" could instead be called
"extended fences". What do you think?

> > Putting all those execution-order relations together gives us xb, the
> > executes-before relation. Then the only axiom we need for all of this
> > that xb is acyclic.
> >
> > Of course, I have left out a lot of details. Still, how does that sound
> > as a scheme for rationalizing the memory model?
>
> It seems like we're on the same page!
> It would be an honor for me to fill in the details and propose a patch, if
> you're interested.

An invasive, multi-faceted change like this has to be broken down into
multiple patches, each doing only one thing and each easily verified as
not changing the meaning of the code. Feel free to go ahead and work
out a proposal.

Alan