Hello!
This series contains LKMM documentation updates:
1. Documentation/litmus-tests: Add locking tests to README.
2. Documentation/litmus-tests: Demonstrate unordered failing cmpxchg.
3. Documentation/atomic_t: Emphasize that failed atomic operations
give no ordering.
Thanx, Paul
------------------------------------------------------------------------
Documentation/litmus-tests/README | 48 ++++++----
b/Documentation/atomic_t.txt | 4
b/Documentation/litmus-tests/README | 29 ++++++
b/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-1.litmus | 34 +++++++
b/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-2.litmus | 30 ++++++
b/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-1.litmus | 33 ++++++
b/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-2.litmus | 30 ++++++
7 files changed, 190 insertions(+), 18 deletions(-)
Hello!
This series contains LKMM documentation updates:
1. Documentation/litmus-tests: Add locking tests to README.
2. Documentation/litmus-tests: Demonstrate unordered failing cmpxchg.
3. Documentation/atomic_t: Emphasize that failed atomic operations
give no ordering.
4. Documentation/litmus-tests: Make cmpxchg() tests safe for klitmus.
Changes since V1:
o Apply formatting feedback from Andrea Parri
o Add patch 4 that makes tests safe for klitmus, also based on
feedback from Andrea Parri.
Thanx, Paul
------------------------------------------------------------------------
Documentation/litmus-tests/README | 16 ++++
Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-1.litmus | 1
Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-2.litmus | 4 -
Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-1.litmus | 1
Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-2.litmus | 4 -
b/Documentation/atomic_t.txt | 4 -
b/Documentation/litmus-tests/README | 29 ++++++++
b/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-1.litmus | 34 ++++++++++
b/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-2.litmus | 30 ++++++++
b/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-1.litmus | 33 +++++++++
b/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-2.litmus | 30 ++++++++
11 files changed, 180 insertions(+), 6 deletions(-)
The four litmus tests in Documentation/litmus-tests/atomic do not
declare all of their local variables. Although this is just fine for LKMM
analysis by herd7, it causes build failures when run in-kernel by klitmus.
This commit therefore adjusts these tests to declare all local variables.
Reported-by: Andrea Parri <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
.../litmus-tests/atomic/cmpxchg-fail-ordered-1.litmus | 1 +
.../litmus-tests/atomic/cmpxchg-fail-ordered-2.litmus | 4 ++--
.../litmus-tests/atomic/cmpxchg-fail-unordered-1.litmus | 1 +
.../litmus-tests/atomic/cmpxchg-fail-unordered-2.litmus | 4 ++--
4 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-1.litmus b/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-1.litmus
index 3df1d140b189b..c0f93dc07105e 100644
--- a/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-1.litmus
+++ b/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-1.litmus
@@ -23,6 +23,7 @@ P0(int *x, int *y, int *z)
P1(int *x, int *y, int *z)
{
int r0;
+ int r1;
WRITE_ONCE(*y, 1);
r1 = cmpxchg(z, 1, 0);
diff --git a/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-2.litmus b/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-2.litmus
index 54146044a16f6..5c06054f46947 100644
--- a/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-2.litmus
+++ b/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-2.litmus
@@ -11,7 +11,6 @@ C cmpxchg-fail-ordered-2
P0(int *x, int *y)
{
- int r0;
int r1;
WRITE_ONCE(*x, 1);
@@ -20,7 +19,8 @@ P0(int *x, int *y)
P1(int *x, int *y)
{
- int r0;
+ int r1;
+ int r2;
r1 = cmpxchg(y, 0, 1);
smp_mb__after_atomic();
diff --git a/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-1.litmus b/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-1.litmus
index a727ce23b1a6e..39ea1f56a28d2 100644
--- a/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-1.litmus
+++ b/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-1.litmus
@@ -23,6 +23,7 @@ P0(int *x, int *y, int *z)
P1(int *x, int *y, int *z)
{
int r0;
+ int r1;
WRITE_ONCE(*y, 1);
r1 = cmpxchg(z, 1, 0);
diff --git a/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-2.litmus b/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-2.litmus
index a245bac55b578..61aab24a4a643 100644
--- a/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-2.litmus
+++ b/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-2.litmus
@@ -12,7 +12,6 @@ C cmpxchg-fail-unordered-2
P0(int *x, int *y)
{
- int r0;
int r1;
WRITE_ONCE(*x, 1);
@@ -21,7 +20,8 @@ P0(int *x, int *y)
P1(int *x, int *y)
{
- int r0;
+ int r1;
+ int r2;
r1 = cmpxchg(y, 0, 1);
r2 = READ_ONCE(*x);
--
2.40.1
This commit documents the litmus tests in the "locking" directory.
[ paulmck: Apply formatting feedback from Andrea Parri. ]
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Alan Stern <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: David Howells <[email protected]>
Cc: Jade Alglave <[email protected]>
Cc: Luc Maranget <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Akira Yokosawa <[email protected]>
Cc: Daniel Lustig <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
---
Documentation/litmus-tests/README | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/Documentation/litmus-tests/README b/Documentation/litmus-tests/README
index 658d37860d397..26ca56df02125 100644
--- a/Documentation/litmus-tests/README
+++ b/Documentation/litmus-tests/README
@@ -22,6 +22,35 @@ Atomic-RMW-ops-are-atomic-WRT-atomic_set.litmus
NOTE: Require herd7 7.56 or later which supports "(void)expr".
+locking (/locking directory)
+----------------------------
+
+DCL-broken.litmus
+ Demonstrates that double-checked locking needs more than just
+ the obvious lock acquisitions and releases.
+
+DCL-fixed.litmus
+ Demonstrates corrected double-checked locking that uses
+ smp_store_release() and smp_load_acquire() in addition to the
+ obvious lock acquisitions and releases.
+
+RM-broken.litmus
+ Demonstrates problems with "roach motel" locking, where code is
+ freely moved into lock-based critical sections. This example also
+ shows how to use the "filter" clause to discard executions that
+ would be excluded by other code not modeled in the litmus test.
+ Note also that this "roach motel" optimization is emulated by
+ physically moving P1()'s two reads from x under the lock.
+
+ What is a roach motel? This is from an old advertisement for
+ a cockroach trap, much later featured in one of the "Men in
+ Black" movies. "The roaches check in. They don't check out."
+
+RM-fixed.litmus
+ The counterpart to RM-broken.litmus, showing P0()'s two loads from
+ x safely outside of the critical section.
+
+
RCU (/rcu directory)
--------------------
--
2.40.1
The ORDERING section of Documentation/atomic_t.txt can easily be read as
saying that conditional atomic RMW operations that fail are ordered when
those operations have the _acquire() or _release() suffixes. This is
not the case, therefore update this section to make it clear that failed
conditional atomic RMW operations provide no ordering.
Reported-by: Anna-Maria Behnsen <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Alan Stern <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: David Howells <[email protected]>
Cc: Jade Alglave <[email protected]>
Cc: Luc Maranget <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Akira Yokosawa <[email protected]>
Cc: Daniel Lustig <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Acked-by: Andrea Parri <[email protected]>
Acked-by: Mark Rutland <[email protected]>
---
Documentation/atomic_t.txt | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/Documentation/atomic_t.txt b/Documentation/atomic_t.txt
index d7adc6d543db4..bee3b1bca9a7b 100644
--- a/Documentation/atomic_t.txt
+++ b/Documentation/atomic_t.txt
@@ -171,14 +171,14 @@ The rule of thumb:
- RMW operations that are conditional are unordered on FAILURE,
otherwise the above rules apply.
-Except of course when an operation has an explicit ordering like:
+Except of course when a successful operation has an explicit ordering like:
{}_relaxed: unordered
{}_acquire: the R of the RMW (or atomic_read) is an ACQUIRE
{}_release: the W of the RMW (or atomic_set) is a RELEASE
Where 'unordered' is against other memory locations. Address dependencies are
-not defeated.
+not defeated. Conditional operations are still unordered on FAILURE.
Fully ordered primitives are ordered against everything prior and everything
subsequent. Therefore a fully ordered primitive is like having an smp_mb()
--
2.40.1
This commit adds four litmus tests showing that a failing cmpxchg()
operation is unordered unless followed by an smp_mb__after_atomic()
operation.
Suggested-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Alan Stern <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: David Howells <[email protected]>
Cc: Jade Alglave <[email protected]>
Cc: Luc Maranget <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Akira Yokosawa <[email protected]>
Cc: Daniel Lustig <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
---
Documentation/litmus-tests/README | 16 +++++++++
.../atomic/cmpxchg-fail-ordered-1.litmus | 34 +++++++++++++++++++
.../atomic/cmpxchg-fail-ordered-2.litmus | 30 ++++++++++++++++
.../atomic/cmpxchg-fail-unordered-1.litmus | 33 ++++++++++++++++++
.../atomic/cmpxchg-fail-unordered-2.litmus | 30 ++++++++++++++++
5 files changed, 143 insertions(+)
create mode 100644 Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-1.litmus
create mode 100644 Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-2.litmus
create mode 100644 Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-1.litmus
create mode 100644 Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-2.litmus
diff --git a/Documentation/litmus-tests/README b/Documentation/litmus-tests/README
index 26ca56df02125..6c666f3422ea3 100644
--- a/Documentation/litmus-tests/README
+++ b/Documentation/litmus-tests/README
@@ -21,6 +21,22 @@ Atomic-RMW-ops-are-atomic-WRT-atomic_set.litmus
Test that atomic_set() cannot break the atomicity of atomic RMWs.
NOTE: Require herd7 7.56 or later which supports "(void)expr".
+cmpxchg-fail-ordered-1.litmus
+ Demonstrate that a failing cmpxchg() operation acts as a full barrier
+ when followed by smp_mb__after_atomic().
+
+cmpxchg-fail-ordered-2.litmus
+ Demonstrate that a failing cmpxchg() operation acts as an acquire
+ operation when followed by smp_mb__after_atomic().
+
+cmpxchg-fail-unordered-1.litmus
+ Demonstrate that a failing cmpxchg() operation does not act as a
+ full barrier.
+
+cmpxchg-fail-unordered-2.litmus
+ Demonstrate that a failing cmpxchg() operation does not act as an
+ acquire operation.
+
locking (/locking directory)
----------------------------
diff --git a/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-1.litmus b/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-1.litmus
new file mode 100644
index 0000000000000..3df1d140b189b
--- /dev/null
+++ b/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-1.litmus
@@ -0,0 +1,34 @@
+C cmpxchg-fail-ordered-1
+
+(*
+ * Result: Never
+ *
+ * Demonstrate that a failing cmpxchg() operation will act as a full
+ * barrier when followed by smp_mb__after_atomic().
+ *)
+
+{}
+
+P0(int *x, int *y, int *z)
+{
+ int r0;
+ int r1;
+
+ WRITE_ONCE(*x, 1);
+ r1 = cmpxchg(z, 1, 0);
+ smp_mb__after_atomic();
+ r0 = READ_ONCE(*y);
+}
+
+P1(int *x, int *y, int *z)
+{
+ int r0;
+
+ WRITE_ONCE(*y, 1);
+ r1 = cmpxchg(z, 1, 0);
+ smp_mb__after_atomic();
+ r0 = READ_ONCE(*x);
+}
+
+locations[0:r1;1:r1]
+exists (0:r0=0 /\ 1:r0=0)
diff --git a/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-2.litmus b/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-2.litmus
new file mode 100644
index 0000000000000..54146044a16f6
--- /dev/null
+++ b/Documentation/litmus-tests/atomic/cmpxchg-fail-ordered-2.litmus
@@ -0,0 +1,30 @@
+C cmpxchg-fail-ordered-2
+
+(*
+ * Result: Never
+ *
+ * Demonstrate use of smp_mb__after_atomic() to make a failing cmpxchg
+ * operation have acquire ordering.
+ *)
+
+{}
+
+P0(int *x, int *y)
+{
+ int r0;
+ int r1;
+
+ WRITE_ONCE(*x, 1);
+ r1 = cmpxchg(y, 0, 1);
+}
+
+P1(int *x, int *y)
+{
+ int r0;
+
+ r1 = cmpxchg(y, 0, 1);
+ smp_mb__after_atomic();
+ r2 = READ_ONCE(*x);
+}
+
+exists (0:r1=0 /\ 1:r1=1 /\ 1:r2=0)
diff --git a/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-1.litmus b/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-1.litmus
new file mode 100644
index 0000000000000..a727ce23b1a6e
--- /dev/null
+++ b/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-1.litmus
@@ -0,0 +1,33 @@
+C cmpxchg-fail-unordered-1
+
+(*
+ * Result: Sometimes
+ *
+ * Demonstrate that a failing cmpxchg() operation does not act as a
+ * full barrier. (In contrast, a successful cmpxchg() does act as a
+ * full barrier.)
+ *)
+
+{}
+
+P0(int *x, int *y, int *z)
+{
+ int r0;
+ int r1;
+
+ WRITE_ONCE(*x, 1);
+ r1 = cmpxchg(z, 1, 0);
+ r0 = READ_ONCE(*y);
+}
+
+P1(int *x, int *y, int *z)
+{
+ int r0;
+
+ WRITE_ONCE(*y, 1);
+ r1 = cmpxchg(z, 1, 0);
+ r0 = READ_ONCE(*x);
+}
+
+locations[0:r1;1:r1]
+exists (0:r0=0 /\ 1:r0=0)
diff --git a/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-2.litmus b/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-2.litmus
new file mode 100644
index 0000000000000..a245bac55b578
--- /dev/null
+++ b/Documentation/litmus-tests/atomic/cmpxchg-fail-unordered-2.litmus
@@ -0,0 +1,30 @@
+C cmpxchg-fail-unordered-2
+
+(*
+ * Result: Sometimes
+ *
+ * Demonstrate that a failing cmpxchg() operation does not act as either
+ * an acquire release operation. (In contrast, a successful cmpxchg()
+ * does act as both an acquire and a release operation.)
+ *)
+
+{}
+
+P0(int *x, int *y)
+{
+ int r0;
+ int r1;
+
+ WRITE_ONCE(*x, 1);
+ r1 = cmpxchg(y, 0, 1);
+}
+
+P1(int *x, int *y)
+{
+ int r0;
+
+ r1 = cmpxchg(y, 0, 1);
+ r2 = READ_ONCE(*x);
+}
+
+exists (0:r1=0 /\ 1:r1=1 /\ 1:r2=0)
--
2.40.1
> This series contains LKMM documentation updates:
>
> 1. Documentation/litmus-tests: Add locking tests to README.
>
> 2. Documentation/litmus-tests: Demonstrate unordered failing cmpxchg.
>
> 3. Documentation/atomic_t: Emphasize that failed atomic operations
> give no ordering.
>
> 4. Documentation/litmus-tests: Make cmpxchg() tests safe for klitmus.
For the series,
Acked-by: Andrea Parri <[email protected]>
Andrea
On Thu, May 02, 2024 at 11:36:51AM +0200, Andrea Parri wrote:
> > This series contains LKMM documentation updates:
> >
> > 1. Documentation/litmus-tests: Add locking tests to README.
> >
> > 2. Documentation/litmus-tests: Demonstrate unordered failing cmpxchg.
> >
> > 3. Documentation/atomic_t: Emphasize that failed atomic operations
> > give no ordering.
> >
> > 4. Documentation/litmus-tests: Make cmpxchg() tests safe for klitmus.
>
> For the series,
>
> Acked-by: Andrea Parri <[email protected]>
Thank you! I will apply on my next rebase.
Thanx, Paul
Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney:
> This commit adds four litmus tests showing that a failing cmpxchg()
> operation is unordered unless followed by an smp_mb__after_atomic()
> operation.
So far, my understanding was that all RMW operations without suffix
(xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb].
I guess this shows again how important it is to model these full
barriers explicitly inside the cat model, instead of relying on implicit
conversions internal to herd.
I'd like to propose a patch to this effect.
What is the intended behavior of a failed cmpxchg()? Is it the same as a
relaxed one?
My suggestion would be in the direction of marking read and write events
of these operations as Mb, and then defining
(* full barrier events that appear in non-failing RMW *)
let RMW_MB = Mb & (dom(rmw) | range(rmw))
let mb =
[M] ; fencerel(Mb) ; [M]
| [M] ; (po \ si ; rmw) ; [RMW_MB] ; po^? ; [M]
| [M] ; po^? ; [RMW_MB] ; (po \ rmw ; si) ; [M]
| ...
The po \ si;rmw is because ordering is not provided internally of the
rmw, although I suspect that after we added release sequences it could
perhaps be simplified to
let mb =
[M] ; fencerel(Mb) ; [M]
| [M] ; po ; [RMW_MB] ; po^? ; [M]
| [M] ; po^? ; [RMW_MB] ; po ; [M]
| ...
or
let mb =
[M] ; fencerel(Mb) ; [M]
| [M] ; po & (po^? ; [RMW_MB] ; po^?) ; [M]
| ...
(the po & is necessary to avoid trivial hb cycles of an RMW event
happening before itself)
Any interest?
Have fun,
jonas
Am 5/6/2024 um 12:05 PM schrieb Jonas Oberhauser:
>
>
> Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney:
>> This commit adds four litmus tests showing that a failing cmpxchg()
>> operation is unordered unless followed by an smp_mb__after_atomic()
>> operation.
>
> So far, my understanding was that all RMW operations without suffix
> (xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb].
>
> I guess this shows again how important it is to model these full
> barriers explicitly inside the cat model, instead of relying on implicit
> conversions internal to herd.
>
> I'd like to propose a patch to this effect.
>
> What is the intended behavior of a failed cmpxchg()? Is it the same as a
> relaxed one?
>
> My suggestion would be in the direction of marking read and write events
> of these operations as Mb, and then defining
>
> (* full barrier events that appear in non-failing RMW *)
> let RMW_MB = Mb & (dom(rmw) | range(rmw))
>
>
> let mb =
> [M] ; fencerel(Mb) ; [M]
> | [M] ; (po \ rmw) ; [RMW_MB] ; po^? ; [M]
> | [M] ; po^? ; [RMW_MB] ; (po \ rmw) ; [M]
> | ...
>
> The po \ rmw is because ordering is not provided internally of the
> rmw
(removed the unnecessary si since LKMM is still non-mixed-accesses)
This could also be written with a single rule:
| [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M]
> I suspect that after we added [rmw] sequences it could
> perhaps be simplified [...]
>
No, my suspicion is wrong - this would incorrectly let full-barrier RMWs
act like strong fences when they appear in an rmw sequence.
if (z==1) || x = 2; || xchg(&y,2) || if (y==2)
x = 1; || y =_rel 1; || || z=1;
right now, we allow x=2 overwriting x=1 (in case the last thread does
not propagate x=2 along with z=1) because on power, the xchg might be
implemented with a sync that doesn't get executed until the very end
of the program run.
Instead of its negative form (everything other than inside the rmw),
it could also be rewritten positively. Here's a somewhat short form:
let mb =
[M] ; fencerel(Mb) ; [M]
(* everything across a full barrier RMW is ordered. This includes up
to one event inside the RMW. *)
| [M] ; po ; [RMW_MB] ; po ; [M]
(* full barrier RMW writes are ordered with everything behind the RMW *)
| [W & RMW_MB] ; po ; [M]
(* full barrier RMW reads are ordered with everything before the RMW *)
| [M] ; po ; [R & RMW_MB]
| ...
Good luck,
jonas
On Mon, May 06, 2024 at 06:30:45PM +0200, Jonas Oberhauser wrote:
> Am 5/6/2024 um 12:05 PM schrieb Jonas Oberhauser:
> > Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney:
> > > This commit adds four litmus tests showing that a failing cmpxchg()
> > > operation is unordered unless followed by an smp_mb__after_atomic()
> > > operation.
> >
> > So far, my understanding was that all RMW operations without suffix
> > (xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb].
> >
> > I guess this shows again how important it is to model these full
> > barriers explicitly inside the cat model, instead of relying on implicit
> > conversions internal to herd.
> >
> > I'd like to propose a patch to this effect.
> >
> > What is the intended behavior of a failed cmpxchg()? Is it the same as a
> > relaxed one?
Yes, and unless I am too confused, LKMM currently does implement this.
Please let me know if I am missing something.
> > My suggestion would be in the direction of marking read and write events
> > of these operations as Mb, and then defining
> >
> > (* full barrier events that appear in non-failing RMW *)
> > let RMW_MB = Mb & (dom(rmw) | range(rmw))
> >
> >
> > let mb =
> > ??? [M] ; fencerel(Mb) ; [M]
> > ? | [M] ; (po \ rmw) ; [RMW_MB] ; po^? ; [M]
> > ? | [M] ; po^? ; [RMW_MB] ; (po \ rmw) ; [M]
> > ? | ...
> >
> > The po \ rmw is because ordering is not provided internally of the rmw
>
> (removed the unnecessary si since LKMM is still non-mixed-accesses)
Addition of mixed-access support would be quite welcome!
> This could also be written with a single rule:
>
> | [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M]
>
> > I suspect that after we added [rmw] sequences it could perhaps be
> > simplified [...]
>
> No, my suspicion is wrong - this would incorrectly let full-barrier RMWs
> act like strong fences when they appear in an rmw sequence.
>
> if (z==1) || x = 2; || xchg(&y,2) || if (y==2)
> x = 1; || y =_rel 1; || || z=1;
>
>
> right now, we allow x=2 overwriting x=1 (in case the last thread does not
> propagate x=2 along with z=1) because on power, the xchg might be
> implemented with a sync that doesn't get executed until the very end
> of the program run.
>
>
> Instead of its negative form (everything other than inside the rmw),
> it could also be rewritten positively. Here's a somewhat short form:
>
> let mb =
> [M] ; fencerel(Mb) ; [M]
> (* everything across a full barrier RMW is ordered. This includes up to
> one event inside the RMW. *)
> | [M] ; po ; [RMW_MB] ; po ; [M]
> (* full barrier RMW writes are ordered with everything behind the RMW *)
> | [W & RMW_MB] ; po ; [M]
> (* full barrier RMW reads are ordered with everything before the RMW *)
> | [M] ; po ; [R & RMW_MB]
> | ...
Does this produce the results expected by the litmus tests in the Linux
kernel source tree and also those at https://github.com/paulmckrcu/litmus?
Thanx, Paul
On Mon, May 06, 2024 at 11:00:42AM -0700, Paul E. McKenney wrote:
> On Mon, May 06, 2024 at 06:30:45PM +0200, Jonas Oberhauser wrote:
> > Am 5/6/2024 um 12:05 PM schrieb Jonas Oberhauser:
> > > Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney:
> > > > This commit adds four litmus tests showing that a failing cmpxchg()
> > > > operation is unordered unless followed by an smp_mb__after_atomic()
> > > > operation.
> > >
> > > So far, my understanding was that all RMW operations without suffix
> > > (xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb].
It's more accurate to say that RMW operations without a suffix that
return a value will be interpreted that way. So for example,
atomic_inc() doesn't imply any ordering, because it doesn't return a
value.
> > > barriers explicitly inside the cat model, instead of relying on implicit
> > > conversions internal to herd.
Don't the annotations in linux-kernel.def and linux-kernel.bell (like
"noreturn") already make this explicit?
I guess the part that is still implicit is that herd7 doesn't regard
failed RMW operations as actual RMWs (they don't have a store part).
Alan
Am 5/6/2024 um 9:21 PM schrieb Alan Stern:
> On Mon, May 06, 2024 at 11:00:42AM -0700, Paul E. McKenney wrote:
>> On Mon, May 06, 2024 at 06:30:45PM +0200, Jonas Oberhauser wrote:
>>> Am 5/6/2024 um 12:05 PM schrieb Jonas Oberhauser:
>>>> Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney:
>>>>> This commit adds four litmus tests showing that a failing cmpxchg()
>>>>> operation is unordered unless followed by an smp_mb__after_atomic()
>>>>> operation.
>>>>
>>>> So far, my understanding was that all RMW operations without suffix
>>>> (xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb].
>
> It's more accurate to say that RMW operations without a suffix that
> return a value will be interpreted that way. So for example,
> atomic_inc() doesn't imply any ordering, because it doesn't return a
> value.
>
I see, thanks.
>>>> barriers explicitlyinside the cat model, instead of relying on implicit
>>>> conversions internal to herd.
>
> Don't the annotations in linux-kernel.def and linux-kernel.bell (like
> "noreturn") already make this explicit?
Not that I'm aware. All I can see there is that according to .bell RMW
don't have an mb mode, but according to .def they do.
How this mb disappears between parsing the code (.def) and interpreting
it (.bell) is totally implicit. Including how noreturn affects this
disappeareance.
In fact most tool developers that support LKMM (Viktor, Hernan, and Luc)
were at least once confused about it. And I think they read those files
more carefully than I.
https://github.com/herd/herdtools7/issues/384#issuecomment-1132859904
Note that while there's no explicit annotation of noreturn in the .def
file, at least I can guess based on context that it should be annotated
on all the functions that don't have _return and for which also a
version with _return exists.
have fun,
jonas
Am 5/6/2024 um 8:00 PM schrieb Paul E. McKenney:
> On Mon, May 06, 2024 at 06:30:45PM +0200, Jonas Oberhauser wrote:
>> Am 5/6/2024 um 12:05 PM schrieb Jonas Oberhauser:
>>> Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney:
>>>> This commit adds four litmus tests showing that a failing cmpxchg()
>>>> operation is unordered unless followed by an smp_mb__after_atomic()
>>>> operation.
>>>
>>> So far, my understanding was that all RMW operations without suffix
>>> (xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb].
>>>
>>> I guess this shows again how important it is to model these full
>>> barriers explicitly inside the cat model, instead of relying on implicit
>>> conversions internal to herd.
>>>
>>> I'd like to propose a patch to this effect.
>>>
>>> What is the intended behavior of a failed cmpxchg()? Is it the same as a
>>> relaxed one?
>
> Yes, and unless I am too confused, LKMM currently does implement this.
> Please let me know if I am missing something.
At least the herd and Dat3M implementations seem to be doing that, at
least according to this thread sent to me by Hernan.
https://github.com/herd/herdtools7/issues/384#issue-1243049709
>
>>> My suggestion would be in the direction of marking read and write events
>>> of these operations as Mb, and then defining
>>>
>>> (* full barrier events that appear in non-failing RMW *)
>>> let RMW_MB = Mb & (dom(rmw) | range(rmw))
>>>
>>>
>>> let mb =
>>> [M] ; fencerel(Mb) ; [M]
>>> | [M] ; (po \ rmw) ; [RMW_MB] ; po^? ; [M]
>>> | [M] ; po^? ; [RMW_MB] ; (po \ rmw) ; [M]
>>> | ...
>>>
>>> The po \ rmw is because ordering is not provided internally of the rmw
>>
>> (removed the unnecessary si since LKMM is still non-mixed-accesses)
>
> Addition of mixed-access support would be quite welcome!
:P
>> This could also be written with a single rule:
>>
>> | [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M]
>>
>>> I suspect that after we added [rmw] sequences it could perhaps be
>>> simplified [...]
>>
>> No, my suspicion is wrong - this would incorrectly let full-barrier RMWs
>> act like strong fences when they appear in an rmw sequence.
>>
>> if (z==1) || x = 2; || xchg(&y,2) || if (y==2)
>> x = 1; || y =_rel 1; || || z=1;
>>
>>
>> right now, we allow x=2 overwriting x=1 (in case the last thread does not
>> propagate x=2 along with z=1) because on power, the xchg might be
>> implemented with a sync that doesn't get executed until the very end
>> of the program run.
>>
>>
>> Instead of its negative form (everything other than inside the rmw),
>> it could also be rewritten positively. Here's a somewhat short form:
>>
>> let mb =
>> [M] ; fencerel(Mb) ; [M]
>> (* everything across a full barrier RMW is ordered. This includes up to
>> one event inside the RMW. *)
>> | [M] ; po ; [RMW_MB] ; po ; [M]
>> (* full barrier RMW writes are ordered with everything behind the RMW *)
>> | [W & RMW_MB] ; po ; [M]
>> (* full barrier RMW reads are ordered with everything before the RMW *)
>> | [M] ; po ; [R & RMW_MB]
>> | ...
>
> Does this produce the results expected by the litmus tests in the Linux
> kernel source tree and also those at https://github.com/paulmckrcu/litmus?
I suspect that it doesn't work out of the box because of some of the
implicit magic herd is doing that could get in the way, so I'd need some
help from Luc to actually turn this into a patch that can be tested.
(or at least confirmation that just by changing a few things in the .def
& .bell files we can sidestep the implicit behaviors).
But at least in my proofs it seems to be equivalent.
(there may still be differences in opinion on what some herd things
mean, so what I/Viktor have formalized as the semantics of the herd
model may not be exactly the behavior of LKMM in herd. hence testing is
necessary too as a sanity check)
best wishes,
jonas
> > Don't the annotations in linux-kernel.def and linux-kernel.bell (like
> > "noreturn") already make this explicit?
>
> Not that I'm aware. All I can see there is that according to .bell RMW don't
> have an mb mode, but according to .def they do.
>
> How this mb disappears between parsing the code (.def) and interpreting it
> (.bell) is totally implicit. Including how noreturn affects this
> disappeareance.
IIRC, that's more or less implicit ;-) in the herd macros of the .def file;
for example,
(noreturn)
- atomic_add(V,X) { __atomic_op(X,+,V); }
Generates a pair of R*[noreturn] and W*[once] events
(w/ return)
- atomic_add_return_relaxed(V,X) __atomic_op_return{once}(X,+,V)
Generates a pair of R*[once] and W*[once] events
- atomic_add_return_acquire(V,X) __atomic_op_return{acquire}(X,+,V)
Generates a pair of R*[acquire] and W*[once] events
- atomic_add_return_release(V,X) __atomic_op_return{acquire}(X,+,V)
Generates a pair of R*[once] and W*[release] events
- atomic_add_return(V,X) __atomic_op_return{mb}(X,+,V)
Generates a pair of R*[once] and W*[once] events plus two F[mb] events
(possibly failing)
- atomic_cmpxchg_relaxed(X,V,W) __cmpxchg{once}(X,V,W)
Generates a pair of R*[once] and W*[once] events if successful;
a single R*[once] event otherwise.
- atomic_cmpxchg_acquire(X,V,W) __cmpxchg{acquire}(X,V,W)
Generates a pair of R*[acquire] and W*[once] events if successful;
a single R*[once] event otherwise.
- atomic_cmpxchg_release(X,V,W) __cmpxchg{release}(X,V,W)
Generates a pair of R*[once] and W*[release] events if successful;
a single R*[once] event otherwise.
- atomic_cmpxchg(X,V,W) __cmpxchg{mb}(X,V,W)
Generates a pair of R*[once] and W*[once] events plus two F[mb] events
if successful; a single R*[once] event otherwise.
The line
instructions RMW[{'once,'acquire,'release}]
in the .bell file seems to be effectively redundant (perhaps a LISA backward
-compatibility?): consider
$ cat rmw.litmus
C rmw
{}
P0(atomic_t *x)
{
int r0;
r0 = atomic_inc_return_release(x);
}
exists (x=1)
Some experiments:
- Upon removing 'release from "(instructions) RMW"
$ herd7 -conf linux-kernel.cfg rmw.litmus
Test rmw Allowed
States 1
[x]=1;
Ok
Witnesses
Positive: 1 Negative: 0
Condition exists ([x]=1)
Observation rmw Always 1 0
Time rmw 0.00
Hash=3a2dd354c250206d993d31f05f3f595c
- Upon restoring 'release in RMW and removing it from W
$ herd7 -conf linux-kernel.cfg rmw.litmus
Test rmw Allowed
States 1
[x]=1;
Ok
Witnesses
Positive: 1 Negative: 0
Condition exists ([x]=1)
Observation rmw Always 1 0
Time rmw 0.00
Hash=3a2dd354c250206d993d31f05f3f595c
- Upon removing 'release from both W and RMW
$ herd7 -conf linux-kernel.cfg rmw.litmus (herd complains... )
File "./linux-kernel.bell", line 76, characters 32-39: unbound var: Release
But I'd have to defer to Luc/Jade about herd internals and/or recommended style
on this matter.
Andrea
On 5/6/2024 8:00 PM, Paul E. McKenney wrote:
> On Mon, May 06, 2024 at 06:30:45PM +0200, Jonas Oberhauser wrote:
>> Am 5/6/2024 um 12:05 PM schrieb Jonas Oberhauser:
>>> Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney:
>>>> This commit adds four litmus tests showing that a failing cmpxchg()
>>>> operation is unordered unless followed by an smp_mb__after_atomic()
>>>> operation.
>>>
>>> So far, my understanding was that all RMW operations without suffix
>>> (xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb].
>>>
>>> I guess this shows again how important it is to model these full
>>> barriers explicitly inside the cat model, instead of relying on implicit
>>> conversions internal to herd.
>>>
>>> I'd like to propose a patch to this effect.
>>>
>>> What is the intended behavior of a failed cmpxchg()? Is it the same as a
>>> relaxed one?
>
> Yes, and unless I am too confused, LKMM currently does implement this.
> Please let me know if I am missing something.
>
>>> My suggestion would be in the direction of marking read and write events
>>> of these operations as Mb, and then defining
>>>
>>> (* full barrier events that appear in non-failing RMW *)
>>> let RMW_MB = Mb & (dom(rmw) | range(rmw))
>>>
>>>
>>> let mb =
>>> [M] ; fencerel(Mb) ; [M]
>>> | [M] ; (po \ rmw) ; [RMW_MB] ; po^? ; [M]
>>> | [M] ; po^? ; [RMW_MB] ; (po \ rmw) ; [M]
>>> | ...
>>>
>>> The po \ rmw is because ordering is not provided internally of the rmw
>>
>> (removed the unnecessary si since LKMM is still non-mixed-accesses)
>
> Addition of mixed-access support would be quite welcome!
>
>> This could also be written with a single rule:
>>
>> | [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M]
>>
>>> I suspect that after we added [rmw] sequences it could perhaps be
>>> simplified [...]
>>
>> No, my suspicion is wrong - this would incorrectly let full-barrier RMWs
>> act like strong fences when they appear in an rmw sequence.
>>
>> if (z==1) || x = 2; || xchg(&y,2) || if (y==2)
>> x = 1; || y =_rel 1; || || z=1;
>>
>>
>> right now, we allow x=2 overwriting x=1 (in case the last thread does not
>> propagate x=2 along with z=1) because on power, the xchg might be
>> implemented with a sync that doesn't get executed until the very end
>> of the program run.
>>
>>
>> Instead of its negative form (everything other than inside the rmw),
>> it could also be rewritten positively. Here's a somewhat short form:
>>
>> let mb =
>> [M] ; fencerel(Mb) ; [M]
>> (* everything across a full barrier RMW is ordered. This includes up to
>> one event inside the RMW. *)
>> | [M] ; po ; [RMW_MB] ; po ; [M]
>> (* full barrier RMW writes are ordered with everything behind the RMW *)
>> | [W & RMW_MB] ; po ; [M]
>> (* full barrier RMW reads are ordered with everything before the RMW *)
>> | [M] ; po ; [R & RMW_MB]
>> | ...
>
> Does this produce the results expected by the litmus tests in the Linux
> kernel source tree and also those at https://github.com/paulmckrcu/litmus?
>
> Thanx, Paul
I implemented in the dartagnan tool the changes proposed by Jonas (i.e.
changing the mb definition in the cat model and removing the fences that
were added programmatically).
I run this using the ~5K litmus test I have (it should include
everything from the source tree + the non-LISA ones from your repo). I
also checked with the version of qspinlock discussed in [1].
I do get the expected results.
Hernan
[1] https://lkml.org/lkml/2022/8/26/597
On Wed, May 15, 2024 at 08:44:30AM +0200, Hernan Ponce de Leon wrote:
> On 5/6/2024 8:00 PM, Paul E. McKenney wrote:
> > On Mon, May 06, 2024 at 06:30:45PM +0200, Jonas Oberhauser wrote:
> > > Am 5/6/2024 um 12:05 PM schrieb Jonas Oberhauser:
> > > > Am 5/2/2024 um 1:21 AM schrieb Paul E. McKenney:
> > > > > This commit adds four litmus tests showing that a failing cmpxchg()
> > > > > operation is unordered unless followed by an smp_mb__after_atomic()
> > > > > operation.
> > > >
> > > > So far, my understanding was that all RMW operations without suffix
> > > > (xchg(), cmpxchg(), ...) will be interpreted as F[Mb];...;F[Mb].
> > > >
> > > > I guess this shows again how important it is to model these full
> > > > barriers explicitly inside the cat model, instead of relying on implicit
> > > > conversions internal to herd.
> > > >
> > > > I'd like to propose a patch to this effect.
> > > >
> > > > What is the intended behavior of a failed cmpxchg()? Is it the same as a
> > > > relaxed one?
> >
> > Yes, and unless I am too confused, LKMM currently does implement this.
> > Please let me know if I am missing something.
> >
> > > > My suggestion would be in the direction of marking read and write events
> > > > of these operations as Mb, and then defining
> > > >
> > > > (* full barrier events that appear in non-failing RMW *)
> > > > let RMW_MB = Mb & (dom(rmw) | range(rmw))
> > > >
> > > >
> > > > let mb =
> > > > ??? [M] ; fencerel(Mb) ; [M]
> > > > ? | [M] ; (po \ rmw) ; [RMW_MB] ; po^? ; [M]
> > > > ? | [M] ; po^? ; [RMW_MB] ; (po \ rmw) ; [M]
> > > > ? | ...
> > > >
> > > > The po \ rmw is because ordering is not provided internally of the rmw
> > >
> > > (removed the unnecessary si since LKMM is still non-mixed-accesses)
> >
> > Addition of mixed-access support would be quite welcome!
> >
> > > This could also be written with a single rule:
> > >
> > > | [M] ; (po \ rmw) & (po^?; [RMW_MB] ; po^?) ; [M]
> > >
> > > > I suspect that after we added [rmw] sequences it could perhaps be
> > > > simplified [...]
> > >
> > > No, my suspicion is wrong - this would incorrectly let full-barrier RMWs
> > > act like strong fences when they appear in an rmw sequence.
> > >
> > > if (z==1) || x = 2; || xchg(&y,2) || if (y==2)
> > > x = 1; || y =_rel 1; || || z=1;
> > >
> > >
> > > right now, we allow x=2 overwriting x=1 (in case the last thread does not
> > > propagate x=2 along with z=1) because on power, the xchg might be
> > > implemented with a sync that doesn't get executed until the very end
> > > of the program run.
> > >
> > >
> > > Instead of its negative form (everything other than inside the rmw),
> > > it could also be rewritten positively. Here's a somewhat short form:
> > >
> > > let mb =
> > > [M] ; fencerel(Mb) ; [M]
> > > (* everything across a full barrier RMW is ordered. This includes up to
> > > one event inside the RMW. *)
> > > | [M] ; po ; [RMW_MB] ; po ; [M]
> > > (* full barrier RMW writes are ordered with everything behind the RMW *)
> > > | [W & RMW_MB] ; po ; [M]
> > > (* full barrier RMW reads are ordered with everything before the RMW *)
> > > | [M] ; po ; [R & RMW_MB]
> > > | ...
> >
> > Does this produce the results expected by the litmus tests in the Linux
> > kernel source tree and also those at https://github.com/paulmckrcu/litmus?
> >
> > Thanx, Paul
>
> I implemented in the dartagnan tool the changes proposed by Jonas (i.e.
> changing the mb definition in the cat model and removing the fences that
> were added programmatically).
>
> I run this using the ~5K litmus test I have (it should include everything
> from the source tree + the non-LISA ones from your repo). I also checked
> with the version of qspinlock discussed in [1].
>
> I do get the expected results.
Thank you very much, Hernan!!! And good show, Puranjay!!!
Thanx, Paul
> Hernan
>
> [1] https://lkml.org/lkml/2022/8/26/597
>