2021-10-25 18:21:18

by Boqun Feng

[permalink] [raw]
Subject: [RFC v2 0/3] memory model: Make unlock(A)+lock(B) on the same CPU RCtso

Hi,

Just a new version trying to make forward progress on this ;-)

v1: https://lore.kernel.org/lkml/[email protected]/

Changes since v1:

* Split the patch into three to help resolve the litmus test
addition discussion.
* Add some explanation in patch #2 on the requirement of tests in
litmus-tests directory.

To summarize the change in memory model, we now guarantee in the
following code:

<memory access M>
spin_unlock(A);
spin_lock(B);
<memory access N>

M is ordered against N unless M is a store and N is a load. More
detailed examples of this guarantee can be found in patch #3.

Architecture maintainers, appreciate it that you can take a look at
patch #3 and rest of whole set to confirm this guarantee works on your
architectures.

Alan, I split the patchset into three patches because I do think we need
some sort of patch #2 so that we can have consensus about whether merge
patch #3 or not. I know you want to keep litmus-tests directory as
simple as possible, but it won't hurt to document the requirement.
Looking forwards to your thoughts ;-)

Suggestion and comments are welcome!

Regards,
Boqun


Boqun Feng (3):
tools/memory-model: Provide extra ordering for unlock+lock pair on the
same CPU
tools/memory-model: doc: Describe the requirement of the litmus-tests
directory
tools/memory-model: litmus: Add two tests for unlock(A)+lock(B)
ordering

.../Documentation/explanation.txt | 44 +++++++++++--------
tools/memory-model/README | 12 +++++
tools/memory-model/linux-kernel.cat | 6 +--
...LB+unlocklockonceonce+poacquireonce.litmus | 33 ++++++++++++++
...unlocklockonceonce+fencermbonceonce.litmus | 33 ++++++++++++++
tools/memory-model/litmus-tests/README | 8 ++++
6 files changed, 114 insertions(+), 22 deletions(-)
create mode 100644 tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
create mode 100644 tools/memory-model/litmus-tests/MP+unlocklockonceonce+fencermbonceonce.litmus

--
2.33.0


2021-10-25 21:23:06

by Boqun Feng

[permalink] [raw]
Subject: [RFC v2 3/3] tools/memory-model: litmus: Add two tests for unlock(A)+lock(B) ordering

The memory model has been updated to provide a stronger ordering
guarantee for unlock(A)+lock(B) on the same CPU/thread. Therefore add
two litmus tests describing this new guarantee, these tests are simple
yet can clearly show the usage of the new guarantee, also they can serve
as the self tests for the modification in the model.

Co-developed-by: Alan Stern <[email protected]>
Signed-off-by: Alan Stern <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
---
...LB+unlocklockonceonce+poacquireonce.litmus | 33 +++++++++++++++++++
...unlocklockonceonce+fencermbonceonce.litmus | 33 +++++++++++++++++++
tools/memory-model/litmus-tests/README | 8 +++++
3 files changed, 74 insertions(+)
create mode 100644 tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
create mode 100644 tools/memory-model/litmus-tests/MP+unlocklockonceonce+fencermbonceonce.litmus

diff --git a/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
new file mode 100644
index 000000000000..955b9c7cdc7f
--- /dev/null
+++ b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
@@ -0,0 +1,33 @@
+C LB+unlocklockonceonce+poacquireonce
+
+(*
+ * Result: Never
+ *
+ * If two locked critical sections execute on the same CPU, all accesses
+ * in the first must execute before any accesses in the second, even if
+ * the critical sections are protected by different locks.
+ *)
+
+{}
+
+P0(spinlock_t *s, spinlock_t *t, int *x, int *y)
+{
+ int r1;
+
+ spin_lock(s);
+ r1 = READ_ONCE(*x);
+ spin_unlock(s);
+ spin_lock(t);
+ WRITE_ONCE(*y, 1);
+ spin_unlock(t);
+}
+
+P1(int *x, int *y)
+{
+ int r2;
+
+ r2 = smp_load_acquire(y);
+ WRITE_ONCE(*x, 1);
+}
+
+exists (0:r1=1 /\ 1:r2=1)
diff --git a/tools/memory-model/litmus-tests/MP+unlocklockonceonce+fencermbonceonce.litmus b/tools/memory-model/litmus-tests/MP+unlocklockonceonce+fencermbonceonce.litmus
new file mode 100644
index 000000000000..2feb1398be71
--- /dev/null
+++ b/tools/memory-model/litmus-tests/MP+unlocklockonceonce+fencermbonceonce.litmus
@@ -0,0 +1,33 @@
+C MP+unlocklockonceonce+fencermbonceonce
+
+(*
+ * Result: Never
+ *
+ * If two locked critical sections execute on the same CPU, stores in the
+ * first must propagate to each CPU before stores in the second do, even if
+ * the critical sections are protected by different locks.
+ *)
+
+{}
+
+P0(spinlock_t *s, spinlock_t *t, int *x, int *y)
+{
+ spin_lock(s);
+ WRITE_ONCE(*x, 1);
+ spin_unlock(s);
+ spin_lock(t);
+ WRITE_ONCE(*y, 1);
+ spin_unlock(t);
+}
+
+P1(int *x, int *y)
+{
+ int r1;
+ int r2;
+
+ r1 = READ_ONCE(*y);
+ smp_rmb();
+ r2 = READ_ONCE(*x);
+}
+
+exists (1:r1=1 /\ 1:r2=0)
diff --git a/tools/memory-model/litmus-tests/README b/tools/memory-model/litmus-tests/README
index 681f9067fa9e..d311a0ff1ae6 100644
--- a/tools/memory-model/litmus-tests/README
+++ b/tools/memory-model/litmus-tests/README
@@ -63,6 +63,10 @@ LB+poonceonces.litmus
As above, but with store-release replaced with WRITE_ONCE()
and load-acquire replaced with READ_ONCE().

+LB+unlocklockonceonce+poacquireonce.litmus
+ Does a unlock+lock pair provides ordering guarantee between a
+ load and a store?
+
MP+onceassign+derefonce.litmus
As below, but with rcu_assign_pointer() and an rcu_dereference().

@@ -90,6 +94,10 @@ MP+porevlocks.litmus
As below, but with the first access of the writer process
and the second access of reader process protected by a lock.

+MP+unlocklockonceonce+fencermbonceonce.litmus
+ Does a unlock+lock pair provides ordering guarantee between a
+ store and another store?
+
MP+fencewmbonceonce+fencermbonceonce.litmus
Does a smp_wmb() (between the stores) and an smp_rmb() (between
the loads) suffice for the message-passing litmus test, where one
--
2.33.0

2021-10-26 08:54:50

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC v2 3/3] tools/memory-model: litmus: Add two tests for unlock(A)+lock(B) ordering

On Mon, Oct 25, 2021 at 10:54:16PM +0800, Boqun Feng wrote:
> diff --git a/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
> new file mode 100644
> index 000000000000..955b9c7cdc7f
> --- /dev/null
> +++ b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
> @@ -0,0 +1,33 @@
> +C LB+unlocklockonceonce+poacquireonce
> +
> +(*
> + * Result: Never
> + *
> + * If two locked critical sections execute on the same CPU, all accesses
> + * in the first must execute before any accesses in the second, even if
> + * the critical sections are protected by different locks.

One small nit; the above "all accesses" reads as if:

spin_lock(s);
WRITE_ONCE(*x, 1);
spin_unlock(s);
spin_lock(t);
r1 = READ_ONCE(*y);
spin_unlock(t);

would also work, except of course that's the one reorder allowed by TSO.

> + *)
> +
> +{}
> +
> +P0(spinlock_t *s, spinlock_t *t, int *x, int *y)
> +{
> + int r1;
> +
> + spin_lock(s);
> + r1 = READ_ONCE(*x);
> + spin_unlock(s);
> + spin_lock(t);
> + WRITE_ONCE(*y, 1);
> + spin_unlock(t);
> +}
> +
> +P1(int *x, int *y)
> +{
> + int r2;
> +
> + r2 = smp_load_acquire(y);
> + WRITE_ONCE(*x, 1);
> +}
> +
> +exists (0:r1=1 /\ 1:r2=1)

2021-10-26 09:53:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC v2 0/3] memory model: Make unlock(A)+lock(B) on the same CPU RCtso

On Mon, Oct 25, 2021 at 10:54:13PM +0800, Boqun Feng wrote:
> Boqun Feng (3):
> tools/memory-model: Provide extra ordering for unlock+lock pair on the
> same CPU
> tools/memory-model: doc: Describe the requirement of the litmus-tests
> directory
> tools/memory-model: litmus: Add two tests for unlock(A)+lock(B)
> ordering

I'm obviously all in favour of this :-)

Acked-by: Peter Zijlstra (Intel) <[email protected]>

2021-10-28 19:13:04

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [RFC v2 3/3] tools/memory-model: litmus: Add two tests for unlock(A)+lock(B) ordering

On Tue, Oct 26, 2021 at 09:01:00AM +0200, Peter Zijlstra wrote:
> On Mon, Oct 25, 2021 at 10:54:16PM +0800, Boqun Feng wrote:
> > diff --git a/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
> > new file mode 100644
> > index 000000000000..955b9c7cdc7f
> > --- /dev/null
> > +++ b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
> > @@ -0,0 +1,33 @@
> > +C LB+unlocklockonceonce+poacquireonce
> > +
> > +(*
> > + * Result: Never
> > + *
> > + * If two locked critical sections execute on the same CPU, all accesses
> > + * in the first must execute before any accesses in the second, even if
> > + * the critical sections are protected by different locks.
>
> One small nit; the above "all accesses" reads as if:
>
> spin_lock(s);
> WRITE_ONCE(*x, 1);
> spin_unlock(s);
> spin_lock(t);
> r1 = READ_ONCE(*y);
> spin_unlock(t);
>
> would also work, except of course that's the one reorder allowed by TSO.

I applied this series with Peter's Acked-by, and with the above comment
reading as follows:

+(*
+ * Result: Never
+ *
+ * If two locked critical sections execute on the same CPU, all accesses
+ * in the first must execute before any accesses in the second, even if the
+ * critical sections are protected by different locks. The one exception
+ * to this rule is that (consistent with TSO) a prior write can be reordered
+ * with a later read from the viewpoint of a process not holding both locks.
+ *)

Thank you all!

Thanx, Paul

2021-10-28 23:54:32

by Boqun Feng

[permalink] [raw]
Subject: Re: [RFC v2 3/3] tools/memory-model: litmus: Add two tests for unlock(A)+lock(B) ordering

On Thu, Oct 28, 2021 at 12:11:29PM -0700, Paul E. McKenney wrote:
> On Tue, Oct 26, 2021 at 09:01:00AM +0200, Peter Zijlstra wrote:
> > On Mon, Oct 25, 2021 at 10:54:16PM +0800, Boqun Feng wrote:
> > > diff --git a/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
> > > new file mode 100644
> > > index 000000000000..955b9c7cdc7f
> > > --- /dev/null
> > > +++ b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
> > > @@ -0,0 +1,33 @@
> > > +C LB+unlocklockonceonce+poacquireonce
> > > +
> > > +(*
> > > + * Result: Never
> > > + *
> > > + * If two locked critical sections execute on the same CPU, all accesses
> > > + * in the first must execute before any accesses in the second, even if
> > > + * the critical sections are protected by different locks.
> >
> > One small nit; the above "all accesses" reads as if:
> >
> > spin_lock(s);
> > WRITE_ONCE(*x, 1);
> > spin_unlock(s);
> > spin_lock(t);
> > r1 = READ_ONCE(*y);
> > spin_unlock(t);
> >
> > would also work, except of course that's the one reorder allowed by TSO.
>
> I applied this series with Peter's Acked-by, and with the above comment

Thanks!

> reading as follows:
>
> +(*
> + * Result: Never
> + *
> + * If two locked critical sections execute on the same CPU, all accesses
> + * in the first must execute before any accesses in the second, even if the
> + * critical sections are protected by different locks. The one exception
> + * to this rule is that (consistent with TSO) a prior write can be reordered
> + * with a later read from the viewpoint of a process not holding both locks.

Just want to be accurate, in our memory model "execute" means a CPU
commit an memory access instruction to the Memory Subsystem, so if we
have a store W and a load R, where W executes before R, it doesn't mean
the memory effect of W is observed before the memory effect of R by
other CPUs, consider the following case


CPU0 Memory Subsystem CPU1
==== ====
WRITE_ONCE(*x,1); // W ---------->|
spin_unlock(s); |
spin_lock(t); |
r1 = READ_ONCE(*y); // R -------->|
// R reads 0 |
|<----------------WRITR_ONCE(*y, 1); // W'
W' propagates to CPU0 |
<-------------------------|
| smp_mb();
|<----------------r1 = READ_ONCE(*x); // R' reads 0
|
| W progrates to CPU 1
|----------------->

The "->" from CPU0 to the Memory Subsystem shows that W executes before
R, however the memory effect of a store can be observed only after the
Memory Subsystem propagates it to another CPU, as a result CPU1 doesn't
observe W before R is executed. So the original version of the comments
is correct in our memory model terminology, at least that's how I
understand it, Alan can correct me if I'm wrong.

Maybe it's better to replace the sentence starting with "The one
exception..." into:

One thing to notice is that even though a write executes by a read, the
memory effects can still be reordered from the viewpoint of a process
not holding both locks, similar to TSO ordering.

Thoughts?

Apologies for responsing late...

("Memory Subsystem" is an abstraction in our memory model, which doesn't
mean hardware implements things in the same way.).

Regards,
Boqun

> + *)
>
> Thank you all!
>
> Thanx, Paul

2021-10-29 14:37:54

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC v2 3/3] tools/memory-model: litmus: Add two tests for unlock(A)+lock(B) ordering

On Fri, Oct 29, 2021 at 07:51:39AM +0800, Boqun Feng wrote:
> On Thu, Oct 28, 2021 at 12:11:29PM -0700, Paul E. McKenney wrote:
> > On Tue, Oct 26, 2021 at 09:01:00AM +0200, Peter Zijlstra wrote:
> > > On Mon, Oct 25, 2021 at 10:54:16PM +0800, Boqun Feng wrote:
> > > > diff --git a/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
> > > > new file mode 100644
> > > > index 000000000000..955b9c7cdc7f
> > > > --- /dev/null
> > > > +++ b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
> > > > @@ -0,0 +1,33 @@
> > > > +C LB+unlocklockonceonce+poacquireonce
> > > > +
> > > > +(*
> > > > + * Result: Never
> > > > + *
> > > > + * If two locked critical sections execute on the same CPU, all accesses
> > > > + * in the first must execute before any accesses in the second, even if
> > > > + * the critical sections are protected by different locks.
> > >
> > > One small nit; the above "all accesses" reads as if:
> > >
> > > spin_lock(s);
> > > WRITE_ONCE(*x, 1);
> > > spin_unlock(s);
> > > spin_lock(t);
> > > r1 = READ_ONCE(*y);
> > > spin_unlock(t);
> > >
> > > would also work, except of course that's the one reorder allowed by TSO.
> >
> > I applied this series with Peter's Acked-by, and with the above comment
>
> Thanks!
>
> > reading as follows:
> >
> > +(*
> > + * Result: Never
> > + *
> > + * If two locked critical sections execute on the same CPU, all accesses
> > + * in the first must execute before any accesses in the second, even if the
> > + * critical sections are protected by different locks. The one exception
> > + * to this rule is that (consistent with TSO) a prior write can be reordered
> > + * with a later read from the viewpoint of a process not holding both locks.
>
> Just want to be accurate, in our memory model "execute" means a CPU
> commit an memory access instruction to the Memory Subsystem, so if we
> have a store W and a load R, where W executes before R, it doesn't mean
> the memory effect of W is observed before the memory effect of R by
> other CPUs, consider the following case
>
>
> CPU0 Memory Subsystem CPU1
> ==== ====
> WRITE_ONCE(*x,1); // W ---------->|
> spin_unlock(s); |
> spin_lock(t); |
> r1 = READ_ONCE(*y); // R -------->|
> // R reads 0 |
> |<----------------WRITR_ONCE(*y, 1); // W'
> W' propagates to CPU0 |
> <-------------------------|
> | smp_mb();
> |<----------------r1 = READ_ONCE(*x); // R' reads 0
> |
> | W progrates to CPU 1
> |----------------->
>
> The "->" from CPU0 to the Memory Subsystem shows that W executes before
> R, however the memory effect of a store can be observed only after the
> Memory Subsystem propagates it to another CPU, as a result CPU1 doesn't
> observe W before R is executed. So the original version of the comments
> is correct in our memory model terminology, at least that's how I
> understand it, Alan can correct me if I'm wrong.

Indeed, that is correct.

It is an unfortunate inconsistency with the terminology in
Documentation/memory-barriers.txt. I suspect most people think of a
write as executing when it is observed by another CPU, even though that
really isn't a coherent concept. (For example, it could easily lead
somebody to think that a write observed at different times by different
CPUs has executed more than once!)

> Maybe it's better to replace the sentence starting with "The one
> exception..." into:
>
> One thing to notice is that even though a write executes by a read, the
> memory effects can still be reordered from the viewpoint of a process
> not holding both locks, similar to TSO ordering.
>
> Thoughts?

Or more briefly:

Note: Even when a write executes before a read, their memory
effects can be reordered from the viewpoint of another CPU (the
kind of reordering allowed by TSO).

Alan

> Apologies for responsing late...
>
> ("Memory Subsystem" is an abstraction in our memory model, which doesn't
> mean hardware implements things in the same way.).
>
> Regards,
> Boqun
>
> > + *)
> >
> > Thank you all!
> >
> > Thanx, Paul

2021-10-29 15:03:43

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [RFC v2 3/3] tools/memory-model: litmus: Add two tests for unlock(A)+lock(B) ordering

On Fri, Oct 29, 2021 at 10:34:42AM -0400, Alan Stern wrote:
> On Fri, Oct 29, 2021 at 07:51:39AM +0800, Boqun Feng wrote:
> > On Thu, Oct 28, 2021 at 12:11:29PM -0700, Paul E. McKenney wrote:
> > > On Tue, Oct 26, 2021 at 09:01:00AM +0200, Peter Zijlstra wrote:
> > > > On Mon, Oct 25, 2021 at 10:54:16PM +0800, Boqun Feng wrote:
> > > > > diff --git a/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
> > > > > new file mode 100644
> > > > > index 000000000000..955b9c7cdc7f
> > > > > --- /dev/null
> > > > > +++ b/tools/memory-model/litmus-tests/LB+unlocklockonceonce+poacquireonce.litmus
> > > > > @@ -0,0 +1,33 @@
> > > > > +C LB+unlocklockonceonce+poacquireonce
> > > > > +
> > > > > +(*
> > > > > + * Result: Never
> > > > > + *
> > > > > + * If two locked critical sections execute on the same CPU, all accesses
> > > > > + * in the first must execute before any accesses in the second, even if
> > > > > + * the critical sections are protected by different locks.
> > > >
> > > > One small nit; the above "all accesses" reads as if:
> > > >
> > > > spin_lock(s);
> > > > WRITE_ONCE(*x, 1);
> > > > spin_unlock(s);
> > > > spin_lock(t);
> > > > r1 = READ_ONCE(*y);
> > > > spin_unlock(t);
> > > >
> > > > would also work, except of course that's the one reorder allowed by TSO.
> > >
> > > I applied this series with Peter's Acked-by, and with the above comment
> >
> > Thanks!
> >
> > > reading as follows:
> > >
> > > +(*
> > > + * Result: Never
> > > + *
> > > + * If two locked critical sections execute on the same CPU, all accesses
> > > + * in the first must execute before any accesses in the second, even if the
> > > + * critical sections are protected by different locks. The one exception
> > > + * to this rule is that (consistent with TSO) a prior write can be reordered
> > > + * with a later read from the viewpoint of a process not holding both locks.
> >
> > Just want to be accurate, in our memory model "execute" means a CPU
> > commit an memory access instruction to the Memory Subsystem, so if we
> > have a store W and a load R, where W executes before R, it doesn't mean
> > the memory effect of W is observed before the memory effect of R by
> > other CPUs, consider the following case
> >
> >
> > CPU0 Memory Subsystem CPU1
> > ==== ====
> > WRITE_ONCE(*x,1); // W ---------->|
> > spin_unlock(s); |
> > spin_lock(t); |
> > r1 = READ_ONCE(*y); // R -------->|
> > // R reads 0 |
> > |<----------------WRITR_ONCE(*y, 1); // W'
> > W' propagates to CPU0 |
> > <-------------------------|
> > | smp_mb();
> > |<----------------r1 = READ_ONCE(*x); // R' reads 0
> > |
> > | W progrates to CPU 1
> > |----------------->
> >
> > The "->" from CPU0 to the Memory Subsystem shows that W executes before
> > R, however the memory effect of a store can be observed only after the
> > Memory Subsystem propagates it to another CPU, as a result CPU1 doesn't
> > observe W before R is executed. So the original version of the comments
> > is correct in our memory model terminology, at least that's how I
> > understand it, Alan can correct me if I'm wrong.
>
> Indeed, that is correct.
>
> It is an unfortunate inconsistency with the terminology in
> Documentation/memory-barriers.txt. I suspect most people think of a
> write as executing when it is observed by another CPU, even though that
> really isn't a coherent concept. (For example, it could easily lead
> somebody to think that a write observed at different times by different
> CPUs has executed more than once!)

Agreed, the terminology is odd. But the fact that different CPUs can
see writes in different orders is probably always going to be a bit
counter-intuitive, so it is good to avoid giving that intuition any
support.

> > Maybe it's better to replace the sentence starting with "The one
> > exception..." into:
> >
> > One thing to notice is that even though a write executes by a read, the
> > memory effects can still be reordered from the viewpoint of a process
> > not holding both locks, similar to TSO ordering.
> >
> > Thoughts?
>
> Or more briefly:
>
> Note: Even when a write executes before a read, their memory
> effects can be reordered from the viewpoint of another CPU (the
> kind of reordering allowed by TSO).

Very good! I took this verbatim in a fixup patch to be combined
with the original on my next rebase.

Thanx, Paul

> Alan
>
> > Apologies for responsing late...
> >
> > ("Memory Subsystem" is an abstraction in our memory model, which doesn't
> > mean hardware implements things in the same way.).
> >
> > Regards,
> > Boqun
> >
> > > + *)
> > >
> > > Thank you all!
> > >
> > > Thanx, Paul