LinuxLists.cc - [PATCH kcsan 9/9] tools/memory-model: Document locking corner cases

2020-08-31 18:23:08

Subject: [PATCH kcsan 9/9] tools/memory-model: Document locking corner cases

From: "Paul E. McKenney" <[email protected]>

Most Linux-kernel uses of locking are straightforward, but there are
corner-case uses that rely on less well-known aspects of the lock and
unlock primitives. This commit therefore adds a locking.txt and litmus
tests in Documentation/litmus-tests/locking to explain these corner-case
uses.

Signed-off-by: Paul E. McKenney <[email protected]>
---
.../litmus-tests/locking/DCL-broken.litmus | 55 ++++
.../litmus-tests/locking/DCL-fixed.litmus | 56 ++++
.../litmus-tests/locking/RM-broken.litmus | 42 +++
Documentation/litmus-tests/locking/RM-fixed.litmus | 42 +++
tools/memory-model/Documentation/locking.txt | 320 +++++++++++++++++++++
5 files changed, 515 insertions(+)
create mode 100644 Documentation/litmus-tests/locking/DCL-broken.litmus
create mode 100644 Documentation/litmus-tests/locking/DCL-fixed.litmus
create mode 100644 Documentation/litmus-tests/locking/RM-broken.litmus
create mode 100644 Documentation/litmus-tests/locking/RM-fixed.litmus
create mode 100644 tools/memory-model/Documentation/locking.txt

diff --git a/Documentation/litmus-tests/locking/DCL-broken.litmus b/Documentation/litmus-tests/locking/DCL-broken.litmus
new file mode 100644
index 0000000..cfaa25f
--- /dev/null
+++ b/Documentation/litmus-tests/locking/DCL-broken.litmus
@@ -0,0 +1,55 @@
+C DCL-broken
+
+(*
+ * Result: Sometimes
+ *
+ * This litmus test demonstrates more than just locking is required to
+ * correctly implement double-checked locking.
+ *)
+
+{
+ int flag;
+ int data;
+ int lck;
+}
+
+P0(int *flag, int *data, int *lck)
+{
+ int r0;
+ int r1;
+ int r2;
+
+ r0 = READ_ONCE(*flag);
+ if (r0 == 0) {
+ spin_lock(lck);
+ r1 = READ_ONCE(*flag);
+ if (r1 == 0) {
+ WRITE_ONCE(*data, 1);
+ WRITE_ONCE(*flag, 1);
+ }
+ spin_unlock(lck);
+ }
+ r2 = READ_ONCE(*data);
+}
+
+P1(int *flag, int *data, int *lck)
+{
+ int r0;
+ int r1;
+ int r2;
+
+ r0 = READ_ONCE(*flag);
+ if (r0 == 0) {
+ spin_lock(lck);
+ r1 = READ_ONCE(*flag);
+ if (r1 == 0) {
+ WRITE_ONCE(*data, 1);
+ WRITE_ONCE(*flag, 1);
+ }
+ spin_unlock(lck);
+ }
+ r2 = READ_ONCE(*data);
+}
+
+locations [flag;data;lck;0:r0;0:r1;1:r0;1:r1]
+exists (0:r2=0 \/ 1:r2=0)
diff --git a/Documentation/litmus-tests/locking/DCL-fixed.litmus b/Documentation/litmus-tests/locking/DCL-fixed.litmus
new file mode 100644
index 0000000..579d6c2
--- /dev/null
+++ b/Documentation/litmus-tests/locking/DCL-fixed.litmus
@@ -0,0 +1,56 @@
+C DCL-fixed
+
+(*
+ * Result: Never
+ *
+ * This litmus test demonstrates that double-checked locking can be
+ * reliable given proper use of smp_load_acquire() and smp_store_release()
+ * in addition to the locking.
+ *)
+
+{
+ int flag;
+ int data;
+ int lck;
+}
+
+P0(int *flag, int *data, int *lck)
+{
+ int r0;
+ int r1;
+ int r2;
+
+ r0 = smp_load_acquire(flag);
+ if (r0 == 0) {
+ spin_lock(lck);
+ r1 = READ_ONCE(*flag);
+ if (r1 == 0) {
+ WRITE_ONCE(*data, 1);
+ smp_store_release(flag, 1);
+ }
+ spin_unlock(lck);
+ }
+ r2 = READ_ONCE(*data);
+}
+
+P1(int *flag, int *data, int *lck)
+{
+ int r0;
+ int r1;
+ int r2;
+
+ r0 = smp_load_acquire(flag);
+ if (r0 == 0) {
+ spin_lock(lck);
+ r1 = READ_ONCE(*flag);
+ if (r1 == 0) {
+ WRITE_ONCE(*data, 1);
+ smp_store_release(flag, 1);
+ }
+ spin_unlock(lck);
+ }
+ r2 = READ_ONCE(*data);
+}
+
+locations [flag;data;lck;0:r0;0:r1;1:r0;1:r1]
+exists (0:r2=0 \/ 1:r2=0)
diff --git a/Documentation/litmus-tests/locking/RM-broken.litmus b/Documentation/litmus-tests/locking/RM-broken.litmus
new file mode 100644
index 0000000..c586ae4
--- /dev/null
+++ b/Documentation/litmus-tests/locking/RM-broken.litmus
@@ -0,0 +1,42 @@
+C RM-broken
+
+(*
+ * Result: DEADLOCK
+ *
+ * This litmus test demonstrates that the old "roach motel" approach
+ * to locking, where code can be freely moved into critical sections,
+ * cannot be used in the Linux kernel.
+ *)
+
+{
+ int lck;
+ int x;
+ int y;
+}
+
+P0(int *x, int *y, int *lck)
+{
+ int r2;
+
+ spin_lock(lck);
+ r2 = atomic_inc_return(y);
+ WRITE_ONCE(*x, 1);
+ spin_unlock(lck);
+}
+
+P1(int *x, int *y, int *lck)
+{
+ int r0;
+ int r1;
+ int r2;
+
+ spin_lock(lck);
+ r0 = READ_ONCE(*x);
+ r1 = READ_ONCE(*x);
+ r2 = atomic_inc_return(y);
+ spin_unlock(lck);
+}
+
+locations [x;lck;0:r2;1:r0;1:r1;1:r2]
+filter (y=2 /\ 1:r0=0 /\ 1:r1=1)
+exists (1:r2=1)
diff --git a/Documentation/litmus-tests/locking/RM-fixed.litmus b/Documentation/litmus-tests/locking/RM-fixed.litmus
new file mode 100644
index 0000000..6728567
--- /dev/null
+++ b/Documentation/litmus-tests/locking/RM-fixed.litmus
@@ -0,0 +1,42 @@
+C RM-fixed
+
+(*
+ * Result: Never
+ *
+ * This litmus test demonstrates that the old "roach motel" approach
+ * to locking, where code can be freely moved into critical sections,
+ * cannot be used in the Linux kernel.
+ *)
+
+{
+ int lck;
+ int x;
+ int y;
+}
+
+P0(int *x, int *y, int *lck)
+{
+ int r2;
+
+ spin_lock(lck);
+ r2 = atomic_inc_return(y);
+ WRITE_ONCE(*x, 1);
+ spin_unlock(lck);
+}
+
+P1(int *x, int *y, int *lck)
+{
+ int r0;
+ int r1;
+ int r2;
+
+ r0 = READ_ONCE(*x);
+ r1 = READ_ONCE(*x);
+ spin_lock(lck);
+ r2 = atomic_inc_return(y);
+ spin_unlock(lck);
+}
+
+locations [x;lck;0:r2;1:r0;1:r1;1:r2]
+filter (y=2 /\ 1:r0=0 /\ 1:r1=1)
+exists (1:r2=1)
diff --git a/tools/memory-model/Documentation/locking.txt b/tools/memory-model/Documentation/locking.txt
new file mode 100644
index 0000000..a6ad6aa
--- /dev/null
+++ b/tools/memory-model/Documentation/locking.txt
@@ -0,0 +1,320 @@
+Locking
+=======
+
+Locking is well-known and the common use cases are straightforward: Any
+CPU holding a given lock sees any changes previously seen or made by any
+CPU before it previously released that same lock. This last sentence
+is the only part of this document that most developers will need to read.
+
+However, developers who would like to also access lock-protected shared
+variables outside of their corresponding locks should continue reading.
+
+
+Locking and Prior Accesses
+--------------------------
+
+The basic rule of locking is worth repeating:
+
+ Any CPU holding a given lock sees any changes previously seen
+ or made by any CPU before it previously released that same lock.
+
+Note that this statement is a bit stronger than "Any CPU holding a
+given lock sees all changes made by any CPU during the time that CPU was
+previously holding this same lock". For example, consider the following
+pair of code fragments:
+
+ /* See MP+polocks.litmus. */
+ void CPU0(void)
+ {
+ WRITE_ONCE(x, 1);
+ spin_lock(&mylock);
+ WRITE_ONCE(y, 1);
+ spin_unlock(&mylock);
+ }
+
+ void CPU1(void)
+ {
+ spin_lock(&mylock);
+ r0 = READ_ONCE(y);
+ spin_unlock(&mylock);
+ r1 = READ_ONCE(x);
+ }
+
+The basic rule guarantees that if CPU0() acquires mylock before CPU1(),
+then both r0 and r1 must be set to the value 1. This also has the
+consequence that if the final value of r0 is equal to 1, then the final
+value of r1 must also be equal to 1. In contrast, the weaker rule would
+say nothing about the final value of r1.
+
+
+Locking and Subsequent Accesses
+-------------------------------
+
+The converse to the basic rule also holds: Any CPU holding a given
+lock will not see any changes that will be made by any CPU after it
+subsequently acquires this same lock. This converse statement is
+illustrated by the following litmus test:
+
+ /* See MP+porevlocks.litmus. */
+ void CPU0(void)
+ {
+ r0 = READ_ONCE(y);
+ spin_lock(&mylock);
+ r1 = READ_ONCE(x);
+ spin_unlock(&mylock);
+ }
+
+ void CPU1(void)
+ {
+ spin_lock(&mylock);
+ WRITE_ONCE(x, 1);
+ spin_unlock(&mylock);
+ WRITE_ONCE(y, 1);
+ }
+
+This converse to the basic rule guarantees that if CPU0() acquires
+mylock before CPU1(), then both r0 and r1 must be set to the value 0.
+This also has the consequence that if the final value of r1 is equal
+to 0, then the final value of r0 must also be equal to 0. In contrast,
+the weaker rule would say nothing about the final value of r0.
+
+These examples show only a single pair of CPUs, but the effects of the
+locking basic rule extend across multiple acquisitions of a given lock
+across multiple CPUs.
+
+
+Double-Checked Locking
+----------------------
+
+It is well known that more than just a lock is required to make
+double-checked locking work correctly, This litmus test illustrates
+one incorrect approach:
+
+ /* See Documentation/litmus-tests/locking/DCL-broken.litmus. */
+ P0(int *flag, int *data, int *lck)
+ {
+ int r0;
+ int r1;
+ int r2;
+
+ r0 = READ_ONCE(*flag);
+ if (r0 == 0) {
+ spin_lock(lck);
+ r1 = READ_ONCE(*flag);
+ if (r1 == 0) {
+ WRITE_ONCE(*data, 1);
+ WRITE_ONCE(*flag, 1);
+ }
+ spin_unlock(lck);
+ }
+ r2 = READ_ONCE(*data);
+ }
+ /* P1() is the exactly the same as P0(). */
+
+There are two problems. First, there is no ordering between the first
+READ_ONCE() of "flag" and the READ_ONCE() of "data". Second, there is
+no ordering between the two WRITE_ONCE() calls. It should therefore be
+no surprise that "r2" can be zero, and a quick herd7 run confirms this.
+
+One way to fix this is to use smp_load_acquire() and smp_store_release()
+as shown in this corrected version:
+
+ /* See Documentation/litmus-tests/locking/DCL-fixed.litmus. */
+ P0(int *flag, int *data, int *lck)
+ {
+ int r0;
+ int r1;
+ int r2;
+
+ r0 = smp_load_acquire(flag);
+ if (r0 == 0) {
+ spin_lock(lck);
+ r1 = READ_ONCE(*flag);
+ if (r1 == 0) {
+ WRITE_ONCE(*data, 1);
+ smp_store_release(flag, 1);
+ }
+ spin_unlock(lck);
+ }
+ r2 = READ_ONCE(*data);
+ }
+ /* P1() is the exactly the same as P0(). */
+
+The smp_load_acquire() guarantees that its load from "flags" will
+be ordered before the READ_ONCE() from data, thus solving the first
+problem. The smp_store_release() guarantees that its store will be
+ordered after the WRITE_ONCE() to "data", solving the second problem.
+The smp_store_release() pairs with the smp_load_acquire(), thus ensuring
+that the ordering provided by each actually takes effect. Again, a
+quick herd7 run confirms this.
+
+In short, if you access a lock-protected variable without holding the
+corresponding lock, you will need to provide additional ordering, in
+this case, via the smp_load_acquire() and the smp_store_release().
+
+
+Ordering Provided by a Lock to CPUs Not Holding That Lock
+---------------------------------------------------------
+
+It is not necessarily the case that accesses ordered by locking will be
+seen as ordered by CPUs not holding that lock. Consider this example:
+
+ /* See Z6.0+pooncelock+pooncelock+pombonce.litmus. */
+ void CPU0(void)
+ {
+ spin_lock(&mylock);
+ WRITE_ONCE(x, 1);
+ WRITE_ONCE(y, 1);
+ spin_unlock(&mylock);
+ }
+
+ void CPU1(void)
+ {
+ spin_lock(&mylock);
+ r0 = READ_ONCE(y);
+ WRITE_ONCE(z, 1);
+ spin_unlock(&mylock);
+ }
+
+ void CPU2(void)
+ {
+ WRITE_ONCE(z, 2);
+ smp_mb();
+ r1 = READ_ONCE(x);
+ }
+
+Counter-intuitive though it might be, it is quite possible to have
+the final value of r0 be 1, the final value of z be 2, and the final
+value of r1 be 0. The reason for this surprising outcome is that CPU2()
+never acquired the lock, and thus did not fully benefit from the lock's
+ordering properties.
+
+Ordering can be extended to CPUs not holding the lock by careful use
+of smp_mb__after_spinlock():
+
+ /* See Z6.0+pooncelock+poonceLock+pombonce.litmus. */
+ void CPU0(void)
+ {
+ spin_lock(&mylock);
+ WRITE_ONCE(x, 1);
+ WRITE_ONCE(y, 1);
+ spin_unlock(&mylock);
+ }
+
+ void CPU1(void)
+ {
+ spin_lock(&mylock);
+ smp_mb__after_spinlock();
+ r0 = READ_ONCE(y);
+ WRITE_ONCE(z, 1);
+ spin_unlock(&mylock);
+ }
+
+ void CPU2(void)
+ {
+ WRITE_ONCE(z, 2);
+ smp_mb();
+ r1 = READ_ONCE(x);
+ }
+
+This addition of smp_mb__after_spinlock() strengthens the lock
+acquisition sufficiently to rule out the counter-intuitive outcome.
+In other words, the addition of the smp_mb__after_spinlock() prohibits
+the counter-intuitive result where the final value of r0 is 1, the final
+value of z is 2, and the final value of r1 is 0.
+
+
+No Roach-Motel Locking!
+-----------------------
+
+This example requires familiarity with the herd7 "filter" clause, so
+please read up on that topic in litmus-tests.txt.
+
+It is tempting to allow memory-reference instructions to be pulled
+into a critical section, but this cannot be allowed in the general case.
+For example, consider a spin loop preceding a lock-based critical section.
+Now, herd7 does not model spin loops, but we can emulate one with two
+loads, with a "filter" clause to constrain the first to return the
+initial value and the second to return the updated value, as shown below:
+
+ /* See Documentation/litmus-tests/locking/RM-fixed.litmus. */
+ P0(int *x, int *y, int *lck)
+ {
+ int r2;
+
+ spin_lock(lck);
+ r2 = atomic_inc_return(y);
+ WRITE_ONCE(*x, 1);
+ spin_unlock(lck);
+ }
+
+ P1(int *x, int *y, int *lck)
+ {
+ int r0;
+ int r1;
+ int r2;
+
+ r0 = READ_ONCE(*x);
+ r1 = READ_ONCE(*x);
+ spin_lock(lck);
+ r2 = atomic_inc_return(y);
+ spin_unlock(lck);
+ }
+
+ filter (y=2 /\ 1:r0=0 /\ 1:r1=1)
+ exists (1:r2=1)
+
+The variable "x" is the control variable for the emulated spin loop.
+P0() sets it to "1" while holding the lock, and P1() emulates the
+spin loop by reading it twice, first into "1:r0" (which should get the
+initial value "0") and then into "1:r1" (which should get the updated
+value "1").
+
+The purpose of the variable "y" is to reject deadlocked executions.
+Only those executions where the final value of "y" have avoided deadlock.
+
+The "filter" clause takes all this into account, constraining "y" to
+equal "2", "1:r0" to equal "0", and "1:r1" to equal 1.
+
+Then the "exists" clause checks to see if P1() acquired its lock first,
+which should not happen given the filter clause because P0() updates
+"x" while holding the lock. And herd7 confirms this.
+
+But suppose that the compiler was permitted to reorder the spin loop
+into P1()'s critical section, like this:
+
+ /* See Documentation/litmus-tests/locking/RM-broken.litmus. */
+ P0(int *x, int *y, int *lck)
+ {
+ int r2;
+
+ spin_lock(lck);
+ r2 = atomic_inc_return(y);
+ WRITE_ONCE(*x, 1);
+ spin_unlock(lck);
+ }
+
+ P1(int *x, int *y, int *lck)
+ {
+ int r0;
+ int r1;
+ int r2;
+
+ spin_lock(lck);
+ r0 = READ_ONCE(*x);
+ r1 = READ_ONCE(*x);
+ r2 = atomic_inc_return(y);
+ spin_unlock(lck);
+ }
+
+ locations [x;lck;0:r2;1:r0;1:r1;1:r2]
+ filter (y=2 /\ 1:r0=0 /\ 1:r1=1)
+ exists (1:r2=1)
+
+If "1:r0" is equal to "0", "1:r1" can never equal "1" because P0()
+cannot update "x" while P1() holds the lock. And herd7 confirms this,
+showing zero executions matching the "filter" criteria.
+
+And this is why Linux-kernel lock and unlock primitives must prevent
+code from entering critical sections. It is not sufficient to only
+prevnt code from leaving them.
--
2.9.5

2020-08-31 22:50:41

by Alan Stern

[permalink] [raw]

Subject: Re: [PATCH kcsan 9/9] tools/memory-model: Document locking corner cases

On Mon, Aug 31, 2020 at 11:20:37AM -0700, [email protected] wrote:
> +No Roach-Motel Locking!
> +-----------------------
> +
> +This example requires familiarity with the herd7 "filter" clause, so
> +please read up on that topic in litmus-tests.txt.
> +
> +It is tempting to allow memory-reference instructions to be pulled
> +into a critical section, but this cannot be allowed in the general case.
> +For example, consider a spin loop preceding a lock-based critical section.
> +Now, herd7 does not model spin loops, but we can emulate one with two
> +loads, with a "filter" clause to constrain the first to return the
> +initial value and the second to return the updated value, as shown below:
> +
> + /* See Documentation/litmus-tests/locking/RM-fixed.litmus. */
> + P0(int *x, int *y, int *lck)
> + {
> + int r2;
> +
> + spin_lock(lck);
> + r2 = atomic_inc_return(y);
> + WRITE_ONCE(*x, 1);
> + spin_unlock(lck);
> + }
> +
> + P1(int *x, int *y, int *lck)
> + {
> + int r0;
> + int r1;
> + int r2;
> +
> + r0 = READ_ONCE(*x);
> + r1 = READ_ONCE(*x);
> + spin_lock(lck);
> + r2 = atomic_inc_return(y);
> + spin_unlock(lck);
> + }
> +
> + filter (y=2 /\ 1:r0=0 /\ 1:r1=1)
> + exists (1:r2=1)
> +
> +The variable "x" is the control variable for the emulated spin loop.
> +P0() sets it to "1" while holding the lock, and P1() emulates the
> +spin loop by reading it twice, first into "1:r0" (which should get the
> +initial value "0") and then into "1:r1" (which should get the updated
> +value "1").
> +
> +The purpose of the variable "y" is to reject deadlocked executions.
> +Only those executions where the final value of "y" have avoided deadlock.
> +
> +The "filter" clause takes all this into account, constraining "y" to
> +equal "2", "1:r0" to equal "0", and "1:r1" to equal 1.
> +
> +Then the "exists" clause checks to see if P1() acquired its lock first,
> +which should not happen given the filter clause because P0() updates
> +"x" while holding the lock. And herd7 confirms this.
> +
> +But suppose that the compiler was permitted to reorder the spin loop
> +into P1()'s critical section, like this:
> +
> + /* See Documentation/litmus-tests/locking/RM-broken.litmus. */
> + P0(int *x, int *y, int *lck)
> + {
> + int r2;
> +
> + spin_lock(lck);
> + r2 = atomic_inc_return(y);
> + WRITE_ONCE(*x, 1);
> + spin_unlock(lck);
> + }
> +
> + P1(int *x, int *y, int *lck)
> + {
> + int r0;
> + int r1;
> + int r2;
> +
> + spin_lock(lck);
> + r0 = READ_ONCE(*x);
> + r1 = READ_ONCE(*x);
> + r2 = atomic_inc_return(y);
> + spin_unlock(lck);
> + }
> +
> + locations [x;lck;0:r2;1:r0;1:r1;1:r2]
> + filter (y=2 /\ 1:r0=0 /\ 1:r1=1)
> + exists (1:r2=1)
> +
> +If "1:r0" is equal to "0", "1:r1" can never equal "1" because P0()
> +cannot update "x" while P1() holds the lock. And herd7 confirms this,
> +showing zero executions matching the "filter" criteria.
> +
> +And this is why Linux-kernel lock and unlock primitives must prevent
> +code from entering critical sections. It is not sufficient to only
> +prevnt code from leaving them.

Is this discussion perhaps overkill?

Let's put it this way: Suppose we have the following code:

P0(int *x, int *lck)
{
spin_lock(lck);
WRITE_ONCE(*x, 1);
do_something();
spin_unlock(lck);
}

P1(int *x, int *lck)
{
while (READ_ONCE(*x) == 0)
;
spin_lock(lck);
do_something_else();
spin_unlock(lck);
}

It's obvious that this test won't deadlock. But if P1 is changed to:

P1(int *x, int *lck)
{
spin_lock(lck);
while (READ_ONCE(*x) == 0)
;
do_something_else();
spin_unlock(lck);
}

then it's equally obvious that the test can deadlock. No need for
fancy memory models or litmus tests or anything else.

Alan

2020-08-31 23:00:07

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH kcsan 9/9] tools/memory-model: Document locking corner cases

On Mon, Aug 31, 2020 at 04:17:01PM -0400, Alan Stern wrote:
> On Mon, Aug 31, 2020 at 11:20:37AM -0700, [email protected] wrote:
> > +No Roach-Motel Locking!
> > +-----------------------
> > +
> > +This example requires familiarity with the herd7 "filter" clause, so
> > +please read up on that topic in litmus-tests.txt.
> > +
> > +It is tempting to allow memory-reference instructions to be pulled
> > +into a critical section, but this cannot be allowed in the general case.
> > +For example, consider a spin loop preceding a lock-based critical section.
> > +Now, herd7 does not model spin loops, but we can emulate one with two
> > +loads, with a "filter" clause to constrain the first to return the
> > +initial value and the second to return the updated value, as shown below:
> > +
> > + /* See Documentation/litmus-tests/locking/RM-fixed.litmus. */
> > + P0(int *x, int *y, int *lck)
> > + {
> > + int r2;
> > +
> > + spin_lock(lck);
> > + r2 = atomic_inc_return(y);
> > + WRITE_ONCE(*x, 1);
> > + spin_unlock(lck);
> > + }
> > +
> > + P1(int *x, int *y, int *lck)
> > + {
> > + int r0;
> > + int r1;
> > + int r2;
> > +
> > + r0 = READ_ONCE(*x);
> > + r1 = READ_ONCE(*x);
> > + spin_lock(lck);
> > + r2 = atomic_inc_return(y);
> > + spin_unlock(lck);
> > + }
> > +
> > + filter (y=2 /\ 1:r0=0 /\ 1:r1=1)
> > + exists (1:r2=1)
> > +
> > +The variable "x" is the control variable for the emulated spin loop.
> > +P0() sets it to "1" while holding the lock, and P1() emulates the
> > +spin loop by reading it twice, first into "1:r0" (which should get the
> > +initial value "0") and then into "1:r1" (which should get the updated
> > +value "1").
> > +
> > +The purpose of the variable "y" is to reject deadlocked executions.
> > +Only those executions where the final value of "y" have avoided deadlock.
> > +
> > +The "filter" clause takes all this into account, constraining "y" to
> > +equal "2", "1:r0" to equal "0", and "1:r1" to equal 1.
> > +
> > +Then the "exists" clause checks to see if P1() acquired its lock first,
> > +which should not happen given the filter clause because P0() updates
> > +"x" while holding the lock. And herd7 confirms this.
> > +
> > +But suppose that the compiler was permitted to reorder the spin loop
> > +into P1()'s critical section, like this:
> > +
> > + /* See Documentation/litmus-tests/locking/RM-broken.litmus. */
> > + P0(int *x, int *y, int *lck)
> > + {
> > + int r2;
> > +
> > + spin_lock(lck);
> > + r2 = atomic_inc_return(y);
> > + WRITE_ONCE(*x, 1);
> > + spin_unlock(lck);
> > + }
> > +
> > + P1(int *x, int *y, int *lck)
> > + {
> > + int r0;
> > + int r1;
> > + int r2;
> > +
> > + spin_lock(lck);
> > + r0 = READ_ONCE(*x);
> > + r1 = READ_ONCE(*x);
> > + r2 = atomic_inc_return(y);
> > + spin_unlock(lck);
> > + }
> > +
> > + locations [x;lck;0:r2;1:r0;1:r1;1:r2]
> > + filter (y=2 /\ 1:r0=0 /\ 1:r1=1)
> > + exists (1:r2=1)
> > +
> > +If "1:r0" is equal to "0", "1:r1" can never equal "1" because P0()
> > +cannot update "x" while P1() holds the lock. And herd7 confirms this,
> > +showing zero executions matching the "filter" criteria.
> > +
> > +And this is why Linux-kernel lock and unlock primitives must prevent
> > +code from entering critical sections. It is not sufficient to only
> > +prevnt code from leaving them.
>
> Is this discussion perhaps overkill?
>
> Let's put it this way: Suppose we have the following code:
>
> P0(int *x, int *lck)
> {
> spin_lock(lck);
> WRITE_ONCE(*x, 1);
> do_something();
> spin_unlock(lck);
> }
>
> P1(int *x, int *lck)
> {
> while (READ_ONCE(*x) == 0)
> ;
> spin_lock(lck);
> do_something_else();
> spin_unlock(lck);
> }
>
> It's obvious that this test won't deadlock. But if P1 is changed to:
>
> P1(int *x, int *lck)
> {
> spin_lock(lck);
> while (READ_ONCE(*x) == 0)
> ;
> do_something_else();
> spin_unlock(lck);
> }
>
> then it's equally obvious that the test can deadlock. No need for
> fancy memory models or litmus tests or anything else.

For people like you and me, who have been thinking about memory ordering
for longer than either of us care to admit, this level of exposition is
most definitely -way- overkill!!!

But I have had people be very happy and grateful that I explained this to
them at this level of detail. Yes, I started parallel programming before
some of them were born, but they are definitely within our target audience
for this particular document. And it is not just Linux kernel hackers
who need this level of detail. A roughly similar transactional-memory
scenario proved to be so non-obvious to any number of noted researchers
that Blundell, Lewis, and Martin needed to feature it in this paper:
https://ieeexplore.ieee.org/abstract/document/4069174
(Alternative source: https://repository.upenn.edu/cgi/viewcontent.cgi?article=1344&context=cis_papers)

Please note that I am -not- advocating making (say) explanation.txt or
recipes.txt more newbie-accessible than they already are. After all,
the point of the README file in that same directory is to direct people
to the documentation files that are the best fit for them, and both
explanation.txt and recipes.txt contain advanced material, and thus
require similarly advanced prerequisites.

Seem reasonable, or am I missing your point?

Thanx, Paul

2020-09-01 01:48:16

by Alan Stern

[permalink] [raw]

Subject: Re: [PATCH kcsan 9/9] tools/memory-model: Document locking corner cases

On Mon, Aug 31, 2020 at 02:47:38PM -0700, Paul E. McKenney wrote:
> On Mon, Aug 31, 2020 at 04:17:01PM -0400, Alan Stern wrote:

> > Is this discussion perhaps overkill?
> >
> > Let's put it this way: Suppose we have the following code:
> >
> > P0(int *x, int *lck)
> > {
> > spin_lock(lck);
> > WRITE_ONCE(*x, 1);
> > do_something();
> > spin_unlock(lck);
> > }
> >
> > P1(int *x, int *lck)
> > {
> > while (READ_ONCE(*x) == 0)
> > ;
> > spin_lock(lck);
> > do_something_else();
> > spin_unlock(lck);
> > }
> >
> > It's obvious that this test won't deadlock. But if P1 is changed to:
> >
> > P1(int *x, int *lck)
> > {
> > spin_lock(lck);
> > while (READ_ONCE(*x) == 0)
> > ;
> > do_something_else();
> > spin_unlock(lck);
> > }
> >
> > then it's equally obvious that the test can deadlock. No need for
> > fancy memory models or litmus tests or anything else.
>
> For people like you and me, who have been thinking about memory ordering
> for longer than either of us care to admit, this level of exposition is
> most definitely -way- overkill!!!
>
> But I have had people be very happy and grateful that I explained this to
> them at this level of detail. Yes, I started parallel programming before
> some of them were born, but they are definitely within our target audience
> for this particular document. And it is not just Linux kernel hackers
> who need this level of detail. A roughly similar transactional-memory
> scenario proved to be so non-obvious to any number of noted researchers
> that Blundell, Lewis, and Martin needed to feature it in this paper:
> https://ieeexplore.ieee.org/abstract/document/4069174
> (Alternative source: https://repository.upenn.edu/cgi/viewcontent.cgi?article=1344&context=cis_papers)
>
> Please note that I am -not- advocating making (say) explanation.txt or
> recipes.txt more newbie-accessible than they already are. After all,
> the point of the README file in that same directory is to direct people
> to the documentation files that are the best fit for them, and both
> explanation.txt and recipes.txt contain advanced material, and thus
> require similarly advanced prerequisites.
>
> Seem reasonable, or am I missing your point?

The question is, what are you trying to accomplish in this section? Are
you trying to demonstrate that it isn't safe to allow arbitrary code to
leak into a critical section? If so then you don't need to present an
LKMM litmus test to make the point; the example I gave here will do
quite as well. Perhaps even better, since it doesn't drag in all sorts
of extraneous concepts like limitations of litmus tests or how to
emulate a spin loop.

On the other hand, if your goal is to show how to construct a litmus
test that will model a particular C language test case (such as the one
I gave), then the text does a reasonable job -- although I do think it
could be clarified somewhat. For instance, it wouldn't hurt to include
the real C code before giving the corresponding litmus test, so that the
reader will have a clear idea of what you're trying to model.

Just what you want to achieve here is not clear from the context.

Besides, the example is in any case a straw man. The text starts out
saying "It is tempting to allow memory-reference instructions to be
pulled into a critical section", but then the example pulls an entire
spin loop inside -- not just the memory references but also the
conditional branch instruction at the bottom of the loop! I can't
imagine anyone would think it was safe to allow branches to leak into a
critical section, particularly when doing so would break a control
dependency (as it does here).

Alan

2020-09-01 17:05:32

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH kcsan 9/9] tools/memory-model: Document locking corner cases

On Mon, Aug 31, 2020 at 09:45:04PM -0400, Alan Stern wrote:
> On Mon, Aug 31, 2020 at 02:47:38PM -0700, Paul E. McKenney wrote:
> > On Mon, Aug 31, 2020 at 04:17:01PM -0400, Alan Stern wrote:
>
> > > Is this discussion perhaps overkill?
> > >
> > > Let's put it this way: Suppose we have the following code:
> > >
> > > P0(int *x, int *lck)
> > > {
> > > spin_lock(lck);
> > > WRITE_ONCE(*x, 1);
> > > do_something();
> > > spin_unlock(lck);
> > > }
> > >
> > > P1(int *x, int *lck)
> > > {
> > > while (READ_ONCE(*x) == 0)
> > > ;
> > > spin_lock(lck);
> > > do_something_else();
> > > spin_unlock(lck);
> > > }
> > >
> > > It's obvious that this test won't deadlock. But if P1 is changed to:
> > >
> > > P1(int *x, int *lck)
> > > {
> > > spin_lock(lck);
> > > while (READ_ONCE(*x) == 0)
> > > ;
> > > do_something_else();
> > > spin_unlock(lck);
> > > }
> > >
> > > then it's equally obvious that the test can deadlock. No need for
> > > fancy memory models or litmus tests or anything else.
> >
> > For people like you and me, who have been thinking about memory ordering
> > for longer than either of us care to admit, this level of exposition is
> > most definitely -way- overkill!!!
> >
> > But I have had people be very happy and grateful that I explained this to
> > them at this level of detail. Yes, I started parallel programming before
> > some of them were born, but they are definitely within our target audience
> > for this particular document. And it is not just Linux kernel hackers
> > who need this level of detail. A roughly similar transactional-memory
> > scenario proved to be so non-obvious to any number of noted researchers
> > that Blundell, Lewis, and Martin needed to feature it in this paper:
> > https://ieeexplore.ieee.org/abstract/document/4069174
> > (Alternative source: https://repository.upenn.edu/cgi/viewcontent.cgi?article=1344&context=cis_papers)
> >
> > Please note that I am -not- advocating making (say) explanation.txt or
> > recipes.txt more newbie-accessible than they already are. After all,
> > the point of the README file in that same directory is to direct people
> > to the documentation files that are the best fit for them, and both
> > explanation.txt and recipes.txt contain advanced material, and thus
> > require similarly advanced prerequisites.
> >
> > Seem reasonable, or am I missing your point?
>
> The question is, what are you trying to accomplish in this section? Are
> you trying to demonstrate that it isn't safe to allow arbitrary code to
> leak into a critical section? If so then you don't need to present an
> LKMM litmus test to make the point; the example I gave here will do
> quite as well. Perhaps even better, since it doesn't drag in all sorts
> of extraneous concepts like limitations of litmus tests or how to
> emulate a spin loop.
>
> On the other hand, if your goal is to show how to construct a litmus
> test that will model a particular C language test case (such as the one
> I gave), then the text does a reasonable job -- although I do think it
> could be clarified somewhat. For instance, it wouldn't hurt to include
> the real C code before giving the corresponding litmus test, so that the
> reader will have a clear idea of what you're trying to model.

Makes sense. I can apply this at some point, but I would welcome a patch
from you, which I would be happy to fold in with your Codeveloped-by.

> Just what you want to achieve here is not clear from the context.

People who have internalized the "roach motel" model of locking
(https://www.cs.umd.edu/~pugh/java/memoryModel/BidirectionalMemoryBarrier.html)
need their internalization adjusted.

> Besides, the example is in any case a straw man. The text starts out
> saying "It is tempting to allow memory-reference instructions to be
> pulled into a critical section", but then the example pulls an entire
> spin loop inside -- not just the memory references but also the
> conditional branch instruction at the bottom of the loop! I can't
> imagine anyone would think it was safe to allow branches to leak into a
> critical section, particularly when doing so would break a control
> dependency (as it does here).

Most people outside of a few within the Linux kernel community and within
the various hardware memory-ordering communities don't know that control
dependencies even exist, so could not be expected to see any danger
in rather thoroughly folding, spindling, or otherwise mutilating them,
let alone pulling them into a lock-based critical section. And many in
the various toolchain communities see dependencies of any sort as an
impediment to performance that should be broken wherever and whenever
possible.

That said, a less prejudicial introduction to this example might be good.
What did you have in mind?

Thanx, Paul

2020-09-01 20:12:07

by Alan Stern

[permalink] [raw]

Subject: Re: [PATCH kcsan 9/9] tools/memory-model: Document locking corner cases

On Tue, Sep 01, 2020 at 10:04:21AM -0700, Paul E. McKenney wrote:
> On Mon, Aug 31, 2020 at 09:45:04PM -0400, Alan Stern wrote:

> > The question is, what are you trying to accomplish in this section? Are
> > you trying to demonstrate that it isn't safe to allow arbitrary code to
> > leak into a critical section? If so then you don't need to present an
> > LKMM litmus test to make the point; the example I gave here will do
> > quite as well. Perhaps even better, since it doesn't drag in all sorts
> > of extraneous concepts like limitations of litmus tests or how to
> > emulate a spin loop.
> >
> > On the other hand, if your goal is to show how to construct a litmus
> > test that will model a particular C language test case (such as the one
> > I gave), then the text does a reasonable job -- although I do think it
> > could be clarified somewhat. For instance, it wouldn't hurt to include
> > the real C code before giving the corresponding litmus test, so that the
> > reader will have a clear idea of what you're trying to model.
>
> Makes sense. I can apply this at some point, but I would welcome a patch
> from you, which I would be happy to fold in with your Codeveloped-by.

I don't have time to work on these documents now. Maybe later on. They
do need some serious editing. (You could try reading through them
carefully yourself; I'm sure you'd find a lot of things to fix up.)

Incidentally, your patch bomb from yesterday was the first time I had
seen these things (except for the litmus-test format document).

> > Just what you want to achieve here is not clear from the context.
>
> People who have internalized the "roach motel" model of locking
> (https://www.cs.umd.edu/~pugh/java/memoryModel/BidirectionalMemoryBarrier.html)
> need their internalization adjusted.

Shucks, if you only want to show that letting arbitrary code (i.e.,
branches) migrate into a critical section is unsafe, all you need is
this uniprocessor example:

P0(int *sl)
{
goto Skip;
spin_lock(sl);
spin_unlock(sl);
Skip:
spin_lock(sl);
spin_unlock(sl);
}

This does nothing but runs fine. Letting the branch move into the first
critical section gives:

P0(int *sl)
{
spin_lock(sl);
goto Skip;
spin_unlock(sl);
Skip:
spin_lock(sl);
spin_unlock(sl);
}

which self-deadlocks 100% of the time. You don't need to know anything
about memory models or concurrency to understand this.

On the other hand, if you want to show that letting memory accesses leak
into a critical section is unsafe then you need a different example:
spin loops won't do it.

> > Besides, the example is in any case a straw man. The text starts out
> > saying "It is tempting to allow memory-reference instructions to be
> > pulled into a critical section", but then the example pulls an entire
> > spin loop inside -- not just the memory references but also the
> > conditional branch instruction at the bottom of the loop! I can't
> > imagine anyone would think it was safe to allow branches to leak into a
> > critical section, particularly when doing so would break a control
> > dependency (as it does here).
>
> Most people outside of a few within the Linux kernel community and within
> the various hardware memory-ordering communities don't know that control
> dependencies even exist, so could not be expected to see any danger
> in rather thoroughly folding, spindling, or otherwise mutilating them,
> let alone pulling them into a lock-based critical section. And many in
> the various toolchain communities see dependencies of any sort as an
> impediment to performance that should be broken wherever and whenever
> possible.
>
> That said, a less prejudicial introduction to this example might be good.
> What did you have in mind?

Again, it depends on what example is intended to accomplish (which you
still haven't said explicitly). Whatever it is, I don't think the
current text is a good way to do it.

Alan

2020-09-03 23:48:40

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH kcsan 9/9] tools/memory-model: Document locking corner cases

On Tue, Sep 01, 2020 at 04:11:10PM -0400, Alan Stern wrote:
> On Tue, Sep 01, 2020 at 10:04:21AM -0700, Paul E. McKenney wrote:
> > On Mon, Aug 31, 2020 at 09:45:04PM -0400, Alan Stern wrote:
>
> > > The question is, what are you trying to accomplish in this section? Are
> > > you trying to demonstrate that it isn't safe to allow arbitrary code to
> > > leak into a critical section? If so then you don't need to present an
> > > LKMM litmus test to make the point; the example I gave here will do
> > > quite as well. Perhaps even better, since it doesn't drag in all sorts
> > > of extraneous concepts like limitations of litmus tests or how to
> > > emulate a spin loop.
> > >
> > > On the other hand, if your goal is to show how to construct a litmus
> > > test that will model a particular C language test case (such as the one
> > > I gave), then the text does a reasonable job -- although I do think it
> > > could be clarified somewhat. For instance, it wouldn't hurt to include
> > > the real C code before giving the corresponding litmus test, so that the
> > > reader will have a clear idea of what you're trying to model.
> >
> > Makes sense. I can apply this at some point, but I would welcome a patch
> > from you, which I would be happy to fold in with your Codeveloped-by.
>
> I don't have time to work on these documents now. Maybe later on. They
> do need some serious editing. (You could try reading through them
> carefully yourself; I'm sure you'd find a lot of things to fix up.)
>
> Incidentally, your patch bomb from yesterday was the first time I had
> seen these things (except for the litmus-test format document).

The hope was to have a good version of them completed some weeks ago,
but life intervened.

My current thought is to move these three patches out of my queue for
v5.10 to try again in v5.11:

0b8c06b75ea1 ("tools/memory-model: Add a simple entry point document")
dc372dc0dc89 ("tools/memory-model: Move Documentation description to Documentation/README")
0d9aaf8df7cb ("tools/memory-model: Document categories of ordering primitives")
35dd5f6d17a0 ("tools/memory-model: Document locking corner cases")

These would remain in my v5.10 queue:

1e44e6e82e7b ("Replace HTTP links with HTTPS ones: LKMM")
cc9628b45c9f ("tools/memory-model: Update recipes.txt prime_numbers.c path")
984f272be9d7 ("tools/memory-model: Improve litmus-test documentation")
7c22cf3b731f ("tools/memory-model: Expand the cheatsheet.txt notion of relaxed")
(But with the updates from the other thread.)

Does that work? If not, what would?

> > > Just what you want to achieve here is not clear from the context.
> >
> > People who have internalized the "roach motel" model of locking
> > (https://www.cs.umd.edu/~pugh/java/memoryModel/BidirectionalMemoryBarrier.html)
> > need their internalization adjusted.
>
> Shucks, if you only want to show that letting arbitrary code (i.e.,
> branches) migrate into a critical section is unsafe, all you need is
> this uniprocessor example:
>
> P0(int *sl)
> {
> goto Skip;
> spin_lock(sl);
> spin_unlock(sl);
> Skip:
> spin_lock(sl);
> spin_unlock(sl);
> }
>
> This does nothing but runs fine. Letting the branch move into the first
> critical section gives:
>
> P0(int *sl)
> {
> spin_lock(sl);
> goto Skip;
> spin_unlock(sl);
> Skip:
> spin_lock(sl);
> spin_unlock(sl);
> }
>
> which self-deadlocks 100% of the time. You don't need to know anything
> about memory models or concurrency to understand this.

Although your example does an excellent job of illustrating the general
point about branches, I am not convinced that it would be seen as
demonstrating the dangers of moving an entire loop into a critical
section.

> On the other hand, if you want to show that letting memory accesses leak
> into a critical section is unsafe then you need a different example:
> spin loops won't do it.

I am not immediately coming up with an example that is broken by leaking
isolated memory accesses into a critical section. I will give it some
more thought.

> > > Besides, the example is in any case a straw man. The text starts out
> > > saying "It is tempting to allow memory-reference instructions to be
> > > pulled into a critical section", but then the example pulls an entire
> > > spin loop inside -- not just the memory references but also the
> > > conditional branch instruction at the bottom of the loop! I can't
> > > imagine anyone would think it was safe to allow branches to leak into a
> > > critical section, particularly when doing so would break a control
> > > dependency (as it does here).
> >
> > Most people outside of a few within the Linux kernel community and within
> > the various hardware memory-ordering communities don't know that control
> > dependencies even exist, so could not be expected to see any danger
> > in rather thoroughly folding, spindling, or otherwise mutilating them,
> > let alone pulling them into a lock-based critical section. And many in
> > the various toolchain communities see dependencies of any sort as an
> > impediment to performance that should be broken wherever and whenever
> > possible.
> >
> > That said, a less prejudicial introduction to this example might be good.
> > What did you have in mind?
>
> Again, it depends on what example is intended to accomplish (which you
> still haven't said explicitly). Whatever it is, I don't think the
> current text is a good way to do it.

OK, as noted above, I will move this one out of the v5.10 queue into the
v5.11 queue, which should provide time to refine it, one way or another.

Thanx, Paul

2020-09-04 19:53:55

by Alan Stern

[permalink] [raw]

Subject: Re: [PATCH kcsan 9/9] tools/memory-model: Document locking corner cases

On Thu, Sep 03, 2020 at 04:45:07PM -0700, Paul E. McKenney wrote:

> The hope was to have a good version of them completed some weeks ago,
> but life intervened.
>
> My current thought is to move these three patches out of my queue for
> v5.10 to try again in v5.11:
>
> 0b8c06b75ea1 ("tools/memory-model: Add a simple entry point document")
> dc372dc0dc89 ("tools/memory-model: Move Documentation description to Documentation/README")
> 0d9aaf8df7cb ("tools/memory-model: Document categories of ordering primitives")
> 35dd5f6d17a0 ("tools/memory-model: Document locking corner cases")
>
> These would remain in my v5.10 queue:
>
> 1e44e6e82e7b ("Replace HTTP links with HTTPS ones: LKMM")
> cc9628b45c9f ("tools/memory-model: Update recipes.txt prime_numbers.c path")
> 984f272be9d7 ("tools/memory-model: Improve litmus-test documentation")
> 7c22cf3b731f ("tools/memory-model: Expand the cheatsheet.txt notion of relaxed")
> (But with the updates from the other thread.)
>
> Does that work? If not, what would?

That sounds reasonable.

> > > > Just what you want to achieve here is not clear from the context.
> > >
> > > People who have internalized the "roach motel" model of locking
> > > (https://www.cs.umd.edu/~pugh/java/memoryModel/BidirectionalMemoryBarrier.html)
> > > need their internalization adjusted.
> >
> > Shucks, if you only want to show that letting arbitrary code (i.e.,
> > branches) migrate into a critical section is unsafe, all you need is
> > this uniprocessor example:
> >
> > P0(int *sl)
> > {
> > goto Skip;
> > spin_lock(sl);
> > spin_unlock(sl);
> > Skip:
> > spin_lock(sl);
> > spin_unlock(sl);
> > }
> >
> > This does nothing but runs fine. Letting the branch move into the first
> > critical section gives:
> >
> > P0(int *sl)
> > {
> > spin_lock(sl);
> > goto Skip;
> > spin_unlock(sl);
> > Skip:
> > spin_lock(sl);
> > spin_unlock(sl);
> > }
> >
> > which self-deadlocks 100% of the time. You don't need to know anything
> > about memory models or concurrency to understand this.
>
> Although your example does an excellent job of illustrating the general
> point about branches, I am not convinced that it would be seen as
> demonstrating the dangers of moving an entire loop into a critical
> section.

All right, how about this?

P0(int *sl)
{
while (spin_is_locked(sl))
cpu_relax();
spin_lock(sl);
spin_unlock(sl);
}

Runs normally, even if other threads are doing unknown locking and
unlocking at the same time. But:

P0(int *sl)
{
spin_lock(sl);
while (spin_is_locked(sl))
cpu_relax();
spin_unlock(sl);
}

always goes into an infinite loop.

> > On the other hand, if you want to show that letting memory accesses leak
> > into a critical section is unsafe then you need a different example:
> > spin loops won't do it.
>
> I am not immediately coming up with an example that is broken by leaking
> isolated memory accesses into a critical section. I will give it some
> more thought.

It may turn out to be a hard challenge. As far as I know, there are no
such examples, unless you want to count something like this:

spin_lock(sl);
spin_unlock(sl);
spin_lock(sl);
spin_unlock(sl);

transformed to:

spin_lock(sl);
spin_lock(sl);
spin_unlock(sl);
spin_unlock(sl);

You could view this transformation as moving the second spin_lock up
into the first critical section (obviously dangerous since spin_lock
involves a loop), or as moving the first spin_unlock down into the
second critical section (not so obvious since spin_unlock is just a
memory access).

Okay, so let's restrict ourselves to memory accesses and loops that
don't touch the spinlock variable itself. Then we would need something
more similar to the original example, like this:

P0(spin_lock *sl, int *x)
{
while (READ_ONCE(x) == 0)
cpu_relax();
spin_lock(sl);
spin_unlock(sl);
}

P1(spin_lock *sl, int *x)
{
spin_lock(sl);
WRITE_ONCE(x, 1);
spin_unlock(sl);
}

This will always run to completion. But if the loop in P0 is moved into
the critical section, the test may never end. Again, you don't need
fancy memory models to understand this; you just need to know that
critical sections are mutually exclusive.

But if this example didn't have a loop, allowing the memory access to
leak into the critical section would be fine.

Alan