2018-06-28 11:58:09

by Andrea Parri

[permalink] [raw]
Subject: [PATCH 0/3] sched/locking/doc: Miscellaneous fixes

Hi,

This is a follow-up on the discussion started in [1].

Applies on -rcu/dev.

Cheers,
Andrea

[1] http://lkml.kernel.org/r/1529918258-7295-1-git-send-email-andrea.parri@amarulasolutions.com

Andrea Parri (3):
sched: Use smp_mb() in wake_woken_function()
locking: Clarify requirements for smp_mb__after_spinlock()
doc: Update wake_up() & co. memory-barrier guarantees

Documentation/memory-barriers.txt | 43 ++++++++++++++++++------------
include/linux/sched.h | 4 +--
include/linux/spinlock.h | 25 ++----------------
kernel/sched/completion.c | 8 +++---
kernel/sched/core.c | 30 +++++++++------------
kernel/sched/wait.c | 55 ++++++++++++++++++---------------------
6 files changed, 72 insertions(+), 93 deletions(-)

--
2.7.4



2018-06-28 13:04:22

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 2/3] locking: Clarify requirements for smp_mb__after_spinlock()

On Thu, Jun 28, 2018 at 12:41:19PM +0200, Andrea Parri wrote:
> + * smp_mb__after_spinlock() provides a full memory barrier between po-earlier
> + * lock acquisitions and po-later memory accesses.

What does po-earlier mean? Partial order?


2018-06-28 13:51:18

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH 2/3] locking: Clarify requirements for smp_mb__after_spinlock()

On Thu, 28 Jun 2018, Andrea Parri wrote:

> --- a/include/linux/spinlock.h
> +++ b/include/linux/spinlock.h
> @@ -114,29 +114,8 @@ do { \
> #endif /*arch_spin_is_contended*/
>
> /*
> - * This barrier must provide two things:
> - *
> - * - it must guarantee a STORE before the spin_lock() is ordered against a
> - * LOAD after it, see the comments at its two usage sites.
> - *
> - * - it must ensure the critical section is RCsc.
> - *
> - * The latter is important for cases where we observe values written by other
> - * CPUs in spin-loops, without barriers, while being subject to scheduling.
> - *
> - * CPU0 CPU1 CPU2
> - *
> - * for (;;) {
> - * if (READ_ONCE(X))
> - * break;
> - * }
> - * X=1
> - * <sched-out>
> - * <sched-in>
> - * r = X;
> - *
> - * without transitivity it could be that CPU1 observes X!=0 breaks the loop,
> - * we get migrated and CPU2 sees X==0.
> + * smp_mb__after_spinlock() provides a full memory barrier between po-earlier
> + * lock acquisitions and po-later memory accesses.

How about saying "provides the equivalent of a full memory barrier"?

The point being that smp_mb__after_spinlock doesn't have to provide an
actual barrier; it just has to ensure the behavior is the same as if a
full barrier were present.

Alan


2018-06-28 13:53:47

by Andrea Parri

[permalink] [raw]
Subject: Re: [PATCH 2/3] locking: Clarify requirements for smp_mb__after_spinlock()

> > + * smp_mb__after_spinlock() provides a full memory barrier between po-earlier
> > + * lock acquisitions and po-later memory accesses.
>
> How about saying "provides the equivalent of a full memory barrier"?
>
> The point being that smp_mb__after_spinlock doesn't have to provide an
> actual barrier; it just has to ensure the behavior is the same as if a
> full barrier were present.

Agreed; will fix in the next version. Thanks,

Andrea


>
> Alan
>

2018-06-28 15:23:57

by Andrea Parri

[permalink] [raw]
Subject: [PATCH 2/3] locking: Clarify requirements for smp_mb__after_spinlock()

There are 11 interpretations of the requirements described in the header
comment for smp_mb__after_spinlock(): one for each LKMM maintainer, and
one currently encoded in the Cat file. Stick to the latter (until a more
satisfactory solution is presented/agreed).

Signed-off-by: Andrea Parri <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
---
include/linux/spinlock.h | 25 ++-----------------------
1 file changed, 2 insertions(+), 23 deletions(-)

diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 1e8a464358384..6737ee2381d50 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -114,29 +114,8 @@ do { \
#endif /*arch_spin_is_contended*/

/*
- * This barrier must provide two things:
- *
- * - it must guarantee a STORE before the spin_lock() is ordered against a
- * LOAD after it, see the comments at its two usage sites.
- *
- * - it must ensure the critical section is RCsc.
- *
- * The latter is important for cases where we observe values written by other
- * CPUs in spin-loops, without barriers, while being subject to scheduling.
- *
- * CPU0 CPU1 CPU2
- *
- * for (;;) {
- * if (READ_ONCE(X))
- * break;
- * }
- * X=1
- * <sched-out>
- * <sched-in>
- * r = X;
- *
- * without transitivity it could be that CPU1 observes X!=0 breaks the loop,
- * we get migrated and CPU2 sees X==0.
+ * smp_mb__after_spinlock() provides a full memory barrier between po-earlier
+ * lock acquisitions and po-later memory accesses.
*
* Since most load-store architectures implement ACQUIRE with an smp_mb() after
* the LL/SC loop, they need no further barriers. Similarly all our TSO
--
2.7.4


2018-06-28 16:22:34

by Andrea Parri

[permalink] [raw]
Subject: [PATCH 1/3] sched: Use smp_mb() in wake_woken_function()

wake_woken_function() synchronizes with wait_woken() as follows:

[wait_woken] [wake_woken_function]

entry->flags &= ~wq_flag_woken; condition = true;
smp_mb(); smp_wmb();
if (condition) wq_entry->flags |= wq_flag_woken;
break;

This commit replaces the above smp_wmb() with an smp_mb() in order to
guarantee that either wait_woken() sees the wait condition being true
or the store to wq_entry->flags in woken_wake_function() follows the
store in wait_woken() in the coherence order (so that the former can
eventually be observed by wait_woken()).

The commit also fixes a comment associated to set_current_state() in
wait_woken(): the comment pairs the barrier in set_current_state() to
the above smp_wmb(), while the actual pairing involves the barrier in
set_current_state() and the barrier executed by the try_to_wake_up()
in wake_woken_function().

Signed-off-by: Andrea Parri <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Paul E. McKenney" <<[email protected]>
---
kernel/sched/wait.c | 47 +++++++++++++++++++++--------------------------
1 file changed, 21 insertions(+), 26 deletions(-)

diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 928be527477eb..a7a2aaa3026a6 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -392,35 +392,36 @@ static inline bool is_kthread_should_stop(void)
* if (condition)
* break;
*
- * p->state = mode; condition = true;
- * smp_mb(); // A smp_wmb(); // C
- * if (!wq_entry->flags & WQ_FLAG_WOKEN) wq_entry->flags |= WQ_FLAG_WOKEN;
- * schedule() try_to_wake_up();
- * p->state = TASK_RUNNING; ~~~~~~~~~~~~~~~~~~
- * wq_entry->flags &= ~WQ_FLAG_WOKEN; condition = true;
- * smp_mb() // B smp_wmb(); // C
- * wq_entry->flags |= WQ_FLAG_WOKEN;
- * }
- * remove_wait_queue(&wq_head, &wait);
+ * // in wait_woken() // in woken_wake_function()
*
+ * p->state = mode; wq_entry->flags |= WQ_FLAG_WOKEN;
+ * smp_mb(); // A try_to_wake_up():
+ * if (!(wq_entry->flags & WQ_FLAG_WOKEN)) <full barrier>
+ * schedule() if (p->state & mode)
+ * p->state = TASK_RUNNING; p->state = TASK_RUNNING;
+ * wq_entry->flags &= ~WQ_FLAG_WOKEN; ~~~~~~~~~~~~~~~~~~
+ * smp_mb(); // B condition = true;
+ * } smp_mb(); // C
+ * remove_wait_queue(&wq_head, &wait); wq_entry->flags |= WQ_FLAG_WOKEN;
*/
long wait_woken(struct wait_queue_entry *wq_entry, unsigned mode, long timeout)
{
- set_current_state(mode); /* A */
/*
- * The above implies an smp_mb(), which matches with the smp_wmb() from
- * woken_wake_function() such that if we observe WQ_FLAG_WOKEN we must
- * also observe all state before the wakeup.
+ * The below executes an smp_mb(), which matches with the full barrier
+ * executed by the try_to_wake_up() in woken_wake_function() such that
+ * either we see the store to wq_entry->flags in woken_wake_function()
+ * or woken_wake_function() sees our store to current->state.
*/
+ set_current_state(mode); /* A */
if (!(wq_entry->flags & WQ_FLAG_WOKEN) && !is_kthread_should_stop())
timeout = schedule_timeout(timeout);
__set_current_state(TASK_RUNNING);

/*
- * The below implies an smp_mb(), it too pairs with the smp_wmb() from
- * woken_wake_function() such that we must either observe the wait
- * condition being true _OR_ WQ_FLAG_WOKEN such that we will not miss
- * an event.
+ * The below executes an smp_mb(), which matches with the smp_mb() (C)
+ * in woken_wake_function() such that either we see the wait condition
+ * being true or the store to wq_entry->flags in woken_wake_function()
+ * follows ours in the coherence order.
*/
smp_store_mb(wq_entry->flags, wq_entry->flags & ~WQ_FLAG_WOKEN); /* B */

@@ -430,14 +431,8 @@ EXPORT_SYMBOL(wait_woken);

int woken_wake_function(struct wait_queue_entry *wq_entry, unsigned mode, int sync, void *key)
{
- /*
- * Although this function is called under waitqueue lock, LOCK
- * doesn't imply write barrier and the users expects write
- * barrier semantics on wakeup functions. The following
- * smp_wmb() is equivalent to smp_wmb() in try_to_wake_up()
- * and is paired with smp_store_mb() in wait_woken().
- */
- smp_wmb(); /* C */
+ /* Pairs with the smp_store_mb() in wait_woken(). */
+ smp_mb(); /* C */
wq_entry->flags |= WQ_FLAG_WOKEN;

return default_wake_function(wq_entry, mode, sync, key);
--
2.7.4


2018-06-28 16:22:41

by Andrea Parri

[permalink] [raw]
Subject: [PATCH 3/3] doc: Update wake_up() & co. memory-barrier guarantees

Both the implementation and the users' expectation [1] for the various
wakeup primitives have evolved over time, but the documentation has not
kept up with these changes: brings it into 2018.

[1] http://lkml.kernel.org/r/[email protected]

Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Andrea Parri <[email protected]>
[ aparri: Apply feedback from Alan Stern. ]
Cc: Alan Stern <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: David Howells <[email protected]>
Cc: Jade Alglave <[email protected]>
Cc: Luc Maranget <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Akira Yokosawa <[email protected]>
Cc: Daniel Lustig <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Ingo Molnar <[email protected]>
---
Documentation/memory-barriers.txt | 43 ++++++++++++++++++++++++---------------
include/linux/sched.h | 4 ++--
kernel/sched/completion.c | 8 ++++----
kernel/sched/core.c | 30 +++++++++++----------------
kernel/sched/wait.c | 8 ++++----
5 files changed, 49 insertions(+), 44 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index a02d6bbfc9d0a..0d8d7ef131e9a 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -2179,32 +2179,41 @@ or:
event_indicated = 1;
wake_up_process(event_daemon);

-A write memory barrier is implied by wake_up() and co. if and only if they
-wake something up. The barrier occurs before the task state is cleared, and so
-sits between the STORE to indicate the event and the STORE to set TASK_RUNNING:
+A general memory barrier is executed by wake_up() if it wakes something up.
+If it doesn't wake anything up then a memory barrier may or may not be
+executed; you must not rely on it. The barrier occurs before the task state
+is accessed, in particular, it sits between the STORE to indicate the event
+and the STORE to set TASK_RUNNING:

- CPU 1 CPU 2
+ CPU 1 (Sleeper) CPU 2 (Waker)
=============================== ===============================
set_current_state(); STORE event_indicated
smp_store_mb(); wake_up();
- STORE current->state <write barrier>
- <general barrier> STORE current->state
- LOAD event_indicated
+ STORE current->state ...
+ <general barrier> <general barrier>
+ LOAD event_indicated if ((LOAD task->state) & TASK_NORMAL)
+ STORE task->state

-To repeat, this write memory barrier is present if and only if something
-is actually awakened. To see this, consider the following sequence of
-events, where X and Y are both initially zero:
+where "task" is the thread being woken up and it equals CPU 1's "current".
+
+To repeat, a general memory barrier is guaranteed to be executed by wake_up()
+if something is actually awakened, but otherwise there is no such guarantee.
+To see this, consider the following sequence of events, where X and Y are both
+initially zero:

CPU 1 CPU 2
=============================== ===============================
- X = 1; STORE event_indicated
+ X = 1; Y = 1;
smp_mb(); wake_up();
- Y = 1; wait_event(wq, Y == 1);
- wake_up(); load from Y sees 1, no memory barrier
- load from X might see 0
+ LOAD Y LOAD X
+
+If a wakeup does occur, one (at least) of the two loads must see 1. If, on
+the other hand, a wakeup does not occur, both loads might see 0.

-In contrast, if a wakeup does occur, CPU 2's load from X would be guaranteed
-to see 1.
+wake_up_process() always executes a general memory barrier. The barrier again
+occurs before the task state is accessed. In particular, if the wake_up() in
+the previous snippet were replaced by a call to wake_up_process() then one of
+the two loads would be guaranteed to see 1.

The available waker functions include:

@@ -2224,6 +2233,8 @@ The available waker functions include:
wake_up_poll();
wake_up_process();

+In terms of memory ordering, these functions all provide the same guarantees of
+a wake_up() (or stronger).

[!] Note that the memory barriers implied by the sleeper and the waker do _not_
order multiple stores before the wake-up with respect to loads of those stored
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 87bf02d93a279..ddfdeb632f748 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -167,8 +167,8 @@ struct task_group;
* need_sleep = false;
* wake_up_state(p, TASK_UNINTERRUPTIBLE);
*
- * Where wake_up_state() (and all other wakeup primitives) imply enough
- * barriers to order the store of the variable against wakeup.
+ * where wake_up_state() executes a full memory barrier before accessing the
+ * task state.
*
* Wakeup will do: if (@state & p->state) p->state = TASK_RUNNING, that is,
* once it observes the TASK_UNINTERRUPTIBLE store the waking CPU can issue a
diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index e426b0cb9ac63..a1ad5b7d5521b 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -22,8 +22,8 @@
*
* See also complete_all(), wait_for_completion() and related routines.
*
- * It may be assumed that this function implies a write memory barrier before
- * changing the task state if and only if any tasks are woken up.
+ * If this function wakes up a task, it executes a full memory barrier before
+ * accessing the task state.
*/
void complete(struct completion *x)
{
@@ -44,8 +44,8 @@ EXPORT_SYMBOL(complete);
*
* This will wake up all threads waiting on this particular completion event.
*
- * It may be assumed that this function implies a write memory barrier before
- * changing the task state if and only if any tasks are woken up.
+ * If this function wakes up a task, it executes a full memory barrier before
+ * accessing the task state.
*
* Since complete_all() sets the completion of @x permanently to done
* to allow multiple waiters to finish, a call to reinit_completion()
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bfd49a932bdb6..3579fc45fbeb8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -413,8 +413,8 @@ void wake_q_add(struct wake_q_head *head, struct task_struct *task)
* its already queued (either by us or someone else) and will get the
* wakeup due to that.
*
- * This cmpxchg() implies a full barrier, which pairs with the write
- * barrier implied by the wakeup in wake_up_q().
+ * This cmpxchg() executes a full barrier, which pairs with the full
+ * barrier executed by the wakeup in wake_up_q().
*/
if (cmpxchg(&node->next, NULL, WAKE_Q_TAIL))
return;
@@ -442,8 +442,8 @@ void wake_up_q(struct wake_q_head *head)
task->wake_q.next = NULL;

/*
- * wake_up_process() implies a wmb() to pair with the queueing
- * in wake_q_add() so as not to miss wakeups.
+ * wake_up_process() executes a full barrier, which pairs with
+ * the queueing in wake_q_add() so as not to miss wakeups.
*/
wake_up_process(task);
put_task_struct(task);
@@ -1880,8 +1880,7 @@ static void ttwu_queue(struct task_struct *p, int cpu, int wake_flags)
* rq(c1)->lock (if not at the same time, then in that order).
* C) LOCK of the rq(c1)->lock scheduling in task
*
- * Transitivity guarantees that B happens after A and C after B.
- * Note: we only require RCpc transitivity.
+ * Release/acquire chaining guarantees that B happens after A and C after B.
* Note: the CPU doing B need not be c0 or c1
*
* Example:
@@ -1943,16 +1942,9 @@ static void ttwu_queue(struct task_struct *p, int cpu, int wake_flags)
* UNLOCK rq(0)->lock
*
*
- * However; for wakeups there is a second guarantee we must provide, namely we
- * must observe the state that lead to our wakeup. That is, not only must our
- * task observe its own prior state, it must also observe the stores prior to
- * its wakeup.
- *
- * This means that any means of doing remote wakeups must order the CPU doing
- * the wakeup against the CPU the task is going to end up running on. This,
- * however, is already required for the regular Program-Order guarantee above,
- * since the waking CPU is the one issueing the ACQUIRE (smp_cond_load_acquire).
- *
+ * However, for wakeups there is a second guarantee we must provide, namely we
+ * must ensure that CONDITION=1 done by the caller can not be reordered with
+ * accesses to the task state; see try_to_wake_up() and set_current_state().
*/

/**
@@ -1968,6 +1960,9 @@ static void ttwu_queue(struct task_struct *p, int cpu, int wake_flags)
* Atomic against schedule() which would dequeue a task, also see
* set_current_state().
*
+ * This function executes a full memory barrier before accessing the task
+ * state; see set_current_state().
+ *
* Return: %true if @p->state changes (an actual wakeup was done),
* %false otherwise.
*/
@@ -2141,8 +2136,7 @@ static void try_to_wake_up_local(struct task_struct *p, struct rq_flags *rf)
*
* Return: 1 if the process was woken up, 0 if it was already running.
*
- * It may be assumed that this function implies a write memory barrier before
- * changing the task state if and only if any tasks are woken up.
+ * This function executes a full memory barrier before accessing the task state.
*/
int wake_up_process(struct task_struct *p)
{
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index a7a2aaa3026a6..870f97b313e38 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -134,8 +134,8 @@ static void __wake_up_common_lock(struct wait_queue_head *wq_head, unsigned int
* @nr_exclusive: how many wake-one or wake-many threads to wake up
* @key: is directly passed to the wakeup function
*
- * It may be assumed that this function implies a write memory barrier before
- * changing the task state if and only if any tasks are woken up.
+ * If this function wakes up a task, it executes a full memory barrier before
+ * accessing the task state.
*/
void __wake_up(struct wait_queue_head *wq_head, unsigned int mode,
int nr_exclusive, void *key)
@@ -180,8 +180,8 @@ EXPORT_SYMBOL_GPL(__wake_up_locked_key_bookmark);
*
* On UP it can prevent extra preemption.
*
- * It may be assumed that this function implies a write memory barrier before
- * changing the task state if and only if any tasks are woken up.
+ * If this function wakes up a task, it executes a full memory barrier before
+ * accessing the task state.
*/
void __wake_up_sync_key(struct wait_queue_head *wq_head, unsigned int mode,
int nr_exclusive, void *key)
--
2.7.4


2018-06-28 17:42:54

by Andrea Parri

[permalink] [raw]
Subject: Re: [PATCH 2/3] locking: Clarify requirements for smp_mb__after_spinlock()

Hi Matthew,

On Thu, Jun 28, 2018 at 06:02:58AM -0700, Matthew Wilcox wrote:
> On Thu, Jun 28, 2018 at 12:41:19PM +0200, Andrea Parri wrote:
> > + * smp_mb__after_spinlock() provides a full memory barrier between po-earlier
> > + * lock acquisitions and po-later memory accesses.
>
> What does po-earlier mean? Partial order?

"po" was intended as an abbreviation for "program order"; I should have
kept the whole form..., thank you for the feedback.

Andrea
>

2018-06-28 19:34:43

by Andrea Parri

[permalink] [raw]
Subject: Re: [PATCH 2/3] locking: Clarify requirements for smp_mb__after_spinlock()

On Thu, Jun 28, 2018 at 05:05:50PM +0200, Peter Zijlstra wrote:
> On Thu, Jun 28, 2018 at 12:41:19PM +0200, Andrea Parri wrote:
> > - * This barrier must provide two things:
> > - *
> > - * - it must guarantee a STORE before the spin_lock() is ordered against a
> > - * LOAD after it, see the comments at its two usage sites.
> > - *
> > - * - it must ensure the critical section is RCsc.
> > - *
> > - * The latter is important for cases where we observe values written by other
> > - * CPUs in spin-loops, without barriers, while being subject to scheduling.
> > - *
> > - * CPU0 CPU1 CPU2
> > - *
> > - * for (;;) {
> > - * if (READ_ONCE(X))
> > - * break;
> > - * }
> > - * X=1
> > - * <sched-out>
> > - * <sched-in>
> > - * r = X;
> > - *
> > - * without transitivity it could be that CPU1 observes X!=0 breaks the loop,
> > - * we get migrated and CPU2 sees X==0.
>
> Please don't remove that; that explains _why_ we need a full memory
> barrier here.

Peter:

Both you and Boqun stated that the above snippet is "bad":

http://lkml.kernel.org/r/[email protected]

and I do agree with your assessment! ;-)

I've no objection to keep that comment (together with the
"clarification" suggested in this patch) _once_ replaced
that snippet with something else (say, with the snippet
Boqun suggested in:

http://lkml.kernel.org/r/20180312085600.aczjkpn73axzs2sb@tardis ):

is this what you mean?

Andrea

2018-06-28 22:19:12

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/3] locking: Clarify requirements for smp_mb__after_spinlock()

On Thu, Jun 28, 2018 at 12:41:19PM +0200, Andrea Parri wrote:
> - * This barrier must provide two things:
> - *
> - * - it must guarantee a STORE before the spin_lock() is ordered against a
> - * LOAD after it, see the comments at its two usage sites.
> - *
> - * - it must ensure the critical section is RCsc.
> - *
> - * The latter is important for cases where we observe values written by other
> - * CPUs in spin-loops, without barriers, while being subject to scheduling.
> - *
> - * CPU0 CPU1 CPU2
> - *
> - * for (;;) {
> - * if (READ_ONCE(X))
> - * break;
> - * }
> - * X=1
> - * <sched-out>
> - * <sched-in>
> - * r = X;
> - *
> - * without transitivity it could be that CPU1 observes X!=0 breaks the loop,
> - * we get migrated and CPU2 sees X==0.

Please don't remove that; that explains _why_ we need a full memory
barrier here.

If anything, move it into __schedule() to explain the
smp_mb__after_spinlock() usage there.

2018-07-02 12:53:07

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/3] locking: Clarify requirements for smp_mb__after_spinlock()

On Thu, Jun 28, 2018 at 07:30:45PM +0200, Andrea Parri wrote:
> On Thu, Jun 28, 2018 at 05:05:50PM +0200, Peter Zijlstra wrote:
> > On Thu, Jun 28, 2018 at 12:41:19PM +0200, Andrea Parri wrote:
> > > - * This barrier must provide two things:
> > > - *
> > > - * - it must guarantee a STORE before the spin_lock() is ordered against a
> > > - * LOAD after it, see the comments at its two usage sites.
> > > - *
> > > - * - it must ensure the critical section is RCsc.
> > > - *
> > > - * The latter is important for cases where we observe values written by other
> > > - * CPUs in spin-loops, without barriers, while being subject to scheduling.
> > > - *
> > > - * CPU0 CPU1 CPU2
> > > - *
> > > - * for (;;) {
> > > - * if (READ_ONCE(X))
> > > - * break;
> > > - * }
> > > - * X=1
> > > - * <sched-out>
> > > - * <sched-in>
> > > - * r = X;
> > > - *
> > > - * without transitivity it could be that CPU1 observes X!=0 breaks the loop,
> > > - * we get migrated and CPU2 sees X==0.
> >
> > Please don't remove that; that explains _why_ we need a full memory
> > barrier here.
>
> Peter:
>
> Both you and Boqun stated that the above snippet is "bad":
>
> http://lkml.kernel.org/r/[email protected]
>
> and I do agree with your assessment! ;-)

Right..

> I've no objection to keep that comment (together with the
> "clarification" suggested in this patch) _once_ replaced
> that snippet with something else (say, with the snippet
> Boqun suggested in:
>
> http://lkml.kernel.org/r/20180312085600.aczjkpn73axzs2sb@tardis ):
>
> is this what you mean?

Yes. I much prefer to explain the why for rule than to just state them.

2018-07-02 15:13:30

by Andrea Parri

[permalink] [raw]
Subject: [PATCH v2 2/3] locking: Clarify requirements for smp_mb__after_spinlock()

There are 11 interpretations of the requirements described in the header
comment for smp_mb__after_spinlock(): one for each LKMM maintainer, and
one currently encoded in the Cat file. Stick to the latter (until a more
satisfactory solution is available).

This also reworks some snippets related to the barrier to illustrate the
requirements and to link them to the idioms which are relied upon at its
call sites.

Suggested-by: Boqun Feng <[email protected]>
Signed-off-by: Andrea Parri <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
---
Changes since v1:

- reworked the snippets (Peter Zijlstra)
- style fixes (Alan Stern and Matthew Wilcox)
- added Boqun's Suggested-by: tag

include/linux/spinlock.h | 51 ++++++++++++++++++++++++++++++++----------------
kernel/sched/core.c | 41 +++++++++++++++++++-------------------
2 files changed, 55 insertions(+), 37 deletions(-)

diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 1e8a464358384..0b46efca659f9 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -114,29 +114,46 @@ do { \
#endif /*arch_spin_is_contended*/

/*
- * This barrier must provide two things:
+ * smp_mb__after_spinlock() provides the equivalent of a full memory barrier
+ * between program-order earlier lock acquisitions and program-order later
+ * memory accesses.
*
- * - it must guarantee a STORE before the spin_lock() is ordered against a
- * LOAD after it, see the comments at its two usage sites.
+ * This guarantees that the following two properties hold:
*
- * - it must ensure the critical section is RCsc.
+ * 1) Given the snippet:
*
- * The latter is important for cases where we observe values written by other
- * CPUs in spin-loops, without barriers, while being subject to scheduling.
+ * { X = 0; Y = 0; }
*
- * CPU0 CPU1 CPU2
+ * CPU0 CPU1
*
- * for (;;) {
- * if (READ_ONCE(X))
- * break;
- * }
- * X=1
- * <sched-out>
- * <sched-in>
- * r = X;
+ * WRITE_ONCE(X, 1); WRITE_ONCE(Y, 1);
+ * spin_lock(S); smp_mb();
+ * smp_mb__after_spinlock(); r1 = READ_ONCE(X);
+ * r0 = READ_ONCE(Y);
+ * spin_unlock(S);
*
- * without transitivity it could be that CPU1 observes X!=0 breaks the loop,
- * we get migrated and CPU2 sees X==0.
+ * it is forbidden that CPU0 does not observe CPU1's store to Y (r0 = 0)
+ * and CPU1 does not observe CPU0's store to X (r1 = 0); see the comments
+ * preceding the call to smp_mb__after_spinlock() in __schedule() and in
+ * try_to_wake_up().
+ *
+ * 2) Given the snippet:
+ *
+ * { X = 0; Y = 0; }
+ *
+ * CPU0 CPU1 CPU2
+ *
+ * spin_lock(S); spin_lock(S); r1 = READ_ONCE(Y);
+ * WRITE_ONCE(X, 1); smp_mb__after_spinlock(); smp_rmb();
+ * spin_unlock(S); r0 = READ_ONCE(X); r2 = READ_ONCE(X);
+ * WRITE_ONCE(Y, 1);
+ * spin_unlock(S);
+ *
+ * it is forbidden that CPU0's critical section executes before CPU1's
+ * critical section (r0 = 1), CPU2 observes CPU1's store to Y (r1 = 1)
+ * and CPU2 does not observe CPU0's store to X (r2 = 0); see the comments
+ * preceding the calls to smp_rmb() in try_to_wake_up() for similar
+ * snippets but "projected" onto two CPUs.
*
* Since most load-store architectures implement ACQUIRE with an smp_mb() after
* the LL/SC loop, they need no further barriers. Similarly all our TSO
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index da8f12119a127..ec9ef0aec71ac 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1999,21 +1999,20 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
* be possible to, falsely, observe p->on_rq == 0 and get stuck
* in smp_cond_load_acquire() below.
*
- * sched_ttwu_pending() try_to_wake_up()
- * [S] p->on_rq = 1; [L] P->state
- * UNLOCK rq->lock -----.
- * \
- * +--- RMB
- * schedule() /
- * LOCK rq->lock -----'
- * UNLOCK rq->lock
+ * sched_ttwu_pending() try_to_wake_up()
+ * STORE p->on_rq = 1 LOAD p->state
+ * UNLOCK rq->lock
+ *
+ * __schedule() (switch to task 'p')
+ * LOCK rq->lock smp_rmb();
+ * smp_mb__after_spinlock();
+ * UNLOCK rq->lock
*
* [task p]
- * [S] p->state = UNINTERRUPTIBLE [L] p->on_rq
+ * STORE p->state = UNINTERRUPTIBLE LOAD p->on_rq
*
- * Pairs with the UNLOCK+LOCK on rq->lock from the
- * last wakeup of our task and the schedule that got our task
- * current.
+ * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in
+ * __schedule(). See the comment for smp_mb__after_spinlock().
*/
smp_rmb();
if (p->on_rq && ttwu_remote(p, wake_flags))
@@ -2027,15 +2026,17 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
* One must be running (->on_cpu == 1) in order to remove oneself
* from the runqueue.
*
- * [S] ->on_cpu = 1; [L] ->on_rq
- * UNLOCK rq->lock
- * RMB
- * LOCK rq->lock
- * [S] ->on_rq = 0; [L] ->on_cpu
+ * __schedule() (switch to task 'p') try_to_wake_up()
+ * STORE p->on_cpu = 1 LOAD p->on_rq
+ * UNLOCK rq->lock
+ *
+ * __schedule() (put 'p' to sleep)
+ * LOCK rq->lock smp_rmb();
+ * smp_mb__after_spinlock();
+ * STORE p->on_rq = 0 LOAD p->on_cpu
*
- * Pairs with the full barrier implied in the UNLOCK+LOCK on rq->lock
- * from the consecutive calls to schedule(); the first switching to our
- * task, the second putting it to sleep.
+ * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in
+ * __schedule(). See the comment for smp_mb__after_spinlock().
*/
smp_rmb();

--
2.7.4


2018-07-02 15:41:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] locking: Clarify requirements for smp_mb__after_spinlock()

On Mon, Jul 02, 2018 at 05:11:55PM +0200, Andrea Parri wrote:
> /*
> + * smp_mb__after_spinlock() provides the equivalent of a full memory barrier
> + * between program-order earlier lock acquisitions and program-order later
> + * memory accesses.
> *
> + * This guarantees that the following two properties hold:
> *
> + * 1) Given the snippet:
> *
> + * { X = 0; Y = 0; }
> *
> + * CPU0 CPU1
> *
> + * WRITE_ONCE(X, 1); WRITE_ONCE(Y, 1);
> + * spin_lock(S); smp_mb();
> + * smp_mb__after_spinlock(); r1 = READ_ONCE(X);
> + * r0 = READ_ONCE(Y);
> + * spin_unlock(S);
> *
> + * it is forbidden that CPU0 does not observe CPU1's store to Y (r0 = 0)
> + * and CPU1 does not observe CPU0's store to X (r1 = 0); see the comments
> + * preceding the call to smp_mb__after_spinlock() in __schedule() and in
> + * try_to_wake_up().
> + *
> + * 2) Given the snippet:
> + *
> + * { X = 0; Y = 0; }
> + *
> + * CPU0 CPU1 CPU2
> + *
> + * spin_lock(S); spin_lock(S); r1 = READ_ONCE(Y);
> + * WRITE_ONCE(X, 1); smp_mb__after_spinlock(); smp_rmb();
> + * spin_unlock(S); r0 = READ_ONCE(X); r2 = READ_ONCE(X);
> + * WRITE_ONCE(Y, 1);
> + * spin_unlock(S);
> + *
> + * it is forbidden that CPU0's critical section executes before CPU1's
> + * critical section (r0 = 1), CPU2 observes CPU1's store to Y (r1 = 1)
> + * and CPU2 does not observe CPU0's store to X (r2 = 0); see the comments
> + * preceding the calls to smp_rmb() in try_to_wake_up() for similar
> + * snippets but "projected" onto two CPUs.

Maybe explicitly note that 2) is the RCsc lock upgrade.


> * Since most load-store architectures implement ACQUIRE with an smp_mb() after
> * the LL/SC loop, they need no further barriers. Similarly all our TSO
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index da8f12119a127..ec9ef0aec71ac 100644
> +++ b/kernel/sched/core.c
> @@ -1999,21 +1999,20 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> * be possible to, falsely, observe p->on_rq == 0 and get stuck
> * in smp_cond_load_acquire() below.
> *
> + * sched_ttwu_pending() try_to_wake_up()
> + * STORE p->on_rq = 1 LOAD p->state
> + * UNLOCK rq->lock
> + *
> + * __schedule() (switch to task 'p')
> + * LOCK rq->lock smp_rmb();
> + * smp_mb__after_spinlock();
> + * UNLOCK rq->lock
> *
> * [task p]
> + * STORE p->state = UNINTERRUPTIBLE LOAD p->on_rq
> *
> + * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in
> + * __schedule(). See the comment for smp_mb__after_spinlock().
> */
> smp_rmb();
> if (p->on_rq && ttwu_remote(p, wake_flags))
> @@ -2027,15 +2026,17 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> * One must be running (->on_cpu == 1) in order to remove oneself
> * from the runqueue.
> *
> + * __schedule() (switch to task 'p') try_to_wake_up()
> + * STORE p->on_cpu = 1 LOAD p->on_rq
> + * UNLOCK rq->lock
> + *
> + * __schedule() (put 'p' to sleep)
> + * LOCK rq->lock smp_rmb();
> + * smp_mb__after_spinlock();
> + * STORE p->on_rq = 0 LOAD p->on_cpu
> *
> + * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in
> + * __schedule(). See the comment for smp_mb__after_spinlock().
> */
> smp_rmb();

Ah yes, good.

Ack!

2018-07-03 08:50:34

by Andrea Parri

[permalink] [raw]
Subject: Re: [PATCH v2 2/3] locking: Clarify requirements for smp_mb__after_spinlock()

On Mon, Jul 02, 2018 at 05:37:35PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 02, 2018 at 05:11:55PM +0200, Andrea Parri wrote:
> > /*
> > + * smp_mb__after_spinlock() provides the equivalent of a full memory barrier
> > + * between program-order earlier lock acquisitions and program-order later
> > + * memory accesses.
> > *
> > + * This guarantees that the following two properties hold:
> > *
> > + * 1) Given the snippet:
> > *
> > + * { X = 0; Y = 0; }
> > *
> > + * CPU0 CPU1
> > *
> > + * WRITE_ONCE(X, 1); WRITE_ONCE(Y, 1);
> > + * spin_lock(S); smp_mb();
> > + * smp_mb__after_spinlock(); r1 = READ_ONCE(X);
> > + * r0 = READ_ONCE(Y);
> > + * spin_unlock(S);
> > *
> > + * it is forbidden that CPU0 does not observe CPU1's store to Y (r0 = 0)
> > + * and CPU1 does not observe CPU0's store to X (r1 = 0); see the comments
> > + * preceding the call to smp_mb__after_spinlock() in __schedule() and in
> > + * try_to_wake_up().
> > + *
> > + * 2) Given the snippet:
> > + *
> > + * { X = 0; Y = 0; }
> > + *
> > + * CPU0 CPU1 CPU2
> > + *
> > + * spin_lock(S); spin_lock(S); r1 = READ_ONCE(Y);
> > + * WRITE_ONCE(X, 1); smp_mb__after_spinlock(); smp_rmb();
> > + * spin_unlock(S); r0 = READ_ONCE(X); r2 = READ_ONCE(X);
> > + * WRITE_ONCE(Y, 1);
> > + * spin_unlock(S);
> > + *
> > + * it is forbidden that CPU0's critical section executes before CPU1's
> > + * critical section (r0 = 1), CPU2 observes CPU1's store to Y (r1 = 1)
> > + * and CPU2 does not observe CPU0's store to X (r2 = 0); see the comments
> > + * preceding the calls to smp_rmb() in try_to_wake_up() for similar
> > + * snippets but "projected" onto two CPUs.
>
> Maybe explicitly note that 2) is the RCsc lock upgrade.

Yes, I'll do a respin to add this note and the below Ack shortly.

Thanks,
Andrea


>
>
> > * Since most load-store architectures implement ACQUIRE with an smp_mb() after
> > * the LL/SC loop, they need no further barriers. Similarly all our TSO
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index da8f12119a127..ec9ef0aec71ac 100644
> > +++ b/kernel/sched/core.c
> > @@ -1999,21 +1999,20 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> > * be possible to, falsely, observe p->on_rq == 0 and get stuck
> > * in smp_cond_load_acquire() below.
> > *
> > + * sched_ttwu_pending() try_to_wake_up()
> > + * STORE p->on_rq = 1 LOAD p->state
> > + * UNLOCK rq->lock
> > + *
> > + * __schedule() (switch to task 'p')
> > + * LOCK rq->lock smp_rmb();
> > + * smp_mb__after_spinlock();
> > + * UNLOCK rq->lock
> > *
> > * [task p]
> > + * STORE p->state = UNINTERRUPTIBLE LOAD p->on_rq
> > *
> > + * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in
> > + * __schedule(). See the comment for smp_mb__after_spinlock().
> > */
> > smp_rmb();
> > if (p->on_rq && ttwu_remote(p, wake_flags))
> > @@ -2027,15 +2026,17 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> > * One must be running (->on_cpu == 1) in order to remove oneself
> > * from the runqueue.
> > *
> > + * __schedule() (switch to task 'p') try_to_wake_up()
> > + * STORE p->on_cpu = 1 LOAD p->on_rq
> > + * UNLOCK rq->lock
> > + *
> > + * __schedule() (put 'p' to sleep)
> > + * LOCK rq->lock smp_rmb();
> > + * smp_mb__after_spinlock();
> > + * STORE p->on_rq = 0 LOAD p->on_cpu
> > *
> > + * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in
> > + * __schedule(). See the comment for smp_mb__after_spinlock().
> > */
> > smp_rmb();
>
> Ah yes, good.
>
> Ack!

2018-07-03 14:55:20

by Andrea Parri

[permalink] [raw]
Subject: [PATCH v3 2/3] locking: Clarify requirements for smp_mb__after_spinlock()

There are 11 interpretations of the requirements described in the header
comment for smp_mb__after_spinlock(): one for each LKMM maintainer, and
one currently encoded in the Cat file. Stick to the latter (until a more
satisfactory solution is available).

This also reworks some snippets related to the barrier to illustrate the
requirements and to link them to the idioms which are relied upon at its
call sites.

Suggested-by: Boqun Feng <[email protected]>
Signed-off-by: Andrea Parri <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
---
Changes since v2:
- restore note about RCsc lock (Peter Zijlstra)
- add Peter's Acked-by: tag

Changes since v1:
- rework the snippets (Peter Zijlstra)
- style fixes (Alan Stern and Matthew Wilcox)
- add Boqun's Suggested-by: tag

include/linux/spinlock.h | 53 ++++++++++++++++++++++++++++++++----------------
kernel/sched/core.c | 41 +++++++++++++++++++------------------
2 files changed, 57 insertions(+), 37 deletions(-)

diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 1e8a464358384..d70a06ff2bdd2 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -114,29 +114,48 @@ do { \
#endif /*arch_spin_is_contended*/

/*
- * This barrier must provide two things:
+ * smp_mb__after_spinlock() provides the equivalent of a full memory barrier
+ * between program-order earlier lock acquisitions and program-order later
+ * memory accesses.
*
- * - it must guarantee a STORE before the spin_lock() is ordered against a
- * LOAD after it, see the comments at its two usage sites.
+ * This guarantees that the following two properties hold:
*
- * - it must ensure the critical section is RCsc.
+ * 1) Given the snippet:
*
- * The latter is important for cases where we observe values written by other
- * CPUs in spin-loops, without barriers, while being subject to scheduling.
+ * { X = 0; Y = 0; }
*
- * CPU0 CPU1 CPU2
+ * CPU0 CPU1
*
- * for (;;) {
- * if (READ_ONCE(X))
- * break;
- * }
- * X=1
- * <sched-out>
- * <sched-in>
- * r = X;
+ * WRITE_ONCE(X, 1); WRITE_ONCE(Y, 1);
+ * spin_lock(S); smp_mb();
+ * smp_mb__after_spinlock(); r1 = READ_ONCE(X);
+ * r0 = READ_ONCE(Y);
+ * spin_unlock(S);
*
- * without transitivity it could be that CPU1 observes X!=0 breaks the loop,
- * we get migrated and CPU2 sees X==0.
+ * it is forbidden that CPU0 does not observe CPU1's store to Y (r0 = 0)
+ * and CPU1 does not observe CPU0's store to X (r1 = 0); see the comments
+ * preceding the call to smp_mb__after_spinlock() in __schedule() and in
+ * try_to_wake_up().
+ *
+ * 2) Given the snippet:
+ *
+ * { X = 0; Y = 0; }
+ *
+ * CPU0 CPU1 CPU2
+ *
+ * spin_lock(S); spin_lock(S); r1 = READ_ONCE(Y);
+ * WRITE_ONCE(X, 1); smp_mb__after_spinlock(); smp_rmb();
+ * spin_unlock(S); r0 = READ_ONCE(X); r2 = READ_ONCE(X);
+ * WRITE_ONCE(Y, 1);
+ * spin_unlock(S);
+ *
+ * it is forbidden that CPU0's critical section executes before CPU1's
+ * critical section (r0 = 1), CPU2 observes CPU1's store to Y (r1 = 1)
+ * and CPU2 does not observe CPU0's store to X (r2 = 0); see the comments
+ * preceding the calls to smp_rmb() in try_to_wake_up() for similar
+ * snippets but "projected" onto two CPUs.
+ *
+ * Property (2) upgrades the lock to an RCsc lock.
*
* Since most load-store architectures implement ACQUIRE with an smp_mb() after
* the LL/SC loop, they need no further barriers. Similarly all our TSO
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index da8f12119a127..ec9ef0aec71ac 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1999,21 +1999,20 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
* be possible to, falsely, observe p->on_rq == 0 and get stuck
* in smp_cond_load_acquire() below.
*
- * sched_ttwu_pending() try_to_wake_up()
- * [S] p->on_rq = 1; [L] P->state
- * UNLOCK rq->lock -----.
- * \
- * +--- RMB
- * schedule() /
- * LOCK rq->lock -----'
- * UNLOCK rq->lock
+ * sched_ttwu_pending() try_to_wake_up()
+ * STORE p->on_rq = 1 LOAD p->state
+ * UNLOCK rq->lock
+ *
+ * __schedule() (switch to task 'p')
+ * LOCK rq->lock smp_rmb();
+ * smp_mb__after_spinlock();
+ * UNLOCK rq->lock
*
* [task p]
- * [S] p->state = UNINTERRUPTIBLE [L] p->on_rq
+ * STORE p->state = UNINTERRUPTIBLE LOAD p->on_rq
*
- * Pairs with the UNLOCK+LOCK on rq->lock from the
- * last wakeup of our task and the schedule that got our task
- * current.
+ * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in
+ * __schedule(). See the comment for smp_mb__after_spinlock().
*/
smp_rmb();
if (p->on_rq && ttwu_remote(p, wake_flags))
@@ -2027,15 +2026,17 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
* One must be running (->on_cpu == 1) in order to remove oneself
* from the runqueue.
*
- * [S] ->on_cpu = 1; [L] ->on_rq
- * UNLOCK rq->lock
- * RMB
- * LOCK rq->lock
- * [S] ->on_rq = 0; [L] ->on_cpu
+ * __schedule() (switch to task 'p') try_to_wake_up()
+ * STORE p->on_cpu = 1 LOAD p->on_rq
+ * UNLOCK rq->lock
+ *
+ * __schedule() (put 'p' to sleep)
+ * LOCK rq->lock smp_rmb();
+ * smp_mb__after_spinlock();
+ * STORE p->on_rq = 0 LOAD p->on_cpu
*
- * Pairs with the full barrier implied in the UNLOCK+LOCK on rq->lock
- * from the consecutive calls to schedule(); the first switching to our
- * task, the second putting it to sleep.
+ * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in
+ * __schedule(). See the comment for smp_mb__after_spinlock().
*/
smp_rmb();

--
2.7.4


2018-07-03 15:39:22

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] locking: Clarify requirements for smp_mb__after_spinlock()

On Tue, Jul 03, 2018 at 04:53:59PM +0200, Andrea Parri wrote:
> There are 11 interpretations of the requirements described in the header
> comment for smp_mb__after_spinlock(): one for each LKMM maintainer, and
> one currently encoded in the Cat file. Stick to the latter (until a more
> satisfactory solution is available).
>
> This also reworks some snippets related to the barrier to illustrate the
> requirements and to link them to the idioms which are relied upon at its
> call sites.
>
> Suggested-by: Boqun Feng <[email protected]>
> Signed-off-by: Andrea Parri <[email protected]>
> Acked-by: Peter Zijlstra <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: "Paul E. McKenney" <[email protected]>

Looks good, a couple of changes suggested below.

Thanx, Paul

> ---
> Changes since v2:
> - restore note about RCsc lock (Peter Zijlstra)
> - add Peter's Acked-by: tag
>
> Changes since v1:
> - rework the snippets (Peter Zijlstra)
> - style fixes (Alan Stern and Matthew Wilcox)
> - add Boqun's Suggested-by: tag
>
> include/linux/spinlock.h | 53 ++++++++++++++++++++++++++++++++----------------
> kernel/sched/core.c | 41 +++++++++++++++++++------------------
> 2 files changed, 57 insertions(+), 37 deletions(-)
>
> diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
> index 1e8a464358384..d70a06ff2bdd2 100644
> --- a/include/linux/spinlock.h
> +++ b/include/linux/spinlock.h
> @@ -114,29 +114,48 @@ do { \
> #endif /*arch_spin_is_contended*/
>
> /*
> - * This barrier must provide two things:
> + * smp_mb__after_spinlock() provides the equivalent of a full memory barrier
> + * between program-order earlier lock acquisitions and program-order later

Not just the earlier lock acquisition, but also all program-order earlier
memory accesses, correct?

> + * memory accesses.
> *
> - * - it must guarantee a STORE before the spin_lock() is ordered against a
> - * LOAD after it, see the comments at its two usage sites.
> + * This guarantees that the following two properties hold:
> *
> - * - it must ensure the critical section is RCsc.
> + * 1) Given the snippet:
> *
> - * The latter is important for cases where we observe values written by other
> - * CPUs in spin-loops, without barriers, while being subject to scheduling.
> + * { X = 0; Y = 0; }
> *
> - * CPU0 CPU1 CPU2
> + * CPU0 CPU1
> *
> - * for (;;) {
> - * if (READ_ONCE(X))
> - * break;
> - * }
> - * X=1
> - * <sched-out>
> - * <sched-in>
> - * r = X;
> + * WRITE_ONCE(X, 1); WRITE_ONCE(Y, 1);
> + * spin_lock(S); smp_mb();
> + * smp_mb__after_spinlock(); r1 = READ_ONCE(X);
> + * r0 = READ_ONCE(Y);
> + * spin_unlock(S);
> *
> - * without transitivity it could be that CPU1 observes X!=0 breaks the loop,
> - * we get migrated and CPU2 sees X==0.
> + * it is forbidden that CPU0 does not observe CPU1's store to Y (r0 = 0)
> + * and CPU1 does not observe CPU0's store to X (r1 = 0); see the comments
> + * preceding the call to smp_mb__after_spinlock() in __schedule() and in
> + * try_to_wake_up().

Should we say that this is an instance of the SB pattern? (Am OK either
way, just asking the question.)

> + *
> + * 2) Given the snippet:
> + *
> + * { X = 0; Y = 0; }
> + *
> + * CPU0 CPU1 CPU2
> + *
> + * spin_lock(S); spin_lock(S); r1 = READ_ONCE(Y);
> + * WRITE_ONCE(X, 1); smp_mb__after_spinlock(); smp_rmb();
> + * spin_unlock(S); r0 = READ_ONCE(X); r2 = READ_ONCE(X);
> + * WRITE_ONCE(Y, 1);
> + * spin_unlock(S);
> + *
> + * it is forbidden that CPU0's critical section executes before CPU1's
> + * critical section (r0 = 1), CPU2 observes CPU1's store to Y (r1 = 1)
> + * and CPU2 does not observe CPU0's store to X (r2 = 0); see the comments
> + * preceding the calls to smp_rmb() in try_to_wake_up() for similar
> + * snippets but "projected" onto two CPUs.
> + *
> + * Property (2) upgrades the lock to an RCsc lock.
> *
> * Since most load-store architectures implement ACQUIRE with an smp_mb() after
> * the LL/SC loop, they need no further barriers. Similarly all our TSO
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index da8f12119a127..ec9ef0aec71ac 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1999,21 +1999,20 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> * be possible to, falsely, observe p->on_rq == 0 and get stuck
> * in smp_cond_load_acquire() below.
> *
> - * sched_ttwu_pending() try_to_wake_up()
> - * [S] p->on_rq = 1; [L] P->state
> - * UNLOCK rq->lock -----.
> - * \
> - * +--- RMB
> - * schedule() /
> - * LOCK rq->lock -----'
> - * UNLOCK rq->lock
> + * sched_ttwu_pending() try_to_wake_up()
> + * STORE p->on_rq = 1 LOAD p->state
> + * UNLOCK rq->lock
> + *
> + * __schedule() (switch to task 'p')
> + * LOCK rq->lock smp_rmb();
> + * smp_mb__after_spinlock();
> + * UNLOCK rq->lock
> *
> * [task p]
> - * [S] p->state = UNINTERRUPTIBLE [L] p->on_rq
> + * STORE p->state = UNINTERRUPTIBLE LOAD p->on_rq
> *
> - * Pairs with the UNLOCK+LOCK on rq->lock from the
> - * last wakeup of our task and the schedule that got our task
> - * current.
> + * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in
> + * __schedule(). See the comment for smp_mb__after_spinlock().
> */
> smp_rmb();
> if (p->on_rq && ttwu_remote(p, wake_flags))
> @@ -2027,15 +2026,17 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> * One must be running (->on_cpu == 1) in order to remove oneself
> * from the runqueue.
> *
> - * [S] ->on_cpu = 1; [L] ->on_rq
> - * UNLOCK rq->lock
> - * RMB
> - * LOCK rq->lock
> - * [S] ->on_rq = 0; [L] ->on_cpu
> + * __schedule() (switch to task 'p') try_to_wake_up()
> + * STORE p->on_cpu = 1 LOAD p->on_rq
> + * UNLOCK rq->lock
> + *
> + * __schedule() (put 'p' to sleep)
> + * LOCK rq->lock smp_rmb();
> + * smp_mb__after_spinlock();
> + * STORE p->on_rq = 0 LOAD p->on_cpu
> *
> - * Pairs with the full barrier implied in the UNLOCK+LOCK on rq->lock
> - * from the consecutive calls to schedule(); the first switching to our
> - * task, the second putting it to sleep.
> + * Pairs with the LOCK+smp_mb__after_spinlock() on rq->lock in
> + * __schedule(). See the comment for smp_mb__after_spinlock().
> */
> smp_rmb();
>
> --
> 2.7.4
>


2018-07-03 17:09:41

by Andrea Parri

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] locking: Clarify requirements for smp_mb__after_spinlock()

On Tue, Jul 03, 2018 at 08:39:10AM -0700, Paul E. McKenney wrote:

[...]

> > + * smp_mb__after_spinlock() provides the equivalent of a full memory barrier
> > + * between program-order earlier lock acquisitions and program-order later
>
> Not just the earlier lock acquisition, but also all program-order earlier
> memory accesses, correct?

I understand: "but also all program-order earlier memory accesses program-order
before that lock acquisition(s) ...". Yes, but:

- I considered this as implied by the above (L ->mb M2 and M1 ->po L implies
M1 ->mb M2, where M1, M2 are memory accesses and L is a lock acquisition);

- my prose abilities are limited ;-), and I was/am unable to come up with an
(to me) acceptable or readable enough way to make it explicit; some ideas?


> > + * WRITE_ONCE(X, 1); WRITE_ONCE(Y, 1);
> > + * spin_lock(S); smp_mb();
> > + * smp_mb__after_spinlock(); r1 = READ_ONCE(X);
> > + * r0 = READ_ONCE(Y);
> > + * spin_unlock(S);
>
> Should we say that this is an instance of the SB pattern? (Am OK either
> way, just asking the question.)

I don't think we *should* ;-), but I'm also OK either way.

Andrea

2018-07-05 22:29:48

by Andrea Parri

[permalink] [raw]
Subject: Re: [PATCH 0/3] sched/locking/doc: Miscellaneous fixes

On Thu, Jun 28, 2018 at 12:41:17PM +0200, Andrea Parri wrote:
> Hi,
>
> This is a follow-up on the discussion started in [1].
>
> Applies on -rcu/dev.
>
> Cheers,
> Andrea
>
> [1] http://lkml.kernel.org/r/1529918258-7295-1-git-send-email-andrea.parri@amarulasolutions.com
>
> Andrea Parri (3):
> sched: Use smp_mb() in wake_woken_function()
> locking: Clarify requirements for smp_mb__after_spinlock()
> doc: Update wake_up() & co. memory-barrier guarantees

What is the plan for this series?

We are currently discussing locking changes that might affect 2/3, but
I'd still consider this patch and the overall series as an improvement
over the current status (1/3 is a fix to the code) and I'd really like
future changes to be based on this series...

I expected this series to go via "-rcu -> tip -> ..."; 2/3 had Peter's
Ack, but 1/3 and 3/3 are missing any comments (or tags): please let me
know if any action on the series is required from me (the series still
applies on the latest "dev" branch of -rcu).

Andrea


>
> Documentation/memory-barriers.txt | 43 ++++++++++++++++++------------
> include/linux/sched.h | 4 +--
> include/linux/spinlock.h | 25 ++----------------
> kernel/sched/completion.c | 8 +++---
> kernel/sched/core.c | 30 +++++++++------------
> kernel/sched/wait.c | 55 ++++++++++++++++++---------------------
> 6 files changed, 72 insertions(+), 93 deletions(-)
>
> --
> 2.7.4
>

2018-07-06 10:38:42

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/3] sched/locking/doc: Miscellaneous fixes

On Fri, Jul 06, 2018 at 12:28:12AM +0200, Andrea Parri wrote:
> What is the plan for this series?
>
> We are currently discussing locking changes that might affect 2/3, but
> I'd still consider this patch and the overall series as an improvement
> over the current status (1/3 is a fix to the code) and I'd really like
> future changes to be based on this series...
>
> I expected this series to go via "-rcu -> tip -> ..."; 2/3 had Peter's
> Ack, but 1/3 and 3/3 are missing any comments (or tags): please let me
> know if any action on the series is required from me (the series still
> applies on the latest "dev" branch of -rcu).

You can add my ack:

Acked-by: Peter Zijlstra (Intel) <[email protected]>

to the others as well, they look like a definite improvement. I'm fine
with these going through Paul's tree, Ingo?

2018-07-06 14:42:22

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 0/3] sched/locking/doc: Miscellaneous fixes

On Fri, Jul 06, 2018 at 12:36:11PM +0200, Peter Zijlstra wrote:
> On Fri, Jul 06, 2018 at 12:28:12AM +0200, Andrea Parri wrote:
> > What is the plan for this series?
> >
> > We are currently discussing locking changes that might affect 2/3, but
> > I'd still consider this patch and the overall series as an improvement
> > over the current status (1/3 is a fix to the code) and I'd really like
> > future changes to be based on this series...
> >
> > I expected this series to go via "-rcu -> tip -> ..."; 2/3 had Peter's
> > Ack, but 1/3 and 3/3 are missing any comments (or tags): please let me
> > know if any action on the series is required from me (the series still
> > applies on the latest "dev" branch of -rcu).
>
> You can add my ack:
>
> Acked-by: Peter Zijlstra (Intel) <[email protected]>
>
> to the others as well, they look like a definite improvement. I'm fine
> with these going through Paul's tree, Ingo?

I have pulled them in and will submit them to the upcoming merge window,
thank you all! If someone else would prefer to take them, just let
me know.

Thanx, Paul