2015-11-02 13:53:33

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH 2/4] sched: Document Program-Order guarantees

These are some notes on the scheduler locking and how it provides
program order guarantees on SMP systems.

Cc: Linus Torvalds <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: David Howells <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
kernel/sched/core.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 142 insertions(+)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1905,6 +1905,148 @@ static void ttwu_queue(struct task_struc
raw_spin_unlock(&rq->lock);
}

+/*
+ * Notes on Program-Order guarantees on SMP systems.
+ *
+ *
+ * PREEMPTION/MIGRATION
+ *
+ * Regular preemption/migration is safe because as long as the task is runnable
+ * migrations involve both rq locks, albeit not (necessarily) at the same time.
+ *
+ * So we get (we allow 3 CPU migrations):
+ *
+ * CPU0 CPU1 CPU2
+ *
+ * LOCK rq(0)->lock
+ * sched-out X
+ * sched-in Y
+ * UNLOCK rq(0)->lock
+ *
+ * LOCK rq(0)->lock // MB against CPU0
+ * dequeue X
+ * UNLOCK rq(0)->lock
+ *
+ * LOCK rq(1)->lock
+ * enqueue X
+ * UNLOCK rq(1)->lock
+ *
+ * LOCK rq(1)->lock // MB against CPU2
+ * sched-out Z
+ * sched-in X
+ * UNLOCK rq(1)->lock
+ *
+ * and the first LOCK rq(0) on CPU2 gives a full order against the UNLOCK rq(0)
+ * on CPU0. Similarly the LOCK rq(1) on CPU1 provides full order against the
+ * UNLOCK rq(1) on CPU2, therefore by the time task X runs on CPU1 it must
+ * observe the state it left behind on CPU0.
+ *
+ *
+ * BLOCKING -- aka. SLEEP + WAKEUP
+ *
+ * For blocking things are a little more interesting, because when we dequeue
+ * the task, we don't need to acquire the old rq lock in order to migrate it.
+ *
+ * Say CPU0 does a wait_event() and CPU1 does the wake() and migrates the task
+ * to CPU2 (the most complex example):
+ *
+ * CPU0 (schedule) CPU1 (try_to_wake_up) CPU2 (sched_ttwu_pending)
+ *
+ * X->state = UNINTERRUPTIBLE
+ * MB
+ * if (cond)
+ * break
+ * cond = true
+ *
+ * LOCK rq(0)->lock LOCK X->pi_lock
+ *
+ * dequeue X
+ * while (X->on_cpu)
+ * cpu_relax()
+ * sched-out X
+ * RELEASE
+ * X->on_cpu = 0
+ * RMB
+ * X->state = WAKING
+ * set_task_cpu(X,2)
+ * WMB
+ * ti(X)->cpu = 2
+ *
+ * llist_add(X, rq(2)) // MB
+ * llist_del_all() // MB
+ *
+ * LOCK rq(2)->lock
+ * enqueue X
+ * X->state = RUNNING
+ * UNLOCK rq(2)->lock
+ *
+ * LOCK rq(2)->lock
+ * sched-out Z
+ * sched-in X
+ * UNLOCK rq(1)->lock
+ *
+ * if (cond) // _TRUE_
+ * UNLOCK X->pi_lock
+ * UNLOCK rq(0)->lock
+ *
+ * So in this case the scheduler does not provide an obvious full barrier; but
+ * the smp_store_release() in finish_lock_switch(), paired with the control-dep
+ * and smp_rmb() in try_to_wake_up() form a release-acquire pair and fully
+ * order things between CPU0 and CPU1.
+ *
+ * The llist primitives order things between CPU1 and CPU2 -- the alternative
+ * is CPU1 doing the remote enqueue (the alternative path in ttwu_queue()) in
+ * which case the rq(2)->lock release/acquire will order things between them.
+ *
+ * Which again leads to the guarantee that by the time X gets to run on CPU2
+ * it must observe the state it left behind on CPU0.
+ *
+ * However; for blocking there is a second guarantee we must provide, namely we
+ * must observe the state that lead to our wakeup. That is, not only must X
+ * observe its own prior state, it must also observe the @cond store.
+ *
+ * This too is achieved in the above, since CPU1 does the waking, we only need
+ * the ordering between CPU1 and CPU2, which is the same as the above.
+ *
+ *
+ * There is however a much more interesting case for this guarantee, where X
+ * never makes it off CPU0:
+ *
+ * CPU0 (schedule) CPU1 (try_to_wake_up)
+ *
+ * X->state = UNINTERRUPTIBLE
+ * MB
+ * if (cond)
+ * break
+ * cond = true
+ *
+ * WMB (aka smp_mb__before_spinlock)
+ * LOCK X->pi_lock
+ *
+ * if (X->on_rq)
+ * LOCK rq(0)->lock
+ * X->state = RUNNING
+ * UNLOCK rq(0)->lock
+ *
+ * LOCK rq(0)->lock // MB against CPU1
+ * UNLOCK rq(0)->lock
+ *
+ * if (cond) // _TRUE_
+ *
+ * UNLOCK X->pi_lock
+ *
+ * Here our task X never quite leaves CPU0, the wakeup happens before we can
+ * dequeue and schedule someone else. In this case we must still observe cond
+ * after our call to schedule() completes.
+ *
+ * This is achieved by the smp_mb__before_spinlock() WMB which ensures the store
+ * cannot leak inside the LOCK, and LOCK rq(0)->lock on CPU0 provides full order
+ * against the UNLOCK rq(0)->lock from CPU1. Furthermore our load of cond cannot
+ * happen before this same LOCK.
+ *
+ * Therefore, again, we're good.
+ */
+
/**
* try_to_wake_up - wake up a thread
* @p: the thread to be awakened


2015-11-02 20:27:39

by Paul Turner

[permalink] [raw]
Subject: Re: [PATCH 2/4] sched: Document Program-Order guarantees

On Mon, Nov 2, 2015 at 5:29 AM, Peter Zijlstra <[email protected]> wrote:
> These are some notes on the scheduler locking and how it provides
> program order guarantees on SMP systems.
>
> Cc: Linus Torvalds <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: "Paul E. McKenney" <[email protected]>
> Cc: Jonathan Corbet <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: David Howells <[email protected]>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> ---
> kernel/sched/core.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 142 insertions(+)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1905,6 +1905,148 @@ static void ttwu_queue(struct task_struc
> raw_spin_unlock(&rq->lock);
> }
>
> +/*
> + * Notes on Program-Order guarantees on SMP systems.
> + *
> + *
> + * PREEMPTION/MIGRATION
> + *
> + * Regular preemption/migration is safe because as long as the task is runnable
> + * migrations involve both rq locks, albeit not (necessarily) at the same time.
> + *
> + * So we get (we allow 3 CPU migrations):
> + *
> + * CPU0 CPU1 CPU2
> + *
> + * LOCK rq(0)->lock
> + * sched-out X
> + * sched-in Y
> + * UNLOCK rq(0)->lock
> + *
> + * LOCK rq(0)->lock // MB against CPU0
> + * dequeue X
> + * UNLOCK rq(0)->lock
> + *
> + * LOCK rq(1)->lock
> + * enqueue X
> + * UNLOCK rq(1)->lock
> + *
> + * LOCK rq(1)->lock // MB against CPU2
> + * sched-out Z
> + * sched-in X
> + * UNLOCK rq(1)->lock
> + *
> + * and the first LOCK rq(0) on CPU2 gives a full order against the UNLOCK rq(0)
> + * on CPU0. Similarly the LOCK rq(1) on CPU1 provides full order against the
> + * UNLOCK rq(1) on CPU2, therefore by the time task X runs on CPU1 it must
> + * observe the state it left behind on CPU0.
> + *

I suspect this part might be more explicitly expressed by specifying
the requirements that migration satisfies; then providing an example.
This makes it easier for others to reason about the locks and saves
worrying about whether the examples hit our 3 million sub-cases.

I'd also propose just dropping preemption from this part, we only need
memory order to be correct on migration, whether it's scheduled or not
[it also invites confusion with the wake-up case].

Something like:
When any task 't' migrates, all activity on its prior cpu [c1] is
guaranteed to be happens-before any subsequent execution on its new
cpu [c2]. There are 3 components to enforcing this.

[c1] 1) Sched-out of t requires rq(c1)->lock
[any cpu] 2) Any migration of t, by any cpu is required to synchronize
on *both* rq(c1)->lock and rq(c2)->lock
[c2] 3) Sched-in of t requires cq(c2)->lock

Transitivity guarantees that (2) orders after (1) and (3) after (2).
Note that in some cases (e.g. active, or idle cpu) the balancing cpu
in (2) may be c1 or c2.

[Follow example]


> + *
> + * BLOCKING -- aka. SLEEP + WAKEUP
> + *
> + * For blocking things are a little more interesting, because when we dequeue
> + * the task, we don't need to acquire the old rq lock in order to migrate it.
> + *
> + * Say CPU0 does a wait_event() and CPU1 does the wake() and migrates the task
> + * to CPU2 (the most complex example):
> + *
> + * CPU0 (schedule) CPU1 (try_to_wake_up) CPU2 (sched_ttwu_pending)
> + *
> + * X->state = UNINTERRUPTIBLE
> + * MB
> + * if (cond)
> + * break
> + * cond = true
> + *
> + * LOCK rq(0)->lock LOCK X->pi_lock
> + *
> + * dequeue X
> + * while (X->on_cpu)
> + * cpu_relax()
> + * sched-out X
> + * RELEASE
> + * X->on_cpu = 0
> + * RMB
> + * X->state = WAKING
> + * set_task_cpu(X,2)
> + * WMB
> + * ti(X)->cpu = 2
> + *
> + * llist_add(X, rq(2)) // MB
> + * llist_del_all() // MB
> + *
> + * LOCK rq(2)->lock
> + * enqueue X
> + * X->state = RUNNING
> + * UNLOCK rq(2)->lock
> + *
> + * LOCK rq(2)->lock
> + * sched-out Z
> + * sched-in X
> + * UNLOCK rq(1)->lock
> + *
> + * if (cond) // _TRUE_
> + * UNLOCK X->pi_lock
> + * UNLOCK rq(0)->lock
> + *
> + * So in this case the scheduler does not provide an obvious full barrier; but
> + * the smp_store_release() in finish_lock_switch(), paired with the control-dep
> + * and smp_rmb() in try_to_wake_up() form a release-acquire pair and fully
> + * order things between CPU0 and CPU1.
> + *
> + * The llist primitives order things between CPU1 and CPU2 -- the alternative
> + * is CPU1 doing the remote enqueue (the alternative path in ttwu_queue()) in
> + * which case the rq(2)->lock release/acquire will order things between them.

I'd vote for similar treatment here. We can make the dependency on
on_cpu much more obvious, something that is a reasonably subtle detail
today.

> + *
> + * Which again leads to the guarantee that by the time X gets to run on CPU2
> + * it must observe the state it left behind on CPU0.
> + *
> + * However; for blocking there is a second guarantee we must provide, namely we
> + * must observe the state that lead to our wakeup. That is, not only must X
> + * observe its own prior state, it must also observe the @cond store.
> + *
> + * This too is achieved in the above, since CPU1 does the waking, we only need
> + * the ordering between CPU1 and CPU2, which is the same as the above.
> + *
> + *
> + * There is however a much more interesting case for this guarantee, where X
> + * never makes it off CPU0:
> + *
> + * CPU0 (schedule) CPU1 (try_to_wake_up)
> + *
> + * X->state = UNINTERRUPTIBLE
> + * MB
> + * if (cond)
> + * break
> + * cond = true
> + *
> + * WMB (aka smp_mb__before_spinlock)
> + * LOCK X->pi_lock
> + *
> + * if (X->on_rq)
> + * LOCK rq(0)->lock
> + * X->state = RUNNING
> + * UNLOCK rq(0)->lock
> + *
> + * LOCK rq(0)->lock // MB against CPU1
> + * UNLOCK rq(0)->lock
> + *
> + * if (cond) // _TRUE_
> + *
> + * UNLOCK X->pi_lock
> + *
> + * Here our task X never quite leaves CPU0, the wakeup happens before we can
> + * dequeue and schedule someone else. In this case we must still observe cond
> + * after our call to schedule() completes.
> + *
> + * This is achieved by the smp_mb__before_spinlock() WMB which ensures the store
> + * cannot leak inside the LOCK, and LOCK rq(0)->lock on CPU0 provides full order
> + * against the UNLOCK rq(0)->lock from CPU1. Furthermore our load of cond cannot
> + * happen before this same LOCK.
> + *
> + * Therefore, again, we're good.
> + */
> +
> /**
> * try_to_wake_up - wake up a thread
> * @p: the thread to be awakened
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2015-11-02 20:34:53

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/4] sched: Document Program-Order guarantees

On Mon, Nov 02, 2015 at 12:27:05PM -0800, Paul Turner wrote:
> I suspect this part might be more explicitly expressed by specifying
> the requirements that migration satisfies; then providing an example.
> This makes it easier for others to reason about the locks and saves
> worrying about whether the examples hit our 3 million sub-cases.
>
> I'd also propose just dropping preemption from this part, we only need
> memory order to be correct on migration, whether it's scheduled or not
> [it also invites confusion with the wake-up case].
>
> Something like:
> When any task 't' migrates, all activity on its prior cpu [c1] is
> guaranteed to be happens-before any subsequent execution on its new
> cpu [c2]. There are 3 components to enforcing this.
>
> [c1] 1) Sched-out of t requires rq(c1)->lock
> [any cpu] 2) Any migration of t, by any cpu is required to synchronize
> on *both* rq(c1)->lock and rq(c2)->lock
> [c2] 3) Sched-in of t requires cq(c2)->lock
>
> Transitivity guarantees that (2) orders after (1) and (3) after (2).
> Note that in some cases (e.g. active, or idle cpu) the balancing cpu
> in (2) may be c1 or c2.
>
> [Follow example]

Make sense, I'll try and reword things like that.

Note that in don't actually need the strong transitivity here (RCsc),
weak transitivity (RCpc) is in fact sufficient.

2015-11-02 22:09:52

by Paul Turner

[permalink] [raw]
Subject: Re: [PATCH 2/4] sched: Document Program-Order guarantees

On Mon, Nov 2, 2015 at 12:34 PM, Peter Zijlstra <[email protected]> wrote:
> On Mon, Nov 02, 2015 at 12:27:05PM -0800, Paul Turner wrote:
>> I suspect this part might be more explicitly expressed by specifying
>> the requirements that migration satisfies; then providing an example.
>> This makes it easier for others to reason about the locks and saves
>> worrying about whether the examples hit our 3 million sub-cases.
>>
>> I'd also propose just dropping preemption from this part, we only need
>> memory order to be correct on migration, whether it's scheduled or not
>> [it also invites confusion with the wake-up case].
>>
>> Something like:
>> When any task 't' migrates, all activity on its prior cpu [c1] is
>> guaranteed to be happens-before any subsequent execution on its new
>> cpu [c2]. There are 3 components to enforcing this.
>>
>> [c1] 1) Sched-out of t requires rq(c1)->lock
>> [any cpu] 2) Any migration of t, by any cpu is required to synchronize
>> on *both* rq(c1)->lock and rq(c2)->lock
>> [c2] 3) Sched-in of t requires cq(c2)->lock
>>
>> Transitivity guarantees that (2) orders after (1) and (3) after (2).
>> Note that in some cases (e.g. active, or idle cpu) the balancing cpu
>> in (2) may be c1 or c2.
>>
>> [Follow example]
>
> Make sense, I'll try and reword things like that.
>
> Note that in don't actually need the strong transitivity here (RCsc),
> weak transitivity (RCpc) is in fact sufficient.

Yeah, I thought about just using acquire/release to talk about the
interplay, in particular with the release in (1) in release and
acquire from (3) which would make some of this much more explicit and
highlight that we only need RCpc. We have not been very consistent at
using this terminology, although this could be a good starting point.

If we went this route, we could do something like:

+ * So in this case the scheduler does not provide an obvious full barrier; but
+ * the smp_store_release() in finish_lock_switch(), paired with the control-dep
+ * and smp_rmb() in try_to_wake_up() form a release-acquire pair and fully
+ * order things between CPU0 and CPU1.

Instead of having this, which is complete, but hard to synchronize
with the points at which it actually matters. Just use acquire and
release above, then at the actual site, e.g. in try_to_wake_up()
document how we deliver the acquire required by the higher level
documentation/requirements.

This makes it easier to maintain the stupidly racy documentation
consistency property in the future.

2015-11-02 22:12:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/4] sched: Document Program-Order guarantees

On Mon, Nov 02, 2015 at 02:09:20PM -0800, Paul Turner wrote:
> If we went this route, we could do something like:
>
> + * So in this case the scheduler does not provide an obvious full barrier; but
> + * the smp_store_release() in finish_lock_switch(), paired with the control-dep
> + * and smp_rmb() in try_to_wake_up() form a release-acquire pair and fully
> + * order things between CPU0 and CPU1.
>
> Instead of having this, which is complete, but hard to synchronize
> with the points at which it actually matters. Just use acquire and
> release above, then at the actual site, e.g. in try_to_wake_up()
> document how we deliver the acquire required by the higher level
> documentation/requirements.

Right, which was most of the point of trying to introduce
smp_cond_acquire(), abstract out the tricky cond-dep and rmb trickery
so we can indeed talk about release+acquire like normal people ;-)

2015-11-20 10:02:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/4] sched: Document Program-Order guarantees

On Mon, Nov 02, 2015 at 12:27:05PM -0800, Paul Turner wrote:
> I suspect this part might be more explicitly expressed by specifying
> the requirements that migration satisfies; then providing an example.
> This makes it easier for others to reason about the locks and saves
> worrying about whether the examples hit our 3 million sub-cases.

Something like so then? Note that this patch (obviously) comes after
introducing smp_cond_acquire(), simplifying the RELEASE/ACQUIRE on
p->on_cpu.

---
Subject: sched: Document Program-Order guarantees
From: Peter Zijlstra <[email protected]>
Date: Tue Nov 17 19:01:11 CET 2015

These are some notes on the scheduler locking and how it provides
program order guarantees on SMP systems.

Cc: "Paul E. McKenney" <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: David Howells <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
kernel/sched/core.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 91 insertions(+)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1916,6 +1916,97 @@ static void ttwu_queue(struct task_struc
raw_spin_unlock(&rq->lock);
}

+/*
+ * Notes on Program-Order guarantees on SMP systems.
+ *
+ * MIGRATION
+ *
+ * The basic program-order guarantee on SMP systems is that when a task [t]
+ * migrates, all its activity on its old cpu [c0] happens-before any subsequent
+ * execution on its new cpu [c1].
+ *
+ * For migration (of runnable tasks) this is provided by the following means:
+ *
+ * A) UNLOCK of the rq(c0)->lock scheduling out task t
+ * B) migration for t is required to synchronize *both* rq(c0)->lock and
+ * rq(c1)->lock (if not at the same time, then in that order).
+ * C) LOCK of the rq(c1)->lock scheduling in task
+ *
+ * Transitivity guarantees that B happens after A and C after B.
+ * Note: we only require RCpc transitivity.
+ * Note: the cpu doing B need not be c0 or c1
+ *
+ * Example:
+ *
+ * CPU0 CPU1 CPU2
+ *
+ * LOCK rq(0)->lock
+ * sched-out X
+ * sched-in Y
+ * UNLOCK rq(0)->lock
+ *
+ * LOCK rq(0)->lock // orders against CPU0
+ * dequeue X
+ * UNLOCK rq(0)->lock
+ *
+ * LOCK rq(1)->lock
+ * enqueue X
+ * UNLOCK rq(1)->lock
+ *
+ * LOCK rq(1)->lock // orders against CPU2
+ * sched-out Z
+ * sched-in X
+ * UNLOCK rq(1)->lock
+ *
+ *
+ * BLOCKING -- aka. SLEEP + WAKEUP
+ *
+ * For blocking we (obviously) need to provide the same guarantee as for
+ * migration. However the means are completely different as there is no lock
+ * chain to provide order. Instead we do:
+ *
+ * 1) smp_store_release(X->on_cpu, 0)
+ * 2) smp_cond_acquire(!X->on_cpu)
+ *
+ * Example:
+ *
+ * CPU0 (schedule) CPU1 (try_to_wake_up) CPU2 (schedule)
+ *
+ * LOCK rq(0)->lock LOCK X->pi_lock
+ * dequeue X
+ * sched-out X
+ * smp_store_release(X->on_cpu, 0);
+ *
+ * smp_cond_acquire(!X->on_cpu);
+ * X->state = WAKING
+ * set_task_cpu(X,2)
+ *
+ * LOCK rq(2)->lock
+ * enqueue X
+ * X->state = RUNNING
+ * UNLOCK rq(2)->lock
+ *
+ * LOCK rq(2)->lock // orders against CPU1
+ * sched-out Z
+ * sched-in X
+ * UNLOCK rq(1)->lock
+ *
+ * UNLOCK X->pi_lock
+ * UNLOCK rq(0)->lock
+ *
+ *
+ * However; for wakeups there is a second guarantee we must provide, namely we
+ * must observe the state that lead to our wakeup. That is, not only must our
+ * task observe its own prior state, it must also observe the stores prior to
+ * its wakeup.
+ *
+ * This means that any means of doing remote wakeups must order the CPU doing
+ * the wakeup against the CPU the task is going to end up running on. This,
+ * however, is already required for the regular Program-Order guarantee above,
+ * since the waking CPU is the one issueing the ACQUIRE (2).
+ *
+ */
+
/**
* try_to_wake_up - wake up a thread
* @p: the thread to be awakened

2015-11-20 14:09:25

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH 2/4] sched: Document Program-Order guarantees

Hi Peter,

On Fri, Nov 20, 2015 at 11:02:30AM +0100, Peter Zijlstra wrote:
[snip]
> + * BLOCKING -- aka. SLEEP + WAKEUP
> + *
> + * For blocking we (obviously) need to provide the same guarantee as for
> + * migration. However the means are completely different as there is no lock
> + * chain to provide order. Instead we do:
> + *
> + * 1) smp_store_release(X->on_cpu, 0)
> + * 2) smp_cond_acquire(!X->on_cpu)
> + *
> + * Example:
> + *
> + * CPU0 (schedule) CPU1 (try_to_wake_up) CPU2 (schedule)
> + *
> + * LOCK rq(0)->lock LOCK X->pi_lock
> + * dequeue X
> + * sched-out X
> + * smp_store_release(X->on_cpu, 0);
> + *
> + * smp_cond_acquire(!X->on_cpu);
> + * X->state = WAKING
> + * set_task_cpu(X,2)
> + *
> + * LOCK rq(2)->lock
> + * enqueue X
> + * X->state = RUNNING
> + * UNLOCK rq(2)->lock
> + *
> + * LOCK rq(2)->lock // orders against CPU1
> + * sched-out Z
> + * sched-in X
> + * UNLOCK rq(1)->lock
> + *
> + * UNLOCK X->pi_lock
> + * UNLOCK rq(0)->lock
> + *
> + *
> + * However; for wakeups there is a second guarantee we must provide, namely we
> + * must observe the state that lead to our wakeup. That is, not only must our
> + * task observe its own prior state, it must also observe the stores prior to
> + * its wakeup.
> + *
> + * This means that any means of doing remote wakeups must order the CPU doing
> + * the wakeup against the CPU the task is going to end up running on. This,
> + * however, is already required for the regular Program-Order guarantee above,
> + * since the waking CPU is the one issueing the ACQUIRE (2).
> + *

Hope I'm the only one who got confused about the "2" in "ACQUIRE (2)",
what does that refer? "2) smp_cond_acquire(!X->on_cpu)"?

The comments are great, just try to understand your meaning here ;-)

Regards,
Boqun

> + */
> +
> /**
> * try_to_wake_up - wake up a thread
> * @p: the thread to be awakened


Attachments:
(No filename) (2.15 kB)
signature.asc (473.00 B)
Download all attachments

2015-11-20 14:18:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/4] sched: Document Program-Order guarantees

On Fri, Nov 20, 2015 at 10:08:50PM +0800, Boqun Feng wrote:
> Hi Peter,
>
> On Fri, Nov 20, 2015 at 11:02:30AM +0100, Peter Zijlstra wrote:
> [snip]
> > + * BLOCKING -- aka. SLEEP + WAKEUP
> > + *
> > + * For blocking we (obviously) need to provide the same guarantee as for
> > + * migration. However the means are completely different as there is no lock
> > + * chain to provide order. Instead we do:
> > + *
> > + * 1) smp_store_release(X->on_cpu, 0)
> > + * 2) smp_cond_acquire(!X->on_cpu)
> > + *
> > + * Example:
> > + *
> > + * CPU0 (schedule) CPU1 (try_to_wake_up) CPU2 (schedule)
> > + *
> > + * LOCK rq(0)->lock LOCK X->pi_lock
> > + * dequeue X
> > + * sched-out X
> > + * smp_store_release(X->on_cpu, 0);
> > + *
> > + * smp_cond_acquire(!X->on_cpu);
> > + * X->state = WAKING
> > + * set_task_cpu(X,2)
> > + *
> > + * LOCK rq(2)->lock
> > + * enqueue X
> > + * X->state = RUNNING
> > + * UNLOCK rq(2)->lock
> > + *
> > + * LOCK rq(2)->lock // orders against CPU1
> > + * sched-out Z
> > + * sched-in X
> > + * UNLOCK rq(1)->lock
> > + *
> > + * UNLOCK X->pi_lock
> > + * UNLOCK rq(0)->lock
> > + *
> > + *
> > + * However; for wakeups there is a second guarantee we must provide, namely we
> > + * must observe the state that lead to our wakeup. That is, not only must our
> > + * task observe its own prior state, it must also observe the stores prior to
> > + * its wakeup.
> > + *
> > + * This means that any means of doing remote wakeups must order the CPU doing
> > + * the wakeup against the CPU the task is going to end up running on. This,
> > + * however, is already required for the regular Program-Order guarantee above,
> > + * since the waking CPU is the one issueing the ACQUIRE (2).
> > + *
>
> Hope I'm the only one who got confused about the "2" in "ACQUIRE (2)",
> what does that refer? "2) smp_cond_acquire(!X->on_cpu)"?

Yes, exactly that. Would an unadorned 2 be clearer?

2015-11-20 14:21:40

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH 2/4] sched: Document Program-Order guarantees

On Fri, Nov 20, 2015 at 03:18:12PM +0100, Peter Zijlstra wrote:
> On Fri, Nov 20, 2015 at 10:08:50PM +0800, Boqun Feng wrote:
> > Hi Peter,
> >
> > On Fri, Nov 20, 2015 at 11:02:30AM +0100, Peter Zijlstra wrote:
> > [snip]
> > > + * BLOCKING -- aka. SLEEP + WAKEUP
> > > + *
> > > + * For blocking we (obviously) need to provide the same guarantee as for
> > > + * migration. However the means are completely different as there is no lock
> > > + * chain to provide order. Instead we do:
> > > + *
> > > + * 1) smp_store_release(X->on_cpu, 0)
> > > + * 2) smp_cond_acquire(!X->on_cpu)
> > > + *
> > > + * Example:
> > > + *
> > > + * CPU0 (schedule) CPU1 (try_to_wake_up) CPU2 (schedule)
> > > + *
> > > + * LOCK rq(0)->lock LOCK X->pi_lock
> > > + * dequeue X
> > > + * sched-out X
> > > + * smp_store_release(X->on_cpu, 0);
> > > + *
> > > + * smp_cond_acquire(!X->on_cpu);
> > > + * X->state = WAKING
> > > + * set_task_cpu(X,2)
> > > + *
> > > + * LOCK rq(2)->lock
> > > + * enqueue X
> > > + * X->state = RUNNING
> > > + * UNLOCK rq(2)->lock
> > > + *
> > > + * LOCK rq(2)->lock // orders against CPU1
> > > + * sched-out Z
> > > + * sched-in X
> > > + * UNLOCK rq(1)->lock
> > > + *
> > > + * UNLOCK X->pi_lock
> > > + * UNLOCK rq(0)->lock
> > > + *
> > > + *
> > > + * However; for wakeups there is a second guarantee we must provide, namely we
> > > + * must observe the state that lead to our wakeup. That is, not only must our
> > > + * task observe its own prior state, it must also observe the stores prior to
> > > + * its wakeup.
> > > + *
> > > + * This means that any means of doing remote wakeups must order the CPU doing
> > > + * the wakeup against the CPU the task is going to end up running on. This,
> > > + * however, is already required for the regular Program-Order guarantee above,
> > > + * since the waking CPU is the one issueing the ACQUIRE (2).
> > > + *
> >
> > Hope I'm the only one who got confused about the "2" in "ACQUIRE (2)",
> > what does that refer? "2) smp_cond_acquire(!X->on_cpu)"?
>
> Yes, exactly that. Would an unadorned 2 be clearer?

How about "the one issueing the ACQUIRE (smp_cond_acquire)"?


Attachments:
(No filename) (2.42 kB)
signature.asc (473.00 B)
Download all attachments

2015-11-20 19:41:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/4] sched: Document Program-Order guarantees

On Fri, Nov 20, 2015 at 10:21:11PM +0800, Boqun Feng wrote:
> > Yes, exactly that. Would an unadorned 2 be clearer?
>
> How about "the one issueing the ACQUIRE (smp_cond_acquire)"?

Sure, that works. Thanks