2019-10-12 05:51:25

by Manfred Spraul

[permalink] [raw]
Subject: [PATCH 1/6] wake_q: Cleanup + Documentation update.

1) wake_q_add() contains a memory barrier, and callers such as
ipc/mqueue.c rely on this barrier.
Unfortunately, this is documented in ipc/mqueue.c, and not in the
description of wake_q_add().
Therefore: Update the documentation.
Removing/updating ipc/mqueue.c will happen with the next patch in the
series.

2) wake_q_add() ends with get_task_struct(), which is an
unordered refcount increase. Add a clear comment that the callers
are responsible for a barrier: most likely spin_unlock() or
smp_store_release().

3) wake_up_q() relies on the memory barrier in try_to_wake_up().
Add a comment, to simplify searching.

4) wake_q.next is accessed without synchroniyation by wake_q_add(),
using cmpxchg_relaxed(), and by wake_up_q().
Therefore: Use WRITE_ONCE in wake_up_q(), to ensure that the
compiler doesn't perform any tricks.

Signed-off-by: Manfred Spraul <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
---
kernel/sched/core.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index dd05a378631a..60ae574317fd 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -440,8 +440,16 @@ static bool __wake_q_add(struct wake_q_head *head, struct task_struct *task)
* @task: the task to queue for 'later' wakeup
*
* Queue a task for later wakeup, most likely by the wake_up_q() call in the
- * same context, _HOWEVER_ this is not guaranteed, the wakeup can come
- * instantly.
+ * same context, _HOWEVER_ this is not guaranteed. Especially, the wakeup
+ * may happen before the function returns.
+ *
+ * What is guaranteed is that there is a memory barrier before the wakeup,
+ * callers may rely on this barrier.
+ *
+ * On the other hand, the caller must guarantee that @task does not disappear
+ * before wake_q_add() completed. wake_q_add() does not contain any memory
+ * barrier to ensure ordering, thus the caller may need to use
+ * smp_store_release().
*
* This function must be used as-if it were wake_up_process(); IOW the task
* must be ready to be woken at this location.
@@ -486,11 +494,14 @@ void wake_up_q(struct wake_q_head *head)
BUG_ON(!task);
/* Task can safely be re-inserted now: */
node = node->next;
- task->wake_q.next = NULL;
+
+ WRITE_ONCE(task->wake_q.next, NULL);

/*
* wake_up_process() executes a full barrier, which pairs with
* the queueing in wake_q_add() so as not to miss wakeups.
+ * The barrier is the smp_mb__after_spinlock() in
+ * try_to_wake_up().
*/
wake_up_process(task);
put_task_struct(task);
--
2.21.0


2019-10-14 06:36:20

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH 1/6] wake_q: Cleanup + Documentation update.

On Sat, 12 Oct 2019, Manfred Spraul wrote:

>1) wake_q_add() contains a memory barrier, and callers such as
>ipc/mqueue.c rely on this barrier.
>Unfortunately, this is documented in ipc/mqueue.c, and not in the
>description of wake_q_add().
>Therefore: Update the documentation.
>Removing/updating ipc/mqueue.c will happen with the next patch in the
>series.
>
>2) wake_q_add() ends with get_task_struct(), which is an
>unordered refcount increase. Add a clear comment that the callers
>are responsible for a barrier: most likely spin_unlock() or
>smp_store_release().
>
>3) wake_up_q() relies on the memory barrier in try_to_wake_up().
>Add a comment, to simplify searching.
>
>4) wake_q.next is accessed without synchroniyation by wake_q_add(),
>using cmpxchg_relaxed(), and by wake_up_q().
>Therefore: Use WRITE_ONCE in wake_up_q(), to ensure that the
>compiler doesn't perform any tricks.
>
>Signed-off-by: Manfred Spraul <[email protected]>
>Cc: Davidlohr Bueso <[email protected]>
>---
> kernel/sched/core.c | 17 ++++++++++++++---
> 1 file changed, 14 insertions(+), 3 deletions(-)
>
>diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>index dd05a378631a..60ae574317fd 100644
>--- a/kernel/sched/core.c
>+++ b/kernel/sched/core.c
>@@ -440,8 +440,16 @@ static bool __wake_q_add(struct wake_q_head *head, struct task_struct *task)
> * @task: the task to queue for 'later' wakeup
> *
> * Queue a task for later wakeup, most likely by the wake_up_q() call in the
>- * same context, _HOWEVER_ this is not guaranteed, the wakeup can come
>- * instantly.
>+ * same context, _HOWEVER_ this is not guaranteed. Especially, the wakeup
>+ * may happen before the function returns.
>+ *
>+ * What is guaranteed is that there is a memory barrier before the wakeup,
>+ * callers may rely on this barrier.
>+ *
>+ * On the other hand, the caller must guarantee that @task does not disappear
>+ * before wake_q_add() completed. wake_q_add() does not contain any memory
>+ * barrier to ensure ordering, thus the caller may need to use
>+ * smp_store_release().

This is why we have wake_q_add_safe(). I think this last paragraph is unnecessary
and confusing.

Thanks,
Davidlohr

> *
> * This function must be used as-if it were wake_up_process(); IOW the task
> * must be ready to be woken at this location.
>@@ -486,11 +494,14 @@ void wake_up_q(struct wake_q_head *head)
> BUG_ON(!task);
> /* Task can safely be re-inserted now: */
> node = node->next;
>- task->wake_q.next = NULL;
>+
>+ WRITE_ONCE(task->wake_q.next, NULL);
>
> /*
> * wake_up_process() executes a full barrier, which pairs with
> * the queueing in wake_q_add() so as not to miss wakeups.
>+ * The barrier is the smp_mb__after_spinlock() in
>+ * try_to_wake_up().
> */
> wake_up_process(task);
> put_task_struct(task);
>--
>2.21.0
>

2019-10-14 12:06:01

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/6] wake_q: Cleanup + Documentation update.

On Sat, Oct 12, 2019 at 07:49:53AM +0200, Manfred Spraul wrote:
> 1) wake_q_add() contains a memory barrier, and callers such as
> ipc/mqueue.c rely on this barrier.
> Unfortunately, this is documented in ipc/mqueue.c, and not in the
> description of wake_q_add().
> Therefore: Update the documentation.
> Removing/updating ipc/mqueue.c will happen with the next patch in the
> series.
>
> 2) wake_q_add() ends with get_task_struct(), which is an
> unordered refcount increase. Add a clear comment that the callers
> are responsible for a barrier: most likely spin_unlock() or
> smp_store_release().
>
> 3) wake_up_q() relies on the memory barrier in try_to_wake_up().
> Add a comment, to simplify searching.
>
> 4) wake_q.next is accessed without synchroniyation by wake_q_add(),
> using cmpxchg_relaxed(), and by wake_up_q().
> Therefore: Use WRITE_ONCE in wake_up_q(), to ensure that the
> compiler doesn't perform any tricks.
>
> Signed-off-by: Manfred Spraul <[email protected]>
> Cc: Davidlohr Bueso <[email protected]>
> ---
> kernel/sched/core.c | 17 ++++++++++++++---
> 1 file changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index dd05a378631a..60ae574317fd 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -440,8 +440,16 @@ static bool __wake_q_add(struct wake_q_head *head, struct task_struct *task)
> * @task: the task to queue for 'later' wakeup
> *
> * Queue a task for later wakeup, most likely by the wake_up_q() call in the
> - * same context, _HOWEVER_ this is not guaranteed, the wakeup can come
> - * instantly.
> + * same context, _HOWEVER_ this is not guaranteed. Especially, the wakeup
> + * may happen before the function returns.
> + *
> + * What is guaranteed is that there is a memory barrier before the wakeup,
> + * callers may rely on this barrier.

I would like to stress that this (wake_q_add) provides the same ordering
guarantees as a 'normal' wakeup.

> + *
> + * On the other hand, the caller must guarantee that @task does not disappear
> + * before wake_q_add() completed. wake_q_add() does not contain any memory
> + * barrier to ensure ordering, thus the caller may need to use
> + * smp_store_release().

Like Davidlohr, I'm slightly puzzled by this last paragraph.

> *
> * This function must be used as-if it were wake_up_process(); IOW the task
> * must be ready to be woken at this location.
> @@ -486,11 +494,14 @@ void wake_up_q(struct wake_q_head *head)
> BUG_ON(!task);
> /* Task can safely be re-inserted now: */
> node = node->next;
> - task->wake_q.next = NULL;
> +
> + WRITE_ONCE(task->wake_q.next, NULL);
>
> /*
> * wake_up_process() executes a full barrier, which pairs with
> * the queueing in wake_q_add() so as not to miss wakeups.
> + * The barrier is the smp_mb__after_spinlock() in
> + * try_to_wake_up().

We already have wake_up_process() documented as implying this barrier;
why do we want to mention the specifics here? That is, have a look at
the comments before try_to_wake_up().

> */
> wake_up_process(task);
> put_task_struct(task);
> --
> 2.21.0
>