2017-09-05 19:01:06

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 0/6] swait: Introduce and use swq_has_sleepers

Hi.

Recently[1] Nick mentioned that a lot of swait_active() callers look fishy.
This is because it inherited bad habits from regular waitqueues. Other than rcu,
kvm is one of the main callers, which I audited.

The following patches fix and/or justify (in baby steps) some of the
callers. Note that s390 and mips are other offenders, but I have no
idea if it can actually occur -- ie:

CPU0 CPU1
kvm_vcpu_block kvm_mips_comparecount_func

[S] prepare_to_swait(&vcpu->wq)
[L] swait_active(&vcpu->wq)
[S] queue_timer_int(vcpu)

[L] if (!kvm_vcpu_has_pending_timer(vcpu))
schedule()


[1] swait: add missing barrier to swake_up: https://lkml.org/lkml/2017/9/1/165

Thanks!

Davidlohr Bueso (6):
sched/wait: Add swq_has_sleepers()
kvm,async_pf: Use swq_has_sleepers()
kvm,lapic: Justify use of swait_activate()
x86,kvm: Fix apf_task_wake_one() serialization
kvm: Serialize wq active checks in kvm_vcpu_wake_up()
kvm,powerpc: Serialize wq active checks in ops->vcpu_kick

arch/powerpc/kvm/book3s_hv.c | 2 +-
arch/x86/kernel/kvm.c | 2 +-
arch/x86/kvm/lapic.c | 6 +++++
include/linux/swait.h | 57 ++++++++++++++++++++++++++++++++++++++++++--
virt/kvm/async_pf.c | 6 +----
virt/kvm/kvm_main.c | 2 +-
6 files changed, 65 insertions(+), 10 deletions(-)

--
2.12.0


2017-09-05 19:01:12

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 1/6] sched/wait: Add swq_has_sleepers()

Which is the equivalent of what we have in regular waitqueues.
I'm not crazy about the name, but this also helps us get both
apis closer -- which iirc comes originally from the -net folks.

We also duplicate the comments for the lockless swait_active(),
from wait.h. Future users will make use of this interface.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
include/linux/swait.h | 57 +++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 55 insertions(+), 2 deletions(-)

diff --git a/include/linux/swait.h b/include/linux/swait.h
index 4a4e180d0a35..ce4c2582fd42 100644
--- a/include/linux/swait.h
+++ b/include/linux/swait.h
@@ -79,9 +79,62 @@ extern void __init_swait_queue_head(struct swait_queue_head *q, const char *name
DECLARE_SWAIT_QUEUE_HEAD(name)
#endif

-static inline int swait_active(struct swait_queue_head *q)
+/**
+ * swait_active -- locklessly test for waiters on the queue
+ * @wq: the waitqueue to test for waiters
+ *
+ * returns true if the wait list is not empty
+ *
+ * NOTE: this function is lockless and requires care, incorrect usage _will_
+ * lead to sporadic and non-obvious failure.
+ *
+ * NOTE2: this function has the same above implications as regular waitqueues.
+ *
+ * Use either while holding swait_queue_head::lock or when used for wakeups
+ * with an extra smp_mb() like:
+ *
+ * CPU0 - waker CPU1 - waiter
+ *
+ * for (;;) {
+ * @cond = true; prepare_to_wait(&wq_head, &wait, state);
+ * smp_mb(); // smp_mb() from set_current_state()
+ * if (swait_active(wq_head)) if (@cond)
+ * wake_up(wq_head); break;
+ * schedule();
+ * }
+ * finish_wait(&wq_head, &wait);
+ *
+ * Because without the explicit smp_mb() it's possible for the
+ * swait_active() load to get hoisted over the @cond store such that we'll
+ * observe an empty wait list while the waiter might not observe @cond.
+ *
+ * Also note that this 'optimization' trades a spin_lock() for an smp_mb(),
+ * which (when the lock is uncontended) are of roughly equal cost.
+ */
+static inline int swait_active(struct swait_queue_head *wq)
+{
+ return !list_empty(&wq->task_list);
+}
+
+/**
+ * swq_has_sleeper - check if there are any waiting processes
+ * @wq: wait queue head
+ *
+ * Returns true if @wq has waiting processes
+ *
+ * Please refer to the comment for swait_active.
+ */
+static inline bool swq_has_sleeper(struct swait_queue_head *wq)
{
- return !list_empty(&q->task_list);
+ /*
+ * We need to be sure we are in sync with the list_add()
+ * modifications to the wait queue (task_list).
+ *
+ * This memory barrier should be paired with one on the
+ * waiting side.
+ */
+ smp_mb();
+ return swait_active(wq);
}

extern void swake_up(struct swait_queue_head *q);
--
2.12.0

2017-09-05 19:01:19

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 6/6] kvm,powerpc: Serialize wq active checks in ops->vcpu_kick

Particularly because kvmppc_fast_vcpu_kick_hv() is a callback,
ensure that we properly serialize wq active checks in order to
avoid potentially missing a wakeup due to racing with the waiter
side.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/powerpc/kvm/book3s_hv.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 359c79cdf0cc..7ffc757ef978 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -181,7 +181,7 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
struct swait_queue_head *wqp;

wqp = kvm_arch_vcpu_wq(vcpu);
- if (swait_active(wqp)) {
+ if (swq_has_sleeper(wqp)) {
swake_up(wqp);
++vcpu->stat.halt_wakeup;
}
--
2.12.0

2017-09-05 19:01:57

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 5/6] kvm: Serialize wq active checks in kvm_vcpu_wake_up()

This is a generic call and can be suceptible to races
in reading the wq task_list while another task is adding
itself to the list. Add a full barrier by using the
swq_has_sleeper() helper.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
virt/kvm/kvm_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4d81f6ded88e..4e76e6d15ce0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2185,7 +2185,7 @@ bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
struct swait_queue_head *wqp;

wqp = kvm_arch_vcpu_wq(vcpu);
- if (swait_active(wqp)) {
+ if (swq_has_sleeper(wqp)) {
swake_up(wqp);
++vcpu->stat.halt_wakeup;
return true;
--
2.12.0

2017-09-05 19:02:29

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 4/6] x86,kvm: Fix apf_task_wake_one() serialization

During code inspection, the following potential race was seen:

CPU0 CPU1
kvm_async_pf_task_wait apf_task_wake_one
[S] prepare_to_swait(&n.wq)
[L] swait_active(&n->wq)
[S] hlist_del_init(&n->link);
[L] if (!hlist_unhahed(&n.link))
schedule()

Properly serialize swait_active() checks such that a wakeup is
not missed.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/x86/kernel/kvm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 874827b0d7ca..aa60a08b65b1 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -180,7 +180,7 @@ static void apf_task_wake_one(struct kvm_task_sleep_node *n)
hlist_del_init(&n->link);
if (n->halted)
smp_send_reschedule(n->cpu);
- else if (swait_active(&n->wq))
+ else if (swq_has_sleeper(&n->wq))
swake_up(&n->wq);
}

--
2.12.0

2017-09-05 19:03:05

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 3/6] kvm,lapic: Justify use of swait_activate()

A comment might serve future readers.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/x86/kvm/lapic.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 589dcc117086..fb8be28c7093 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1326,6 +1326,12 @@ static void apic_timer_expired(struct kvm_lapic *apic)
atomic_inc(&apic->lapic_timer.pending);
kvm_set_pending_timer(vcpu);

+ /*
+ * The above kvm_set_pending_timer implies a wmb
+ * which pairs with the swaiter side. Either way,
+ * the atomic_inc() is also serialized so using
+ * swait_active() is safe.
+ */
if (swait_active(q))
swake_up(q);

--
2.12.0

2017-09-05 19:03:25

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 2/6] kvm,async_pf: Use swq_has_sleepers()

... as we've got the new helper now. This caller already
does the right thing, hence no changes in semantics.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
virt/kvm/async_pf.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index bb298a200cd3..57bcb27dcf30 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -106,11 +106,7 @@ static void async_pf_execute(struct work_struct *work)

trace_kvm_async_pf_completed(addr, gva);

- /*
- * This memory barrier pairs with prepare_to_wait's set_current_state()
- */
- smp_mb();
- if (swait_active(&vcpu->wq))
+ if (swq_has_sleeper(&vcpu->wq))
swake_up(&vcpu->wq);

mmput(mm);
--
2.12.0

2017-09-10 09:20:45

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 4/6] x86,kvm: Fix apf_task_wake_one() serialization

On 05/09/2017 21:00, Davidlohr Bueso wrote:
> During code inspection, the following potential race was seen:
>
> CPU0 CPU1
> kvm_async_pf_task_wait apf_task_wake_one
> [S] prepare_to_swait(&n.wq)
> [L] swait_active(&n->wq)
> [S] hlist_del_init(&n->link);
> [L] if (!hlist_unhahed(&n.link))
> schedule()
>
> Properly serialize swait_active() checks such that a wakeup is
> not missed.
>
> Signed-off-by: Davidlohr Bueso <[email protected]>
> ---
> arch/x86/kernel/kvm.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 874827b0d7ca..aa60a08b65b1 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -180,7 +180,7 @@ static void apf_task_wake_one(struct kvm_task_sleep_node *n)
> hlist_del_init(&n->link);
> if (n->halted)
> smp_send_reschedule(n->cpu);
> - else if (swait_active(&n->wq))
> + else if (swq_has_sleeper(&n->wq))
> swake_up(&n->wq);
> }

After Nick's patch, swake_up starts with:

smp_mb();
if (!swait_active(q))
return;

so we can just remove the test here (and in patch 2).

The other patches could also use a better swait API, for example:

1) add a public __swake_up routine that omits the memory barrier, and
which can be used in patch 3. Perhaps better: omit the out-of-lock
check in __swake_up: then the caller can use it if it knows there is a
waiter. In those cases the memory barrier is expensive.

2) change swake_up and __swake_up to return true if they woke up a
process (or alternatively 0/-EAGAIN). Patches 5 and 6 now need not call
anymore either swq_has_sleepers or swait_active, and that saves a memory
barrier too.

What do you think?

Thanks,

Paolo

2017-09-10 09:24:02

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 4/6] x86,kvm: Fix apf_task_wake_one() serialization

On 10/09/2017 11:20, Paolo Bonzini wrote:
> On 05/09/2017 21:00, Davidlohr Bueso wrote:
>> During code inspection, the following potential race was seen:
>>
>> CPU0 CPU1
>> kvm_async_pf_task_wait apf_task_wake_one
>> [S] prepare_to_swait(&n.wq)
>> [L] swait_active(&n->wq)
>> [S] hlist_del_init(&n->link);
>> [L] if (!hlist_unhahed(&n.link))
>> schedule()
>>
>> Properly serialize swait_active() checks such that a wakeup is
>> not missed.
>>
>> Signed-off-by: Davidlohr Bueso <[email protected]>
>> ---
>> arch/x86/kernel/kvm.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>> index 874827b0d7ca..aa60a08b65b1 100644
>> --- a/arch/x86/kernel/kvm.c
>> +++ b/arch/x86/kernel/kvm.c
>> @@ -180,7 +180,7 @@ static void apf_task_wake_one(struct kvm_task_sleep_node *n)
>> hlist_del_init(&n->link);
>> if (n->halted)
>> smp_send_reschedule(n->cpu);
>> - else if (swait_active(&n->wq))
>> + else if (swq_has_sleeper(&n->wq))
>> swake_up(&n->wq);
>> }
>
> After Nick's patch, swake_up starts with:
>
> smp_mb();
> if (!swait_active(q))
> return;
>
> so we can just remove the test here (and in patch 2).
>
> The other patches could also use a better swait API, for example:
>
> 1) add a public __swake_up routine that omits the memory barrier, and
> which can be used in patch 3. Perhaps better: omit the out-of-lock
> check in __swake_up: then the caller can use it if it knows there is a
> waiter. In those cases the memory barrier is expensive.
>
> 2) change swake_up and __swake_up to return true if they woke up a
> process (or alternatively 0/-EAGAIN). Patches 5 and 6 now need not call
> anymore either swq_has_sleepers or swait_active, and that saves a memory
> barrier too.
>
> What do you think?

... doh, I missed PeterZ's remark that the early test is gone in tip.
Then the series makes total sense. Peter, if you ack patch 1 I can push
it through the KVM tree.

Paolo