2017-09-13 20:08:36

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH v2 0/7] swait: Introduce and use swq_has_sleeper()

Changes from v1: https://lkml.org/lkml/2017/9/5/622
- Added patch 7 (mips)
- Small comment fixlets in patch 1.

Hi,

Recently[1] Nick mentioned that a lot of swait_active() callers look fishy.
This is because it inherited bad habits from regular waitqueues. Other than rcu,
kvm is one of the main callers, which I audited.

The following patches fix and/or justify (in baby steps) some of the
callers. The main exception is s390, which I didn't follow how ->valid_wakeup
can get hoisted as kvm_vcpu_block does not use that in the wait loop.

Thanks!

Davidlohr Bueso (7):
sched/wait: Add swq_has_sleepers()
kvm,async_pf: Use swq_has_sleepers()
kvm,lapic: Justify use of swait_activate()
kvm,x86: Fix apf_task_wake_one() wq serialization
kvm: Serialize wq active checks in kvm_vcpu_wake_up()
kvm,powerpc: Serialize wq active checks in ops->vcpu_kick
kvm,mips: Fix potential swait_active() races

arch/mips/kvm/mips.c | 4 +--
arch/powerpc/kvm/book3s_hv.c | 2 +-
arch/x86/kernel/kvm.c | 2 +-
arch/x86/kvm/lapic.c | 4 +++
include/linux/swait.h | 58 ++++++++++++++++++++++++++++++++++++++++++--
virt/kvm/async_pf.c | 6 +----
virt/kvm/kvm_main.c | 2 +-
7 files changed, 66 insertions(+), 12 deletions(-)

--
2.12.0


2017-09-13 20:08:45

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 6/7] kvm,powerpc: Serialize wq active checks in ops->vcpu_kick

Particularly because kvmppc_fast_vcpu_kick_hv() is a callback,
ensure that we properly serialize wq active checks in order to
avoid potentially missing a wakeup due to racing with the waiter
side.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/powerpc/kvm/book3s_hv.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 18e974a34fce..473e831d1038 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -181,7 +181,7 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
struct swait_queue_head *wqp;

wqp = kvm_arch_vcpu_wq(vcpu);
- if (swait_active(wqp)) {
+ if (swq_has_sleeper(wqp)) {
swake_up(wqp);
++vcpu->stat.halt_wakeup;
}
--
2.12.0

2017-09-13 20:08:38

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 1/7] sched/wait: Add swq_has_sleepers()

Which is the equivalent of what we have in regular waitqueues.
I'm not crazy about the name, but this also helps us get both
apis closer -- which iirc comes originally from the -net folks.

We also duplicate the comments for the lockless swait_active(),
from wait.h. Future users will make use of this interface.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
include/linux/swait.h | 58 +++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 56 insertions(+), 2 deletions(-)

diff --git a/include/linux/swait.h b/include/linux/swait.h
index 4a4e180d0a35..73e97a08d3d0 100644
--- a/include/linux/swait.h
+++ b/include/linux/swait.h
@@ -79,9 +79,63 @@ extern void __init_swait_queue_head(struct swait_queue_head *q, const char *name
DECLARE_SWAIT_QUEUE_HEAD(name)
#endif

-static inline int swait_active(struct swait_queue_head *q)
+/**
+ * swait_active -- locklessly test for waiters on the queue
+ * @wq: the waitqueue to test for waiters
+ *
+ * returns true if the wait list is not empty
+ *
+ * NOTE: this function is lockless and requires care, incorrect usage _will_
+ * lead to sporadic and non-obvious failure.
+ *
+ * NOTE2: this function has the same above implications as regular waitqueues.
+ *
+ * Use either while holding swait_queue_head::lock or when used for wakeups
+ * with an extra smp_mb() like:
+ *
+ * CPU0 - waker CPU1 - waiter
+ *
+ * for (;;) {
+ * @cond = true; prepare_to_swait(&wq_head, &wait, state);
+ * smp_mb(); // smp_mb() from set_current_state()
+ * if (swait_active(wq_head)) if (@cond)
+ * wake_up(wq_head); break;
+ * schedule();
+ * }
+ * finish_swait(&wq_head, &wait);
+ *
+ * Because without the explicit smp_mb() it's possible for the
+ * swait_active() load to get hoisted over the @cond store such that we'll
+ * observe an empty wait list while the waiter might not observe @cond.
+ * This, in turn, can trigger missing wakeups.
+ *
+ * Also note that this 'optimization' trades a spin_lock() for an smp_mb(),
+ * which (when the lock is uncontended) are of roughly equal cost.
+ */
+static inline int swait_active(struct swait_queue_head *wq)
+{
+ return !list_empty(&wq->task_list);
+}
+
+/**
+ * swq_has_sleeper - check if there are any waiting processes
+ * @wq: the waitqueue to test for waiters
+ *
+ * Returns true if @wq has waiting processes
+ *
+ * Please refer to the comment for swait_active.
+ */
+static inline bool swq_has_sleeper(struct swait_queue_head *wq)
{
- return !list_empty(&q->task_list);
+ /*
+ * We need to be sure we are in sync with the list_add()
+ * modifications to the wait queue (task_list).
+ *
+ * This memory barrier should be paired with one on the
+ * waiting side.
+ */
+ smp_mb();
+ return swait_active(wq);
}

extern void swake_up(struct swait_queue_head *q);
--
2.12.0

2017-09-13 20:09:06

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 7/7] kvm,mips: Fix potential swait_active() races

For example, the following could occur, making us miss a wakeup:

CPU0 CPU1
kvm_vcpu_block kvm_mips_comparecount_func
[L] swait_active(&vcpu->wq)
[S] prepare_to_swait(&vcpu->wq)
[L] if (!kvm_vcpu_has_pending_timer(vcpu))
schedule() [S] queue_timer_int(vcpu)

Ensure that the swait_active() check is not hoisted over the interrupt.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/mips/kvm/mips.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index bce2a6431430..d535edc01434 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -514,7 +514,7 @@ int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu,

dvcpu->arch.wait = 0;

- if (swait_active(&dvcpu->wq))
+ if (swq_has_sleeper(&dvcpu->wq))
swake_up(&dvcpu->wq);

return 0;
@@ -1179,7 +1179,7 @@ static void kvm_mips_comparecount_func(unsigned long data)
kvm_mips_callbacks->queue_timer_int(vcpu);

vcpu->arch.wait = 0;
- if (swait_active(&vcpu->wq))
+ if (swq_has_sleeper(&vcpu->wq))
swake_up(&vcpu->wq);
}

--
2.12.0

2017-09-13 20:09:33

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 5/7] kvm: Serialize wq active checks in kvm_vcpu_wake_up()

This is a generic call and can be suceptible to races
in reading the wq task_list while another task is adding
itself to the list. Add a full barrier by using the
swq_has_sleeper() helper.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
virt/kvm/kvm_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6ed1c2021198..c41d903c5783 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2186,7 +2186,7 @@ bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
struct swait_queue_head *wqp;

wqp = kvm_arch_vcpu_wq(vcpu);
- if (swait_active(wqp)) {
+ if (swq_has_sleeper(wqp)) {
swake_up(wqp);
++vcpu->stat.halt_wakeup;
return true;
--
2.12.0

2017-09-13 20:10:20

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 4/7] kvm,x86: Fix apf_task_wake_one() wq serialization

During code inspection, the following potential race was seen:

CPU0 CPU1
kvm_async_pf_task_wait apf_task_wake_one
[L] swait_active(&n->wq)
[S] prepare_to_swait(&n.wq)
[L] if (!hlist_unhahed(&n.link))
schedule() [S] hlist_del_init(&n->link);

Properly serialize swait_active() checks such that a wakeup is
not missed.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/x86/kernel/kvm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 874827b0d7ca..aa60a08b65b1 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -180,7 +180,7 @@ static void apf_task_wake_one(struct kvm_task_sleep_node *n)
hlist_del_init(&n->link);
if (n->halted)
smp_send_reschedule(n->cpu);
- else if (swait_active(&n->wq))
+ else if (swq_has_sleeper(&n->wq))
swake_up(&n->wq);
}

--
2.12.0

2017-09-13 20:10:39

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 3/7] kvm,lapic: Justify use of swait_activate()

A comment might serve future readers.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
arch/x86/kvm/lapic.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index aaf10b6f5380..69c5612be786 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1324,6 +1324,10 @@ static void apic_timer_expired(struct kvm_lapic *apic)
atomic_inc(&apic->lapic_timer.pending);
kvm_set_pending_timer(vcpu);

+ /*
+ * For x86, the atomic_inc() is serialized, thus
+ * using swait_active() is safe.
+ */
if (swait_active(q))
swake_up(q);

--
2.12.0

2017-09-13 20:10:56

by Davidlohr Bueso

[permalink] [raw]
Subject: [PATCH 2/7] kvm,async_pf: Use swq_has_sleepers()

... as we've got the new helper now. This caller already
does the right thing, hence no changes in semantics.

Signed-off-by: Davidlohr Bueso <[email protected]>
---
virt/kvm/async_pf.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index bb298a200cd3..57bcb27dcf30 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -106,11 +106,7 @@ static void async_pf_execute(struct work_struct *work)

trace_kvm_async_pf_completed(addr, gva);

- /*
- * This memory barrier pairs with prepare_to_wait's set_current_state()
- */
- smp_mb();
- if (swait_active(&vcpu->wq))
+ if (swq_has_sleeper(&vcpu->wq))
swake_up(&vcpu->wq);

mmput(mm);
--
2.12.0

2017-09-13 20:35:53

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 7/7] kvm,mips: Fix potential swait_active() races

On 13/09/2017 22:08, Davidlohr Bueso wrote:
> For example, the following could occur, making us miss a wakeup:
>
> CPU0 CPU1
> kvm_vcpu_block kvm_mips_comparecount_func
> [L] swait_active(&vcpu->wq)
> [S] prepare_to_swait(&vcpu->wq)
> [L] if (!kvm_vcpu_has_pending_timer(vcpu))
> schedule() [S] queue_timer_int(vcpu)
>
> Ensure that the swait_active() check is not hoisted over the interrupt.
>
> Signed-off-by: Davidlohr Bueso <[email protected]>
> ---
> arch/mips/kvm/mips.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index bce2a6431430..d535edc01434 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -514,7 +514,7 @@ int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu,
>
> dvcpu->arch.wait = 0;
>
> - if (swait_active(&dvcpu->wq))
> + if (swq_has_sleeper(&dvcpu->wq))
> swake_up(&dvcpu->wq);
>
> return 0;
> @@ -1179,7 +1179,7 @@ static void kvm_mips_comparecount_func(unsigned long data)
> kvm_mips_callbacks->queue_timer_int(vcpu);
>
> vcpu->arch.wait = 0;
> - if (swait_active(&vcpu->wq))
> + if (swq_has_sleeper(&vcpu->wq))
> swake_up(&vcpu->wq);
> }
>
>

has_sleeper*s*. Can fix when committing.

Paolo

2017-09-13 22:22:49

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH 7/7] kvm,mips: Fix potential swait_active() races

On Wed, 13 Sep 2017, Paolo Bonzini wrote:
>has_sleeper*s*. Can fix when committing.

So for regular waitqueues we have it singular, which is why I kept
that name -- albeit sleepers() being better suited, yes. I don't think
we want to rename it unless we rename all wq_has_sleeper() callers as
well.

Thanks,
Davidlohr

2017-09-15 11:33:08

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 7/7] kvm,mips: Fix potential swait_active() races

On 14/09/2017 00:22, Davidlohr Bueso wrote:
> On Wed, 13 Sep 2017, Paolo Bonzini wrote:
>> has_sleeper*s*. Can fix when committing.
>
> So for regular waitqueues we have it singular, which is why I kept
> that name -- albeit sleepers() being better suited, yes. I don't think
> we want to rename it unless we rename all wq_has_sleeper() callers as
> well.

The typo is in patches 1 and 2. Fixed those!

Paolo

2017-09-15 11:47:15

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 3/7] kvm,lapic: Justify use of swait_activate()

On 13/09/2017 22:08, Davidlohr Bueso wrote:
> A comment might serve future readers.
>
> Signed-off-by: Davidlohr Bueso <[email protected]>
> ---
> arch/x86/kvm/lapic.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index aaf10b6f5380..69c5612be786 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1324,6 +1324,10 @@ static void apic_timer_expired(struct kvm_lapic *apic)
> atomic_inc(&apic->lapic_timer.pending);
> kvm_set_pending_timer(vcpu);
>
> + /*
> + * For x86, the atomic_inc() is serialized, thus
> + * using swait_active() is safe.
> + */
> if (swait_active(q))
> swake_up(q);
>
>

Better add an smp_mb__after_atomic() for documentation purposes.

Paolo

2017-09-15 11:53:21

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2 0/7] swait: Introduce and use swq_has_sleeper()

On 13/09/2017 22:08, Davidlohr Bueso wrote:
> The following patches fix and/or justify (in baby steps) some of the
> callers. The main exception is s390, which I didn't follow how ->valid_wakeup
> can get hoisted as kvm_vcpu_block does not use that in the wait loop.

valid_wakeup is just an optimization, so it's not a problem.

There seems to be always an atomic_or or set_bit before
kvm_s390_vcpu_wakeup is called (except kvm_s390_idle_wakeup which has no
store at all and doesn't need any serialization). So my suggestion is
to add an smp__mb_after_atomic in kvm_s390_vcpu_wakeup; I'll let the
s390 guys do it.

Paolo

2017-09-19 08:25:19

by Christian Borntraeger

[permalink] [raw]
Subject: Re: [PATCH v2 0/7] swait: Introduce and use swq_has_sleeper()



On 09/15/2017 01:53 PM, Paolo Bonzini wrote:
> On 13/09/2017 22:08, Davidlohr Bueso wrote:
>> The following patches fix and/or justify (in baby steps) some of the
>> callers. The main exception is s390, which I didn't follow how ->valid_wakeup
>> can get hoisted as kvm_vcpu_block does not use that in the wait loop.
>
> valid_wakeup is just an optimization, so it's not a problem.
>
> There seems to be always an atomic_or or set_bit before
> kvm_s390_vcpu_wakeup is called (except kvm_s390_idle_wakeup which has no
> store at all and doesn't need any serialization). So my suggestion is
> to add an smp__mb_after_atomic in kvm_s390_vcpu_wakeup; I'll let the
> s390 guys do it.


I will queue something like this


diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index a832ad0..44239b5 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -1074,6 +1074,12 @@ void kvm_s390_vcpu_wakeup(struct kvm_vcpu *vcpu)
* in kvm_vcpu_block without having the waitqueue set (polling)
*/
vcpu->valid_wakeup = true;
+ /*
+ * This is mostly to document, that the read in swait_active could
+ * be moved before other stores, leading to subtle races.
+ * All current users do not store or use an atomic like update
+ */
+ __smp_mb__after_atomic();
if (swait_active(&vcpu->wq)) {
/*
* The vcpu gave up the cpu voluntarily, mark it as a good





but I am asking myself if it is "safer" to make this function use swq_has_sleepers
in case we add in a distant future another user to kvm_s390_vcpu_wakeup that does
use a normal store and everybody has already forgotten this?