2010-02-23 19:19:18

by Jan Kiszka

[permalink] [raw]
Subject: Re: [patch] x86: kvm: Convert i8254/i8259 locks to raw_spinlocks

Thomas Gleixner wrote:
> The i8254/i8259 locks need to be real spinlocks on preempt-rt. Convert
> them to raw_spinlock. No change for !RT kernels.

Doesn't fly for -rt anymore: pic_irq_update runs under this raw lock and
calls kvm_vcpu_kick which tries to wake_up some thread -> scheduling
while atomic.

This used to work up to 956f97cf. -rt for 2.6.31 is not yet affected,
but 2.6.33 should be broken (haven't checked, using kvm-kmod over 2.6.31
ATM). I can provide a patch that restores the deferred kicking if it's
acceptable for upstream.

Jan

--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


2010-02-23 22:24:05

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch] x86: kvm: Convert i8254/i8259 locks to raw_spinlocks

On Tue, 23 Feb 2010, Jan Kiszka wrote:

> Thomas Gleixner wrote:
> > The i8254/i8259 locks need to be real spinlocks on preempt-rt. Convert
> > them to raw_spinlock. No change for !RT kernels.
>
> Doesn't fly for -rt anymore: pic_irq_update runs under this raw lock and
> calls kvm_vcpu_kick which tries to wake_up some thread -> scheduling
> while atomic.

Hmm, a wakeup itself is fine. Is that code waking a wake queue ?

> This used to work up to 956f97cf. -rt for 2.6.31 is not yet affected,
> but 2.6.33 should be broken (haven't checked, using kvm-kmod over 2.6.31
> ATM). I can provide a patch that restores the deferred kicking if it's
> acceptable for upstream.

Well, at least is would be nice to have one for -rt.

Thanks,

tglx

2010-02-24 09:42:21

by Jan Kiszka

[permalink] [raw]
Subject: [PATCH] KVM: x86: Kick VCPU outside PIC lock again

Thomas Gleixner wrote:
> On Tue, 23 Feb 2010, Jan Kiszka wrote:
>
>> Thomas Gleixner wrote:
>>> The i8254/i8259 locks need to be real spinlocks on preempt-rt. Convert
>>> them to raw_spinlock. No change for !RT kernels.
>> Doesn't fly for -rt anymore: pic_irq_update runs under this raw lock and
>> calls kvm_vcpu_kick which tries to wake_up some thread -> scheduling
>> while atomic.
>
> Hmm, a wakeup itself is fine. Is that code waking a wake queue ?

Yes, it's a wake queue.

>
>> This used to work up to 956f97cf. -rt for 2.6.31 is not yet affected,
>> but 2.6.33 should be broken (haven't checked, using kvm-kmod over 2.6.31
>> ATM). I can provide a patch that restores the deferred kicking if it's
>> acceptable for upstream.
>
> Well, at least is would be nice to have one for -rt.
>

Here we go. Haven't run kvm.git long enough over -rt yet to say that it
was the only remaining issue, but at least it doesn't complain instantly
anymore when starting a VM.

Jan

---------->

This restores the deferred VCPU kicking before 956f97cf. We need this
over -rt as wake_up* requires non-atomic context in this configuration.

Signed-off-by: Jan Kiszka <[email protected]>
---

arch/x86/kvm/i8259.c | 53 ++++++++++++++++++++++++++++++++++++--------------
arch/x86/kvm/irq.h | 1 +
2 files changed, 39 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 07771da..ca426bd 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -32,6 +32,29 @@
#include <linux/kvm_host.h>
#include "trace.h"

+static void pic_lock(struct kvm_pic *s)
+ __acquires(&s->lock)
+{
+ raw_spin_lock(&s->lock);
+}
+
+static void pic_unlock(struct kvm_pic *s)
+ __releases(&s->lock)
+{
+ bool wakeup = s->wakeup_needed;
+ struct kvm_vcpu *vcpu;
+
+ s->wakeup_needed = false;
+
+ raw_spin_unlock(&s->lock);
+
+ if (wakeup) {
+ vcpu = s->kvm->bsp_vcpu;
+ if (vcpu)
+ kvm_vcpu_kick(vcpu);
+ }
+}
+
static void pic_clear_isr(struct kvm_kpic_state *s, int irq)
{
s->isr &= ~(1 << irq);
@@ -44,19 +67,19 @@ static void pic_clear_isr(struct kvm_kpic_state *s, int irq)
* Other interrupt may be delivered to PIC while lock is dropped but
* it should be safe since PIC state is already updated at this stage.
*/
- raw_spin_unlock(&s->pics_state->lock);
+ pic_unlock(s->pics_state);
kvm_notify_acked_irq(s->pics_state->kvm, SELECT_PIC(irq), irq);
- raw_spin_lock(&s->pics_state->lock);
+ pic_lock(s->pics_state);
}

void kvm_pic_clear_isr_ack(struct kvm *kvm)
{
struct kvm_pic *s = pic_irqchip(kvm);

- raw_spin_lock(&s->lock);
+ pic_lock(s);
s->pics[0].isr_ack = 0xff;
s->pics[1].isr_ack = 0xff;
- raw_spin_unlock(&s->lock);
+ pic_unlock(s);
}

/*
@@ -157,9 +180,9 @@ static void pic_update_irq(struct kvm_pic *s)

void kvm_pic_update_irq(struct kvm_pic *s)
{
- raw_spin_lock(&s->lock);
+ pic_lock(s);
pic_update_irq(s);
- raw_spin_unlock(&s->lock);
+ pic_unlock(s);
}

int kvm_pic_set_irq(void *opaque, int irq, int level)
@@ -167,14 +190,14 @@ int kvm_pic_set_irq(void *opaque, int irq, int level)
struct kvm_pic *s = opaque;
int ret = -1;

- raw_spin_lock(&s->lock);
+ pic_lock(s);
if (irq >= 0 && irq < PIC_NUM_PINS) {
ret = pic_set_irq1(&s->pics[irq >> 3], irq & 7, level);
pic_update_irq(s);
trace_kvm_pic_set_irq(irq >> 3, irq & 7, s->pics[irq >> 3].elcr,
s->pics[irq >> 3].imr, ret == 0);
}
- raw_spin_unlock(&s->lock);
+ pic_unlock(s);

return ret;
}
@@ -204,7 +227,7 @@ int kvm_pic_read_irq(struct kvm *kvm)
int irq, irq2, intno;
struct kvm_pic *s = pic_irqchip(kvm);

- raw_spin_lock(&s->lock);
+ pic_lock(s);
irq = pic_get_irq(&s->pics[0]);
if (irq >= 0) {
pic_intack(&s->pics[0], irq);
@@ -229,7 +252,7 @@ int kvm_pic_read_irq(struct kvm *kvm)
intno = s->pics[0].irq_base + irq;
}
pic_update_irq(s);
- raw_spin_unlock(&s->lock);
+ pic_unlock(s);

return intno;
}
@@ -443,7 +466,7 @@ static int picdev_write(struct kvm_io_device *this,
printk(KERN_ERR "PIC: non byte write\n");
return 0;
}
- raw_spin_lock(&s->lock);
+ pic_lock(s);
switch (addr) {
case 0x20:
case 0x21:
@@ -456,7 +479,7 @@ static int picdev_write(struct kvm_io_device *this,
elcr_ioport_write(&s->pics[addr & 1], addr, data);
break;
}
- raw_spin_unlock(&s->lock);
+ pic_unlock(s);
return 0;
}

@@ -473,7 +496,7 @@ static int picdev_read(struct kvm_io_device *this,
printk(KERN_ERR "PIC: non byte read\n");
return 0;
}
- raw_spin_lock(&s->lock);
+ pic_lock(s);
switch (addr) {
case 0x20:
case 0x21:
@@ -487,7 +510,7 @@ static int picdev_read(struct kvm_io_device *this,
break;
}
*(unsigned char *)val = data;
- raw_spin_unlock(&s->lock);
+ pic_unlock(s);
return 0;
}

@@ -504,7 +527,7 @@ static void pic_irq_request(void *opaque, int level)
s->output = level;
if (vcpu && level && (s->pics[0].isr_ack & (1 << irq))) {
s->pics[0].isr_ack &= ~(1 << irq);
- kvm_vcpu_kick(vcpu);
+ s->wakeup_needed = true;
}
}

diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index 34b1591..cd1f362 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -63,6 +63,7 @@ struct kvm_kpic_state {

struct kvm_pic {
raw_spinlock_t lock;
+ bool wakeup_needed;
unsigned pending_acks;
struct kvm *kvm;
struct kvm_kpic_state pics[2]; /* 0 is master pic, 1 is slave pic */

2010-02-24 09:48:14

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Kick VCPU outside PIC lock again

On 02/24/2010 11:41 AM, Jan Kiszka wrote:
> Thomas Gleixner wrote:
>
>> On Tue, 23 Feb 2010, Jan Kiszka wrote:
>>
>>
>>> Thomas Gleixner wrote:
>>>
>>>> The i8254/i8259 locks need to be real spinlocks on preempt-rt. Convert
>>>> them to raw_spinlock. No change for !RT kernels.
>>>>
>>> Doesn't fly for -rt anymore: pic_irq_update runs under this raw lock and
>>> calls kvm_vcpu_kick which tries to wake_up some thread -> scheduling
>>> while atomic.
>>>
>> Hmm, a wakeup itself is fine. Is that code waking a wake queue ?
>>
> Yes, it's a wake queue.
>

So what's the core issue? Is the lock_t in the wait_queue a sleeping mutex?

> This restores the deferred VCPU kicking before 956f97cf. We need this
> over -rt as wake_up* requires non-atomic context in this configuration.
>
>

Seems sane, will apply once I understand why the current code fails.

--
error compiling committee.c: too many arguments to function

2010-02-24 09:55:42

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Kick VCPU outside PIC lock again

Avi Kivity wrote:
> On 02/24/2010 11:41 AM, Jan Kiszka wrote:
>> Thomas Gleixner wrote:
>>
>>> On Tue, 23 Feb 2010, Jan Kiszka wrote:
>>>
>>>
>>>> Thomas Gleixner wrote:
>>>>
>>>>> The i8254/i8259 locks need to be real spinlocks on preempt-rt. Convert
>>>>> them to raw_spinlock. No change for !RT kernels.
>>>>>
>>>> Doesn't fly for -rt anymore: pic_irq_update runs under this raw lock and
>>>> calls kvm_vcpu_kick which tries to wake_up some thread -> scheduling
>>>> while atomic.
>>>>
>>> Hmm, a wakeup itself is fine. Is that code waking a wake queue ?
>>>
>> Yes, it's a wake queue.
>>
>
> So what's the core issue? Is the lock_t in the wait_queue a sleeping mutex?

Yep.

Jan

--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

2010-02-24 10:04:42

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Kick VCPU outside PIC lock again

On 02/24/2010 11:54 AM, Jan Kiszka wrote:
> Avi Kivity wrote:
>
>> On 02/24/2010 11:41 AM, Jan Kiszka wrote:
>>
>>> Thomas Gleixner wrote:
>>>
>>>
>>>> On Tue, 23 Feb 2010, Jan Kiszka wrote:
>>>>
>>>>
>>>>
>>>>> Thomas Gleixner wrote:
>>>>>
>>>>>
>>>>>> The i8254/i8259 locks need to be real spinlocks on preempt-rt. Convert
>>>>>> them to raw_spinlock. No change for !RT kernels.
>>>>>>
>>>>>>
>>>>> Doesn't fly for -rt anymore: pic_irq_update runs under this raw lock and
>>>>> calls kvm_vcpu_kick which tries to wake_up some thread -> scheduling
>>>>> while atomic.
>>>>>
>>>>>
>>>> Hmm, a wakeup itself is fine. Is that code waking a wake queue ?
>>>>
>>>>
>>> Yes, it's a wake queue.
>>>
>>>
>> So what's the core issue? Is the lock_t in the wait_queue a sleeping mutex?
>>
> Yep.
>

I see. Won't we hit the same issue when we call pic functions from
atomic context during the guest entry sequence?

--
error compiling committee.c: too many arguments to function

2010-02-24 10:14:11

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Kick VCPU outside PIC lock again

Avi Kivity wrote:
> On 02/24/2010 11:54 AM, Jan Kiszka wrote:
>> Avi Kivity wrote:
>>
>>> On 02/24/2010 11:41 AM, Jan Kiszka wrote:
>>>
>>>> Thomas Gleixner wrote:
>>>>
>>>>
>>>>> On Tue, 23 Feb 2010, Jan Kiszka wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Thomas Gleixner wrote:
>>>>>>
>>>>>>
>>>>>>> The i8254/i8259 locks need to be real spinlocks on preempt-rt. Convert
>>>>>>> them to raw_spinlock. No change for !RT kernels.
>>>>>>>
>>>>>>>
>>>>>> Doesn't fly for -rt anymore: pic_irq_update runs under this raw lock and
>>>>>> calls kvm_vcpu_kick which tries to wake_up some thread -> scheduling
>>>>>> while atomic.
>>>>>>
>>>>>>
>>>>> Hmm, a wakeup itself is fine. Is that code waking a wake queue ?
>>>>>
>>>>>
>>>> Yes, it's a wake queue.
>>>>
>>>>
>>> So what's the core issue? Is the lock_t in the wait_queue a sleeping mutex?
>>>
>> Yep.
>>
>
> I see. Won't we hit the same issue when we call pic functions from
> atomic context during the guest entry sequence?
>

If there are such call paths, for sure. What concrete path(s) do you
have in mind?

Jan

--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

2010-02-24 10:17:40

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Kick VCPU outside PIC lock again

On 02/24/2010 12:13 PM, Jan Kiszka wrote:
>
>
>> I see. Won't we hit the same issue when we call pic functions from
>> atomic context during the guest entry sequence?
>>
>>
> If there are such call paths, for sure. What concrete path(s) do you
> have in mind?
>
>

vcpu_enter_guest() -> inject_pending_event() ->
kvm_cpu_{has,get}_interrupt() -> various pic functions if you're unlucky.

--
error compiling committee.c: too many arguments to function

2010-02-24 10:23:07

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Kick VCPU outside PIC lock again

Avi Kivity wrote:
> On 02/24/2010 12:13 PM, Jan Kiszka wrote:
>>
>>> I see. Won't we hit the same issue when we call pic functions from
>>> atomic context during the guest entry sequence?
>>>
>>>
>> If there are such call paths, for sure. What concrete path(s) do you
>> have in mind?
>>
>>
>
> vcpu_enter_guest() -> inject_pending_event() ->
> kvm_cpu_{has,get}_interrupt() -> various pic functions if you're unlucky.

But do they kick anyone or just check/pull information? Never saw any
warnings during my tests last year (granted: with older -rt and kvm
versions).

Jan

--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

2010-02-24 10:28:11

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Kick VCPU outside PIC lock again

On 02/24/2010 12:22 PM, Jan Kiszka wrote:
> Avi Kivity wrote:
>
>> On 02/24/2010 12:13 PM, Jan Kiszka wrote:
>>
>>>
>>>
>>>> I see. Won't we hit the same issue when we call pic functions from
>>>> atomic context during the guest entry sequence?
>>>>
>>>>
>>>>
>>> If there are such call paths, for sure. What concrete path(s) do you
>>> have in mind?
>>>
>>>
>>>
>> vcpu_enter_guest() -> inject_pending_event() ->
>> kvm_cpu_{has,get}_interrupt() -> various pic functions if you're unlucky.
>>
> But do they kick anyone or just check/pull information?

Probably not, kicking should be a side effect (or rather the main
effect) of queueing an interrupt, not dequeuing it.

> Never saw any
> warnings during my tests last year (granted: with older -rt and kvm
> versions).
>

Well, most guests kill the pic early on. Will apply the patch.

--
error compiling committee.c: too many arguments to function

2010-02-24 10:29:10

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Kick VCPU outside PIC lock again

Jan Kiszka wrote:
> Avi Kivity wrote:
>> On 02/24/2010 12:13 PM, Jan Kiszka wrote:
>>>
>>>> I see. Won't we hit the same issue when we call pic functions from
>>>> atomic context during the guest entry sequence?
>>>>
>>>>
>>> If there are such call paths, for sure. What concrete path(s) do you
>>> have in mind?
>>>
>>>
>> vcpu_enter_guest() -> inject_pending_event() ->
>> kvm_cpu_{has,get}_interrupt() -> various pic functions if you're unlucky.
>
> But do they kick anyone or just check/pull information? Never saw any
> warnings during my tests last year (granted: with older -rt and kvm
> versions).

Mmh, they could if there is > 1 IRQ pending. Guess this just never
triggered in real life due to typical APIC use and low IRQ load during
PIC times in my tests.

Jan

--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

2010-02-24 10:31:31

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Kick VCPU outside PIC lock again

Avi Kivity wrote:
> On 02/24/2010 12:22 PM, Jan Kiszka wrote:
>> Avi Kivity wrote:
>>
>>> On 02/24/2010 12:13 PM, Jan Kiszka wrote:
>>>
>>>>
>>>>> I see. Won't we hit the same issue when we call pic functions from
>>>>> atomic context during the guest entry sequence?
>>>>>
>>>>>
>>>>>
>>>> If there are such call paths, for sure. What concrete path(s) do you
>>>> have in mind?
>>>>
>>>>
>>>>
>>> vcpu_enter_guest() -> inject_pending_event() ->
>>> kvm_cpu_{has,get}_interrupt() -> various pic functions if you're unlucky.
>>>
>> But do they kick anyone or just check/pull information?
>
> Probably not, kicking should be a side effect (or rather the main
> effect) of queueing an interrupt, not dequeuing it.
>
>> Never saw any
>> warnings during my tests last year (granted: with older -rt and kvm
>> versions).
>>
>
> Well, most guests kill the pic early on. Will apply the patch.
>

I think it needs some extension: pic_irq_request should only schedule a
wake up on a rising edge of the PIC output.

Jan

--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

2010-02-24 10:41:52

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Kick VCPU outside PIC lock again

On 02/24/2010 12:28 PM, Jan Kiszka wrote:
> Jan Kiszka wrote:
>
>> Avi Kivity wrote:
>>
>>> On 02/24/2010 12:13 PM, Jan Kiszka wrote:
>>>
>>>>
>>>>
>>>>> I see. Won't we hit the same issue when we call pic functions from
>>>>> atomic context during the guest entry sequence?
>>>>>
>>>>>
>>>>>
>>>> If there are such call paths, for sure. What concrete path(s) do you
>>>> have in mind?
>>>>
>>>>
>>>>
>>> vcpu_enter_guest() -> inject_pending_event() ->
>>> kvm_cpu_{has,get}_interrupt() -> various pic functions if you're unlucky.
>>>
>> But do they kick anyone or just check/pull information? Never saw any
>> warnings during my tests last year (granted: with older -rt and kvm
>> versions).
>>
> Mmh, they could if there is> 1 IRQ pending. Guess this just never
> triggered in real life due to typical APIC use and low IRQ load during
> PIC times in my tests.
>

We could just ignore the wakeup in this path. It's called in vcpu
context, so obviously the vcpu is awake and kicking it will only hurt
your feet.

Longer term, we should clear up this mess. One possible path is to move
the pic/lapic/injection stuff out of the the critical section, and use
vcpu->requests to close the race between querying the pic/lapic and
entering the guest.

--
error compiling committee.c: too many arguments to function

2010-02-24 11:43:13

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: Kick VCPU outside PIC lock again

Avi Kivity wrote:
> On 02/24/2010 12:28 PM, Jan Kiszka wrote:
>> Jan Kiszka wrote:
>>
>>> Avi Kivity wrote:
>>>
>>>> On 02/24/2010 12:13 PM, Jan Kiszka wrote:
>>>>
>>>>>
>>>>>> I see. Won't we hit the same issue when we call pic functions from
>>>>>> atomic context during the guest entry sequence?
>>>>>>
>>>>>>
>>>>>>
>>>>> If there are such call paths, for sure. What concrete path(s) do you
>>>>> have in mind?
>>>>>
>>>>>
>>>>>
>>>> vcpu_enter_guest() -> inject_pending_event() ->
>>>> kvm_cpu_{has,get}_interrupt() -> various pic functions if you're unlucky.
>>>>
>>> But do they kick anyone or just check/pull information? Never saw any
>>> warnings during my tests last year (granted: with older -rt and kvm
>>> versions).
>>>
>> Mmh, they could if there is> 1 IRQ pending. Guess this just never
>> triggered in real life due to typical APIC use and low IRQ load during
>> PIC times in my tests.
>>
>
> We could just ignore the wakeup in this path. It's called in vcpu
> context, so obviously the vcpu is awake and kicking it will only hurt
> your feet.

Looking at kvm_vcpu_kick, this already happens: The wake queue is
checked for pending waiters (ie. non if waking ourself), and no IPI is
sent if we run on the same CPU like the VCPU is on. That explains why
this path is practically safe.

>
> Longer term, we should clear up this mess. One possible path is to move
> the pic/lapic/injection stuff out of the the critical section, and use
> vcpu->requests to close the race between querying the pic/lapic and
> entering the guest.
>

Sounds worthwhile as well.

Jan

--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux