2017-09-06 03:40:08

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PATCH 0/3] KVM: SVM: Fix guest not booting w/ AVIC enabled

Certain QEMU options fails to boot VM guest w/ SVM AVIC enabled
(e.g. modprobe kvm_amd avic=1). Investigation shows that this mainly
due to AVIC hardware does not trap into hypervisor when guest OS
writes to APIC_EOI register.

The boot hang is caused by missing timer interrupt when using in-kernel
PIT model (e.g. launch qemu w/ '-no-hpet' option) since it requires
irq acknowledgmen before injecting another interrupt in case
irq re-injection is enabled (normally default).

Suravee Suthikulpanit (3):
KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu()
KVM: Add struct kvm_vcpu pointer parameter to get_enable_apicv()
KVM: SVM: Add irqchip_split() checks before enabling AVIC

arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/svm.c | 43 ++++++++++++++++++++++++++++-------------
arch/x86/kvm/vmx.c | 2 +-
arch/x86/kvm/x86.c | 2 +-
4 files changed, 33 insertions(+), 16 deletions(-)

--
1.8.3.1


2017-09-06 03:40:14

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PATCH 1/3] KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu()

Preparing the base code for subsequent changes. This does not change
existing logic.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/kvm/svm.c | 28 ++++++++++++++++++++--------
1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index af256b7..316edbf 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1587,6 +1587,23 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
avic_update_vapic_bar(svm, APIC_DEFAULT_PHYS_BASE);
}

+static int avic_init_vcpu(struct vcpu_svm *svm)
+{
+ int ret;
+
+ if (!avic)
+ return 0;
+
+ ret = avic_init_backing_page(&svm->vcpu);
+ if (ret)
+ return ret;
+
+ INIT_LIST_HEAD(&svm->ir_list);
+ spin_lock_init(&svm->ir_list_lock);
+
+ return ret;
+}
+
static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
{
struct vcpu_svm *svm;
@@ -1623,14 +1640,9 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
if (!hsave_page)
goto free_page3;

- if (avic) {
- err = avic_init_backing_page(&svm->vcpu);
- if (err)
- goto free_page4;
-
- INIT_LIST_HEAD(&svm->ir_list);
- spin_lock_init(&svm->ir_list_lock);
- }
+ err = avic_init_vcpu(svm);
+ if (err)
+ goto free_page4;

/* We initialize this flag to true to make sure that the is_running
* bit would be set the first time the vcpu is loaded.
--
1.8.3.1

2017-09-06 03:40:18

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PATCH 2/3] KVM: Add struct kvm_vcpu pointer parameter to get_enable_apicv()

Modify struct kvm_x86_ops.arch.apicv_active() to take struct kvm_vcpu
pointer as parameter in preparation to subsequent changes.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/svm.c | 2 +-
arch/x86/kvm/vmx.c | 2 +-
arch/x86/kvm/x86.c | 2 +-
4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f4d120a..3cdde44 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -969,7 +969,7 @@ struct kvm_x86_ops {
void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
void (*enable_irq_window)(struct kvm_vcpu *vcpu);
void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
- bool (*get_enable_apicv)(void);
+ bool (*get_enable_apicv)(struct kvm_vcpu *vcpu);
void (*refresh_apicv_exec_ctrl)(struct kvm_vcpu *vcpu);
void (*hwapic_irr_update)(struct kvm_vcpu *vcpu, int max_irr);
void (*hwapic_isr_update)(struct kvm_vcpu *vcpu, int isr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 316edbf..d1b3eb4 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4386,7 +4386,7 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
return;
}

-static bool svm_get_enable_apicv(void)
+static bool svm_get_enable_apicv(struct kvm_vcpu *vcpu)
{
return avic;
}
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c6ef294..8a7ab16 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4983,7 +4983,7 @@ static void vmx_disable_intercept_msr_x2apic(u32 msr, int type, bool apicv_activ
}
}

-static bool vmx_get_enable_apicv(void)
+static bool vmx_get_enable_apicv(struct kvm_vcpu *vcpu)
{
return enable_apicv;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 05a5e57..b8c40ca 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7946,7 +7946,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
BUG_ON(vcpu->kvm == NULL);
kvm = vcpu->kvm;

- vcpu->arch.apicv_active = kvm_x86_ops->get_enable_apicv();
+ vcpu->arch.apicv_active = kvm_x86_ops->get_enable_apicv(vcpu);
vcpu->arch.pv.pv_unhalted = false;
vcpu->arch.emulate_ctxt.ops = &emulate_ops;
if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_reset_bsp(vcpu))
--
1.8.3.1

2017-09-06 03:40:57

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PATCH 3/3] KVM: SVM: Add irqchip_split() checks before enabling AVIC

SVM AVIC hardware accelerates guest write to APIC_EOI register
(for edge-trigger interrupt), which means it does not trap to KVM.

So, only enable SVM AVIC only in split irqchip mode.
(e.g. launching qemu w/ option '-machine kernel_irqchip=split').

Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/kvm/svm.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index d1b3eb4..7ce191b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1176,7 +1176,6 @@ static void avic_init_vmcb(struct vcpu_svm *svm)
vmcb->control.avic_physical_id = ppa & AVIC_HPA_MASK;
vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID_COUNT;
vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
- svm->vcpu.arch.apicv_active = true;
}

static void init_vmcb(struct vcpu_svm *svm)
@@ -1292,7 +1291,7 @@ static void init_vmcb(struct vcpu_svm *svm)
set_intercept(svm, INTERCEPT_PAUSE);
}

- if (avic)
+ if (kvm_vcpu_apicv_active(&svm->vcpu))
avic_init_vmcb(svm);

/*
@@ -1594,6 +1593,12 @@ static int avic_init_vcpu(struct vcpu_svm *svm)
if (!avic)
return 0;

+ if (!irqchip_split(svm->vcpu.kvm)) {
+ pr_debug("%s: Disable AVIC due to non-split irqchip.\n",
+ __func__);
+ return 0;
+ }
+
ret = avic_init_backing_page(&svm->vcpu);
if (ret)
return ret;
@@ -4388,7 +4393,7 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)

static bool svm_get_enable_apicv(struct kvm_vcpu *vcpu)
{
- return avic;
+ return avic && irqchip_split(vcpu->kvm);
}

static void svm_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
@@ -4405,7 +4410,7 @@ static void svm_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *vmcb = svm->vmcb;

- if (!avic)
+ if (!avic || !irqchip_split(svm->vcpu.kvm))
return;

vmcb->control.int_ctl &= ~AVIC_ENABLE_MASK;
--
1.8.3.1

2017-09-08 15:53:30

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: SVM: Add irqchip_split() checks before enabling AVIC

2017-09-05 22:39-0500, Suravee Suthikulpanit:
> SVM AVIC hardware accelerates guest write to APIC_EOI register
> (for edge-trigger interrupt), which means it does not trap to KVM.
>
> So, only enable SVM AVIC only in split irqchip mode.
> (e.g. launching qemu w/ option '-machine kernel_irqchip=split').

Yeah, hacking TMR to get the VM exit could result in future bugs.
We have to push split irqchip as the deafult in userspaces with this
change.

> Suggested-by: Paolo Bonzini <[email protected]>
> Signed-off-by: Suravee Suthikulpanit <[email protected]>
> ---
> arch/x86/kvm/svm.c | 13 +++++++++----
> 1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index d1b3eb4..7ce191b 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1292,7 +1291,7 @@ static void init_vmcb(struct vcpu_svm *svm)
> set_intercept(svm, INTERCEPT_PAUSE);
> }
>
> - if (avic)
> + if (kvm_vcpu_apicv_active(&svm->vcpu))
> avic_init_vmcb(svm);
> /*
> @@ -1594,6 +1593,12 @@ static int avic_init_vcpu(struct vcpu_svm *svm)
> if (!avic)
> return 0;
>
> + if (!irqchip_split(svm->vcpu.kvm)) {

The other place used kvm_vcpu_apicv_active() instead of checking
irqchip_split() directly, so I think it would be better to be consistent
and do it here too.
I'd also like if this workaround used !irqchip_split() exactly once.

> + pr_debug("%s: Disable AVIC due to non-split irqchip.\n",
> + __func__);

There is going to be too much of those. pr_debug_once() would be a
better notification. We can also report it in svm_get_enable_apicv().

> + return 0;
> + }
> +
> ret = avic_init_backing_page(&svm->vcpu);
> if (ret)
> return ret;
> @@ -4388,7 +4393,7 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
>
> static bool svm_get_enable_apicv(struct kvm_vcpu *vcpu)
> {
> - return avic;
> + return avic && irqchip_split(vcpu->kvm);
> }
>
> static void svm_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
> @@ -4405,7 +4410,7 @@ static void svm_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
> struct vcpu_svm *svm = to_svm(vcpu);
> struct vmcb *vmcb = svm->vmcb;
>
> - if (!avic)
> + if (!avic || !irqchip_split(svm->vcpu.kvm))

Seems like we want !kvm_vcpu_apicv_active() here too.

> return;
>
> vmcb->control.int_ctl &= ~AVIC_ENABLE_MASK;

(separate bug: refresh should be able to enable avic as well.)

thanks.

2017-09-12 09:00:03

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: SVM: Add irqchip_split() checks before enabling AVIC

Hi Radim,

On 9/8/17 08:53, Radim Krčmář wrote:
> 2017-09-05 22:39-0500, Suravee Suthikulpanit:
>> SVM AVIC hardware accelerates guest write to APIC_EOI register
>> (for edge-trigger interrupt), which means it does not trap to KVM.
>>
>> So, only enable SVM AVIC only in split irqchip mode.
>> (e.g. launching qemu w/ option '-machine kernel_irqchip=split').
>
> Yeah, hacking TMR to get the VM exit could result in future bugs.
> We have to push split irqchip as the deafult in userspaces with this
> change.

Actually, I'm not quite sure about the advantages/disadvantages with split
irqchip, and how it would affect other cases, and why it was not used as default
currently.

>> Suggested-by: Paolo Bonzini <[email protected]>
>> Signed-off-by: Suravee Suthikulpanit <[email protected]>
>> ---
>> arch/x86/kvm/svm.c | 13 +++++++++----
>> 1 file changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index d1b3eb4..7ce191b 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -1292,7 +1291,7 @@ static void init_vmcb(struct vcpu_svm *svm)
>> set_intercept(svm, INTERCEPT_PAUSE);
>> }
>>
>> - if (avic)
>> + if (kvm_vcpu_apicv_active(&svm->vcpu))
>> avic_init_vmcb(svm);
>> /*
>> @@ -1594,6 +1593,12 @@ static int avic_init_vcpu(struct vcpu_svm *svm)
>> if (!avic)
>> return 0;
>>
>> + if (!irqchip_split(svm->vcpu.kvm)) {
>
> The other place used kvm_vcpu_apicv_active() instead of checking
> irqchip_split() directly, so I think it would be better to be consistent
> and do it here too.
> I'd also like if this workaround used !irqchip_split() exactly once.

Ok, I'll clean up in V2.

>> + pr_debug("%s: Disable AVIC due to non-split irqchip.\n",
>> + __func__);
>
> There is going to be too much of those. pr_debug_once() would be a
> better notification. We can also report it in svm_get_enable_apicv().

pr_debug_once does not use dynamic debug APIs. I think I can call pr_debug only
when vcpu_id == 0.

>> + return 0;
>> + }
>> +
>> ret = avic_init_backing_page(&svm->vcpu);
>> if (ret)
>> return ret;
>> @@ -4388,7 +4393,7 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
>>
>> static bool svm_get_enable_apicv(struct kvm_vcpu *vcpu)
>> {
>> - return avic;
>> + return avic && irqchip_split(vcpu->kvm);
>> }
>>
>> static void svm_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
>> @@ -4405,7 +4410,7 @@ static void svm_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
>> struct vcpu_svm *svm = to_svm(vcpu);
>> struct vmcb *vmcb = svm->vmcb;
>>
>> - if (!avic)
>> + if (!avic || !irqchip_split(svm->vcpu.kvm))
>
> Seems like we want !kvm_vcpu_apicv_active() here too.

Right.

>
>> return;
>>
>> vmcb->control.int_ctl &= ~AVIC_ENABLE_MASK;
>
> (separate bug: refresh should be able to enable avic as well.)
>
> thanks.
>

Thanks,
Suravee

2017-09-12 13:17:40

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: SVM: Add irqchip_split() checks before enabling AVIC

2017-09-12 01:59-0700, Suravee Suthikulpanit:
> On 9/8/17 08:53, Radim Krčmář wrote:
>> 2017-09-05 22:39-0500, Suravee Suthikulpanit:
>> > SVM AVIC hardware accelerates guest write to APIC_EOI register
>> > (for edge-trigger interrupt), which means it does not trap to KVM.
>> >
>> > So, only enable SVM AVIC only in split irqchip mode.
>> > (e.g. launching qemu w/ option '-machine kernel_irqchip=split').
>>
>> Yeah, hacking TMR to get the VM exit could result in future bugs.
>> We have to push split irqchip as the deafult in userspaces with this
>> change.
>
> Actually, I'm not quite sure about the advantages/disadvantages with split
> irqchip, and how it would affect other cases, and why it was not used as
> default currently.

The main advantage of split irqchip is that we're moving code out of the
kernel, and QEMU's irqchip currently has more features too.

I think it is not the default as the support for split irqchip is recent
(v4.3) and has lower performance, so it is only used in cases that need
the extra features.

> > > + pr_debug("%s: Disable AVIC due to non-split irqchip.\n",
> > > + __func__);
> >
> > There is going to be too much of those. pr_debug_once() would be a
> > better notification. We can also report it in svm_get_enable_apicv().
>
> pr_debug_once does not use dynamic debug APIs. I think I can call pr_debug
> only when vcpu_id == 0.

I see, the rest uses dynamic debug. It is not printing by default, so
v1 is ok. (I'd rather remove the line than to add a condition.)

Thanks.

2017-09-12 14:22:56

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: SVM: Add irqchip_split() checks before enabling AVIC

On 12/09/2017 15:17, Radim Krčmář wrote:
>>> Yeah, hacking TMR to get the VM exit could result in future bugs.
>>> We have to push split irqchip as the deafult in userspaces with this
>>> change.
>> Actually, I'm not quite sure about the advantages/disadvantages with split
>> irqchip, and how it would affect other cases, and why it was not used as
>> default currently.
> The main advantage of split irqchip is that we're moving code out of the
> kernel, and QEMU's irqchip currently has more features too.
>
> I think it is not the default as the support for split irqchip is recent
> (v4.3) and has lower performance, so it is only used in cases that need
> the extra features.

One other difference is that in-kernel PIT is not supported with
split-irqchip, and the QEMU PIT lacks support for reinjecting lost
ticks. But this should only be needed for very old guests at this point.

Paolo

2017-09-12 15:07:51

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: SVM: Add irqchip_split() checks before enabling AVIC



On 9/12/17 07:22, Paolo Bonzini wrote:
> On 12/09/2017 15:17, Radim Krčmář wrote:
>>>> Yeah, hacking TMR to get the VM exit could result in future bugs.
>>>> We have to push split irqchip as the deafult in userspaces with this
>>>> change.
>>> Actually, I'm not quite sure about the advantages/disadvantages with split
>>> irqchip, and how it would affect other cases, and why it was not used as
>>> default currently.
>> The main advantage of split irqchip is that we're moving code out of the
>> kernel, and QEMU's irqchip currently has more features too.
>>
>> I think it is not the default as the support for split irqchip is recent
>> (v4.3) and has lower performance, so it is only used in cases that need
>> the extra features.
>
> One other difference is that in-kernel PIT is not supported with
> split-irqchip, and the QEMU PIT lacks support for reinjecting lost
> ticks. But this should only be needed for very old guests at this point.
>
> Paolo
>

Thanks. I'll look into patch for changing the default in QEMU and follow up with
that separately then.

Suravee