2014-07-17 04:57:23

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH 1/3] KVM: nVMX: Fix virtual interrupt delivery injection

This patch fix bug reported in https://bugzilla.kernel.org/show_bug.cgi?id=73331,
after the patch http://www.spinics.net/lists/kvm/msg105230.html applied, there is
some progress and the L2 can boot up, however, slowly. The original idea of this
fix vid injection patch is from "Zhang, Yang Z" <[email protected]>.

Interrupt which delivered by vid should be injected to L1 by L0 if current is in
L1, or should be injected to L2 by L0 through the old injection way if L1 doesn't
have set VM_EXIT_ACK_INTR_ON_EXIT. The current logic doen't consider these cases.
This patch fix it by vid intr to L1 if current is L1 or L2 through old injection
way if L1 doen't have VM_EXIT_ACK_INTR_ON_EXIT set.

Signed-off-by: Wanpeng Li <[email protected]>
Signed-off-by: "Zhang, Yang Z" <[email protected]>
---
arch/x86/kvm/vmx.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 021d84a..ad36646 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7112,8 +7112,22 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
{
if (max_irr == -1)
return;
-
- vmx_set_rvi(max_irr);
+ if (!is_guest_mode(vcpu)) {
+ vmx_set_rvi(max_irr);
+ } else if (is_guest_mode(vcpu) && !nested_exit_on_intr(vcpu)) {
+ /*
+ * Fall back to old way to inject the interrupt since there
+ * is no vAPIC-v for L2.
+ */
+ if (vcpu->arch.exception.pending ||
+ vcpu->arch.nmi_injected ||
+ vcpu->arch.interrupt.pending)
+ return;
+ else if (vmx_interrupt_allowed(vcpu)) {
+ kvm_queue_interrupt(vcpu, max_irr, false);
+ vmx_inject_irq(vcpu);
+ }
+ }
}

static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
--
1.9.1


2014-07-17 04:57:36

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH 2/3] KVM: nVMX: Fix fail to get nested ack intr's vector during nested vmexit

WARNING: CPU: 9 PID: 7251 at arch/x86/kvm/vmx.c:8719 nested_vmx_vmexit+0xa4/0x233 [kvm_intel]()
Modules linked in: tun nfsv3 nfs_acl auth_rpcgss oid_registry nfsv4 dns_resolver nfs fscache lockd
sunrpc pci_stub netconsole kvm_intel kvm bridge stp llc autofs4 8021q ipv6 uinput joydev microcode
pcspkr igb i2c_algo_bit ehci_pci ehci_hcd e1000e ixgbe ptp pps_core hwmon mdio i2c_i801 i2c_core
tpm_tis tpm ipmi_si ipmi_msghandler isci libsas scsi_transport_sas button dm_mirror dm_region_hash
dm_log dm_mod
CPU: 9 PID: 7251 Comm: qemu-system-x86 Tainted: G W 3.16.0-rc1 #2
Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
000000000000220f ffff880ffd107bf8 ffffffff81493563 000000000000220f
0000000000000000 ffff880ffd107c38 ffffffff8103f0eb ffff880ffd107c48
ffffffffa059709a ffff881ffc9e0040 ffff8800b74b8000 00000000ffffffff
Call Trace:
[<ffffffff81493563>] dump_stack+0x49/0x5e
[<ffffffff8103f0eb>] warn_slowpath_common+0x7c/0x96
[<ffffffffa059709a>] ? nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
[<ffffffff8103f11a>] warn_slowpath_null+0x15/0x17
[<ffffffffa059709a>] nested_vmx_vmexit+0xa4/0x233 [kvm_intel]
[<ffffffffa0594295>] ? nested_vmx_exit_handled+0x6a/0x39e [kvm_intel]
[<ffffffffa0537931>] ? kvm_apic_has_interrupt+0x80/0xd5 [kvm]
[<ffffffffa05972ec>] vmx_check_nested_events+0xc3/0xd3 [kvm_intel]
[<ffffffffa051ebe9>] inject_pending_event+0xd0/0x16e [kvm]
[<ffffffffa051efa0>] vcpu_enter_guest+0x319/0x704 [kvm]

After commit 77b0f5d (KVM: nVMX: Ack and write vector info to intr_info if L1
asks us to), "Acknowledge interrupt on exit" behavior can be emulated. Current
logic will ask for intr vector if it is nested vmexit and VM_EXIT_ACK_INTR_ON_EXIT
is set by L1. However, intr vector for posted intr can't be got by generic read
pending interrupt vector and intack routine, there is a requirement to sync from
pir to irr. This patch fix it by ask the intr vector after sync pir to irr.

Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/lapic.c | 1 +
arch/x86/kvm/vmx.c | 3 +++
2 files changed, 4 insertions(+)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 0069118..b7d45dc 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1637,6 +1637,7 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
apic_clear_irr(vector, apic);
return vector;
}
+EXPORT_SYMBOL_GPL(kvm_get_apic_interrupt);

void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
struct kvm_lapic_state *s)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4ae5ad8..31f1479 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8697,6 +8697,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
&& nested_exit_intr_ack_set(vcpu)) {
int irq = kvm_cpu_get_interrupt(vcpu);
+
+ if (irq < 0 && kvm_apic_vid_enabled(vcpu->kvm))
+ irq = kvm_get_apic_interrupt(vcpu);
WARN_ON(irq < 0);
vmcs12->vm_exit_intr_info = irq |
INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR;
--
1.9.1

2014-07-17 04:57:45

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH 3/3] KVM: nVMX: Fix vmptrld fail and vmwrite error when L1 goes down

This bug can be trigger by L1 goes down directly w/ enable_shadow_vmcs.

[ 6413.158950] kvm: vmptrld (null)/780000000000 failed
[ 6413.158954] vmwrite error: reg 401e value 4 (err 1)
[ 6413.158957] CPU: 0 PID: 4840 Comm: qemu-system-x86 Tainted: G OE 3.16.0kvm+ #2
[ 6413.158958] Hardware name: Dell Inc. OptiPlex 9020/0DNKMN, BIOS A05 12/05/2013
[ 6413.158959] 0000000000000003 ffff880210c9fb58 ffffffff81741de9 ffff8800d7433f80
[ 6413.158960] ffff880210c9fb68 ffffffffa059fa08 ffff880210c9fb78 ffffffffa05938bf
[ 6413.158962] ffff880210c9fba8 ffffffffa059a97f ffff8800d7433f80 0000000000000003
[ 6413.158963] Call Trace:
[ 6413.158968] [<ffffffff81741de9>] dump_stack+0x45/0x56
[ 6413.158972] [<ffffffffa059fa08>] vmwrite_error+0x2c/0x2e [kvm_intel]
[ 6413.158974] [<ffffffffa05938bf>] vmcs_writel+0x1f/0x30 [kvm_intel]
[ 6413.158976] [<ffffffffa059a97f>] free_nested.part.73+0x5f/0x170 [kvm_intel]
[ 6413.158978] [<ffffffffa059ab13>] vmx_free_vcpu+0x33/0x70 [kvm_intel]
[ 6413.158991] [<ffffffffa0360324>] kvm_arch_vcpu_free+0x44/0x50 [kvm]
[ 6413.158998] [<ffffffffa0360f92>] kvm_arch_destroy_vm+0xf2/0x1f0 [kvm]

Commit 26a865 (KVM: VMX: fix use after free of vmx->loaded_vmcs) fix the use
after free bug by move free_loaded_vmcs() before free_nested(), however, this
lead to free loaded_vmcs->vmcs premature and vmptrld load a NULL pointer during
sync shadow vmcs to vmcs12. In addition, vmwrite which used to disable shadow
vmcs and reset VMCS_LINK_POINTER failed since there is no valid current-VMCS.
This patch fix it by skipping sync shadow vmcs and reset vmcs field for L1
destroy since they will be reinitialized after L1 recreate.

Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/vmx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index fbce89e..2b28da7 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6113,9 +6113,9 @@ static void free_nested(struct vcpu_vmx *vmx)
return;
vmx->nested.vmxon = false;
if (vmx->nested.current_vmptr != -1ull) {
- nested_release_vmcs12(vmx);
vmx->nested.current_vmptr = -1ull;
vmx->nested.current_vmcs12 = NULL;
+ nested_release_vmcs12(vmx);
}
if (enable_shadow_vmcs)
free_vmcs(vmx->nested.current_shadow_vmcs);
--
1.9.1

2014-07-17 05:15:44

by Zhang, Yang Z

[permalink] [raw]
Subject: RE: [PATCH 2/3] KVM: nVMX: Fix fail to get nested ack intr's vector during nested vmexit

Wanpeng Li wrote on 2014-07-17:
> WARNING: CPU: 9 PID: 7251 at arch/x86/kvm/vmx.c:8719
> nested_vmx_vmexit+0xa4/0x233 [kvm_intel]() Modules linked in: tun
> nfsv3 nfs_acl auth_rpcgss oid_registry nfsv4 dns_resolver nfs fscache
> lockd sunrpc pci_stub netconsole kvm_intel kvm bridge stp llc autofs4
> 8021q ipv6 uinput joydev microcode pcspkr igb i2c_algo_bit ehci_pci
> ehci_hcd e1000e ixgbe ptp pps_core hwmon mdio i2c_i801 i2c_core
> tpm_tis tpm ipmi_si ipmi_msghandler isci libsas scsi_transport_sas
> button dm_mirror dm_region_hash dm_log dm_mod
> CPU: 9 PID: 7251 Comm: qemu-system-x86 Tainted: G W
> 3.16.0-rc1 #2
> Hardware name: Intel Corporation S2600CP/S2600CP, BIOS
> RMLSDP.86I.00.29.D696.1311111329 11/11/2013 000000000000220f
> ffff880ffd107bf8 ffffffff81493563 000000000000220f
> 0000000000000000 ffff880ffd107c38 ffffffff8103f0eb ffff880ffd107c48
> ffffffffa059709a ffff881ffc9e0040 ffff8800b74b8000 00000000ffffffff
> Call Trace: [<ffffffff81493563>] dump_stack+0x49/0x5e
> [<ffffffff8103f0eb>]
> warn_slowpath_common+0x7c/0x96 [<ffffffffa059709a>] ?
> nested_vmx_vmexit+0xa4/0x233 [kvm_intel] [<ffffffff8103f11a>]
> warn_slowpath_null+0x15/0x17 [<ffffffffa059709a>]
> nested_vmx_vmexit+0xa4/0x233 [kvm_intel] [<ffffffffa0594295>] ?
> nested_vmx_exit_handled+0x6a/0x39e [kvm_intel] [<ffffffffa0537931>] ?
> kvm_apic_has_interrupt+0x80/0xd5 [kvm] [<ffffffffa05972ec>]
> vmx_check_nested_events+0xc3/0xd3 [kvm_intel] [<ffffffffa051ebe9>]
> inject_pending_event+0xd0/0x16e [kvm] [<ffffffffa051efa0>]
> vcpu_enter_guest+0x319/0x704 [kvm]
>
> After commit 77b0f5d (KVM: nVMX: Ack and write vector info to
> intr_info if L1 asks us to), "Acknowledge interrupt on exit" behavior
> can be emulated. Current logic will ask for intr vector if it is
> nested vmexit and VM_EXIT_ACK_INTR_ON_EXIT is set by L1. However, intr
> vector for posted intr can't be got by generic read pending interrupt
> vector and intack routine, there is a requirement to sync from pir to
> irr. This patch fix it by ask the intr vector after sync pir to irr.
>
> Signed-off-by: Wanpeng Li <[email protected]>

Reviewed-by: Yang Zhang <[email protected]>

> ---
> arch/x86/kvm/lapic.c | 1 +
> arch/x86/kvm/vmx.c | 3 +++
> 2 files changed, 4 insertions(+)
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index
> 0069118..b7d45dc 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1637,6 +1637,7 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
> apic_clear_irr(vector, apic);
> return vector;
> }
> +EXPORT_SYMBOL_GPL(kvm_get_apic_interrupt);
>
> void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu,
> struct kvm_lapic_state *s)
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
> 4ae5ad8..31f1479 100644 --- a/arch/x86/kvm/vmx.c +++
> b/arch/x86/kvm/vmx.c @@ -8697,6 +8697,9 @@ static void
> nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
> if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
> && nested_exit_intr_ack_set(vcpu)) {
> int irq = kvm_cpu_get_interrupt(vcpu);
> +
> + if (irq < 0 && kvm_apic_vid_enabled(vcpu->kvm))
> + irq = kvm_get_apic_interrupt(vcpu);
> WARN_ON(irq < 0);
> vmcs12->vm_exit_intr_info = irq |
> INTR_INFO_VALID_MASK | INTR_TYPE_EXT_INTR;


Best regards,
Yang

2014-07-17 08:56:25

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: nVMX: Fix vmptrld fail and vmwrite error when L1 goes down

Il 17/07/2014 06:56, Wanpeng Li ha scritto:
> This bug can be trigger by L1 goes down directly w/ enable_shadow_vmcs.
>
> [ 6413.158950] kvm: vmptrld (null)/780000000000 failed
> [ 6413.158954] vmwrite error: reg 401e value 4 (err 1)
> [ 6413.158957] CPU: 0 PID: 4840 Comm: qemu-system-x86 Tainted: G OE 3.16.0kvm+ #2
> [ 6413.158958] Hardware name: Dell Inc. OptiPlex 9020/0DNKMN, BIOS A05 12/05/2013
> [ 6413.158959] 0000000000000003 ffff880210c9fb58 ffffffff81741de9 ffff8800d7433f80
> [ 6413.158960] ffff880210c9fb68 ffffffffa059fa08 ffff880210c9fb78 ffffffffa05938bf
> [ 6413.158962] ffff880210c9fba8 ffffffffa059a97f ffff8800d7433f80 0000000000000003
> [ 6413.158963] Call Trace:
> [ 6413.158968] [<ffffffff81741de9>] dump_stack+0x45/0x56
> [ 6413.158972] [<ffffffffa059fa08>] vmwrite_error+0x2c/0x2e [kvm_intel]
> [ 6413.158974] [<ffffffffa05938bf>] vmcs_writel+0x1f/0x30 [kvm_intel]
> [ 6413.158976] [<ffffffffa059a97f>] free_nested.part.73+0x5f/0x170 [kvm_intel]
> [ 6413.158978] [<ffffffffa059ab13>] vmx_free_vcpu+0x33/0x70 [kvm_intel]
> [ 6413.158991] [<ffffffffa0360324>] kvm_arch_vcpu_free+0x44/0x50 [kvm]
> [ 6413.158998] [<ffffffffa0360f92>] kvm_arch_destroy_vm+0xf2/0x1f0 [kvm]
>
> Commit 26a865 (KVM: VMX: fix use after free of vmx->loaded_vmcs) fix the use
> after free bug by move free_loaded_vmcs() before free_nested(), however, this
> lead to free loaded_vmcs->vmcs premature and vmptrld load a NULL pointer during
> sync shadow vmcs to vmcs12. In addition, vmwrite which used to disable shadow
> vmcs and reset VMCS_LINK_POINTER failed since there is no valid current-VMCS.
> This patch fix it by skipping sync shadow vmcs and reset vmcs field for L1
> destroy since they will be reinitialized after L1 recreate.
>
> Signed-off-by: Wanpeng Li <[email protected]>
> ---
> arch/x86/kvm/vmx.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index fbce89e..2b28da7 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6113,9 +6113,9 @@ static void free_nested(struct vcpu_vmx *vmx)
> return;
> vmx->nested.vmxon = false;
> if (vmx->nested.current_vmptr != -1ull) {
> - nested_release_vmcs12(vmx);
> vmx->nested.current_vmptr = -1ull;
> vmx->nested.current_vmcs12 = NULL;
> + nested_release_vmcs12(vmx);
> }
> if (enable_shadow_vmcs)
> free_vmcs(vmx->nested.current_shadow_vmcs);
>

This looks good, I'll apply it to kvm/master.

Paolo

2014-07-17 09:00:47

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 1/3] KVM: nVMX: Fix virtual interrupt delivery injection

Il 17/07/2014 06:56, Wanpeng Li ha scritto:
> This patch fix bug reported in https://bugzilla.kernel.org/show_bug.cgi?id=73331,
> after the patch http://www.spinics.net/lists/kvm/msg105230.html applied, there is
> some progress and the L2 can boot up, however, slowly. The original idea of this
> fix vid injection patch is from "Zhang, Yang Z" <[email protected]>.
>
> Interrupt which delivered by vid should be injected to L1 by L0 if current is in
> L1, or should be injected to L2 by L0 through the old injection way if L1 doesn't
> have set VM_EXIT_ACK_INTR_ON_EXIT. The current logic doen't consider these cases.
> This patch fix it by vid intr to L1 if current is L1 or L2 through old injection
> way if L1 doen't have VM_EXIT_ACK_INTR_ON_EXIT set.
>
> Signed-off-by: Wanpeng Li <[email protected]>
> Signed-off-by: "Zhang, Yang Z" <[email protected]>
> ---
> arch/x86/kvm/vmx.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 021d84a..ad36646 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -7112,8 +7112,22 @@ static void vmx_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
> {
> if (max_irr == -1)
> return;
> -
> - vmx_set_rvi(max_irr);
> + if (!is_guest_mode(vcpu)) {
> + vmx_set_rvi(max_irr);
> + } else if (is_guest_mode(vcpu) && !nested_exit_on_intr(vcpu)) {
> + /*
> + * Fall back to old way to inject the interrupt since there
> + * is no vAPIC-v for L2.
> + */
> + if (vcpu->arch.exception.pending ||
> + vcpu->arch.nmi_injected ||
> + vcpu->arch.interrupt.pending)
> + return;
> + else if (vmx_interrupt_allowed(vcpu)) {
> + kvm_queue_interrupt(vcpu, max_irr, false);
> + vmx_inject_irq(vcpu);
> + }
> + }
> }
>
> static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
>

What hypervisor did you test with? nested_exit_on_intr(vcpu) will return
true for both Xen and KVM (nested_exit_on_intr is not the same thing as
ACK_INTR_ON_EXIT).

Paolo

2014-07-17 09:03:49

by Zhang, Yang Z

[permalink] [raw]
Subject: RE: [PATCH 1/3] KVM: nVMX: Fix virtual interrupt delivery injection

Paolo Bonzini wrote on 2014-07-17:
> Il 17/07/2014 06:56, Wanpeng Li ha scritto:
>> This patch fix bug reported in
>> https://bugzilla.kernel.org/show_bug.cgi?id=73331, after the patch
>> http://www.spinics.net/lists/kvm/msg105230.html applied, there is some
>> progress and the L2 can boot up, however, slowly. The original idea of
>> this fix vid injection patch is from "Zhang, Yang Z"
>> <[email protected]>.
>>
>> Interrupt which delivered by vid should be injected to L1 by L0 if
>> current is in L1, or should be injected to L2 by L0 through the old
>> injection way if L1 doesn't have set VM_EXIT_ACK_INTR_ON_EXIT. The
>> current logic doen't consider these cases. This patch fix it by vid
>> intr to L1 if current is L1 or L2 through old injection way if L1
>> doen't have VM_EXIT_ACK_INTR_ON_EXIT set.
>>
>> Signed-off-by: Wanpeng Li <[email protected]>
>> Signed-off-by: "Zhang, Yang Z" <[email protected]>
>> ---
>> arch/x86/kvm/vmx.c | 18 ++++++++++++++++--
>> 1 file changed, 16 insertions(+), 2 deletions(-)
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
>> 021d84a..ad36646 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -7112,8 +7112,22 @@ static void vmx_hwapic_irr_update(struct
>> kvm_vcpu *vcpu, int max_irr) {
>> if (max_irr == -1)
>> return;
>> -
>> - vmx_set_rvi(max_irr);
>> + if (!is_guest_mode(vcpu)) {
>> + vmx_set_rvi(max_irr);
>> + } else if (is_guest_mode(vcpu) && !nested_exit_on_intr(vcpu)) {
>> + /*
>> + * Fall back to old way to inject the interrupt since there
>> + * is no vAPIC-v for L2.
>> + */
>> + if (vcpu->arch.exception.pending ||
>> + vcpu->arch.nmi_injected ||
>> + vcpu->arch.interrupt.pending)
>> + return;
>> + else if (vmx_interrupt_allowed(vcpu)) {
>> + kvm_queue_interrupt(vcpu, max_irr, false);
>> + vmx_inject_irq(vcpu);
>> + }
>> + }
>> }
>>
>> static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64
>> *eoi_exit_bitmap)
>>
>
> What hypervisor did you test with? nested_exit_on_intr(vcpu) will

Jailhouse will clear External-interrupt exiting bit. Am I right? Jan.

> return true for both Xen and KVM (nested_exit_on_intr is not the same
> thing as ACK_INTR_ON_EXIT).

I guess he want to say External-interrupt exiting bit not ACK_INTR_ON_EXIT.

>
> Paolo


Best regards,
Yang

2014-07-17 09:04:28

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: nVMX: Fix vmptrld fail and vmwrite error when L1 goes down

Il 17/07/2014 10:56, Paolo Bonzini ha scritto:
> Il 17/07/2014 06:56, Wanpeng Li ha scritto:
>> This bug can be trigger by L1 goes down directly w/ enable_shadow_vmcs.
>>
>> [ 6413.158950] kvm: vmptrld (null)/780000000000 failed
>> [ 6413.158954] vmwrite error: reg 401e value 4 (err 1)
>> [ 6413.158957] CPU: 0 PID: 4840 Comm: qemu-system-x86 Tainted:
>> G OE 3.16.0kvm+ #2
>> [ 6413.158958] Hardware name: Dell Inc. OptiPlex 9020/0DNKMN, BIOS A05
>> 12/05/2013
>> [ 6413.158959] 0000000000000003 ffff880210c9fb58 ffffffff81741de9
>> ffff8800d7433f80
>> [ 6413.158960] ffff880210c9fb68 ffffffffa059fa08 ffff880210c9fb78
>> ffffffffa05938bf
>> [ 6413.158962] ffff880210c9fba8 ffffffffa059a97f ffff8800d7433f80
>> 0000000000000003
>> [ 6413.158963] Call Trace:
>> [ 6413.158968] [<ffffffff81741de9>] dump_stack+0x45/0x56
>> [ 6413.158972] [<ffffffffa059fa08>] vmwrite_error+0x2c/0x2e [kvm_intel]
>> [ 6413.158974] [<ffffffffa05938bf>] vmcs_writel+0x1f/0x30 [kvm_intel]
>> [ 6413.158976] [<ffffffffa059a97f>] free_nested.part.73+0x5f/0x170
>> [kvm_intel]
>> [ 6413.158978] [<ffffffffa059ab13>] vmx_free_vcpu+0x33/0x70 [kvm_intel]
>> [ 6413.158991] [<ffffffffa0360324>] kvm_arch_vcpu_free+0x44/0x50 [kvm]
>> [ 6413.158998] [<ffffffffa0360f92>] kvm_arch_destroy_vm+0xf2/0x1f0 [kvm]
>>
>> Commit 26a865 (KVM: VMX: fix use after free of vmx->loaded_vmcs) fix
>> the use
>> after free bug by move free_loaded_vmcs() before free_nested(),
>> however, this
>> lead to free loaded_vmcs->vmcs premature and vmptrld load a NULL
>> pointer during
>> sync shadow vmcs to vmcs12. In addition, vmwrite which used to disable
>> shadow
>> vmcs and reset VMCS_LINK_POINTER failed since there is no valid
>> current-VMCS.
>> This patch fix it by skipping sync shadow vmcs and reset vmcs field
>> for L1
>> destroy since they will be reinitialized after L1 recreate.
>>
>> Signed-off-by: Wanpeng Li <[email protected]>
>> ---
>> arch/x86/kvm/vmx.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index fbce89e..2b28da7 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -6113,9 +6113,9 @@ static void free_nested(struct vcpu_vmx *vmx)
>> return;
>> vmx->nested.vmxon = false;
>> if (vmx->nested.current_vmptr != -1ull) {
>> - nested_release_vmcs12(vmx);
>> vmx->nested.current_vmptr = -1ull;
>> vmx->nested.current_vmcs12 = NULL;
>> + nested_release_vmcs12(vmx);
>> }
>> if (enable_shadow_vmcs)
>> free_vmcs(vmx->nested.current_shadow_vmcs);
>>
>
> This looks good, I'll apply it to kvm/master.

Hmm, on second thought the lifetimes of the VMCSes are a total mess.
Let me look more at this.

Paolo

2014-07-17 09:10:50

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH 1/3] KVM: nVMX: Fix virtual interrupt delivery injection

On Thu, Jul 17, 2014 at 09:03:01AM +0000, Zhang, Yang Z wrote:
>Paolo Bonzini wrote on 2014-07-17:
>> Il 17/07/2014 06:56, Wanpeng Li ha scritto:
>>> This patch fix bug reported in
>>> https://bugzilla.kernel.org/show_bug.cgi?id=73331, after the patch
>>> http://www.spinics.net/lists/kvm/msg105230.html applied, there is some
>>> progress and the L2 can boot up, however, slowly. The original idea of
>>> this fix vid injection patch is from "Zhang, Yang Z"
>>> <[email protected]>.
>>>
>>> Interrupt which delivered by vid should be injected to L1 by L0 if
>>> current is in L1, or should be injected to L2 by L0 through the old
>>> injection way if L1 doesn't have set VM_EXIT_ACK_INTR_ON_EXIT. The
>>> current logic doen't consider these cases. This patch fix it by vid
>>> intr to L1 if current is L1 or L2 through old injection way if L1
>>> doen't have VM_EXIT_ACK_INTR_ON_EXIT set.
>>>
>>> Signed-off-by: Wanpeng Li <[email protected]>
>>> Signed-off-by: "Zhang, Yang Z" <[email protected]>
>>> ---
>>> arch/x86/kvm/vmx.c | 18 ++++++++++++++++--
>>> 1 file changed, 16 insertions(+), 2 deletions(-)
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
>>> 021d84a..ad36646 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -7112,8 +7112,22 @@ static void vmx_hwapic_irr_update(struct
>>> kvm_vcpu *vcpu, int max_irr) {
>>> if (max_irr == -1)
>>> return;
>>> -
>>> - vmx_set_rvi(max_irr);
>>> + if (!is_guest_mode(vcpu)) {
>>> + vmx_set_rvi(max_irr);
>>> + } else if (is_guest_mode(vcpu) && !nested_exit_on_intr(vcpu)) {
>>> + /*
>>> + * Fall back to old way to inject the interrupt since there
>>> + * is no vAPIC-v for L2.
>>> + */
>>> + if (vcpu->arch.exception.pending ||
>>> + vcpu->arch.nmi_injected ||
>>> + vcpu->arch.interrupt.pending)
>>> + return;
>>> + else if (vmx_interrupt_allowed(vcpu)) {
>>> + kvm_queue_interrupt(vcpu, max_irr, false);
>>> + vmx_inject_irq(vcpu);
>>> + }
>>> + }
>>> }
>>>
>>> static void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64
>>> *eoi_exit_bitmap)
>>>
>>
>> What hypervisor did you test with? nested_exit_on_intr(vcpu) will
>
>Jailhouse will clear External-interrupt exiting bit. Am I right? Jan.
>
>> return true for both Xen and KVM (nested_exit_on_intr is not the same
>> thing as ACK_INTR_ON_EXIT).
>
>I guess he want to say External-interrupt exiting bit not ACK_INTR_ON_EXIT.
>

Ah yes, a typo here.

Regards,
Wanpeng Li

>>
>> Paolo
>
>
>Best regards,
>Yang
>

2014-07-17 09:14:05

by Zhang, Yang Z

[permalink] [raw]
Subject: RE: [PATCH 2/3] KVM: nVMX: Fix fail to get nested ack intr's vector during nested vmexit

Paolo Bonzini wrote on 2014-07-17:
> Il 17/07/2014 06:56, Wanpeng Li ha scritto:
>> && nested_exit_intr_ack_set(vcpu)) {
>> int irq = kvm_cpu_get_interrupt(vcpu);
>> +
>> + if (irq < 0 && kvm_apic_vid_enabled(vcpu->kvm))
>> + irq = kvm_get_apic_interrupt(vcpu);
>
> There's something weird in this patch. If you "inline"
> kvm_cpu_get_interrupt, what you get is this:
>
> int irq;
> /* Beginning of kvm_cpu_get_interrupt... */
> if (!irqchip_in_kernel(v->kvm))
> irq = v->arch.interrupt.nr;
> else {
> irq = kvm_cpu_get_extint(v); /* PIC */
> if (!kvm_apic_vid_enabled(v->kvm) && irq == -1)
> irq = kvm_get_apic_interrupt(v); /* APIC */
> }
>
> /* kvm_cpu_get_interrupt done. */
> if (irq < 0 && kvm_apic_vid_enabled(vcpu->kvm))
> irq = kvm_get_apic_interrupt(vcpu);
>
> There are just two callers of kvm_cpu_get_interrupt, and the other is
> protected by kvm_cpu_has_injectable_intr so it won't be executed if
> virtual interrupt delivery is enabled. So you patch is effectively the same as this:
>
> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c index
> bd0da43..a1ec6a5 100644 --- a/arch/x86/kvm/irq.c +++
> b/arch/x86/kvm/irq.c @@ -108,7 +108,7 @@ int
> kvm_cpu_get_interrupt(struct kvm_vcpu *v)
>
> vector = kvm_cpu_get_extint(v);
> - if (kvm_apic_vid_enabled(v->kvm) || vector != -1)
> + if (vector != -1)
> return vector; /* PIC */
>
> return kvm_get_apic_interrupt(v); /* APIC */
> But in kvm_get_apic_interrupt I have just added this comment:
>
> /* Note that we never get here with APIC virtualization
> * enabled. */
>
> because kvm_get_apic_interrupt calls apic_set_isr, and apic_set_isr
> must never be called with APIC virtualization enabled either. With
> APIC virtualization enabled, isr_count is always 1, and
> highest_isr_cache is always -1, and apic_set_isr breaks both of these invariants.
>

You are right. kvm_lapic_find_highest_irr should be the right one.

> Paolo


Best regards,
Yang

2014-07-17 09:46:37

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 2/3] KVM: nVMX: Fix fail to get nested ack intr's vector during nested vmexit

Il 17/07/2014 06:56, Wanpeng Li ha scritto:
> && nested_exit_intr_ack_set(vcpu)) {
> int irq = kvm_cpu_get_interrupt(vcpu);
> +
> + if (irq < 0 && kvm_apic_vid_enabled(vcpu->kvm))
> + irq = kvm_get_apic_interrupt(vcpu);

There's something weird in this patch. If you "inline"
kvm_cpu_get_interrupt, what you get is this:

int irq;

/* Beginning of kvm_cpu_get_interrupt... */
if (!irqchip_in_kernel(v->kvm))
irq = v->arch.interrupt.nr;
else {
irq = kvm_cpu_get_extint(v); /* PIC */
if (!kvm_apic_vid_enabled(v->kvm) && irq == -1)
irq = kvm_get_apic_interrupt(v); /* APIC */
}

/* kvm_cpu_get_interrupt done. */
if (irq < 0 && kvm_apic_vid_enabled(vcpu->kvm))
irq = kvm_get_apic_interrupt(vcpu);

There are just two callers of kvm_cpu_get_interrupt, and the other is
protected by kvm_cpu_has_injectable_intr so it won't be executed if
virtual interrupt delivery is enabled. So you patch is effectively the
same as this:

diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index bd0da43..a1ec6a5 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -108,7 +108,7 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v)

vector = kvm_cpu_get_extint(v);

- if (kvm_apic_vid_enabled(v->kvm) || vector != -1)
+ if (vector != -1)
return vector; /* PIC */

return kvm_get_apic_interrupt(v); /* APIC */

But in kvm_get_apic_interrupt I have just added this comment:

/* Note that we never get here with APIC virtualization
* enabled. */

because kvm_get_apic_interrupt calls apic_set_isr, and apic_set_isr must
never be called with APIC virtualization enabled either. With APIC
virtualization enabled, isr_count is always 1, and highest_isr_cache is
always -1, and apic_set_isr breaks both of these invariants.

Paolo

2014-07-17 10:00:26

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH 2/3] KVM: nVMX: Fix fail to get nested ack intr's vector during nested vmexit

On Thu, Jul 17, 2014 at 09:13:56AM +0000, Zhang, Yang Z wrote:
>Paolo Bonzini wrote on 2014-07-17:
>> Il 17/07/2014 06:56, Wanpeng Li ha scritto:
>>> && nested_exit_intr_ack_set(vcpu)) {
>>> int irq = kvm_cpu_get_interrupt(vcpu);
>>> +
>>> + if (irq < 0 && kvm_apic_vid_enabled(vcpu->kvm))
>>> + irq = kvm_get_apic_interrupt(vcpu);
>>
>> There's something weird in this patch. If you "inline"
>> kvm_cpu_get_interrupt, what you get is this:
>>
>> int irq;
>> /* Beginning of kvm_cpu_get_interrupt... */
>> if (!irqchip_in_kernel(v->kvm))
>> irq = v->arch.interrupt.nr;
>> else {
>> irq = kvm_cpu_get_extint(v); /* PIC */
>> if (!kvm_apic_vid_enabled(v->kvm) && irq == -1)
>> irq = kvm_get_apic_interrupt(v); /* APIC */
>> }
>>
>> /* kvm_cpu_get_interrupt done. */
>> if (irq < 0 && kvm_apic_vid_enabled(vcpu->kvm))
>> irq = kvm_get_apic_interrupt(vcpu);
>>
>> There are just two callers of kvm_cpu_get_interrupt, and the other is
>> protected by kvm_cpu_has_injectable_intr so it won't be executed if
>> virtual interrupt delivery is enabled. So you patch is effectively the same as this:
>>
>> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c index
>> bd0da43..a1ec6a5 100644 --- a/arch/x86/kvm/irq.c +++
>> b/arch/x86/kvm/irq.c @@ -108,7 +108,7 @@ int
>> kvm_cpu_get_interrupt(struct kvm_vcpu *v)
>>
>> vector = kvm_cpu_get_extint(v);
>> - if (kvm_apic_vid_enabled(v->kvm) || vector != -1)
>> + if (vector != -1)
>> return vector; /* PIC */
>>
>> return kvm_get_apic_interrupt(v); /* APIC */
>> But in kvm_get_apic_interrupt I have just added this comment:
>>
>> /* Note that we never get here with APIC virtualization
>> * enabled. */
>>
>> because kvm_get_apic_interrupt calls apic_set_isr, and apic_set_isr
>> must never be called with APIC virtualization enabled either. With
>> APIC virtualization enabled, isr_count is always 1, and
>> highest_isr_cache is always -1, and apic_set_isr breaks both of these invariants.
>>
>
>You are right. kvm_lapic_find_highest_irr should be the right one.

That is my original proposal solution of this bug. However, what I concern
after more think is since kvm_lapic_find_highest_irr will not clear
irr, if the intr will be injected by kvm_86_ops->hwapic_irr_update(vcpu,
kvm_lapic_find_highest_irr(vcpu)) which called by vcpu_enter_guest() again?

Any idea, Paolo?

Regards,
Wanpeng Li

>
>> Paolo
>
>
>Best regards,
>Yang
>


Attachments:
(No filename) (2.43 kB)
0001-fix-warning.patch (3.35 kB)
Download all attachments

2014-07-17 10:03:16

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH 2/3] KVM: nVMX: Fix fail to get nested ack intr's vector during nested vmexit

On Thu, Jul 17, 2014 at 09:13:56AM +0000, Zhang, Yang Z wrote:
>Paolo Bonzini wrote on 2014-07-17:
>> Il 17/07/2014 06:56, Wanpeng Li ha scritto:
>>> && nested_exit_intr_ack_set(vcpu)) {
>>> int irq = kvm_cpu_get_interrupt(vcpu);
>>> +
>>> + if (irq < 0 && kvm_apic_vid_enabled(vcpu->kvm))
>>> + irq = kvm_get_apic_interrupt(vcpu);
>>
>> There's something weird in this patch. If you "inline"
>> kvm_cpu_get_interrupt, what you get is this:
>>
>> int irq;
>> /* Beginning of kvm_cpu_get_interrupt... */
>> if (!irqchip_in_kernel(v->kvm))
>> irq = v->arch.interrupt.nr;
>> else {
>> irq = kvm_cpu_get_extint(v); /* PIC */
>> if (!kvm_apic_vid_enabled(v->kvm) && irq == -1)
>> irq = kvm_get_apic_interrupt(v); /* APIC */
>> }
>>
>> /* kvm_cpu_get_interrupt done. */
>> if (irq < 0 && kvm_apic_vid_enabled(vcpu->kvm))
>> irq = kvm_get_apic_interrupt(vcpu);
>>
>> There are just two callers of kvm_cpu_get_interrupt, and the other is
>> protected by kvm_cpu_has_injectable_intr so it won't be executed if
>> virtual interrupt delivery is enabled. So you patch is effectively the same as this:
>>
>> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c index
>> bd0da43..a1ec6a5 100644 --- a/arch/x86/kvm/irq.c +++
>> b/arch/x86/kvm/irq.c @@ -108,7 +108,7 @@ int
>> kvm_cpu_get_interrupt(struct kvm_vcpu *v)
>>
>> vector = kvm_cpu_get_extint(v);
>> - if (kvm_apic_vid_enabled(v->kvm) || vector != -1)
>> + if (vector != -1)
>> return vector; /* PIC */
>>
>> return kvm_get_apic_interrupt(v); /* APIC */
>> But in kvm_get_apic_interrupt I have just added this comment:
>>
>> /* Note that we never get here with APIC virtualization
>> * enabled. */
>>
>> because kvm_get_apic_interrupt calls apic_set_isr, and apic_set_isr
>> must never be called with APIC virtualization enabled either. With
>> APIC virtualization enabled, isr_count is always 1, and
>> highest_isr_cache is always -1, and apic_set_isr breaks both of these invariants.
>>
>
>You are right. kvm_lapic_find_highest_irr should be the right one.
>

That is my original proposal solution of this bug. However, what I concern
after more think is since kvm_lapic_find_highest_irr will not clear irr,
if the intr will be injected by kvm_86_ops->hwapic_irr_update(vcpu,
kvm_lapic_find_highest_irr(vcpu)) which called by vcpu_enter_guest()
again?

Any idea, Paolo?

Regards,
Wanpeng Li


>> Paolo
>
>
>Best regards,
>Yang
>


Attachments:
(No filename) (2.43 kB)
0001-fix-warning.patch (3.35 kB)
Download all attachments

2014-07-17 10:43:43

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 2/3] KVM: nVMX: Fix fail to get nested ack intr's vector during nested vmexit

Il 17/07/2014 12:01, Wanpeng Li ha scritto:
> That is my original proposal solution of this bug. However, what I concern
> after more think is since kvm_lapic_find_highest_irr will not clear
> irr, if the intr will be injected by kvm_86_ops->hwapic_irr_update(vcpu,
> kvm_lapic_find_highest_irr(vcpu)) which called by vcpu_enter_guest() again?
>
> Any idea, Paolo?

The processor should do that when it does the virtual interrupt
delivery. It will do (29.2.2):

Vector := RVI;
VISR[Vector] := 1;
SVI := Vector;
VIRR[Vector] := 0;
If VIRR not empty
then RVI := highest index of bit set in VIRR
else RVI := 0
Fi;
deliver interrupt with Vector through IDT;

Please post a patch, so we can reason on it better.

Paolo

2014-07-17 10:44:15

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 1/3] KVM: nVMX: Fix virtual interrupt delivery injection

Il 17/07/2014 11:11, Wanpeng Li ha scritto:
>>> >> What hypervisor did you test with? nested_exit_on_intr(vcpu) will
>> >
>> >Jailhouse will clear External-interrupt exiting bit. Am I right? Jan.
>> >
>>> >> return true for both Xen and KVM (nested_exit_on_intr is not the same
>>> >> thing as ACK_INTR_ON_EXIT).
>> >
>> >I guess he want to say External-interrupt exiting bit not ACK_INTR_ON_EXIT.
>> >
> Ah yes, a typo here.

Ok, please repost this patch together with your version of patch 2.

Leave aside patch 3 for now, as I think the original use-after-free
patch was wrong.

Paolo

2014-07-17 11:07:28

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH 1/3] KVM: nVMX: Fix virtual interrupt delivery injection

On Thu, Jul 17, 2014 at 12:43:58PM +0200, Paolo Bonzini wrote:
>Il 17/07/2014 11:11, Wanpeng Li ha scritto:
>>>>>> What hypervisor did you test with? nested_exit_on_intr(vcpu) will
>>>>
>>>>Jailhouse will clear External-interrupt exiting bit. Am I right? Jan.
>>>>
>>>>>> return true for both Xen and KVM (nested_exit_on_intr is not the same
>>>>>> thing as ACK_INTR_ON_EXIT).
>>>>
>>>>I guess he want to say External-interrupt exiting bit not ACK_INTR_ON_EXIT.
>>>>
>>Ah yes, a typo here.
>
>Ok, please repost this patch together with your version of patch 2.

Just send out the version two of 1/3 and 2/3.

>
>Leave aside patch 3 for now, as I think the original use-after-free
>patch was wrong.

Any proposal is appreciated. ;-)

Regards,
Wanpeng Li

>
>Paolo