With commit b6b8a1451fc40412c57d1 that introduced
vmx_check_nested_events, checks for injectable interrupts happen
at different points in time for L1 and L2 that could potentially
cause a race. The regression occurs because KVM_REQ_EVENT is always
set when nested_run_pending is set even if there's no pending interrupt.
Consequently, there could be a small window when check_nested_events
returns without exiting to L1, but an interrupt comes through soon
after and it incorrectly, gets injected to L2 by inject_pending_event
Fix this by adding a call to check for nested events too when a check
for injectable interrupt returns true
Signed-off-by: Bandan Das <[email protected]>
---
arch/x86/kvm/x86.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 73537ec..56327a6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5907,6 +5907,19 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win)
kvm_x86_ops->set_nmi(vcpu);
}
} else if (kvm_cpu_has_injectable_intr(vcpu)) {
+ /*
+ * TODO/FIXME: We are calling check_nested_events again
+ * here to avoid a race condition. We should really be
+ * setting KVM_REQ_EVENT only on certain events
+ * and not unconditionally.
+ * See https://lkml.org/lkml/2014/7/2/60 for discussion
+ * about this proposal and current concerns
+ */
+ if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events) {
+ r = kvm_x86_ops->check_nested_events(vcpu, req_int_win);
+ if (r != 0)
+ return r;
+ }
if (kvm_x86_ops->interrupt_allowed(vcpu)) {
kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
false);
--
1.8.3.1
Il 08/07/2014 06:30, Bandan Das ha scritto:
>
> With commit b6b8a1451fc40412c57d1 that introduced
> vmx_check_nested_events, checks for injectable interrupts happen
> at different points in time for L1 and L2 that could potentially
> cause a race. The regression occurs because KVM_REQ_EVENT is always
> set when nested_run_pending is set even if there's no pending interrupt.
> Consequently, there could be a small window when check_nested_events
> returns without exiting to L1, but an interrupt comes through soon
> after and it incorrectly, gets injected to L2 by inject_pending_event
> Fix this by adding a call to check for nested events too when a check
> for injectable interrupt returns true
>
> Signed-off-by: Bandan Das <[email protected]>
> ---
> arch/x86/kvm/x86.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 73537ec..56327a6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5907,6 +5907,19 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win)
> kvm_x86_ops->set_nmi(vcpu);
> }
> } else if (kvm_cpu_has_injectable_intr(vcpu)) {
> + /*
> + * TODO/FIXME: We are calling check_nested_events again
> + * here to avoid a race condition. We should really be
> + * setting KVM_REQ_EVENT only on certain events
> + * and not unconditionally.
> + * See https://lkml.org/lkml/2014/7/2/60 for discussion
> + * about this proposal and current concerns
> + */
> + if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events) {
> + r = kvm_x86_ops->check_nested_events(vcpu, req_int_win);
> + if (r != 0)
> + return r;
> + }
> if (kvm_x86_ops->interrupt_allowed(vcpu)) {
> kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
> false);
>
I think this should be done for NMI as well.
Jan, what do you think? Can you run Jailhouse through this patch?
Paolo
On 2014-07-08 07:50, Paolo Bonzini wrote:
> Il 08/07/2014 06:30, Bandan Das ha scritto:
>>
>> With commit b6b8a1451fc40412c57d1 that introduced
>> vmx_check_nested_events, checks for injectable interrupts happen
>> at different points in time for L1 and L2 that could potentially
>> cause a race. The regression occurs because KVM_REQ_EVENT is always
>> set when nested_run_pending is set even if there's no pending interrupt.
>> Consequently, there could be a small window when check_nested_events
>> returns without exiting to L1, but an interrupt comes through soon
>> after and it incorrectly, gets injected to L2 by inject_pending_event
>> Fix this by adding a call to check for nested events too when a check
>> for injectable interrupt returns true
>>
>> Signed-off-by: Bandan Das <[email protected]>
>> ---
>> arch/x86/kvm/x86.c | 13 +++++++++++++
>> 1 file changed, 13 insertions(+)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 73537ec..56327a6 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -5907,6 +5907,19 @@ static int inject_pending_event(struct kvm_vcpu
>> *vcpu, bool req_int_win)
>> kvm_x86_ops->set_nmi(vcpu);
>> }
>> } else if (kvm_cpu_has_injectable_intr(vcpu)) {
>> + /*
>> + * TODO/FIXME: We are calling check_nested_events again
>> + * here to avoid a race condition. We should really be
>> + * setting KVM_REQ_EVENT only on certain events
>> + * and not unconditionally.
>> + * See https://lkml.org/lkml/2014/7/2/60 for discussion
>> + * about this proposal and current concerns
>> + */
>> + if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events) {
>> + r = kvm_x86_ops->check_nested_events(vcpu, req_int_win);
>> + if (r != 0)
>> + return r;
>> + }
>> if (kvm_x86_ops->interrupt_allowed(vcpu)) {
>> kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
>> false);
>>
>
> I think this should be done for NMI as well.
I don't think arch.nmi_pending can flip asynchronously, only in the
context of the VCPU thread - in contrast to pending IRQ states.
>
> Jan, what do you think? Can you run Jailhouse through this patch?
Jailhouse seems fine with it, and it resolves the lockup of nested KVM
here as well.
Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
Il 08/07/2014 08:56, Jan Kiszka ha scritto:
> I don't think arch.nmi_pending can flip asynchronously, only in the
> context of the VCPU thread - in contrast to pending IRQ states.
Right, only nmi_queued is changed from other threads. /me should really
look at the code instead of going from memory.
>> Jan, what do you think? Can you run Jailhouse through this patch?
>
> Jailhouse seems fine with it, and it resolves the lockup of nested KVM
> here as well.
Thinking more about it, I think this is the right fix. Not setting
KVM_REQ_EVENT in some cases can be an optimization, but it's not
necessary. Definitely there are other cases in which KVM_REQ_EVENT is
set even though no event is pending---most notably during emulation of
invalid guest state.
Paolo
On Tue, Jul 08, 2014 at 10:00:35AM +0200, Paolo Bonzini wrote:
>Il 08/07/2014 08:56, Jan Kiszka ha scritto:
>>I don't think arch.nmi_pending can flip asynchronously, only in the
>>context of the VCPU thread - in contrast to pending IRQ states.
>
>Right, only nmi_queued is changed from other threads. /me should
>really look at the code instead of going from memory.
>
>>>Jan, what do you think? Can you run Jailhouse through this patch?
>>
>>Jailhouse seems fine with it, and it resolves the lockup of nested KVM
>>here as well.
>
>Thinking more about it, I think this is the right fix. Not setting
>KVM_REQ_EVENT in some cases can be an optimization, but it's not
>necessary. Definitely there are other cases in which KVM_REQ_EVENT
>is set even though no event is pending---most notably during
>emulation of invalid guest state.
Anyway,
Reviewed-by: Wanpeng Li <[email protected]>
>
>Paolo