MIME-Version: 1.0
In-Reply-To: <20170619145144.GA10325@potion>
References: <1497493615-18512-1-git-send-email-wanpeng.li@hotmail.com>
 <1497493615-18512-4-git-send-email-wanpeng.li@hotmail.com>
 <20170616133702.GA6360@potion> <CANRm+CwtGykZHSJxRtdPrTEbDzzk5bTpU6XLsdJK3vOHqtYKLw@mail.gmail.com>
 <20170616153832.GA5980@potion> <CANRm+CzBFAetj7arinkWkVYzGgtPx_hc6QBgnW__tWkux0+O0w@mail.gmail.com>
 <20170619145144.GA10325@potion>
From: Wanpeng Li <kernellwp@gmail.com>
Date: Tue, 20 Jun 2017 05:47:47 +0800
Message-ID: <CANRm+CxRhEjh2M-4ZwOANcbo54mbVXrRuWyGM1vwkC2b-ocsDA@mail.gmail.com>
Subject: Re: [PATCH v2 3/4] KVM: async_pf: Force a nested vmexit if the
 injected #PF is async_pf
To: =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= <rkrcmar@redhat.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        kvm <kvm@vger.kernel.org>, Paolo Bonzini <pbonzini@redhat.com>,
        Wanpeng Li <wanpeng.li@hotmail.com>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org
Content-Transfer-Encoding: 8bit
Content-Length: 4802
Lines: 98

2017-06-19 22:51 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2017-06-17 13:52+0800, Wanpeng Li:
>> 2017-06-16 23:38 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> > 2017-06-16 22:24+0800, Wanpeng Li:
>> >> 2017-06-16 21:37 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> >> > 2017-06-14 19:26-0700, Wanpeng Li:
>> >> >> From: Wanpeng Li <wanpeng.li@hotmail.com>
>> >> >>
>> >> >> Add an async_page_fault field to vcpu->arch.exception to identify an async
>> >> >> page fault, and constructs the expected vm-exit information fields. Force
>> >> >> a nested VM exit from nested_vmx_check_exception() if the injected #PF
>> >> >> is async page fault.
>> >> >>
>> >> >> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> >> >> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> >> >> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> >> >> ---
>> >> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> >> >> @@ -452,7 +452,11 @@ EXPORT_SYMBOL_GPL(kvm_complete_insn_gp);
>> >> >>  void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
>> >> >>  {
>> >> >>       ++vcpu->stat.pf_guest;
>> >> >> -     vcpu->arch.cr2 = fault->address;
>> >> >> +     vcpu->arch.exception.async_page_fault = fault->async_page_fault;
>> >> >
>> >> > I think we need to act as if arch.exception.async_page_fault was not
>> >> > pending in kvm_vcpu_ioctl_x86_get_vcpu_events().  Otherwise, if we
>> >> > migrate with pending async_page_fault exception, we'd inject it as a
>> >> > normal #PF, which could confuse/kill the nested guest.
>> >> >
>> >> > And kvm_vcpu_ioctl_x86_set_vcpu_events() should clean the flag for
>> >> > sanity as well.
>> >>
>> >> Do you mean we should add a field like async_page_fault to
>> >> kvm_vcpu_events::exception, then saves arch.exception.async_page_fault
>> >> to events->exception.async_page_fault through KVM_GET_VCPU_EVENTS and
>> >> restores events->exception.async_page_fault to
>> >> arch.exception.async_page_fault through KVM_SET_VCPU_EVENTS?
>> >
>> > No, I thought we could get away with a disgusting hack of hiding the
>> > exception from userspace, which would work for migration, but not if
>> > local userspace did KVM_GET_VCPU_EVENTS and KVM_SET_VCPU_EVENTS ...
>> >
>> > Extending the userspace interface would work, but I'd do it as a last
>> > resort, after all conservative solutions have failed.
>> > async_pf migration is very crude, so exposing the exception is just an
>> > ugly workaround for the local case.  Adding the flag would also require
>> > userspace configuration of async_pf features for the guest to keep
>> > compatibility.
>> >
>> > I see two options that might be simpler than adding the userspace flag:
>> >
>> >  1) do the nested VM exit sooner, at the place where we now queue #PF,
>> >  2) queue the #PF later, save the async_pf in some intermediate
>> >     structure and consume it at the place where you proposed the nested
>> >     VM exit.
>>
>> How about something like this to not get exception events if it is
>> "is_guest_mode(vcpu) && vcpu->arch.exception.nr == PF_VECTOR &&
>> vcpu->arch.exception.async_page_fault" since lost a reschedule
>> optimization is not that importmant in L1.
>>
>> @@ -3072,13 +3074,16 @@ static void
>> kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
>>                             struct kvm_vcpu_events *events)
>>  {
>>      process_nmi(vcpu);
>> -    events->exception.injected =
>> -        vcpu->arch.exception.pending &&
>> -        !kvm_exception_is_soft(vcpu->arch.exception.nr);
>> -    events->exception.nr = vcpu->arch.exception.nr;
>> -    events->exception.has_error_code = vcpu->arch.exception.has_error_code;
>> -    events->exception.pad = 0;
>> -    events->exception.error_code = vcpu->arch.exception.error_code;
>> +    if (!(is_guest_mode(vcpu) && vcpu->arch.exception.nr == PF_VECTOR &&
>> +        vcpu->arch.exception.async_page_fault)) {
>> +        events->exception.injected =
>> +            vcpu->arch.exception.pending &&
>> +            !kvm_exception_is_soft(vcpu->arch.exception.nr);
>> +        events->exception.nr = vcpu->arch.exception.nr;
>> +        events->exception.has_error_code = vcpu->arch.exception.has_error_code;
>> +        events->exception.pad = 0;
>> +        events->exception.error_code = vcpu->arch.exception.error_code;
>> +    }
>
> This adds a bug when userspace does KVM_GET_VCPU_EVENTS and
> KVM_SET_VCPU_EVENTS without migration -- KVM would drop the async_pf and
> a L1 process gets stuck as a result.
>
> We we'd need to add a similar condition to
> kvm_vcpu_ioctl_x86_set_vcpu_events(), so userspace SET doesn't drop it,
> but that is far beyond the realm of acceptable code.

Do you mean current status of the patchset v2 can be accepted?
Otherwise, what's the next should be done?

Regards,
Wanpeng Li