On Tue, Jan 12, 2021, Wei Huang wrote:
> +/* Emulate SVM VM execution instructions */
> +static int svm_emulate_vm_instr(struct kvm_vcpu *vcpu, u8 modrm)
> +{
> + struct vcpu_svm *svm = to_svm(vcpu);
> +
> + switch (modrm) {
> + case 0xd8: /* VMRUN */
> + return vmrun_interception(svm);
> + case 0xda: /* VMLOAD */
> + return vmload_interception(svm);
> + case 0xdb: /* VMSAVE */
> + return vmsave_interception(svm);
> + default:
> + /* inject a #GP for all other cases */
> + kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
> + return 1;
> + }
> +}
v> +
> static int gp_interception(struct vcpu_svm *svm)
> {
> struct kvm_vcpu *vcpu = &svm->vcpu;
> u32 error_code = svm->vmcb->control.exit_info_1;
> -
> - WARN_ON_ONCE(!enable_vmware_backdoor);
> + int rc;
>
> /*
> - * VMware backdoor emulation on #GP interception only handles IN{S},
> - * OUT{S}, and RDPMC, none of which generate a non-zero error code.
> + * Only VMware backdoor and SVM VME errata are handled. Neither of
> + * them has non-zero error codes.
> */
> if (error_code) {
> kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
> return 1;
> }
> - return kvm_emulate_instruction(vcpu, EMULTYPE_VMWARE_GP);
> +
> + rc = kvm_emulate_instruction(vcpu, EMULTYPE_PARAVIRT_GP);
> + if (rc > 1)
> + rc = svm_emulate_vm_instr(vcpu, rc);
> + return rc;
> }
...
> +static int is_vm_instr_opcode(struct x86_emulate_ctxt *ctxt)
> +{
> + unsigned long rax;
> +
> + if (ctxt->b != 0x1)
> + return 0;
> +
> + switch (ctxt->modrm) {
> + case 0xd8: /* VMRUN */
> + case 0xda: /* VMLOAD */
> + case 0xdb: /* VMSAVE */
> + rax = kvm_register_read(emul_to_vcpu(ctxt), VCPU_REGS_RAX);
> + if (!kvm_is_host_reserved_region(rax))
> + return 0;
> + break;
> + default:
> + return 0;
> + }
> +
> + return ctxt->modrm;
> +}
> +
> static bool is_vmware_backdoor_opcode(struct x86_emulate_ctxt *ctxt)
> {
> switch (ctxt->opcode_len) {
> @@ -7305,6 +7327,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
> bool writeback = true;
> bool write_fault_to_spt;
> + int vminstr;
>
> if (unlikely(!kvm_x86_ops.can_emulate_instruction(vcpu, insn, insn_len)))
> return 1;
> @@ -7367,10 +7390,14 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> }
> }
>
> - if ((emulation_type & EMULTYPE_VMWARE_GP) &&
> - !is_vmware_backdoor_opcode(ctxt)) {
> - kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
> - return 1;
> + if (emulation_type & EMULTYPE_PARAVIRT_GP) {
> + vminstr = is_vm_instr_opcode(ctxt);
> + if (!vminstr && !is_vmware_backdoor_opcode(ctxt)) {
> + kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
> + return 1;
> + }
> + if (vminstr)
> + return vminstr;
I'm pretty sure this doesn't correctly handle a VM-instr in L2 that hits a bad
L0 GPA and that L1 wants to intercept. The intercept bitmap isn't checked until
x86_emulate_insn(), and the vm*_interception() helpers expect nested VM-Exits to
be handled further up the stack.
> }
>
> /*
> --
> 2.27.0
>
Sean Christopherson <[email protected]> writes:
...
>> - if ((emulation_type & EMULTYPE_VMWARE_GP) &&
>> - !is_vmware_backdoor_opcode(ctxt)) {
>> - kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
>> - return 1;
>> + if (emulation_type & EMULTYPE_PARAVIRT_GP) {
>> + vminstr = is_vm_instr_opcode(ctxt);
>> + if (!vminstr && !is_vmware_backdoor_opcode(ctxt)) {
>> + kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
>> + return 1;
>> + }
>> + if (vminstr)
>> + return vminstr;
>
> I'm pretty sure this doesn't correctly handle a VM-instr in L2 that hits a bad
> L0 GPA and that L1 wants to intercept. The intercept bitmap isn't checked until
> x86_emulate_insn(), and the vm*_interception() helpers expect nested VM-Exits to
> be handled further up the stack.
>
So, the condition is that L2 executes a vmload and #GPs on a reserved address, jumps to L0 - L0 doesn't
check if L1 has asked for the instruction to be intercepted and goes on with emulating
vmload and returning back to L2 ?
>> }
>>
>> /*
>> --
>> 2.27.0
>>
On Tue, 2021-01-12 at 15:00 -0500, Bandan Das wrote:
> Sean Christopherson <[email protected]> writes:
> ...
> > > - if ((emulation_type & EMULTYPE_VMWARE_GP) &&
> > > - !is_vmware_backdoor_opcode(ctxt)) {
> > > - kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
> > > - return 1;
> > > + if (emulation_type & EMULTYPE_PARAVIRT_GP) {
> > > + vminstr = is_vm_instr_opcode(ctxt);
> > > + if (!vminstr && !is_vmware_backdoor_opcode(ctxt)) {
> > > + kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
> > > + return 1;
> > > + }
> > > + if (vminstr)
> > > + return vminstr;
> >
> > I'm pretty sure this doesn't correctly handle a VM-instr in L2 that hits a bad
> > L0 GPA and that L1 wants to intercept. The intercept bitmap isn't checked until
> > x86_emulate_insn(), and the vm*_interception() helpers expect nested VM-Exits to
> > be handled further up the stack.
Actually IMHO this exactly what we want. We want L0 to always intercept
these #GPs, and hide them from the guest.
What we do need to do (and I prepared and attached a patch for that, is that if we run
a guest, we want to inject corresponding vmexit (like SVM_EXIT_VMRUN)
instead of emulating the instruction.
The attached patch does this, and it made my kvm unit test pass,
even if the test was run in a VM (with an unpatched kernel).
This together with setting that X86_FEATURE_SVME_ADDR_CHK bit for
the guest will allow us to hide that errata completely from the guest
which is a very good thing.
(for example for guests that we can't modify)
Best regards,
Maxim Levitsky
> >
> So, the condition is that L2 executes a vmload and #GPs on a reserved address, jumps to L0 - L0 doesn't
> check if L1 has asked for the instruction to be intercepted and goes on with emulating
> vmload and returning back to L2 ?
>
> > > }
> > >
> > > /*
> > > --
> > > 2.27.0
> > >
On Thu, Jan 14, 2021, Maxim Levitsky wrote:
> On Tue, 2021-01-12 at 15:00 -0500, Bandan Das wrote:
> > Sean Christopherson <[email protected]> writes:
> > ...
> > > > - if ((emulation_type & EMULTYPE_VMWARE_GP) &&
> > > > - !is_vmware_backdoor_opcode(ctxt)) {
> > > > - kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
> > > > - return 1;
> > > > + if (emulation_type & EMULTYPE_PARAVIRT_GP) {
> > > > + vminstr = is_vm_instr_opcode(ctxt);
> > > > + if (!vminstr && !is_vmware_backdoor_opcode(ctxt)) {
> > > > + kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
> > > > + return 1;
> > > > + }
> > > > + if (vminstr)
> > > > + return vminstr;
> > >
> > > I'm pretty sure this doesn't correctly handle a VM-instr in L2 that hits a bad
> > > L0 GPA and that L1 wants to intercept. The intercept bitmap isn't checked until
> > > x86_emulate_insn(), and the vm*_interception() helpers expect nested VM-Exits to
> > > be handled further up the stack.
>
> Actually IMHO this exactly what we want. We want L0 to always intercept
> these #GPs, and hide them from the guest.
>
> What we do need to do (and I prepared and attached a patch for that, is that
> if we run a guest, we want to inject corresponding vmexit (like
> SVM_EXIT_VMRUN) instead of emulating the instruction.
Yes, lack of forwarding to L1 as a nested exit is what I meant by "doesn't
correctly handle".