Received: by 2002:a05:6a10:83d0:0:0:0:0 with SMTP id o16csp38371pxh; Thu, 7 Apr 2022 13:16:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwxqCvoXYrRImyqTD/3H61bi1zCsu/nFQ+N+GfpSy7FpaBSJtZdQ2iqaPx7BQ/QzenTdTrZ X-Received: by 2002:a17:90b:4b47:b0:1c7:7c38:57d9 with SMTP id mi7-20020a17090b4b4700b001c77c3857d9mr17840029pjb.241.1649362579168; Thu, 07 Apr 2022 13:16:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649362579; cv=none; d=google.com; s=arc-20160816; b=slcFdM2m5PyMIgzHb6KZLav/LdMNEJjrz5+iCgO96ACBBkX4tyGBUcoRTE+qoHIznp OkZRCYAMQTIIS7oG3PeZ+9MIKIROPBrXEPfwLXCGn1qnojfzghyQws9Gf6SW3Xrlx1t9 AcD8ZncuprRRpbA9pR1QKY6b172eHXSDNVeusKLZwsEGLIltghcDcz0pQwbsE1msX28G /jx1rpQ4ijIfLT4K8dAcPjHuIQCgqtPhsB/Ty3C5B9vMF3uihAjVCFrH87aJiWMZNZ+V VL4ZRjeECmCAHlo6fr4OSk5/LAbXudPrRLy0iGocMb0dehOXgIuGSGA9koFuO33PqK2V TZyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :from:references:cc:to:content-language:user-agent:mime-version:date :message-id; bh=hcYdrq8s15B8zIfJ9wULuovzc9JHidINP/FqNYFdfuE=; b=qrmi67FaxaE8Z6soH7zCo9VgTeabTMCMPldxLKJ2nviaHJ1fbN6vxxxzo9oEW9vrMY VYmJoDjrf4XRLf1Lply9IBSoNmh5g/0lyTkBFXnYl4XR4rKBFmOLn071pQ5mnjXbpJQH lsoeL3RaTklIhZs1cIXfbWATH65/s6jh9eLv5ITc8QjOBACJpnnH12fg2bcExdLgcN34 eMzn74LH/jtIgJvU6PoEwlMYWisPHgEQ0bWo4k2jVwpvFAhXodNd7+6RIR7b+3wHfQdG HslUpx+4Vp0pdJtLVyQF+MCX5yyfsPOmOYnlyeoG9npsG2HkZorgYfPVajeb3g0oUoO9 dmJQ== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id n9-20020a170903110900b00156ea908d59si699934plh.304.2022.04.07.13.16.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Apr 2022 13:16:19 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 458442F4284; Thu, 7 Apr 2022 12:36:19 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345284AbiDGPfA (ORCPT + 99 others); Thu, 7 Apr 2022 11:35:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50570 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344890AbiDGPev (ORCPT ); Thu, 7 Apr 2022 11:34:51 -0400 Received: from vps-vb.mhejs.net (vps-vb.mhejs.net [37.28.154.113]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 580425F5B; Thu, 7 Apr 2022 08:32:23 -0700 (PDT) Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1ncU7C-0005Jj-Qd; Thu, 07 Apr 2022 17:32:06 +0200 Message-ID: <04ac8bcd-54f7-acd1-0764-11f6925c2c94@maciej.szmigiero.name> Date: Thu, 7 Apr 2022 17:32:01 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Content-Language: en-US To: Sean Christopherson Cc: Maxim Levitsky , Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org References: <7caee33a-da0f-00be-3195-82c3d1cd4cb4@maciej.szmigiero.name> <5135b502-ce2e-babb-7812-4d4c431a5252@maciej.szmigiero.name> <7e8f558d-c00a-7170-f671-bd10c0a56557@maciej.szmigiero.name> From: "Maciej S. Szmigiero" Subject: Re: [PATCH 5/8] KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.7 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 7.04.2022 01:03, Sean Christopherson wrote: > On Thu, Apr 07, 2022, Maciej S. Szmigiero wrote: >> On 6.04.2022 22:52, Sean Christopherson wrote: >>> On Wed, Apr 06, 2022, Maciej S. Szmigiero wrote: >>>> Another option for saving and restoring a VM would be to add it to >>>> KVM_{GET,SET}_NESTED_STATE somewhere (maybe as a part of the saved VMCB12 >>>> control area?). >>> >>> Ooh. What if we keep nested_run_pending=true until the injection completes? Then >>> we don't even need an extra flag because nested_run_pending effectively says that >>> any and all injected events are for L1=>L2. In KVM_GET_NESTED_STATE, shove the >>> to-be-injected event into the normal vmc*12 injection field, and ignore all >>> to-be-injected events in KVM_GET_VCPU_EVENTS if nested_run_pending=true. >>> >>> That should work even for migrating to an older KVM, as keeping nested_run_pending >>> will cause the target to reprocess the event injection as if it were from nested >>> VM-Enter, which it technically is. >> >> I guess here by "ignore all to-be-injected events in KVM_GET_VCPU_EVENTS" you >> mean *moving* back the L1 -> L2 event to be injected from KVM internal data >> structures like arch.nmi_injected (and so on) to the KVM_GET_NESTED_STATE-returned >> VMCB12 EVENTINJ field (or its VMX equivalent). >> >> But then the VMM will need to first call KVM_GET_NESTED_STATE (which will do >> the moving), only then KVM_GET_VCPU_EVENTS (which will then no longer show >> these events as pending). >> And their setters in the opposite order when restoring the VM. > > I wasn't thinking of actually moving things in the source VM, only ignoring events > in KVM_GET_VCPU_EVENTS. Getting state shouldn't be destructive, e.g. the source VM > should still be able to continue running. Right, getters should not change state. > Ahahahaha, and actually looking at the code, there's this gem in KVM_GET_VCPU_EVENTS > > /* > * The API doesn't provide the instruction length for software > * exceptions, so don't report them. As long as the guest RIP > * isn't advanced, we should expect to encounter the exception > * again. > */ > if (kvm_exception_is_soft(vcpu->arch.exception.nr)) { > events->exception.injected = 0; > events->exception.pending = 0; > } > > and again for soft interrupts > > events->interrupt.injected = > vcpu->arch.interrupt.injected && !vcpu->arch.interrupt.soft; > > so through KVM's own incompetency, it's already doing half the work. So KVM_GET_VCPU_EVENTS was basically already explicitly broken for this case (where RIP does not point to a INT3/INTO/INT x instruction). > This is roughly what I had in mind. It will "require" moving nested_run_pending > to kvm_vcpu_arch, but I've been itching for an excuse to do that anyways. > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index eb71727acecb..62c48f6a0815 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -4846,6 +4846,8 @@ static int kvm_vcpu_ioctl_x86_set_mce(struct kvm_vcpu *vcpu, > static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu, > struct kvm_vcpu_events *events) > { > + bool drop_injected_events = vcpu->arch.nested_run_pending; > + > process_nmi(vcpu); > > if (kvm_check_request(KVM_REQ_SMI, vcpu)) > @@ -4872,7 +4874,8 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu, > * isn't advanced, we should expect to encounter the exception > * again. > */ > - if (kvm_exception_is_soft(vcpu->arch.exception.nr)) { > + if (drop_injected_events || > + kvm_exception_is_soft(vcpu->arch.exception.nr)) { > events->exception.injected = 0; > events->exception.pending = 0; > } else { > @@ -4893,13 +4896,14 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu, > events->exception_has_payload = vcpu->arch.exception.has_payload; > events->exception_payload = vcpu->arch.exception.payload; > > - events->interrupt.injected = > - vcpu->arch.interrupt.injected && !vcpu->arch.interrupt.soft; > + events->interrupt.injected = vcpu->arch.interrupt.injected && > + !vcpu->arch.interrupt.soft && > + !drop_injected_events; > events->interrupt.nr = vcpu->arch.interrupt.nr; > events->interrupt.soft = 0; > events->interrupt.shadow = static_call(kvm_x86_get_interrupt_shadow)(vcpu); > > - events->nmi.injected = vcpu->arch.nmi_injected; > + events->nmi.injected = vcpu->arch.nmi_injected && !drop_injected_events; > events->nmi.pending = vcpu->arch.nmi_pending != 0; > events->nmi.masked = static_call(kvm_x86_get_nmi_mask)(vcpu); > events->nmi.pad = 0; > So the VMM will get VMCB12 with EVENTINJ field filled with the event to re-inject from KVM_GET_NESTED_STATE and events.{exception,interrupt,nmi}.injected unset from KVM_GET_VCPU_EVENTS. Let's say now that the VMM uses this data to restore a VM: it restores nested state by using KVM_SET_NESTED_STATE and then events by calling KVM_SET_VCPU_EVENTS. So it ends with VMCB12 EVENTINJ field filled, but KVM injection structures (arch.{exception,interrupt,nmi}.injected) zeroed by that later KVM_SET_VCPU_EVENTS call. Assuming that L1 -> L2 event injection is always based on KVM injection structures (like we discussed earlier), rather than on a direct copy of EVENTINJ field like it is now, before doing the first VM entry KVM will need to re-parse VMCB12 EVENTINJ field into KVM arch.{exception,interrupt,nmi}.injected to make it work properly. Thanks, Maciej