Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp3452691pxb; Mon, 4 Apr 2022 17:26:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwvqal2ARlEij5BygWv9uRNd36+ioCrN+RWxRxn66mkVmjjM1y6ltq8lsg80voURZ0gYs9x X-Received: by 2002:a17:90b:33d0:b0:1ca:ba71:88ed with SMTP id lk16-20020a17090b33d000b001caba7188edmr916824pjb.229.1649118413735; Mon, 04 Apr 2022 17:26:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649118413; cv=none; d=google.com; s=arc-20160816; b=atDn0hv3vethprvE7Ixr1/jDmLmFvMAYYfGIA3TlAUwp1oKc5SjwI2zvF+8P9hO5fv M5N9wLC/Vtq3Qw6ygHD9ysFskioHx1lHT8uJZFFkaMIiFwWhz7ZwL1krsZUGxILeoQ6/ +mVp7pnV95JVSnfHiJxWSL6Mi0XdcGvPvejWCWf6V2xB9URFcYoOA2M7ipfBl/FAPnAu tJ2eYxLrJl1d1z7weomGCAL4dh+DBLYT/DBqxJD7VGi0V3ZpkbSM/I1TYbfUq+F3ID1u /8zzTQwbk8v2a6XA1I2AttygXwGiIni+02XJlBrhVbM2HzpZ7TTRIsgtOK0VJObxF+1U oj0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :from:references:cc:to:content-language:user-agent:mime-version:date :message-id; bh=mAA+qzpa0yEnN+3U4hl75wERs5iW6MK4OMep/sHBgy0=; b=saOgZz/P+2qaOMXERE25+w2GJMmDZkr+O41pgB+R3E6Zxvf0MWER8OHPY1xRBQxqS9 ZpmXt553gB/GekeCoZ75uf/JMHga4y3h50estxZnR8q9HfKv1CSyKc9MTHxoMliB3CIZ athfd2/nHcfjcpW5P8xtl2tldx5mLQVXOcmn0TqNRyxzEaYkkv6SiLPTOgRpPn+82mei ICvP177ezzxxxXta+nOpHpNYggD7IXmi6Dl3Aln+j2gT86lVcXRgmQY2P5JcfhevH5c7 TOzosMrAWX1NkAdxcY9hsmkc4OuOMTJ45/KBZkqAg5MofvCtUOt6FR0+lsD3+7W65aOo CmFg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id j7-20020a17090a31c700b001cac1e802d5si764555pjf.82.2022.04.04.17.26.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Apr 2022 17:26:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 88C19811B1; Mon, 4 Apr 2022 16:49:09 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1380055AbiDDV7P (ORCPT + 99 others); Mon, 4 Apr 2022 17:59:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45446 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1380707AbiDDVIV (ORCPT ); Mon, 4 Apr 2022 17:08:21 -0400 Received: from vps-vb.mhejs.net (vps-vb.mhejs.net [37.28.154.113]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC6112F397; Mon, 4 Apr 2022 14:06:23 -0700 (PDT) Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1nbTtd-0003gN-O5; Mon, 04 Apr 2022 23:05:57 +0200 Message-ID: <7d67bc6f-00ac-7c07-f6c2-c41b2f0d35a1@maciej.szmigiero.name> Date: Mon, 4 Apr 2022 23:05:52 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Content-Language: en-US To: Maxim Levitsky Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Tom Lendacky , Brijesh Singh , Jon Grimm , David Kaplan , Boris Ostrovsky , Liam Merwick , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Paolo Bonzini References: <4f5234ac2c6d91d90b1c85ccb3081a91a6d3be2a.camel@redhat.com> From: "Maciej S. Szmigiero" Subject: Re: [PATCH 3/5] KVM: nSVM: Don't forget about L1-injected events In-Reply-To: <4f5234ac2c6d91d90b1c85ccb3081a91a6d3be2a.camel@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4.04.2022 11:53, Maxim Levitsky wrote: > On Thu, 2022-03-10 at 22:38 +0100, Maciej S. Szmigiero wrote: >> From: "Maciej S. Szmigiero" >> >> In SVM synthetic software interrupts or INT3 or INTO exception that L1 >> wants to inject into its L2 guest are forgotten if there is an intervening >> L0 VMEXIT during their delivery. >> >> They are re-injected correctly with VMX, however. >> >> This is because there is an assumption in SVM that such exceptions will be >> re-delivered by simply re-executing the current instruction. >> Which might not be true if this is a synthetic exception injected by L1, >> since in this case the re-executed instruction will be one already in L2, >> not the VMRUN instruction in L1 that attempted the injection. >> >> Leave the pending L1 -> L2 event in svm->nested.ctl.event_inj{,err} until >> it is either re-injected successfully or returned to L1 upon a nested >> VMEXIT. >> Make sure to always re-queue such event if returned in EXITINTINFO. >> >> The handling of L0 -> {L1, L2} event re-injection is left as-is to avoid >> unforeseen regressions. > > Some time ago I noticed this too, but haven't dug into this too much. > I rememeber I even had some half-baked patch for this I never posted, > because I didn't think about the possibility of this syntetic injection. > > Just to be clear that I understand this correctly: > > 1. What is happening is that L1 is injecting INTn/INTO/INT3 but L2 code > doesn't actualy contain an INTn/INTO/INT3 instruction. > This is wierd but legal thing to do. > Again, if L2 actually contained the instruction, it would have worked? I think so (haven't tested it though). > > 2. When actual INTn/INT0/INT3 are intercepted on SVM, then > save.RIP points to address of the instruction, and control.next_rip > points to address of next instruction after (as expected) Yes. > 3. When EVENTINJ is used with '(TYPE = 3) with vectors 3 or 4' > or 'TYPE=4', then next_rip is pushed on the stack, while save.RIP is > pretty much ignored, and exectution jumps to the handler in the IDT. Yes. > also at least for INT3/INTO, PRM states that IDT's DPL field is checked > before dispatch, meaning that we can get legit #GP during delivery. > (this looks like another legit reason to fix exception merging in KVM) > That's right. > Best regards, > Maxim Levitsky > > >> >> Signed-off-by: Maciej S. Szmigiero >> --- >> arch/x86/kvm/svm/nested.c | 65 +++++++++++++++++++++++++++++++++++++-- >> arch/x86/kvm/svm/svm.c | 17 ++++++++-- >> arch/x86/kvm/svm/svm.h | 47 ++++++++++++++++++++++++++++ >> 3 files changed, 125 insertions(+), 4 deletions(-) >> >> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c >> index 9656f0d6815c..75017bf77955 100644 >> --- a/arch/x86/kvm/svm/nested.c >> +++ b/arch/x86/kvm/svm/nested.c >> @@ -420,8 +420,17 @@ void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm, >> void nested_sync_control_from_vmcb02(struct vcpu_svm *svm) >> { >> u32 mask; >> - svm->nested.ctl.event_inj = svm->vmcb->control.event_inj; >> - svm->nested.ctl.event_inj_err = svm->vmcb->control.event_inj_err; >> + >> + /* >> + * Leave the pending L1 -> L2 event in svm->nested.ctl.event_inj{,err} >> + * if its re-injection is needed >> + */ >> + if (!exit_during_event_injection(svm, svm->nested.ctl.event_inj, >> + svm->nested.ctl.event_inj_err)) { >> + WARN_ON_ONCE(svm->vmcb->control.event_inj & SVM_EVTINJ_VALID); >> + svm->nested.ctl.event_inj = svm->vmcb->control.event_inj; >> + svm->nested.ctl.event_inj_err = svm->vmcb->control.event_inj_err; >> + } > > Beware that this could backfire in regard to nested migration. > > I once chased a nasty bug related to this. > > The bug was: > > - L1 does VMRUN with injection (EVENTINJ set) > > - VMRUN exit handler is 'executed' by KVM, > This copies EVENINJ fields from VMCB12 to VMCB02 > > - Once VMRUN exit handler is done executing, we exit to userspace to start migration > (basically static_call(kvm_x86_handle_exit)(...) handles the SVM_EXIT_VMRUN, > and that is all, vcpu_enter_guest isn't called again, so injection is not canceled) > > - migration happens and it migrates the control area of vmcb02 with EVENTINJ fields set. > > - on migration target, we inject another interrupt to the guest via EVENTINJ > because svm_check_nested_events checks nested_run_pending to avoid doing this > but nested_run_pending was not migrated correctly, > and overwrites the EVENTINJ - injection is lost. > > Paolo back then proposed to me that instead of doing direct copy from VMCB12 to VMCB02 > we should instead go through 'vcpu->arch.interrupt' and such. > I had a prototype of this but never gotten to clean it up to be accepted upstream, > knowing that current way also works. > This sounds like a valid, but different, bug - to be honest, it would look much cleaner to me, too, if EVENTINJ was parsed from VMCB12 into relevant KVM injection structures on a nested VMRUN rather than following a hybrid approach: 1) Copy the field from VMCB12 to VMCB02 directly on a nested VMRUN, 2) Parse the EXITINTINFO into KVM injection structures when re-injecting. >> >> /* Only a few fields of int_ctl are written by the processor. */ >> mask = V_IRQ_MASK | V_TPR_MASK; >> @@ -669,6 +678,54 @@ static void nested_svm_copy_common_state(struct vmcb *from_vmcb, struct vmcb *to >> to_vmcb->save.spec_ctrl = from_vmcb->save.spec_ctrl; >> } >> >> +void nested_svm_maybe_reinject(struct kvm_vcpu *vcpu) > > A personal taste note: I don't like the 'maybe' for some reason. > But I won't fight over this. What's you proposed name then? >> +{ >> + struct vcpu_svm *svm = to_svm(vcpu); >> + unsigned int vector, type; >> + u32 exitintinfo = svm->vmcb->control.exit_int_info; >> + >> + if (WARN_ON_ONCE(!is_guest_mode(vcpu))) >> + return; >> + >> + /* >> + * No L1 -> L2 event to re-inject? >> + * >> + * In this case event_inj will be cleared by >> + * nested_sync_control_from_vmcb02(). >> + */ >> + if (!(svm->nested.ctl.event_inj & SVM_EVTINJ_VALID)) >> + return; >> + >> + /* If the last event injection was successful there shouldn't be any pending event */ >> + if (WARN_ON_ONCE(!(exitintinfo & SVM_EXITINTINFO_VALID))) >> + return; >> + >> + kvm_make_request(KVM_REQ_EVENT, vcpu); >> + >> + vector = exitintinfo & SVM_EXITINTINFO_VEC_MASK; >> + type = exitintinfo & SVM_EXITINTINFO_TYPE_MASK; >> + >> + switch (type) { >> + case SVM_EXITINTINFO_TYPE_NMI: >> + vcpu->arch.nmi_injected = true; >> + break; >> + case SVM_EXITINTINFO_TYPE_EXEPT: >> + if (exitintinfo & SVM_EXITINTINFO_VALID_ERR) >> + kvm_requeue_exception_e(vcpu, vector, >> + svm->vmcb->control.exit_int_info_err); >> + else >> + kvm_requeue_exception(vcpu, vector); >> + break; >> + case SVM_EXITINTINFO_TYPE_SOFT: > > Note that AFAIK, SVM_EXITINTINFO_TYPE_SOFT is only for INTn instructions, > while INT3 and INTO are considered normal exceptions but EVENTINJ > handling has special case for them. > That's right. > On VMX it is a bit cleaner: > It has: > > 3 - normal stock exception caused by CPU itself, like #PF and such > > 4 - INTn > * does DPL check and uses VM_EXIT_INSTRUCTION_LEN > > 5 - ICEBP/INT1, which SVM doesnt support to re-inject > * doesn't do DPL check, but uses VM_EXIT_INSTRUCTION_LEN I think > > 6 - software exception (INT3/INTO) > * does DPL check and uses VM_EXIT_INSTRUCTION_LEN as well > > I don't know if there is any difference between 4 and 6. > > > > (..) > > > I will also review Sean's take on this, let see which one is simplier. Since Sean's patch set is supposed to supersede this one let's continue the discussion there. > Best regards, > Maxim Levitsky Thanks, Maciej