Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp1034078iob; Thu, 28 Apr 2022 17:12:45 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzPdWe0YRCmK6PbUsNrZNTMM7snf2YLREdbA6X/u0K0l9oYcmsA134SGnBy3+c3yeTZ/FLO X-Received: by 2002:a62:e518:0:b0:4fa:9333:ddbd with SMTP id n24-20020a62e518000000b004fa9333ddbdmr37739866pff.11.1651191165161; Thu, 28 Apr 2022 17:12:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651191165; cv=none; d=google.com; s=arc-20160816; b=BM2XXWgKWmzKhcLw9STeAdsiJlEn6WyawBiUanO9bP4ZAchTeQgmGaLGgiLKPNePK0 EdLxNfcL7Sc89D6Wm7xEawL0ANaIBlmcEAT5fsJbWJlbJh5wiIjHpoDUyr7GAgF8/f7y XtvbhMZoQGhux9rdtrLAFtAQGOb8UhCW9NYdd3pf3r0XQkoexmPrnM5VBxU1d4OMl/zU zIuHkkrQU5gk87vF5K1tLIXEJVHzbUbA97EzFOQ075dj4/TJeS2owMzL6MHymLVx4981 +BMSNWaujWMXMw73BJ7h7BROcG4fU9bhFH5SVAeU9vKEyB2SbXJWMVX27VAVHESRBlOZ 8gBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :from:references:cc:to:content-language:user-agent:mime-version:date :message-id; bh=CjfHca4wNYK65FBuoftQU0cHhlwmcoJv4LKot9nTgPY=; b=E98XLQ7hx+cIxpSDQoU0zR+pIsp2kN/SEUjz9r1ubmbncb82j6mZbR93YpFewmRcqG 6C4moMAAuRxqv6p1y6f5gZlHVy8JbV6VEPJOXEz5ZHWb0/vNnkIsebRf7ld2FIrxkAzE ucHRbM9Ma4yJXI8WM8fO3e34kvkEjXw/LI4wkNsTrKeSLTTvgGYK//9OcQ9p0P6JDWZY 6YT/XchpeXBjw6N9FXHtdkIcxXrlLp+4PkN6jZ1k8+kzvZUbSNM54l/X4Sm4HVafeApt e+npJ6BA5q01TMVao3frZLhy7Xw8gaD7Ium76PqTgBCMoqC1Lu5gPv8RnAVy4xrV7GZi q7Vg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y15-20020a056a00190f00b004fb806fbeb4si6176357pfi.300.2022.04.28.17.12.18; Thu, 28 Apr 2022 17:12:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235746AbiD1Nk0 (ORCPT + 99 others); Thu, 28 Apr 2022 09:40:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235714AbiD1NkZ (ORCPT ); Thu, 28 Apr 2022 09:40:25 -0400 Received: from vps-vb.mhejs.net (vps-vb.mhejs.net [37.28.154.113]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 964F095A3B; Thu, 28 Apr 2022 06:37:09 -0700 (PDT) Received: from MUA by vps-vb.mhejs.net with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1nk4KJ-0004TN-Vj; Thu, 28 Apr 2022 15:37:00 +0200 Message-ID: <9553b164-67a6-3634-34c5-f7319ce2dc60@maciej.szmigiero.name> Date: Thu, 28 Apr 2022 15:36:53 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Content-Language: en-US To: Maxim Levitsky Cc: Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Sean Christopherson , Paolo Bonzini References: <20220423021411.784383-1-seanjc@google.com> <20220423021411.784383-6-seanjc@google.com> <051f508121bcf47d8cbc79ee2c0817aafbe5af48.camel@redhat.com> From: "Maciej S. Szmigiero" Subject: Re: [PATCH v2 05/11] KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction In-Reply-To: <051f508121bcf47d8cbc79ee2c0817aafbe5af48.camel@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 28.04.2022 11:37, Maxim Levitsky wrote: > On Sat, 2022-04-23 at 02:14 +0000, Sean Christopherson wrote: >> Re-inject INT3/INTO instead of retrying the instruction if the CPU >> encountered an intercepted exception while vectoring the software >> exception, e.g. if vectoring INT3 encounters a #PF and KVM is using >> shadow paging. Retrying the instruction is architecturally wrong, e.g. >> will result in a spurious #DB if there's a code breakpoint on the INT3/O, >> and lack of re-injection also breaks nested virtualization, e.g. if L1 >> injects a software exception and vectoring the injected exception >> encounters an exception that is intercepted by L0 but not L1. >> >> Due to, ahem, deficiencies in the SVM architecture, acquiring the next >> RIP may require flowing through the emulator even if NRIPS is supported, >> as the CPU clears next_rip if the VM-Exit is due to an exception other >> than "exceptions caused by the INT3, INTO, and BOUND instructions". To >> deal with this, "skip" the instruction to calculate next_rip (if it's >> not already known), and then unwind the RIP write and any side effects >> (RFLAGS updates). >> >> Save the computed next_rip and use it to re-stuff next_rip if injection >> doesn't complete. This allows KVM to do the right thing if next_rip was >> known prior to injection, e.g. if L1 injects a soft event into L2, and >> there is no backing INTn instruction, e.g. if L1 is injecting an >> arbitrary event. >> >> Note, it's impossible to guarantee architectural correctness given SVM's >> architectural flaws. E.g. if the guest executes INTn (no KVM injection), >> an exit occurs while vectoring the INTn, and the guest modifies the code >> stream while the exit is being handled, KVM will compute the incorrect >> next_rip due to "skipping" the wrong instruction. A future enhancement >> to make this less awful would be for KVM to detect that the decoded >> instruction is not the correct INTn and drop the to-be-injected soft >> event (retrying is a lesser evil compared to shoving the wrong RIP on the >> exception stack). >> >> Reported-by: Maciej S. Szmigiero >> Signed-off-by: Sean Christopherson >> --- >> arch/x86/kvm/svm/nested.c | 28 +++++++- >> arch/x86/kvm/svm/svm.c | 140 +++++++++++++++++++++++++++----------- >> arch/x86/kvm/svm/svm.h | 6 +- >> 3 files changed, 130 insertions(+), 44 deletions(-) >> >> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c >> index 461c5f247801..0163238aa198 100644 >> --- a/arch/x86/kvm/svm/nested.c >> +++ b/arch/x86/kvm/svm/nested.c >> @@ -609,6 +609,21 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12 >> } >> } >> >> +static inline bool is_evtinj_soft(u32 evtinj) >> +{ >> + u32 type = evtinj & SVM_EVTINJ_TYPE_MASK; >> + u8 vector = evtinj & SVM_EVTINJ_VEC_MASK; >> + >> + if (!(evtinj & SVM_EVTINJ_VALID)) >> + return false; >> + >> + /* >> + * Intentionally return false for SOFT events, SVM doesn't yet support >> + * re-injecting soft interrupts. >> + */ >> + return type == SVM_EVTINJ_TYPE_EXEPT && kvm_exception_is_soft(vector); >> +} >> + >> static void nested_vmcb02_prepare_control(struct vcpu_svm *svm, >> unsigned long vmcb12_rip) >> { >> @@ -677,6 +692,16 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm, >> else if (boot_cpu_has(X86_FEATURE_NRIPS)) >> vmcb02->control.next_rip = vmcb12_rip; >> >> + if (is_evtinj_soft(vmcb02->control.event_inj)) { >> + svm->soft_int_injected = true; >> + svm->soft_int_csbase = svm->vmcb->save.cs.base; >> + svm->soft_int_old_rip = vmcb12_rip; >> + if (svm->nrips_enabled) >> + svm->soft_int_next_rip = svm->nested.ctl.next_rip; >> + else >> + svm->soft_int_next_rip = vmcb12_rip; >> + } >> + >> vmcb02->control.virt_ext = vmcb01->control.virt_ext & >> LBR_CTL_ENABLE_MASK; >> if (svm->lbrv_enabled) >> @@ -849,6 +874,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu) >> >> out_exit_err: >> svm->nested.nested_run_pending = 0; >> + svm->soft_int_injected = false; >> >> svm->vmcb->control.exit_code = SVM_EXIT_ERR; >> svm->vmcb->control.exit_code_hi = 0; >> @@ -1618,7 +1644,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu, >> nested_copy_vmcb_control_to_cache(svm, ctl); >> >> svm_switch_vmcb(svm, &svm->nested.vmcb02); >> - nested_vmcb02_prepare_control(svm, save->rip); >> + nested_vmcb02_prepare_control(svm, svm->vmcb->save.rip); > > Is this change intentional? It looks to me the final code is correct since "svm->vmcb->save" contains L2 register save, while "save" has L1 register save. It was the patch 1 from this series that was incorrect in using "save->rip" here instead. Thanks, Maciej