Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp3540263pxb; Mon, 4 Apr 2022 20:30:52 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwQRL0z+sXURhOBu6cT83CHtb4sWx5Z9fOFMaiETOtScIDqjIFSuemeNMpTCbur2INHC3Gt X-Received: by 2002:a63:4405:0:b0:382:173c:1b97 with SMTP id r5-20020a634405000000b00382173c1b97mr1138009pga.532.1649129452002; Mon, 04 Apr 2022 20:30:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649129451; cv=none; d=google.com; s=arc-20160816; b=zEwy8IRiyqiXEdaba82GlVl9tYGecJFeIshatNedT1Edk7EHjTSJjTrJoDkMNm8Rfq DuYYrNWMazYRLr2OZ9xT/wheYPta/y7exHuX0ARmgbK1OZRER2gRvw4bqa2OF8IKs/Qz OzJHkrwD+BVTsBo5UjcUgfeOXz2yi6pJoLA3n6xnbMriiIGo1bjiiGIPElRpYGDz/LNV 0oSI6FbaM4Tr6gegcpaTyfyzqNR2HirK3qIbd8LMlDZio3zQAMxdZqaWgXeKQBUr8V8J V8tiTgNY/OHPIHsh62TvJyK2wG19UCnroK4mouehxhRqBW1rchFdERcT3eYFqCiKCTgF FSEA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:references:mime-version :message-id:in-reply-to:date:reply-to:dkim-signature; bh=Ol8exfApW6qp8ksY+lqUpF5z/oy5z9y8ZI2QOtFIzaY=; b=A6F3RirAdm+iS0WYinHMFWxGEeKbs6yZtp2fFcxvvUKu+1pVRkpo+R5j2MH5iRXXq4 r3JGW/yXRWJq6AJqUP99sk2ta/VjV2ldzHeW8834ZOb+KJCpKI0SNgR6YKlaFjPSebyi xvbtPr7hfvkFGeXX0v3LrQ0eSug05+cje8TYsh2xOQa68Wx8eu3PeYdfr/T/3sZRhcdh 8bapSJwhKpoaCAb6LiJyRzEdPfFk3+V0kf7kDFt0WBKzc1YBMJOfyJLSbCXLF6MaDTC+ ZJV9AbtOIWj4DQdpz8MnGRsKdUIj1TzRjt4xxel4mF4Ljkc9rgbT73PhKneIk/BTdkSB Ok1A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Mg+HcPSP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id k10-20020a170902760a00b00153b2d16581si11313315pll.393.2022.04.04.20.30.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Apr 2022 20:30:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Mg+HcPSP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 4087E1B84E4; Mon, 4 Apr 2022 18:30:03 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1353687AbiDBBL0 (ORCPT + 99 others); Fri, 1 Apr 2022 21:11:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43592 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1353614AbiDBBLF (ORCPT ); Fri, 1 Apr 2022 21:11:05 -0400 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 348B78FE45 for ; Fri, 1 Apr 2022 18:09:14 -0700 (PDT) Received: by mail-pj1-x104a.google.com with SMTP id o15-20020a17090aac0f00b001c6595a43dbso2405016pjq.4 for ; Fri, 01 Apr 2022 18:09:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc; bh=Ol8exfApW6qp8ksY+lqUpF5z/oy5z9y8ZI2QOtFIzaY=; b=Mg+HcPSPAzhKgemV6VtSyQw1b+lZxkXiayU8hXXSVt6JO3BCHvxXvdSbAbQOTL7+Yp sxvAAyE2cJZIMrQwjYrdwNgEdaYb8oq0BOIfdnWcJ2r0R0lUZCQ0h1egoCn5Lqp8UHLj LdBk8g+GujiQY7A4CyhaTKH5gfXr2rXfIVZOpvRuooZYAX/tYWTMesQNuu4QIQIpakaj J77gqdAv3XNBVxnt9fwS8wkpo9hVd/eBSeaQmaNCzEHbMKC1zvGEKLyQOBbIJHOCyTvx Hs5SIDIKRbhwak/of+DWADUUjZTyKVz+w/bWKJN6FuLavdX//9ckRo5XlZoT8H/HEFKm 9z3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=Ol8exfApW6qp8ksY+lqUpF5z/oy5z9y8ZI2QOtFIzaY=; b=cYldBAuAFglVfDIlkI/9/xgL2Xw6c0zSmry0RMgntJ5zVcryzXNAyDsdLwsK5Na1IW OzMBVGdVKqkZeAMvCjnGm3QryF1Q5zomlxlnWYnnlZ2OKt8ZATp4/aYJiSQWL4YukeAM pVrPziNgK9EIkz49L8oLO8LXf3mfRA9p0YqmSflPhAETmhC06fLd/ruuSYQYTwJ7Vt7a SwcB2FLYzQ6oRg8Zn2T6cohUJcGjqbV3GENZBeKMMIWTe2vUGJ0gZarMUlyZhBhqLxLl kf64CFynLMt7GehwEW31GvJrIFpYKBwtnY0VdOHN00GpfDZrtS8zPsvc6+zKQkDzdPDR Oz0w== X-Gm-Message-State: AOAM531kxyi3aZOsc4vLlklUhG9J1CQy+aPG5dupZ5mg9aj2vaTVYjdF Y3ydptyne+4shglBAj38PJqcFvNbz4Y= X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5]) (user=seanjc job=sendgmr) by 2002:a17:90b:2246:b0:1c6:5781:7193 with SMTP id hk6-20020a17090b224600b001c657817193mr14995812pjb.48.1648861753762; Fri, 01 Apr 2022 18:09:13 -0700 (PDT) Reply-To: Sean Christopherson Date: Sat, 2 Apr 2022 01:09:00 +0000 In-Reply-To: <20220402010903.727604-1-seanjc@google.com> Message-Id: <20220402010903.727604-6-seanjc@google.com> Mime-Version: 1.0 References: <20220402010903.727604-1-seanjc@google.com> X-Mailer: git-send-email 2.35.1.1094.g7c7d902a7c-goog Subject: [PATCH 5/8] KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, "Maciej S . Szmigiero" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Re-inject INT3/INTO instead of retrying the instruction if the CPU encountered an intercepted exception while vectoring the software exception, e.g. if vectoring INT3 encounters a #PF and KVM is using shadow paging. Retrying the instruction is architecturally wrong, e.g. will result in a spurious #DB if there's a code breakpoint on the INT3/O, and lack of re-injection also breaks nested virtualization, e.g. if L1 injects a software exception and vectoring the injected exception encounters an exception that is intercepted by L0 but not L1. Due to, ahem, deficiencies in the SVM architecture, acquiring the next RIP may require flowing through the emulator even if NRIPS is supported, as the CPU clears next_rip if the VM-Exit is due to an exception other than "exceptions caused by the INT3, INTO, and BOUND instructions". To deal with this, "skip" the instruction to calculate next_ript, and then unwind the RIP write and any side effects (RFLAGS updates). Reported-by: Maciej S. Szmigiero Signed-off-by: Sean Christopherson --- arch/x86/kvm/svm/svm.c | 111 ++++++++++++++++++++++++++++------------- arch/x86/kvm/svm/svm.h | 4 +- 2 files changed, 79 insertions(+), 36 deletions(-) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 6ea8f16e39ac..ecc828d6921e 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -341,9 +341,11 @@ static void svm_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask) } -static int svm_skip_emulated_instruction(struct kvm_vcpu *vcpu) +static int __svm_skip_emulated_instruction(struct kvm_vcpu *vcpu, + bool commit_side_effects) { struct vcpu_svm *svm = to_svm(vcpu); + unsigned long old_rflags; /* * SEV-ES does not expose the next RIP. The RIP update is controlled by @@ -358,18 +360,71 @@ static int svm_skip_emulated_instruction(struct kvm_vcpu *vcpu) } if (!svm->next_rip) { + if (unlikely(!commit_side_effects)) + old_rflags = svm->vmcb->save.rflags; + if (!kvm_emulate_instruction(vcpu, EMULTYPE_SKIP)) return 0; + + if (unlikely(!commit_side_effects)) + svm->vmcb->save.rflags = old_rflags; } else { kvm_rip_write(vcpu, svm->next_rip); } done: - svm_set_interrupt_shadow(vcpu, 0); + if (likely(commit_side_effects)) + svm_set_interrupt_shadow(vcpu, 0); return 1; } +static int svm_skip_emulated_instruction(struct kvm_vcpu *vcpu) +{ + return __svm_skip_emulated_instruction(vcpu, true); +} + +static int svm_update_soft_interrupt_rip(struct kvm_vcpu *vcpu) +{ + unsigned long rip, old_rip = kvm_rip_read(vcpu); + struct vcpu_svm *svm = to_svm(vcpu); + + /* + * Due to architectural shortcomings, the CPU doesn't always provide + * NextRIP, e.g. if KVM intercepted an exception that occurred while + * the CPU was vectoring an INTO/INT3 in the guest. Temporarily skip + * the instruction even if NextRIP is supported to acquire the next + * RIP so that it can be shoved into the NextRIP field, otherwise + * hardware will fail to advance guest RIP during event injection. + * Drop the exception/interrupt if emulation fails and effectively + * retry the instruction, it's the least awful option. If NRIPS is + * in use, the skip must not commit any side effects such as clearing + * the interrupt shadow or RFLAGS.RF. + */ + if (!__svm_skip_emulated_instruction(vcpu, !nrips)) + return -EIO; + + rip = kvm_rip_read(vcpu); + + /* + * If NextRIP is supported, rewind RIP and update NextRip. If NextRip + * isn't supported, keep the result of the skip as the CPU obviously + * won't advance RIP, but stash away the injection information so that + * RIP can be unwound if injection fails. + */ + if (nrips) { + kvm_rip_write(vcpu, old_rip); + svm->vmcb->control.next_rip = rip; + } else { + if (boot_cpu_has(X86_FEATURE_NRIPS)) + svm->vmcb->control.next_rip = rip; + + svm->soft_int_linear_rip = rip + svm->vmcb->save.cs.base; + svm->soft_int_injected = rip - old_rip; + } + return 0; +} + static void svm_queue_exception(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); @@ -379,25 +434,9 @@ static void svm_queue_exception(struct kvm_vcpu *vcpu) kvm_deliver_exception_payload(vcpu); - if (nr == BP_VECTOR && !nrips) { - unsigned long rip, old_rip = kvm_rip_read(vcpu); - - /* - * For guest debugging where we have to reinject #BP if some - * INT3 is guest-owned: - * Emulate nRIP by moving RIP forward. Will fail if injection - * raises a fault that is not intercepted. Still better than - * failing in all cases. - */ - (void)svm_skip_emulated_instruction(vcpu); - rip = kvm_rip_read(vcpu); - - if (boot_cpu_has(X86_FEATURE_NRIPS)) - svm->vmcb->control.next_rip = rip; - - svm->int3_rip = rip + svm->vmcb->save.cs.base; - svm->int3_injected = rip - old_rip; - } + if (kvm_exception_is_soft(nr) && + svm_update_soft_interrupt_rip(vcpu)) + return; svm->vmcb->control.event_inj = nr | SVM_EVTINJ_VALID @@ -3676,9 +3715,9 @@ static void svm_complete_interrupts(struct kvm_vcpu *vcpu) u8 vector; int type; u32 exitintinfo = svm->vmcb->control.exit_int_info; - unsigned int3_injected = svm->int3_injected; + unsigned soft_int_injected = svm->soft_int_injected; - svm->int3_injected = 0; + svm->soft_int_injected = 0; /* * If we've made progress since setting HF_IRET_MASK, we've @@ -3698,6 +3737,18 @@ static void svm_complete_interrupts(struct kvm_vcpu *vcpu) if (!(exitintinfo & SVM_EXITINTINFO_VALID)) return; + /* + * If NextRIP isn't enabled, KVM must manually advance RIP prior to + * injecting the soft exception/interrupt. That advancement needs to + * be unwound if vectoring didn't complete. Note, the _new_ event may + * not be the injected event, e.g. if KVM injected an INTn, the INTn + * hit a #NP in the guest, and the #NP encountered a #PF, the #NP will + * be the reported vectored event, but RIP still needs to be unwound. + */ + if (soft_int_injected && + kvm_is_linear_rip(vcpu, to_svm(vcpu)->soft_int_linear_rip)) + kvm_rip_write(vcpu, kvm_rip_read(vcpu) - soft_int_injected); + kvm_make_request(KVM_REQ_EVENT, vcpu); vector = exitintinfo & SVM_EXITINTINFO_VEC_MASK; @@ -3711,9 +3762,9 @@ static void svm_complete_interrupts(struct kvm_vcpu *vcpu) * hit a #NP in the guest, and the #NP encountered a #PF, the #NP will * be the reported vectored event, but RIP still needs to be unwound. */ - if (int3_injected && type == SVM_EXITINTINFO_TYPE_EXEPT && - kvm_is_linear_rip(vcpu, svm->int3_rip)) - kvm_rip_write(vcpu, kvm_rip_read(vcpu) - int3_injected); + if (soft_int_injected && type == SVM_EXITINTINFO_TYPE_EXEPT && + kvm_is_linear_rip(vcpu, svm->soft_int_linear_rip)) + kvm_rip_write(vcpu, kvm_rip_read(vcpu) - soft_int_injected); switch (type) { case SVM_EXITINTINFO_TYPE_NMI: @@ -3726,14 +3777,6 @@ static void svm_complete_interrupts(struct kvm_vcpu *vcpu) if (vector == X86_TRAP_VC) break; - /* - * In case of software exceptions, do not reinject the vector, - * but re-execute the instruction instead. Rewind RIP first - * if we emulated INT3 before. - */ - if (kvm_exception_is_soft(vector)) - break; - if (exitintinfo & SVM_EXITINTINFO_VALID_ERR) { u32 err = svm->vmcb->control.exit_int_info_err; kvm_requeue_exception_e(vcpu, vector, err); diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 47e7427d0395..a770a1c7ddd2 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -230,8 +230,8 @@ struct vcpu_svm { bool nmi_singlestep; u64 nmi_singlestep_guest_rflags; - unsigned int3_injected; - unsigned long int3_rip; + unsigned soft_int_injected; + unsigned long soft_int_linear_rip; /* optional nested SVM features that are enabled for this guest */ bool nrips_enabled : 1; -- 2.35.1.1094.g7c7d902a7c-goog