Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp3504778pxb; Mon, 4 Apr 2022 19:07:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxf2m3ERmkw1olz19GChhMnvIvxI4QkZzGZu1YsgW0TaSxo5skhLF/voJHdGkxT8qRUwHDr X-Received: by 2002:a63:7e41:0:b0:398:2829:2d94 with SMTP id o1-20020a637e41000000b0039828292d94mr955062pgn.173.1649124434635; Mon, 04 Apr 2022 19:07:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649124434; cv=none; d=google.com; s=arc-20160816; b=iQQZ31+KbgvLhhsUPcnuORf//57GCbFZFjwcBr5w2WM778lSb+We3DwvxcKhcXfIG6 1QQPd9qWKOO5vUXhBP5ZT2aFIIDdtXJMTituxDz62//hGs7hVwBAERRzWk/pQfsURlGi T0aQAi6QQ9hMQRVEpWtSfy30F1dx/7Qq6y/wfo3uLJTzxn3es30G0Qnw3YPovW1fQhHF CLAl2F+l7yyeLzluoKnRinFjhx7vpn2nZ4/L4gSnk2lLjY97fmxHi0/OMDMi48y3jOCY 3jRN1fVRsOTivbk9WayU0gGULQ5QaGJnmuwYVP/rUoC8+nTvijs1ls5S4OJ5CF867y2K nApw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=GyyWXXgVY41si06TKKxTl4RrtLtSdAhIwBpd0+Qeq8M=; b=Sj/9mKR7gxgAEaZZMpaJAqJr5Ko9CEciwm/BetfLLKfrXYR5AGof/KMROYpAbtSh7T jCPfhlvm3Ldb35UAMB0fQy6/mC0XSvtJalFAahLoU+CN6BByP4fzF0w2OvWXMaOCt3aN NPyGRCVJv8J2dvADEpm6S8+2hFJ/p9Fe/TCQCayqGllgmMyScWYvr7tTQbRk4b8xWuV6 6mGdC8j4J1CHGXKNkW6rkNezGqn5C6AcyeV5u9O83krzSLu4BvHyeyvad1U3MVEvxNcN OocV3ZqG/jiemVebLzLQtMY1KuzT2jq3V3F8oSj86OpBdiRJE/Ba8wr5uMS1yU/v0WCx UBxQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=YpLe7N7F; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id b3-20020a63cf43000000b00398b6b8cdf4si11759189pgj.224.2022.04.04.19.07.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Apr 2022 19:07:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=YpLe7N7F; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7B9A722C6DF; Mon, 4 Apr 2022 17:29:42 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1381945AbiDDVY3 (ORCPT + 99 others); Mon, 4 Apr 2022 17:24:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43530 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1379250AbiDDQvK (ORCPT ); Mon, 4 Apr 2022 12:51:10 -0400 Received: from mail-pg1-x52c.google.com (mail-pg1-x52c.google.com [IPv6:2607:f8b0:4864:20::52c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 16B7913D78 for ; Mon, 4 Apr 2022 09:49:12 -0700 (PDT) Received: by mail-pg1-x52c.google.com with SMTP id k14so8856015pga.0 for ; Mon, 04 Apr 2022 09:49:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=GyyWXXgVY41si06TKKxTl4RrtLtSdAhIwBpd0+Qeq8M=; b=YpLe7N7FCmdWzRckwigPohHpFLT/ewO+3vpdF3cOsR7PRQnnsZHAxVS2ZCCSGZU1jT ZUcdY5c52zKivBK9xojZ4U9VUl0Tt/zlrvHqVFr7BW6DBGJ+JcmCe/oOT7Dg39IywNTP Lc0sbC+8WKcihbGFYJJxQlL9N/+aQYvGr+gV9maw+vEdKLe8hYCYs7FBw8T2d3CrbKZp 99cX/mEXmI6ernwHjdoEb9oYoQpnkC26f5mwF+HWrCyIl0DghUM6qgCptbyYUT063qQt 0gbSHfTYMh7lSfWO0KvNP6SysKXWKvpkEIQwpEZOjNlDSl59zBr/3GlQ8uWBprvMTDjm nqCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=GyyWXXgVY41si06TKKxTl4RrtLtSdAhIwBpd0+Qeq8M=; b=ohQIyFYggtkg9+VNq0K+FIU+FnG+y/DF0deNTvnU+XI89TYvZpHf1jHgzFneHnsoaZ AqxMTDE16jvAAMpCkvTq0nmBeK/3M3uMFOWhIjErffz2+Hfc6cIlvkdv6j1XvxiFYLUI bMfkw/nWGj+2LyIUeSDusfNyjb7PUFh4s0XQBYbWVl35uGUTtGML31lp3nS9KiFUxmTJ XI2SbnG9D+DI2oaBFT+7eUuATkuhh9AFIAC8wG+KwegN0h7/5DdZcGR0Z/iCDgUsB3RR sqridTkoBSJoNemBXKo7GhT/qx/lH8AfztcABIjm2Ii7y6bpJZdtfLQUCJ4wIign2tBd 6wCw== X-Gm-Message-State: AOAM530egUXBikfoutuQY8O1yM2+pzOTTgMKZDJAksja/m4UzUDlEYeH d+0yv2uhRVlE159gGPL3O5psxg== X-Received: by 2002:a05:6a00:8c5:b0:4fe:134d:30d3 with SMTP id s5-20020a056a0008c500b004fe134d30d3mr946238pfu.3.1649090952121; Mon, 04 Apr 2022 09:49:12 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id y2-20020a056a00190200b004fa865d1fd3sm12837962pfi.86.2022.04.04.09.49.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Apr 2022 09:49:11 -0700 (PDT) Date: Mon, 4 Apr 2022 16:49:06 +0000 From: Sean Christopherson To: Maxim Levitsky Cc: Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, "Maciej S . Szmigiero" Subject: Re: [PATCH 5/8] KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction Message-ID: References: <20220402010903.727604-1-seanjc@google.com> <20220402010903.727604-6-seanjc@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 04, 2022, Maxim Levitsky wrote: > On Sat, 2022-04-02 at 01:09 +0000, Sean Christopherson wrote: > > Re-inject INT3/INTO instead of retrying the instruction if the CPU > > encountered an intercepted exception while vectoring the software > > exception, e.g. if vectoring INT3 encounters a #PF and KVM is using > > shadow paging. Retrying the instruction is architecturally wrong, e.g. > > will result in a spurious #DB if there's a code breakpoint on the INT3/O, > > and lack of re-injection also breaks nested virtualization, e.g. if L1 > > injects a software exception and vectoring the injected exception > > encounters an exception that is intercepted by L0 but not L1. > > > > Due to, ahem, deficiencies in the SVM architecture, acquiring the next > > RIP may require flowing through the emulator even if NRIPS is supported, > > as the CPU clears next_rip if the VM-Exit is due to an exception other > > than "exceptions caused by the INT3, INTO, and BOUND instructions". To > > deal with this, "skip" the instruction to calculate next_ript, and then > > unwind the RIP write and any side effects (RFLAGS updates). ... > > @@ -3698,6 +3737,18 @@ static void svm_complete_interrupts(struct kvm_vcpu *vcpu) > > if (!(exitintinfo & SVM_EXITINTINFO_VALID)) > > return; > > > > + /* > > + * If NextRIP isn't enabled, KVM must manually advance RIP prior to > > + * injecting the soft exception/interrupt. That advancement needs to > > + * be unwound if vectoring didn't complete. Note, the _new_ event may > > + * not be the injected event, e.g. if KVM injected an INTn, the INTn > > + * hit a #NP in the guest, and the #NP encountered a #PF, the #NP will > > + * be the reported vectored event, but RIP still needs to be unwound. > > + */ > > + if (soft_int_injected && > > + kvm_is_linear_rip(vcpu, to_svm(vcpu)->soft_int_linear_rip)) > > + kvm_rip_write(vcpu, kvm_rip_read(vcpu) - soft_int_injected); Doh, I botched my last minute rebase. This is duplicate code and needs to be dropped. > > + > > kvm_make_request(KVM_REQ_EVENT, vcpu); > > > > vector = exitintinfo & SVM_EXITINTINFO_VEC_MASK; > > @@ -3711,9 +3762,9 @@ static void svm_complete_interrupts(struct kvm_vcpu *vcpu) > > * hit a #NP in the guest, and the #NP encountered a #PF, the #NP will > > * be the reported vectored event, but RIP still needs to be unwound. > > */ > > - if (int3_injected && type == SVM_EXITINTINFO_TYPE_EXEPT && > > - kvm_is_linear_rip(vcpu, svm->int3_rip)) > > - kvm_rip_write(vcpu, kvm_rip_read(vcpu) - int3_injected); > > + if (soft_int_injected && type == SVM_EXITINTINFO_TYPE_EXEPT && > > + kvm_is_linear_rip(vcpu, svm->soft_int_linear_rip)) > > + kvm_rip_write(vcpu, kvm_rip_read(vcpu) - soft_int_injected); > > > > switch (type) { > > case SVM_EXITINTINFO_TYPE_NMI: > > @@ -3726,14 +3777,6 @@ static void svm_complete_interrupts(struct kvm_vcpu *vcpu) > > if (vector == X86_TRAP_VC) > > break; > > > > - /* > > - * In case of software exceptions, do not reinject the vector, > > - * but re-execute the instruction instead. Rewind RIP first > > - * if we emulated INT3 before. > > - */ > > - if (kvm_exception_is_soft(vector)) > > - break; > > - > > if (exitintinfo & SVM_EXITINTINFO_VALID_ERR) { > > u32 err = svm->vmcb->control.exit_int_info_err; > > kvm_requeue_exception_e(vcpu, vector, err); > > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h > > index 47e7427d0395..a770a1c7ddd2 100644 > > --- a/arch/x86/kvm/svm/svm.h > > +++ b/arch/x86/kvm/svm/svm.h > > @@ -230,8 +230,8 @@ struct vcpu_svm { > > bool nmi_singlestep; > > u64 nmi_singlestep_guest_rflags; > > > > - unsigned int3_injected; > > - unsigned long int3_rip; > > + unsigned soft_int_injected; > > + unsigned long soft_int_linear_rip; > > > > /* optional nested SVM features that are enabled for this guest */ > > bool nrips_enabled : 1; > > > I mostly agree with this patch, but think that it doesn't address the > original issue that Maciej wanted to address: > > Suppose that there is *no* instruction in L2 code which caused the software > exception, but rather L1 set arbitrary next_rip, and set EVENTINJ to software > exception with some vector, and that injection got interrupted. > > I don't think that this code will support this. Argh, you're right. Maciej's selftest injects without an instruction, but it doesn't configure the scenario where that injection fails due to an exception+VM-Exit that isn't intercepted by L1 and is handled by L0. The event_inj test gets the coverage for the latter, but always has a backing instruction. > I think that svm_complete_interrupts should store next_rip it in some field > like VMX does (vcpu->arch.event_exit_inst_len). Yeah. The ugly part is that because next_rip is guaranteed to be cleared on exit (the exit is gauranteed to be due to a fault-like exception), KVM has to snapshot next_rip during the "original" injection and use the linear_rip matching heuristic to detect this scenario. > That field also should be migrated, or we must prove that it works anyway. > E.g, what happens when we tried to inject event, > injection was interrupted by other exception, and then we migrate? Ya, should Just Work if control.next_rip is used to cache the next rip. Handling this doesn't seem to be too awful (haven't tested yet), it's largely the same logic as the existing !nrips code. In svm_update_soft_interrupt_rip(), snapshot all information regardless of whether or not nrips is enabled: svm->soft_int_injected = true; svm->soft_int_csbase = svm->vmcb->save.cs.base; svm->soft_int_old_rip = old_rip; svm->soft_int_next_rip = rip; if (nrips) kvm_rip_write(vcpu, old_rip); if (static_cpu_has(X86_FEATURE_NRIPS)) svm->vmcb->control.next_rip = rip; and then in svm_complete_interrupts(), change the linear RIP matching code to look for the old rip in the nrips case and stuff svm->vmcb->control.next_rip on match. bool soft_int_injected = svm->soft_int_injected; unsigned soft_int_rip; svm->soft_int_injected = false; if (soft_int_injected) { if (nrips) soft_int_rip = svm->soft_int_old_rip; else soft_int_rip = svm->soft_int_next_rip; } ... if soft_int_injected && type == SVM_EXITINTINFO_TYPE_EXEPT && kvm_is_linear_rip(vcpu, soft_int_rip + svm->soft_int_csbase)) { if (nrips) svm->vmcb->control.next_rip = svm->soft_int_next_rip; else kvm_rip_write(vcpu, svm->soft_int_old_rip); }