Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp1264861pxb; Wed, 6 Apr 2022 13:04:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwmq2BMjxgZo7qP54h0zGGssyNffFhqo0WA96PMJe8hZynOVUp6tZrwOOmDPDarNwwywTjj X-Received: by 2002:a05:6402:348b:b0:419:172c:e2aa with SMTP id v11-20020a056402348b00b00419172ce2aamr10678479edc.261.1649275466556; Wed, 06 Apr 2022 13:04:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649275466; cv=none; d=google.com; s=arc-20160816; b=ATzPI6PRcSeWujeSi5/KfKFmqhCb9Fmsra0rYwQIz1tJG0vs6L0wEEc3+pnAxXCfgt n0+8YsbhOaU1Fx7uxSaFwtw7aTD6F2FhY6DFvaXF2tLXlzct3I+wRD6+sjKWXnEi+P1l YWPbKklHgRz1LEa5Nbv5PMkZ4iEsFiEut9045eY/qHBEKTs3d7O9TafKXrXrWQ5p2jDX RgN98JUBFAYiGc8QLJm/Kco2DgsnFZoAGu1qUgCMupNbb4FugjPnNlIArXI7rilg1jxb gFmenxaTyb1RJpzF38UTTg3dYbOkhcgeHjvvbH9BUvMQBXwYjpgK59DmvIPTlJj1UFBj 6p3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=7LtKSzkUnN8oqI99/lZipJjXqpkcAstNhdyBktayhLU=; b=T5ji54VdoxMja/lloz8gxZ15eC+NSxEWQIm8BUpuNGxQvxgPJsEGvs1NhpEsFhXy5V +Llm2tE52nfYP7YDjKWWDFHnSAqOehMJCWGyqrnZNoGZ6gFjQT+j+2rkxNnlQLuYBFbj XLAHaXBzKOTcvPE4dqH10I0OQbzc/bZ9w3jExRA7mA6r+Lmyq4dCwqLJUzdKcKldil3J DCiIJdBqVfohJN6qaAqhBE71sq5BQtjFjQk9x9CTBZqF86rxtpKQ9DacDS3zjWE6za/V ISBb5pH24cDaWEU1SIoiPhTtp78e512hud4A+506AmmwwF8sgLZacJfxTDrWc6pLVxx+ 6gig== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=szyYYejN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p12-20020a50d88c000000b00419dd616f19si12234489edj.378.2022.04.06.13.04.00; Wed, 06 Apr 2022 13:04:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=szyYYejN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232562AbiDFTu6 (ORCPT + 99 others); Wed, 6 Apr 2022 15:50:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55928 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232849AbiDFTu2 (ORCPT ); Wed, 6 Apr 2022 15:50:28 -0400 Received: from mail-pf1-x436.google.com (mail-pf1-x436.google.com [IPv6:2607:f8b0:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 25147127BCF for ; Wed, 6 Apr 2022 10:10:20 -0700 (PDT) Received: by mail-pf1-x436.google.com with SMTP id y10so3027695pfa.7 for ; Wed, 06 Apr 2022 10:10:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=7LtKSzkUnN8oqI99/lZipJjXqpkcAstNhdyBktayhLU=; b=szyYYejNiRJ9KPBM4F/rUVAjpTskMrE9UtUPw6chSBquxFQy2rNV4WB6FBFTREc2PM yCvC3xibLom8551y6ZqowtIEtDorpHSETyiAWhWBNS01oCD044+A+bm4esUo9IcMQpYK 99qhJtMw7bRljfL8WnZqnC6QYMF1osRRMpKOdoZVQb8L4JBkhQ5lVd6jQrYU9GO3SGPS LEnsaDsSTG3pMbalYhqWDfFaQd1N6cEdIVPFIb3IY0W9je9D0gyb26C/6XxDat0ek2Nh 6aasrKc3W6Y1/qfAiWCjaGNc9fdWkiYLYjCeJ9jazocZd9/j2jVEEytDBLimAiX97QCo GSkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=7LtKSzkUnN8oqI99/lZipJjXqpkcAstNhdyBktayhLU=; b=4nJV4HGoC/6NXuv0O7JcE9l1/7gnkId+qUEo+3nn9UtB0MuNX3ll7PanEIDWZ0/tj6 g1ZFf2WxL78up7JU4jh+PuGkDechFGOgl5kD2QLZ4MO4+063TuHQiky0A0L3kZjv9nVg /O05ca2VmljjnwE0L5UhUH8v6Xz2rrlO4lV926AGNVgHMYrGylF6vO+XUUTUN8v/KU52 cV3SYNfG8tQVG2cejv8mja/i70SIOCXZJfv9G1Q/biC+4LYjrMiIlBrIJ+Wv9woFI0/B goTVBcwpODNvhTWBfANfA6Eo8EcH9jQLUfnOAxf3fWv0bf13L3AE0o6kRysrOwq2SmdM Fg2Q== X-Gm-Message-State: AOAM533b7XGXs8idGtSgPpY32Qn0UuJs6W+ayt6X2md+m2TtcyEEv88a lLqu3ELctxD6OZlEwfb39VgI0g== X-Received: by 2002:a63:e30a:0:b0:385:fcae:d4a9 with SMTP id f10-20020a63e30a000000b00385fcaed4a9mr8191931pgh.85.1649265019331; Wed, 06 Apr 2022 10:10:19 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id z2-20020aa79902000000b004fb05c04b53sm21035334pff.103.2022.04.06.10.10.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Apr 2022 10:10:18 -0700 (PDT) Date: Wed, 6 Apr 2022 17:10:15 +0000 From: Sean Christopherson To: "Maciej S. Szmigiero" Cc: Maxim Levitsky , Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 5/8] KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction Message-ID: References: <20220402010903.727604-1-seanjc@google.com> <20220402010903.727604-6-seanjc@google.com> <7caee33a-da0f-00be-3195-82c3d1cd4cb4@maciej.szmigiero.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 06, 2022, Maciej S. Szmigiero wrote: > On 6.04.2022 03:48, Sean Christopherson wrote: > > On Mon, Apr 04, 2022, Maciej S. Szmigiero wrote: > (..) > > > Also, I'm not sure that even the proposed updated code above will > > > actually restore the L1-requested next_rip correctly on L1 -> L2 > > > re-injection (will review once the full version is available). > > > > Spoiler alert, it doesn't. Save yourself the review time. :-) > > > > The missing piece is stashing away the injected event on nested VMRUN. Those > > events don't get routed through the normal interrupt/exception injection code and > > so the next_rip info is lost on the subsequent #NPF. > > > > Treating soft interrupts/exceptions like they were injected by KVM (which they > > are, technically) works and doesn't seem too gross. E.g. when prepping vmcb02 > > > > if (svm->nrips_enabled) > > vmcb02->control.next_rip = svm->nested.ctl.next_rip; > > else if (boot_cpu_has(X86_FEATURE_NRIPS)) > > vmcb02->control.next_rip = vmcb12_rip; > > > > if (is_evtinj_soft(vmcb02->control.event_inj)) { > > svm->soft_int_injected = true; > > svm->soft_int_csbase = svm->vmcb->save.cs.base; > > svm->soft_int_old_rip = vmcb12_rip; > > if (svm->nrips_enabled) > > svm->soft_int_next_rip = svm->nested.ctl.next_rip; > > else > > svm->soft_int_next_rip = vmcb12_rip; > > } > > > > And then the VMRUN error path just needs to clear soft_int_injected. > > I am also a fan of parsing EVENTINJ from VMCB12 into relevant KVM > injection structures (much like EXITINTINFO is parsed), as I said to > Maxim two days ago [1]. > Not only for software {interrupts,exceptions} but for all incoming > events (again, just like EXITINTINFO). Ahh, I saw that fly by, but somehow I managed to misread what you intended. I like the idea of populating vcpu->arch.interrupt/exception as "injected" events. KVM prioritizes "injected" over other nested events, so in theory it should work without too much fuss. I've ran through a variety of edge cases in my head and haven't found anything that would be fundamentally broken. I think even live migration would work. I think I'd prefer to do that in a follow-up series so that nVMX can be converted at the same time? It's a bit absurd to add the above soft int code knowing that, at least in theory, simply populating the right software structs would automagically fix the bug. But manually handling the soft int case first would be safer in the sense that we'd still have a fix for the soft int case if it turns out that populating vcpu->arch.interrupt/exception isn't as straightfoward as it seems. > However, there is another issue related to L1 -> L2 event re-injection > using standard KVM event injection mechanism: it mixes the L1 injection > state with the L2 one. > > Specifically for SVM: > * When re-injecting a NMI into L2 NMI-blocking is enabled in > vcpu->arch.hflags (shared between L1 and L2) and IRET intercept is > enabled. > > This is incorrect, since it is L1 that is responsible for enforcing NMI > blocking for NMIs that it injects into its L2. Ah, I see what you're saying. I think :-) IIUC, we can fix this bug without any new flags, just skip the side effects if the NMI is being injected into L2. @@ -3420,6 +3424,10 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu) struct vcpu_svm *svm = to_svm(vcpu); svm->vmcb->control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI; + + if (is_guest_mode(vcpu)) + return; + vcpu->arch.hflags |= HF_NMI_MASK; if (!sev_es_guest(vcpu->kvm)) svm_set_intercept(svm, INTERCEPT_IRET); and for nVMX: @@ -4598,6 +4598,9 @@ static void vmx_inject_nmi(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); + if (is_guest_mode(vcpu)) + goto inject_nmi; + if (!enable_vnmi) { /* * Tracking the NMI-blocked state in software is built upon @@ -4619,6 +4622,7 @@ static void vmx_inject_nmi(struct kvm_vcpu *vcpu) return; } +inject_nmi: vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR); > Also, *L2* being the target of such injection definitely should not block > further NMIs for *L1*. Actually, it should block NMIs for L1. From L1's perspective, the injection is part of VM-Entry. That's a single gigantic instruction, thus there is no NMI window until VM-Entry completes from L1's perspetive. Any exit that occurs on vectoring an injected event and is handled by L0 should not be visible to L1, because from L1's perspective it's all part of VMRUN/VMLAUNCH/VMRESUME. So blocking new events because an NMI (or any event) needs to be reinjected for L2 is correct. > * When re-injecting a *hardware* IRQ into L2 GIF is checked (previously > even on the BUG_ON() level), while L1 should be able to inject even when > L2 GIF is off, Isn't that just a matter of tweaking the assertion to ignore GIF if L2 is active? Hmm, or deleting the assertion altogether, it's likely doing more harm than good at this point. > With the code in my previous patch set I planned to use > exit_during_event_injection() to detect such case, but if we implement > VMCB12 EVENTINJ parsing we can simply add a flag that the relevant event > comes from L1, so its normal injection side-effects should be skipped. Do we still need a flag based on the above? Honest question... I've been staring at all this for the better part of an hour and may have lost track of things. > By the way, the relevant VMX code also looks rather suspicious, > especially for the !enable_vnmi case. I think it's safe to say we can ignore edge cases for !enable_vnmi. It might even be worth trying to remove that support again (Paolo tried years ago), IIRC the only Intel CPUs that don't support virtual NMIs are some funky Yonah SKUs. > Thanks, > Maciej > > [1]: https://lore.kernel.org/kvm/7d67bc6f-00ac-7c07-f6c2-c41b2f0d35a1@maciej.szmigiero.name/