Date:   Wed, 8 Feb 2023 16:06:12 +0000
From:   Sean Christopherson <seanjc@google.com>
To:     Santosh Shukla <santosh.shukla@amd.com>
Cc:     Maxim Levitsky <mlevitsk@redhat.com>, kvm@vger.kernel.org,
        Sandipan Das <sandipan.das@amd.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Jim Mattson <jmattson@google.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Dave Hansen <dave.hansen@linux.intel.com>,
        Borislav Petkov <bp@alien8.de>,
        Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
        Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>,
        Josh Poimboeuf <jpoimboe@kernel.org>,
        Daniel Sneddon <daniel.sneddon@linux.intel.com>,
        Jiaxi Chen <jiaxi.chen@linux.intel.com>,
        Babu Moger <babu.moger@amd.com>, linux-kernel@vger.kernel.org,
        Jing Liu <jing2.liu@intel.com>,
        Wyes Karny <wyes.karny@amd.com>, x86@kernel.org,
        "H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection
 interface
Message-ID: <Y+PIdJZtCsGH2Sw3@google.com>
References: <20221129193717.513824-1-mlevitsk@redhat.com>
 <20221129193717.513824-8-mlevitsk@redhat.com>
 <Y9mWFlGdzoa8ZDW7@google.com>
 <a59505b3-5405-0409-bbf1-34466932c2c1@amd.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <a59505b3-5405-0409-bbf1-34466932c2c1@amd.com>
Precedence: bulk

On Wed, Feb 08, 2023, Santosh Shukla wrote:
> On 2/1/2023 3:58 AM, Sean Christopherson wrote:
> > On Tue, Nov 29, 2022, Maxim Levitsky wrote:
> >> @@ -5191,9 +5191,12 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
> >>  
> >>  	vcpu->arch.nmi_injected = events->nmi.injected;
> >>  	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING)
> >> -		vcpu->arch.nmi_pending = events->nmi.pending;
> >> +		atomic_add(events->nmi.pending, &vcpu->arch.nmi_queued);
> >> +
> >>  	static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
> >>  
> >> +	process_nmi(vcpu);
> > 
> > Argh, having two process_nmi() calls is ugly (not blaming your code, it's KVM's
> > ABI that's ugly).  E.g. if we collapse this down, it becomes:
> > 
> > 	process_nmi(vcpu);
> > 
> > 	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
> > 		<blah blah blah>
> > 	}
> > 	static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
> > 
> > 	process_nmi(vcpu);
> > 
> > And the second mess is that V_NMI needs to be cleared.
> > 
> 
> Can you please elaborate on "V_NMI cleared" scenario? Are you mentioning
> about V_NMI_MASK or svm->nmi_masked?

V_NMI_MASK.  KVM needs to purge any pending virtual NMIs when userspace sets
vCPU event state and KVM_VCPUEVENT_VALID_NMI_PENDING is set.

> > The first process_nmi() effectively exists to (a) purge nmi_queued and (b) keep
> > nmi_pending if KVM_VCPUEVENT_VALID_NMI_PENDING is not set.  I think we can just
> > replace that with an set of nmi_queued, i.e.
> > 
> > 	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
> > 		vcpu->arch-nmi_pending = 0;	
> > 		atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending);
> > 		process_nmi();
> > 
> You mean replace above process_nmi() with kvm_make_request(KVM_REQ_NMI, vcpu), right?
> I'll try with above proposal.

Yep, if that works.  Actually, that might be a requirement.  There's a

  static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);

lurking below this.  Invoking process_nmi() before NMI blocking is updated could
result in KVM incorrectly dropping/keeping NMIs.  I don't think it would be a
problem in practice since KVM save only one NMI, but userspace could stuff NMIs.

Huh.  The the existing code is buggy.  events->nmi.pending is a u8, and
arch.nmi_pending is an unsigned int.  KVM doesn't cap the incoming value, so
userspace could set up to 255 pending NMIs.  The extra weird part is that the extra
NMIs will get dropped the next time KVM stumbles through process_nmi().

Amusingly, KVM only saves one pending NMI, i.e. in a true migration scenario KVM
may drop an NMI.

  events->nmi.pending = vcpu->arch.nmi_pending != 0;

The really amusing part is that that code was added by 7460fb4a3400 ("KVM: Fix
simultaneous NMIs").  The only thing I can figure is that KVM_GET_VCPU_EVENTS was
somewhat blindly updated without much thought about what should actually happen.

So, can you slide the below in early in the series?  Then in this series, convert
to the above suggested flow (zero nmi_pending, stuff nmi_queued) in another patch?

From: Sean Christopherson <seanjc@google.com>
Date: Wed, 8 Feb 2023 07:44:16 -0800
Subject: [PATCH] KVM: x86: Save/restore all NMIs when multiple NMIs are
 pending

Save all pending NMIs in KVM_GET_VCPU_EVENTS, and queue KVM_REQ_NMI if one
or more NMIs are pending after KVM_SET_VCPU_EVENTS in order to re-evaluate
pending NMIs with respect to NMI blocking.

KVM allows multiple NMIs to be pending in order to faithfully emulate bare
metal handling of simultaneous NMIs (on bare metal, truly simultaneous
NMIs are impossible, i.e. one will always arrive first and be consumed).
Support for simultaneous NMIs botched the save/restore though.  KVM only
saves one pending NMI, but allows userspace to restore 255 pending NMIs
as kvm_vcpu_events.nmi.pending is a u8, and KVM's internal state is stored
in an unsigned int.

7460fb4a3400 ("KVM: Fix simultaneous NMIs")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 508074e47bc0..e9339acbf82a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5115,7 +5115,7 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
 	events->interrupt.shadow = static_call(kvm_x86_get_interrupt_shadow)(vcpu);
 
 	events->nmi.injected = vcpu->arch.nmi_injected;
-	events->nmi.pending = vcpu->arch.nmi_pending != 0;
+	events->nmi.pending = vcpu->arch.nmi_pending;
 	events->nmi.masked = static_call(kvm_x86_get_nmi_mask)(vcpu);
 
 	/* events->sipi_vector is never valid when reporting to user space */
@@ -5202,8 +5202,11 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
 						events->interrupt.shadow);
 
 	vcpu->arch.nmi_injected = events->nmi.injected;
-	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING)
+	if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
 		vcpu->arch.nmi_pending = events->nmi.pending;
+		if (vcpu->arch.nmi_pending)
+			kvm_make_request(KVM_REQ_NMI, vcpu);
+	}
 	static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
 
 	if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR &&

base-commit: 6c77ae716d546d71b21f0c9ee7d405314a3f3f9e
--