Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp577271pxy; Wed, 5 May 2021 08:46:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxa2ANM+harzU7nJuATmBUFoyUxXVbbRb3jxKxg0+lEQsfstcdhG4WsFawfnVo3WBnugpe8 X-Received: by 2002:a05:6402:1354:: with SMTP id y20mr14889921edw.115.1620229610435; Wed, 05 May 2021 08:46:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620229610; cv=none; d=google.com; s=arc-20160816; b=gtbsFLK5TLTZAiTmggm2utc6d0Cv2J6yAJwpVV2VIcr/agxwm6oz5zBbhgDLotYkjR r3v5c68KcpiJQHbf9sHcP0mqxT3cEZBMNtwgBH4yMMWrrxoHw2kKn1O0uPsbWGmYLQ7l lAUsVslpU1+v9qtdQwVUIrnpmubDHqnHxbQt+OC0Fn8QsIQcadLJCvFthnAP6zusX8Rv fMbSaZx0hIj5rOJal6YKgJG/GMi4jNj9t/P/0E86LEcPMDBCunCoTOOWFQy9j2aZZLgO EpJcU7GMJAMI7b57y9pj3zps13P1qdTTfMEzmWtMYz3Xel/b0R+eVTsnuwGGczsWOu4f AsGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=HdqA1srNGPjx9dfQaTB/SM+pTfP6lmj2FUwKVTO383E=; b=OBiApN+i4Z+AvH1OOXKtD8E/PHW1f2uKuWigBaVaR1ySsSytihERZWVxCenLOTWar1 RfIN2/IZWmlnh+7Fr0dB18ZJZOtqyaj9tYL0b/jJsfU4O7aWwpyHkaQSPkpuCnX/aDgT IvzxYXfPDKSalDtL7Cq9WkCaSMjDQ5F89K0vjIC6+P/Wg5Y75t73u8a8WZ3s/NysDe7c bRY6aikylMK9vh9wlIYvDyPMpMEqqoefmbGWRj/ho79+1my+epvs/P6hWhZzilELeZ0l If7qU4vB4LmAqWpaKaGSQyRs2R9Ij8oEEtRP80jk9x9Tyw1roPX8xtuCf3NQ0RMHAzIb EhFQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dn27si13845041edb.326.2021.05.05.08.46.26; Wed, 05 May 2021 08:46:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233597AbhEEPpx (ORCPT + 99 others); Wed, 5 May 2021 11:45:53 -0400 Received: from out30-130.freemail.mail.aliyun.com ([115.124.30.130]:60185 "EHLO out30-130.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233466AbhEEPpv (ORCPT ); Wed, 5 May 2021 11:45:51 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R611e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04395;MF=laijs@linux.alibaba.com;NM=1;PH=DS;RN=25;SR=0;TI=SMTPD_---0UXoki5R_1620229489; Received: from C02XQCBJJG5H.local(mailfrom:laijs@linux.alibaba.com fp:SMTPD_---0UXoki5R_1620229489) by smtp.aliyun-inc.com(127.0.0.1); Wed, 05 May 2021 23:44:50 +0800 Subject: Re: [PATCH] KVM/VMX: Invoke NMI non-IST entry instead of IST entry To: Thomas Gleixner , Paolo Bonzini , Sean Christopherson Cc: Andy Lutomirski , Maxim Levitsky , Lai Jiangshan , linux-kernel@vger.kernel.org, Steven Rostedt , Andi Kleen , Andy Lutomirski , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , Josh Poimboeuf , Uros Bizjak , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" , Peter Zijlstra , Alexandre Chartre , Juergen Gross , Joerg Roedel , Jian Cai References: <38B9D60F-F24F-4910-B2DF-2A57F1060452@amacapital.net> <625057c7-ea40-4f37-8bea-cddecfe1b855@redhat.com> <5d7ca301-a0b2-d389-3bc2-feb304c9f5b5@redhat.com> <87im3yhwxh.ffs@nanos.tec.linutronix.de> From: Lai Jiangshan Message-ID: <91013efa-da53-2a3a-0e65-1ddb4318cb70@linux.alibaba.com> Date: Wed, 5 May 2021 23:44:49 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 MIME-Version: 1.0 In-Reply-To: <87im3yhwxh.ffs@nanos.tec.linutronix.de> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021/5/5 08:00, Thomas Gleixner wrote: > On Tue, May 04 2021 at 23:56, Paolo Bonzini wrote: >> On 04/05/21 23:51, Sean Christopherson wrote: >>> On Tue, May 04, 2021, Paolo Bonzini wrote: >>>> On 04/05/21 23:23, Andy Lutomirski wrote: >>>>>> On May 4, 2021, at 2:21 PM, Sean Christopherson wrote: >>>>>> FWIW, NMIs are masked if the VM-Exit was due to an NMI. >>>> >>>> Huh, indeed: "An NMI causes subsequent NMIs to be blocked, but only after >>>> the VM exit completes". >>>> >>>>> Then this whole change is busted, since nothing will unmask NMIs. Revert it? >>>> Looks like the easiest way out indeed. >>> >>> I've no objection to reverting to intn, but what does reverting versus handling >>> NMI on the kernel stack have to do with NMIs being blocked on VM-Exit due to NMI? >>> I'm struggling mightily to connect the dots. >> >> Nah, you're right: vmx_do_interrupt_nmi_irqoff will not call the handler >> directly, rather it calls the IDT entrypoint which *will* do an IRET and >> unmask NMIs. I trusted Andy too much on this one. :) >> >> Thomas's posted patch ("[PATCH] KVM/VMX: Invoke NMI non-IST entry >> instead of IST entry") looks good. > > Well, looks good is one thing. > > It would be more helpful if someone would actually review and/or test it. > > Thanks, > > tglx > I tested it with the following testing-patch applied, it shows that the problem is fixed. The only one line of code in vmenter.S in the testing-patch just emulates the situation that a "uninitialized" garbage in the kernel stack happens to be 1 and it happens to be at the same location of the RSP-located "NMI executing" variable. First round: # apply the testing-patch # perf record events of a vm which does kbuild inside # dmesg shows that there are the same number of "kvm nmi" and "kvm nmi miss" It shows that the problem exists with regard to the invocation of the NMI handler. Second Round: # apply the fix from tglx # apply the testing-patch # perf record events of a vm which does kbuild inside # dmesg shows that there are some "kvm nmi" but no "kvm nmi miss". It shows that the problem is fixed. diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S index 3a6461694fc2..32096049c2a2 100644 --- a/arch/x86/kvm/vmx/vmenter.S +++ b/arch/x86/kvm/vmx/vmenter.S @@ -316,6 +316,7 @@ SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff) #endif pushf push $__KERNEL_CS + movq $1, -24(%rsp) // "NMI executing": 1 = nested, non-1 = not-nested CALL_NOSPEC _ASM_ARG1 /* diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 8586eca349a9..eefd22d22fce 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6439,8 +6439,17 @@ static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu) if (vmx->exit_reason.basic == EXIT_REASON_EXTERNAL_INTERRUPT) handle_external_interrupt_irqoff(vcpu); - else if (vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI) + else if (vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI) { + unsigned long count = this_cpu_read(irq_stat.__nmi_count); + handle_exception_nmi_irqoff(vmx); + + if (is_nmi(vmx_get_intr_info(&vmx->vcpu))) { + pr_info("kvm nmi\n"); + if (count == this_cpu_read(irq_stat.__nmi_count)) + pr_info("kvm nmi miss\n"); + } + } } /*