Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp2220254rwb; Thu, 17 Nov 2022 08:02:00 -0800 (PST) X-Google-Smtp-Source: AA0mqf4eIJdxQoRS5lJDf3/fL1ARUN2CzVV7u3CaMaiNVjNd96xTfiIKCAhptTDRYxC3A31q3Ota X-Received: by 2002:a17:90a:eb05:b0:218:6aaf:f076 with SMTP id j5-20020a17090aeb0500b002186aaff076mr4464723pjz.129.1668700920326; Thu, 17 Nov 2022 08:02:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668700920; cv=none; d=google.com; s=arc-20160816; b=pMtjfvymcSrCPPTFy6Ik6603sWkWWHIgu8KCFwidnYiZzwugf+/PQEwFyrhIiP3bxo UhcWWSW4P1vx/9OlecAx15LM+W7NN3OgT2Z4f674vIOcEQpwZpOmZXxCc4jcDPf0LSe8 OZmDMn0lWpJRXhHBlJgMrRgfEnfA8do1sWTnEyohh6ExFO1XXg8yxvxeeTSg9kUNhnJI DXC6OulTZihYxRCkhmxEyQW5ej6EPR9cQkdCAaekIkTtYrai4KoZGJeJmn0Kjj6/gf3d u00CHOdNc360ykKdXaPqXcBpr+ABJ8HJV9MYmFf7AgMN2EH8U2gcZ5VT9qZTiRNAKtL/ klqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=FyuGzGxEZRHvEr/JeItK3VvnrH3Vjqzh4TosRHWmsqQ=; b=SbkL5isYU/gmiAR3FZ5fwyPTBJMPXqHlN7ff82T+zNT84CnKlHIvVD+0ORe7y15/FS ifEqmmTkFoVnM/niefuveKBSzYOYgX/fSvecH4AeGhWpZI1uKAgltBWuwunKM4JcanOo JI4ujm3nnePM2YYMEJ+rGgghpik58SNpq0Gl5cP+X0kIseHBMfqltHxOGeOCHyEQ8qwC sUERKrXNu/X96GZQVfNt6gLfrI1ox+pMR2UoCibZmVhDDLbGNAh5wlGMQne1PXLmv4rl Y465wTbsjSZuAnhPlkAqcNUyVwG9pKvn0dcVOie93/Q2n/0U5SM8bq1rOwo2CpxXVam0 Q6pw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=R1HkdKDD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m9-20020a635809000000b0046e9da89cebsi1190148pgb.532.2022.11.17.08.01.21; Thu, 17 Nov 2022 08:02:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=R1HkdKDD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234914AbiKQPvq (ORCPT + 92 others); Thu, 17 Nov 2022 10:51:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58414 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233057AbiKQPvl (ORCPT ); Thu, 17 Nov 2022 10:51:41 -0500 Received: from mail-pj1-x1029.google.com (mail-pj1-x1029.google.com [IPv6:2607:f8b0:4864:20::1029]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3BF87769EB for ; Thu, 17 Nov 2022 07:51:41 -0800 (PST) Received: by mail-pj1-x1029.google.com with SMTP id t17so1324803pjo.3 for ; Thu, 17 Nov 2022 07:51:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=FyuGzGxEZRHvEr/JeItK3VvnrH3Vjqzh4TosRHWmsqQ=; b=R1HkdKDDMrNBJhdZpT5wuNeVm7ykr/NYLcK4lYns8I72luzpoJnd9hOVUIJ37xXzpF Y9R1C+K2PAGUXcUWln5cbPjqIPmgeqBLB8XvSWAkW5dfMc3/44T/vapbL2o7yHWGZMZS BAbYjVGK7csYo1tAbSgDbOiGiiMUaP+xUVQP5s1TdW+x1SauV9jA8mFMRXIopQnyY8SG EqANano7h0k0HxMKq6lwz7WLlCStKn017xkNzAWs2WVFoQD2pIJzANNt16M5tNSVwgyN wPRTcgN+CTw2/kD038EMM3AWbYdVqHcanplRBcqNhUS55tRvTXJrPqPZhseKlgXeCpbC WtVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=FyuGzGxEZRHvEr/JeItK3VvnrH3Vjqzh4TosRHWmsqQ=; b=1jjnx7j1PVAq9lRRctcXs2jxfUqo2F9AKCDBOOGL5VqmGFOcUtv075a9SE2jcH3aLj YG5UrWat2wL+ZZwOE9pr+9KIRFM2hJWmk25oslEsFR+l915BuoGhnFrElFwo0XGYWN+q TS65OSzOfE412J/GO6t/BX/46WN7lvEaSOftg7Z3ALpGBGon7GdUOWqD576ywcflOnn1 zfIaW99zveeR1CfJVYcgl/6vniV6N6LePfEDyKIPs5cG8qjtLNNYGOQe3u3jLou6s/5C bJXVDcPfdz+uTQEeerryr2ms7CuB/ZOxMmYtYAmBJTApYSYS2tc4ARl/OS2topIWzJpA Lw1A== X-Gm-Message-State: ANoB5pnIyJAggOrvUwXBTafGhiY5irqh8WpIH4Jjm5xNO98u1jbg0Iky Dzsd3tep8YKrPjuQd+NjSOnt6nBrT+T0gQ== X-Received: by 2002:a17:902:e414:b0:188:b8cf:85b with SMTP id m20-20020a170902e41400b00188b8cf085bmr3238147ple.126.1668700300567; Thu, 17 Nov 2022 07:51:40 -0800 (PST) Received: from google.com (7.104.168.34.bc.googleusercontent.com. [34.168.104.7]) by smtp.gmail.com with ESMTPSA id q13-20020a170902a3cd00b0017f778d4543sm1482631plb.241.2022.11.17.07.51.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Nov 2022 07:51:40 -0800 (PST) Date: Thu, 17 Nov 2022 15:51:36 +0000 From: Sean Christopherson To: "Li, Xin3" Cc: Peter Zijlstra , Paolo Bonzini , "linux-kernel@vger.kernel.org" , "x86@kernel.org" , "kvm@vger.kernel.org" , "tglx@linutronix.de" , "mingo@redhat.com" , "bp@alien8.de" , "dave.hansen@linux.intel.com" , "hpa@zytor.com" , "Tian, Kevin" Subject: Re: [RESEND PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq() for NMI/IRQ reinjection Message-ID: References: <6fd26a70-3774-6ae7-73ea-4653aee106f0@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 17, 2022, Li, Xin3 wrote: > > > > > > > But what about NMIs, afaict this is all horribly broken for NMIs. > > > > > > So the whole VMX thing latches the NMI (which stops NMI > > > > > > recursion), right? > > > > > > > > > > > > But then you drop out of noinstr code, which means any random > > > > > > exception can happen (kprobes #BP, hw_breakpoint #DB, or even > > > > > > #PF due to random nonsense like *SAN). This exception will do > > > > > > IRET and clear the NMI latch, all before you get to run any of the > > > > > > NMI code. > > > > > > > > > > What you said here implies that we have this problem in the existing code. > > > > > Because a fake iret stack is created to call the NMI handler in > > > > > the IDT NMI descriptor, which lastly executes the IRET instruction. > > > > > > > > I can't follow; of course the IDT handler terminates with IRET, it has to no? > > > > > > > > And yes, the current code appears to suffer the same defect. That defect isn't going to be fixed simply by changing how KVM forwards NMIs though. IIUC, _everything_ between VM-Exit and the invocation of the NMI handler needs to be noinstr. On VM-Exit due to NMI, NMIs are blocked. If a #BP/#DB/#PF occurs before KVM gets to kvm_x86_handle_exit_irqoff(), the subsequent IRET will unblock NMIs before the original NMI is serviced, i.e. a second NMI could come in at anytime regardless of how KVM forwards the NMI to the kernel. Is there any way to solve this without tagging everything noinstr? There is a metric shit ton of code between VM-Exit and the handling of NMIs, and much of that code is common helpers. It might be possible to hoist NMI handler much earlier, though we'd need to do a super thorough audit to ensure all necessary host state is restored. > > > With FRED, ERETS/ERETU replace IRET, and use bit 28 of the popped CS > > > field to control whether to unblock NMI. If bit 28 of the field (above > > > the selector) is 1, ERETS/ERETU unblocks NMIs. Side topic, there's a bug in the ISE docs. Section "9.4.2 NMI Blocking" states that bit 16 holds the "unblock NMI" magic, which I'm guessing is a holdover from an earlier revision of FRED. As specified in Section 6.1.3 and Section 6.2.3, ERETS and ERETU each unblocks NMIs if bit 16 of the popped CS field is 1. The following items detail how this behavior may be changed in VMX non-root operation, depending on the settings of certain VM-execution controls: > > Yes, I know that. It is one of the many improvements FRED brings. > > Ideally the IBT WAIT-FOR-ENDBR state also gets squirreled away in the > > hardware exception frame, but that's still up in the air I believe :/ > > > > Anyway.. given there is interrupt priority and NMI is pretty much on top of > > everything else the reinject crap *should* run NMI first. That way NMI runs > > with the latch disabled and whatever other pending interrupts will run later. > > > > But that all is still broken because afaict the current code also leaves noinstr -- > > and once you leave noinstr (or use a static_key, static_call or anything else that > > *can* change at runtime) you can't guarantee nothing. > > For NMI, HPA asked me to use "int $2", as it switches to the NMI IST stack to > execute the NMI handler, essentially like how HW deals with a NMI in host. and > I tested it with NMI watchdog, it looks working fine. Heh, well yeah, because that's how KVM used to handle NMIs back before I reworked NMI handling to use the direct call method. Ironically, that original change was done in part to try and make it _easier_ to deal with FRED (back before FRED was publicly disclosed). If KVM reverts to INTn, the fix to route KVM=>NMI through the non-IST entry can be reverted too. a217a6593cec ("KVM/VMX: Invoke NMI non-IST entry instead of IST entry") 1a5488ef0dcf ("KVM: VMX: Invoke NMI handler via indirect call instead of INTn")