Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp1331059imw; Tue, 5 Jul 2022 07:41:12 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uh2bFODp8yP0j4DKG5n/FelbSrhTxxrBPM4GmI6qV15/nX2sfG5D4S6asy9AR2ngZ3aNGB X-Received: by 2002:a17:907:d16:b0:726:a3bf:121d with SMTP id gn22-20020a1709070d1600b00726a3bf121dmr34488028ejc.390.1657032071760; Tue, 05 Jul 2022 07:41:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657032071; cv=none; d=google.com; s=arc-20160816; b=PswIARY8N26HJ3pe6D5VPMUr5bp8h7QI/uUiuIIg1kLQeXQ92q714PS7ECAnTy5chU GRmOYrj1oejlKL9uMhXsRpKoJWOIknbogBKHA+fl/7yQpoQWRVS++UI5RTFmDQ6w5tDf yfx5hiNqXchq8NITTb17emxDjZD7PnH5SeAsLTnFlUjZ2IStrMCLZvdkQ18rEdKlhBcj 5phx/8VhtobNuRVXYYqfQvg+Z+oVh4twPd/j1XBcb9yenAo8x4bY4/2ZAMksPqpTQ6i7 OU7TYzZ1gU+/hfsfMcWAhCGdjJZv/Fge5vsoSUiDDshMGFusc29A1kw7qNjxX4EjaKB1 CODQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature; bh=O7kOLpOC//2XPWtARO8F0+2P3N+C1ogosiaoWziVYEI=; b=tvo2IJT41fUdUfs0PFr7BQEoSrE/9GBVlHFEzHXl2xGTR4Q2Cm0sxZEnP18EfEE4+t Z0LbMuysVN/td8TLh3QsftvZEVO3jnfeBML6Nze453Mykjt42A7WvY/nLvwfu0xwvx97 RAEGb5NBTsRuNvn7F7IBqPoOxqaSup7WAhOEz/XKIgtvOErL/8ygMJGoK7umkhJGJtRm 1cPbzVOmhKIvQMZHYyTL8DiS3Vzdx/qbb81K18TEHj8pNJJ1N1hXAtZeHqKyrQWe/Che PYHrNqH/7Jx5gwdpG+qwCXnDCqrQzpweh7xkMf6zKPiQPzmZ0M5kW5cRJ6AisfhdnskP vuLA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=VmohKVIV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t1-20020a056402524100b0043181c74405si43876922edd.22.2022.07.05.07.40.46; Tue, 05 Jul 2022 07:41:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=VmohKVIV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232097AbiGEN5A (ORCPT + 99 others); Tue, 5 Jul 2022 09:57:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43856 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231765AbiGEN4a (ORCPT ); Tue, 5 Jul 2022 09:56:30 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id F3C6526AFC for ; Tue, 5 Jul 2022 06:38:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1657028328; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=O7kOLpOC//2XPWtARO8F0+2P3N+C1ogosiaoWziVYEI=; b=VmohKVIVCFb0KUygh1azwoYLbJk3ifFPZLbSWSnY9SgafH7nPfSplAD2g2kNnlYAXOwq/5 GTcur+EPA6CM3BGC++IiHOlxEx0+QneqsqisbmURB9rFShNz1M9hEpHEIBrkUZr4tPzVku l6Q5HTRmaYef8qSr14FNAPpEgya9yYc= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-217-0e_eZ8_xMROPbE3GGvdg6A-1; Tue, 05 Jul 2022 09:38:47 -0400 X-MC-Unique: 0e_eZ8_xMROPbE3GGvdg6A-1 Received: by mail-wm1-f70.google.com with SMTP id k5-20020a05600c0b4500b003941ca130f9so5291954wmr.0 for ; Tue, 05 Jul 2022 06:38:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=O7kOLpOC//2XPWtARO8F0+2P3N+C1ogosiaoWziVYEI=; b=7PLfrhaKoyE9T4NOM7GXpQnyvRiPggEBSCt2AzvsUZlSgBgLht/wbtLvUT/ONpO2+f WPrgn+f5DM8T4jT4fTV19DodDSddSER1dAXdWMgjTBcg1s+4FXpY7BaPN6CSynD66Q/5 0M0/YnKn4rx8tPSRlROSzhEghjUYG2BOKmBQBuyeBOylK1R8eNz8Q5JYMBJ/vJ+Wq3PU htdRO2XQbuawml3XtuoR9+uISIpx2Tx2guhxM2vC+S94SmTCWkltsyBvdQ2pA8+rqSa+ X4eksz6VQRCiejw3T9+f7Bl2a09bV2vGaBruXSKIOjOaNazRRgueaDRaiyp2CKYGy0aB nahg== X-Gm-Message-State: AJIora/gE/0aLTvfzijmcnSM1HLGe5A9OEdCLg/yH7uwH3CbNav1X804 YF2gu7vr0B8yLQyL3sDpwGb7tJJTDdsVgmis3KVuasR6JnTGA5/NOS4YotMf2mgi2J1zYWYcbTb S7DY83gzu3puaW7J1MttQJ04G X-Received: by 2002:a5d:6b4b:0:b0:21d:7854:7755 with SMTP id x11-20020a5d6b4b000000b0021d78547755mr621175wrw.437.1657028326203; Tue, 05 Jul 2022 06:38:46 -0700 (PDT) X-Received: by 2002:a5d:6b4b:0:b0:21d:7854:7755 with SMTP id x11-20020a5d6b4b000000b0021d78547755mr621155wrw.437.1657028325976; Tue, 05 Jul 2022 06:38:45 -0700 (PDT) Received: from [10.35.4.238] (bzq-82-81-161-50.red.bezeqint.net. [82.81.161.50]) by smtp.gmail.com with ESMTPSA id k1-20020a5d6281000000b0021b9e360523sm33778162wru.8.2022.07.05.06.38.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 06:38:44 -0700 (PDT) Message-ID: <289c2dd941ecbc3c32514fc0603148972524b22d.camel@redhat.com> Subject: Re: [PATCH v2 11/11] KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM From: Maxim Levitsky To: Jim Mattson Cc: kvm@vger.kernel.org, Sean Christopherson , x86@kernel.org, Kees Cook , Dave Hansen , linux-kernel@vger.kernel.org, "H. Peter Anvin" , Borislav Petkov , Joerg Roedel , Ingo Molnar , Paolo Bonzini , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Date: Tue, 05 Jul 2022 16:38:42 +0300 In-Reply-To: References: <20220621150902.46126-1-mlevitsk@redhat.com> <20220621150902.46126-12-mlevitsk@redhat.com> <42da1631c8cdd282e5d9cfd0698b6df7deed2daf.camel@redhat.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.40.4 (3.40.4-5.fc34) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2022-06-30 at 09:00 -0700, Jim Mattson wrote: > On Wed, Jun 29, 2022 at 11:00 PM Maxim Levitsky wrote: > > > > On Wed, 2022-06-29 at 09:31 -0700, Jim Mattson wrote: > > > On Tue, Jun 21, 2022 at 8:09 AM Maxim Levitsky wrote: > > > > When #SMI is asserted, the CPU can be in interrupt shadow > > > > due to sti or mov ss. > > > > > > > > It is not mandatory in  Intel/AMD prm to have the #SMI > > > > blocked during the shadow, and on top of > > > > that, since neither SVM nor VMX has true support for SMI > > > > window, waiting for one instruction would mean single stepping > > > > the guest. > > > > > > > > Instead, allow #SMI in this case, but both reset the interrupt > > > > window and stash its value in SMRAM to restore it on exit > > > > from SMM. > > > > > > > > This fixes rare failures seen mostly on windows guests on VMX, > > > > when #SMI falls on the sti instruction which mainfest in > > > > VM entry failure due to EFLAGS.IF not being set, but STI interrupt > > > > window still being set in the VMCS. > > > > > > I think you're just making stuff up! See Note #5 at > > > https://sandpile.org/x86/inter.htm. > > > > > > Can you reference the vendors' documentation that supports this change? > > > > > > > First of all, just to note that the actual issue here was that > > we don't clear the shadow bits in the guest interruptability field > > in the vmcb on SMM entry, that triggered a consistency check because > > we do clear EFLAGS.IF. > > Preserving the interrupt shadow is just nice to have. > > > > > > That what Intel's spec says for the 'STI': > > > > "The IF flag and the STI and CLI instructions do not prohibit the generation of exceptions and nonmaskable inter- > > rupts (NMIs). However, NMIs (and system-management interrupts) may be inhibited on the instruction boundary > > following an execution of STI that begins with IF = 0." > > > > Thus it is likely that #SMI are just blocked when in shadow, but it is easier to implement > > it this way (avoids single stepping the guest) and without any user visable difference, > > which I noted in the patch description, I noted that there are two ways to solve this, > > and preserving the int shadow in SMRAM is just more simple way. > > It's not true that there is no user-visible difference. In your > implementation, the SMI handler can see that the interrupt was > delivered in the interrupt shadow. Most of the SMI save state area is reserved, and the handler has no way of knowing what CPU stored there, it can only access the fields that are reserved in the spec. Yes, if the SMI handler really insists it can see that the saved RIP points to an instruction that follows the STI, but does that really matter? It is allowed by the spec explicitly anyway. Plus our SMI layout (at least for 32 bit) doesn't confirm to the X86 spec anyway, we as I found out flat out write over the fields that have other meaning in the X86 spec. Also I proposed to preserve the int shadow in internal kvm state and migrate it in upper 4 bits of the 'shadow' field of struct kvm_vcpu_events. Both Paolo and Sean proposed to store the int shadow in the SMRAM instead, and you didn't object to this, and now after I refactored and implemented the whole thing you suddently do. BTW, just FYI, I found out that qemu doesn't migrate the 'shadow' field, this needs to be fixed (not related to the issue, just FYI). > > The right fix for this problem is to block SMI in an interrupt shadow, > as is likely the case for all modern CPUs. Yes, I agree that this is the most correct fix.  However AMD just recently posted a VNMI patch series to avoid single stepping the CPU when NMI is blocked due to the same reason, because it is fragile. Do you really want KVM to single step the guest in this case, to deliver the #SMI? I can do it, but it is bound to cause lot of trouble. Note that I will have to do it on both Intel and AMD, as neither has support for SMI window, unless I were to use MTF, which is broken on nested virt as you know, so a nested hypervisor running a guest with SMI will now have to cope with broken MTF. Note that I can't use the VIRQ hack we use for interrupt window, because there is no guarantee that the guest's EFLAGS.IF is on. Best regards, Maxim Levitsky > > > > > As for CPUS that neither block SMI nor preserve the int shadaw, in theory they can, but that would > > break things, as noted in this mail > > > > https://lore.kernel.org/lkml/1284913699-14986-1-git-send-email-avi@redhat.com/ > > > > It is possible though that real cpu supports HLT restart flag, which makes this a non issue, > > still. I can't rule out that a real cpu doesn't preserve the interrupt shadow on SMI, but > > I don't see why we can't do this to make things more robust. > > Because, as I said, I think you're just making stuff up...unless, of > course, you have documentation to back this up. >