Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp1349777imw; Tue, 5 Jul 2022 08:00:31 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uTMD2ervLz5eLYyDjsI9O2pZJsjcx+7WtRGaSKdTb4g59Kb38CXi/bZMloZHPxVNAXLC3V X-Received: by 2002:a17:90b:3850:b0:1ed:7d1:f0a with SMTP id nl16-20020a17090b385000b001ed07d10f0amr43433921pjb.67.1657033231523; Tue, 05 Jul 2022 08:00:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657033231; cv=none; d=google.com; s=arc-20160816; b=r9ndJ0db5u82c4vpG/fGknJhTlBR6dXCHnyBfwktiAI+GphGAoF7VSQ8xVv7LSfhD6 4dDQqCg9xlJjtuaYpLkbG0hVGHjnkWtzPATtUwugmZPcibDRZKkol/ypwf5xwmjbAv+X njYbCU3UAxZbAD28gpTQ2ByQmf08ZLp+I6puxMUL1S1jk58mHK7TTfC/TU7N3jaxqk8G pHQNOpQOCl4GCPFIhIH2AUTSnpkEarMSpnYLDz332dGwInen8sSxw+IzJOSEiVpqVuu1 Nr6jGIjlSdvenCctQvScp+Q37T+vdyrhb+yO9PzPBl4hyuFKJfjjlKm537IbA2bNWRut 354A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature; bh=C7grj8hnH4C0sUwXwqnyePg2Xo6Bva0CuD1fVB2K7TE=; b=oDoReAZuUQMMaFQ12O/D12hJ3ChfN4Qx5Ooy4qJpnOTyPP5KTww7vfFoJkAqXZF+16 m6HP+KX1q9CVZ5cqlOk6WL7i/HjS5aH4e+Nkr/VOYLQl1jDWwyOQM0JeUCsJ92qhYXoD iIwFcLJZbLd+7t2LMMyFZQ8DMrx8iKYmo2q5NEYAQZ02Hi4crC0laM6iJhVvcnbsCWM5 JlnfIgcac9InHaOXBViPSzkp2AWRHukE8ffkL1Nz0cm9NsjUIPh1L9yHZCUVhc7b3877 koDr6tslq74aSJXfyM7qRetxAsb4M8a3HqkTGGB5tPrWjKVdiUBxhexdsa5fbAcOnEwE WX0w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PmuvmUM4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q68-20020a632a47000000b0041179520bd4si27457191pgq.318.2022.07.05.08.00.19; Tue, 05 Jul 2022 08:00:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PmuvmUM4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230456AbiGEOEr (ORCPT + 99 others); Tue, 5 Jul 2022 10:04:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56150 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230198AbiGEOEW (ORCPT ); Tue, 5 Jul 2022 10:04:22 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D6073331 for ; Tue, 5 Jul 2022 06:51:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1657029098; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=C7grj8hnH4C0sUwXwqnyePg2Xo6Bva0CuD1fVB2K7TE=; b=PmuvmUM474B7Vec2mGAfUN5+N70so3y26bqWfEzvs5v5uW0cmk5MGIshx4VA4ThXOU/6qI OkSxll1azrE2ZMmxqNU4mhNzo/pdA0bZRkufDMY3Z6I9w6FY5pHWLQiYAk3hSByb+S9Yzw ruRhVxNgiM23QaQiEsKHi6l4M7Yzi2o= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-259-DGAUwtDmP4-tflfPYP5CoQ-1; Tue, 05 Jul 2022 09:51:36 -0400 X-MC-Unique: DGAUwtDmP4-tflfPYP5CoQ-1 Received: by mail-wr1-f72.google.com with SMTP id a7-20020adfbc47000000b0021d77d52041so123335wrh.4 for ; Tue, 05 Jul 2022 06:51:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=C7grj8hnH4C0sUwXwqnyePg2Xo6Bva0CuD1fVB2K7TE=; b=6Suk++cjK56NNbvayl5yPFe/pNo3cOlB9fziT69TjFOa7P9VUak3fF80rJSml/0WLc RI0VLUTX7WXXadxcYRmbWrx2bE1YyyNooRyNl/k/fjn1dTmkqmcyzbD4vrBJh0I1WCGm 7ceMQszJTeVuR6t3jzhEnt8DBCZu3O2LuzMeWe54wHUHILEFk4+EKSZSQFNw68pEDXky mQkvKOjM/shS2PO+OafoxTCc4+s2RAZ8sMKchaqS95q6kHNIiUuLjmBkyW8nl5equixt zJGPIos5K7xR6a1ATAPDUwcCGeQRYTsobBJOZriHBcCZaplUUjLOVKEJkojpBLVMVkCG iF4g== X-Gm-Message-State: AJIora8OU+4VEaHt4tQo8DG5tr6DgKzKcykOkEEX/rhkpDPpY7cHEz03 bybKIjiKc10TQcZKUbdt8Xr5HMiKnEgA1bvfvbshsphlACEc3YybrXKVL2lvPIkOqa2p4X9Kzkz m9fe5PcHXigOld4J3z4iUpex8 X-Received: by 2002:a05:600c:b46:b0:3a0:4a51:bb1d with SMTP id k6-20020a05600c0b4600b003a04a51bb1dmr36611720wmr.168.1657029095547; Tue, 05 Jul 2022 06:51:35 -0700 (PDT) X-Received: by 2002:a05:600c:b46:b0:3a0:4a51:bb1d with SMTP id k6-20020a05600c0b4600b003a04a51bb1dmr36611685wmr.168.1657029095229; Tue, 05 Jul 2022 06:51:35 -0700 (PDT) Received: from [10.35.4.238] (bzq-82-81-161-50.red.bezeqint.net. [82.81.161.50]) by smtp.gmail.com with ESMTPSA id c3-20020adfef43000000b0021bab0ba755sm34305443wrp.106.2022.07.05.06.51.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Jul 2022 06:51:34 -0700 (PDT) Message-ID: Subject: Re: [PATCH v2 11/11] KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM From: Maxim Levitsky To: Jim Mattson Cc: kvm@vger.kernel.org, Sean Christopherson , x86@kernel.org, Kees Cook , Dave Hansen , linux-kernel@vger.kernel.org, "H. Peter Anvin" , Borislav Petkov , Joerg Roedel , Ingo Molnar , Paolo Bonzini , Thomas Gleixner , Vitaly Kuznetsov , Wanpeng Li Date: Tue, 05 Jul 2022 16:51:32 +0300 In-Reply-To: <646d2f04bbc7530339ca02dcf5eeb3b5e298a1c2.camel@redhat.com> References: <20220621150902.46126-1-mlevitsk@redhat.com> <20220621150902.46126-12-mlevitsk@redhat.com> <42da1631c8cdd282e5d9cfd0698b6df7deed2daf.camel@redhat.com> <289c2dd941ecbc3c32514fc0603148972524b22d.camel@redhat.com> <646d2f04bbc7530339ca02dcf5eeb3b5e298a1c2.camel@redhat.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.40.4 (3.40.4-5.fc34) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2022-07-05 at 16:40 +0300, Maxim Levitsky wrote: > On Tue, 2022-07-05 at 16:38 +0300, Maxim Levitsky wrote: > > On Thu, 2022-06-30 at 09:00 -0700, Jim Mattson wrote: > > > On Wed, Jun 29, 2022 at 11:00 PM Maxim Levitsky wrote: > > > > > > > > On Wed, 2022-06-29 at 09:31 -0700, Jim Mattson wrote: > > > > > On Tue, Jun 21, 2022 at 8:09 AM Maxim Levitsky wrote: > > > > > > When #SMI is asserted, the CPU can be in interrupt shadow > > > > > > due to sti or mov ss. > > > > > > > > > > > > It is not mandatory in  Intel/AMD prm to have the #SMI > > > > > > blocked during the shadow, and on top of > > > > > > that, since neither SVM nor VMX has true support for SMI > > > > > > window, waiting for one instruction would mean single stepping > > > > > > the guest. > > > > > > > > > > > > Instead, allow #SMI in this case, but both reset the interrupt > > > > > > window and stash its value in SMRAM to restore it on exit > > > > > > from SMM. > > > > > > > > > > > > This fixes rare failures seen mostly on windows guests on VMX, > > > > > > when #SMI falls on the sti instruction which mainfest in > > > > > > VM entry failure due to EFLAGS.IF not being set, but STI interrupt > > > > > > window still being set in the VMCS. > > > > > > > > > > I think you're just making stuff up! See Note #5 at > > > > > https://sandpile.org/x86/inter.htm. > > > > > > > > > > Can you reference the vendors' documentation that supports this change? > > > > > > > > > > > > > First of all, just to note that the actual issue here was that > > > > we don't clear the shadow bits in the guest interruptability field > > > > in the vmcb on SMM entry, that triggered a consistency check because > > > > we do clear EFLAGS.IF. > > > > Preserving the interrupt shadow is just nice to have. > > > > > > > > > > > > That what Intel's spec says for the 'STI': > > > > > > > > "The IF flag and the STI and CLI instructions do not prohibit the generation of exceptions and nonmaskable inter- > > > > rupts (NMIs). However, NMIs (and system-management interrupts) may be inhibited on the instruction boundary > > > > following an execution of STI that begins with IF = 0." > > > > > > > > Thus it is likely that #SMI are just blocked when in shadow, but it is easier to implement > > > > it this way (avoids single stepping the guest) and without any user visable difference, > > > > which I noted in the patch description, I noted that there are two ways to solve this, > > > > and preserving the int shadow in SMRAM is just more simple way. > > > > > > It's not true that there is no user-visible difference. In your > > > implementation, the SMI handler can see that the interrupt was > > > delivered in the interrupt shadow. > > > > Most of the SMI save state area is reserved, and the handler has no way of knowing > > what CPU stored there, it can only access the fields that are reserved in the spec. > I mean fields that are not reserved in the spec. > > Best regards, >         Maxim Levitsky > > > > Yes, if the SMI handler really insists it can see that the saved RIP points to an > > instruction that follows the STI, but does that really matter? It is allowed by the > > spec explicitly anyway. > > > > Plus our SMI layout (at least for 32 bit) doesn't confirm to the X86 spec anyway, > > we as I found out flat out write over the fields that have other meaning in the X86 spec. > > > > Also I proposed to preserve the int shadow in internal kvm state and migrate > > it in upper 4 bits of the 'shadow' field of struct kvm_vcpu_events. > > Both Paolo and Sean proposed to store the int shadow in the SMRAM instead, > > and you didn't object to this, and now after I refactored and implemented > > the whole thing you suddently do. > > > > BTW, just FYI, I found out that qemu doesn't migrate the 'shadow' field, > > this needs to be fixed (not related to the issue, just FYI). > > > > > > > > The right fix for this problem is to block SMI in an interrupt shadow, > > > as is likely the case for all modern CPUs. > > > > Yes, I agree that this is the most correct fix.  > > > > However AMD just recently posted a VNMI patch series to avoid > > single stepping the CPU when NMI is blocked due to the same reason, because > > it is fragile. > > > > Do you really want KVM to single step the guest in this case, to deliver the #SMI? > > I can do it, but it is bound to cause lot of trouble. > > > > Note that I will have to do it on both Intel and AMD, as neither has support for SMI > > window, unless I were to use MTF, which is broken on nested virt as you know, > > so a nested hypervisor running a guest with SMI will now have to cope with broken MTF. > > > > Note that I can't use the VIRQ hack we use for interrupt window, because there > > is no guarantee that the guest's EFLAGS.IF is on. > > > > Best regards,    > >         Maxim Levitsky > > > > > > > > > > > > > As for CPUS that neither block SMI nor preserve the int shadaw, in theory they can, but that would > > > > break things, as noted in this mail > > > > > > > > https://lore.kernel.org/lkml/1284913699-14986-1-git-send-email-avi@redhat.com/ > > > > > > > > It is possible though that real cpu supports HLT restart flag, which makes this a non issue, > > > > still. I can't rule out that a real cpu doesn't preserve the interrupt shadow on SMI, but > > > > I don't see why we can't do this to make things more robust. > > > > > > Because, as I said, I think you're just making stuff up...unless, of > > > course, you have documentation to back this up. Again, I clearly explained that I choose to implement it this way because it is more robust, _and_ it was approved by both Sean and Paolo.  It is not called making stuff up - I never claimed that a real CPU does it this way. Best regards, Maxim Levitsky > > > > > > >