Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp5724603pxj; Wed, 23 Jun 2021 07:42:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy98tpdBTEJ+3HEio4aergGlD4oKm596eFv13RbtSFDgHngSnR7d+PS37T/5wMPV/hmojzL X-Received: by 2002:a05:6402:b8c:: with SMTP id cf12mr64126edb.198.1624459369950; Wed, 23 Jun 2021 07:42:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1624459369; cv=none; d=google.com; s=arc-20160816; b=O+agac8QwSs9PbURPgditxn8z7sCo3NczoxulK/v67n0KEDdPyJlUo6Ut7OaXccRmW bfKnVYIvoCiTUXn1ECkN0oBJ3qVkFeDEKJOkqMQDadmPocHhCSFZNuz5DoViTodFptS/ KI6vBU9F7P0xiDe6S0+uiaAEf7IjvtJ/Jb7QvoXsVuGE0gQcpy9cFqkTKVdERbQJX/JX 4SVSVICYm69gqfNvw2LQZc8jVBEkcNHcdSFeYA7rsHdEKgdXGX1Fn+qUtj7cX1qFzyvg hNHOyV7eCsn1si2Hc6PX0j/9sL+rkOF568vO+uHmqjLvaMiR2n9CG4BN1TOlPDJNEsNx V6fg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature; bh=36Ney60QwLOA5znkwX0SnRwg20FjhKs4Zpxi9/E5Tpw=; b=hHm8jyJqgVXmXAd3ygsOs45zIR3wOBj9CdQNXbPbs1zfKvzHtmyS31QvPd34iUQgRS GyyFrHRD3A58To0h0QdPheZCLYFTv8tXEGT/ceCec87fQmQMKBlHT5rsouP4eEm8SLhT 0wVSf73wWfQFdlHY5aoL4pRyoxWBaOsgxXNCkU2n/9PjGpSTDo7bj1vjkyFOAw5m15kA SNXsaaLJybhwTGni10sxBgSQVDYj3mmrTbc3MfT594paViKcgpQdlqX8UJLOO68/LtFC nt/FO0otv1p2X//LbLic1Wf4/RNOGg2uiR177G1XiYWK1rFGxbwNd1A4erxBhkghD+Dt 6kgw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="IkLrw/Me"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v5si42398edi.123.2021.06.23.07.42.26; Wed, 23 Jun 2021 07:42:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="IkLrw/Me"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231229AbhFWOnh (ORCPT + 99 others); Wed, 23 Jun 2021 10:43:37 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:57454 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230061AbhFWOng (ORCPT ); Wed, 23 Jun 2021 10:43:36 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1624459278; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=36Ney60QwLOA5znkwX0SnRwg20FjhKs4Zpxi9/E5Tpw=; b=IkLrw/MeeQIYiBI+KKfRlazrVwudr6myrQ2I31pmruYTtZbZTsluy+EXBFYtOqdhY58mRK V7ShTBJZ9y/7+2KjsVIBqSohkUBywpE+GiCLG4hy+hALM9HmkcipJga0p0/fZPNDJRpCPp ugy0Xuwm8K4pdypUcd6oxsxIibfpiA0= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-240-JRCNYb-rOdqJtpol2k_6xw-1; Wed, 23 Jun 2021 10:41:17 -0400 X-MC-Unique: JRCNYb-rOdqJtpol2k_6xw-1 Received: by mail-wm1-f70.google.com with SMTP id s15-20020a7bc38f0000b02901dbb76fabe9so683122wmj.4 for ; Wed, 23 Jun 2021 07:41:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=36Ney60QwLOA5znkwX0SnRwg20FjhKs4Zpxi9/E5Tpw=; b=cWuGqp5dPn1a0RymAvKX7ypTlA4minvvI4QpNP6YkiLMT8E6iEzrRfrm3As/T0Vh9S KmpW2jjpDKquvhxqxCKNyctPe0sGCfTweBc3x9oyR+DF5JncqyprOAGr0XUgA44tVS60 8AVgGm2ACbM51XlwalwaqK66jrO3t6NY2Gzmn674ov6ELYSwAk+L7MzsEeFmjKSof7gE Gmm1Hiw6eRh96LhH3ePu2myfEDpHPvPakCG8C3QahZwgg6IXEc3ErUQ0kkIHAj/wm07v TnThx/KsaEv7b5hcqvn+WfQ54pQrRffSxL52Te/nj2pfcU23xXNeNFgwMmxfGxDibAwy PmaQ== X-Gm-Message-State: AOAM531LvkS71vwYwkXixT6Xm44A11XaynNo5qSGfvKwJdJG1yJ0ZWmL 7Kp2OJnscQoWK3PnFzBd8g7vSDrqurDLjiQ2F8R7YAtGWK9YTbW5zrVWRK+tQeOUy1iDq5584rA Srq6780yg3sNNYtQy4v03dNYB X-Received: by 2002:a05:6000:1367:: with SMTP id q7mr422595wrz.306.1624459276294; Wed, 23 Jun 2021 07:41:16 -0700 (PDT) X-Received: by 2002:a05:6000:1367:: with SMTP id q7mr422570wrz.306.1624459276129; Wed, 23 Jun 2021 07:41:16 -0700 (PDT) Received: from [172.16.0.103] ([5.28.162.59]) by smtp.gmail.com with ESMTPSA id e2sm247809wrt.29.2021.06.23.07.41.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Jun 2021 07:41:15 -0700 (PDT) Message-ID: <5fc502b70a89e18034716166abc65caec192c19b.camel@redhat.com> Subject: Re: [PATCH RFC] KVM: nSVM: Fix L1 state corruption upon return from SMM From: Maxim Levitsky To: Vitaly Kuznetsov Cc: Sean Christopherson , Wanpeng Li , Jim Mattson , Cathy Avery , Emanuele Giuseppe Esposito , linux-kernel@vger.kernel.org, Paolo Bonzini , kvm@vger.kernel.org Date: Wed, 23 Jun 2021 17:41:13 +0300 In-Reply-To: <87pmwc4sh4.fsf@vitty.brq.redhat.com> References: <20210623074427.152266-1-vkuznets@redhat.com> <53a9f893cb895f4b52e16c374cbe988607925cdf.camel@redhat.com> <87pmwc4sh4.fsf@vitty.brq.redhat.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.40.1 (3.40.1-1.fc34) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2021-06-23 at 15:32 +0200, Vitaly Kuznetsov wrote: > Maxim Levitsky writes: > > > On Wed, 2021-06-23 at 16:01 +0300, Maxim Levitsky wrote: > > > On Wed, 2021-06-23 at 11:39 +0200, Paolo Bonzini wrote: > > > > On 23/06/21 09:44, Vitaly Kuznetsov wrote: > > > > > - RFC: I'm not 100% sure my 'smart' idea to use currently- > > > > > unused HSAVE area > > > > > is that smart. Also, we don't even seem to check that L1 set > > > > > it up upon > > > > > nested VMRUN so hypervisors which don't do that may remain > > > > > broken. A very > > > > > much needed selftest is also missing. > > > > > > > > It's certainly a bit weird, but I guess it counts as smart > > > > too.  It > > > > needs a few more comments, but I think it's a good solution. > > > > > > > > One could delay the backwards memcpy until vmexit time, but > > > > that would > > > > require a new flag so it's not worth it for what is a pretty > > > > rare and > > > > already expensive case. > > > > > > > > Paolo > > > > > > > > > > Hi! > > > > > > I did some homework on this now and I would like to share few my > > > thoughts on this: > > > > > > First of all my attention caught the way we intercept the #SMI > > > (this isn't 100% related to the bug but still worth talking about > > > IMHO) > > > > > > A. Bare metal: Looks like SVM allows to intercept SMI, with > > > SVM_EXIT_SMI, > > >  with an intention of then entering the BIOS SMM handler manually > > > using the SMM_CTL msr. > > >  On bare metal we do set the INTERCEPT_SMI but we emulate the > > > exit as a nop. > > >  I guess on bare metal there are some undocumented bits that BIOS > > > set which > > >  make the CPU to ignore that SMI intercept and still take the > > > #SMI handler, > > >  normally but I wonder if we could still break some motherboard > > >  code due to that. > > > > > > > > > B. Nested: If #SMI is intercepted, then it causes nested VMEXIT. > > >  Since KVM does enable SMI intercept, when it runs nested it > > > means that all SMIs > > >  that nested KVM gets are emulated as NOP, and L1's SMI handler > > > is not run. > > > > > > > > > About the issue that was fixed in this patch. Let me try to > > > understand how > > > it would work on bare metal: > > > > > > 1. A guest is entered. Host state is saved to VM_HSAVE_PA area > > > (or stashed somewhere > > >   in the CPU) > > > > > > 2. #SMI (without intercept) happens > > > > > > 3. CPU has to exit SVM, and start running the host SMI handler, > > > it loads the SMM > > >     state without touching the VM_HSAVE_PA runs the SMI handler, > > > then once it RSMs, > > >     it restores the guest state from SMM area and continues the > > > guest > > > > > > 4. Once a normal VMexit happens, the host state is restored from > > > VM_HSAVE_PA > > > > > > So host state indeed can't be saved to VMC01. > > > > > > I to be honest think would prefer not to use the L1's hsave area > > > but rather add back our > > > 'hsave' in KVM and store there the L1 host state on the nested > > > entry always. > > > > > > This way we will avoid touching the vmcb01 at all and both solve > > > the issue and > > > reduce code complexity. > > > (copying of L1 host state to what basically is L1 guest state > > > area and back > > > even has a comment to explain why it (was) possible to do so. > > > (before you discovered that this doesn't work with SMM). > > > > I need more coffee today. The comment is somwhat wrong actually. > > When L1 switches to L2, then its HSAVE area is L1 guest state, but > > but L1 is a "host" vs L2, so it is host state. > > The copying is more between kvm's register cache and the vmcb. > > > > So maybe backing it up as this patch does is the best solution yet. > > I will take more in depth look at this soon. > > We can resurrect 'hsave' and keep it internally indeed but to make > this > migratable, we'd have to add it to the nested state acquired through > svm_get_nested_state(). Using L1's HSAVE area (ponted to by > MSR_VM_HSAVE_PA) avoids that as we have everything in L1's memory. > And, Hi! I think I would prefer to avoid touching guest memory as much as possible to avoid the shenanigans of accessing it: For example on nested state read we are not allowed to write guest memory since at the point it is already migrated, and for setting nested state we are not allowed to even read the guest memory since the memory map might not be up to date. Then a malicious guest can always change its memory which also can cause issues. Since it didn't work before and both sides of migration need a fix, adding a new flag and adding hsave area to nested state seems like a very good thing. I think though that I would use that smm hsave area just like you did in the patch, just not save it to the guest memory and migrate it as a new state. I would call it something smm_l1_hsave_area or something like that with a comment explaining why it is needed. This way we still avoid overhead of copying the hsave area on each nested entry. What do you think? Best regards, Maxim Levitsky > as far as I understand, we comply with the spec as 1) L1 has to set > it > up and 2) L1 is not supposed to expect any particular format there, > it's > completely volatile. >