Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp5800269pxj; Wed, 23 Jun 2021 09:11:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxpfmL/jhyX/xhqtGo32CMjAB9UPfXmrLVeCqRYa3b4Xfr+ZxCik6BI8N/xIBHXR8pu8jrw X-Received: by 2002:a05:6e02:1563:: with SMTP id k3mr152087ilu.0.1624464710107; Wed, 23 Jun 2021 09:11:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1624464710; cv=none; d=google.com; s=arc-20160816; b=1FGxoi0W/KSHiJErY7+p48hOSv+NDgjHeHy9/LJqYX5hwuW1Lqdp7n1irRT6j6Hv+4 XCTkhGHpGpSScftKArD8BeICNv4+k5FlvDmnp7ggoR4XoiCNtA/IKdFflZn/hTBKQPdX ptlXymKeiI6ZprllWcRW02V0OxLbviVphgKgX9Jy4AmT2qU2xI1YVm25lnFc4SXrNF9v 8rog+MbneCjzwQMs5CkWCCVSAqZsm/it7KaGuyZzyS/qmjrKUsS9sC3zllXz5vRciQ5H ELdYtHe+/tyfnUACqD5KoSBr/FPjxXgW0quP9efTWh3fSXile23bKXBJWh2d0EpD4XUx D/Lw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=wZgM83tgKsDvA+EzpxBvpaFXJwLabTWXJfE+ZcCN5s8=; b=C5YKLL7Q9JTWLYYAQHNqmK+j5q+SWzKU8MQ9P4IivBNKElGPwZbG9Cf6o2A9R3/YXq tolA4XHhd/HO6lZN4JsIByr0fPUjAHYK06VxgqJS8lfs+BsX8LdoQrWQAMf/u5TaJ8RW JoNbEg3kSp0cjzvstFzFUeq9Z4PI2Q0HS5D0XrMGeHoupX0/LxXK3QiP1WW7hB16SA4f aktgJ9p4lBH5QZuyjL+eN4Bn74fvvRKGCV6Q5fyWzwuiWzXBOEhCb57WCMfVZlRFYKPh ehYZyAr+nyH+U2T8ei8yLQjJrozqw0eRi3/4Rx9uqxWSqmtbW58kq2E/gXAtWfjyvU1D sUNw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=cVL0lCZo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y5si356625ioq.102.2021.06.23.09.11.36; Wed, 23 Jun 2021 09:11:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=cVL0lCZo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230101AbhFWQMy (ORCPT + 99 others); Wed, 23 Jun 2021 12:12:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54304 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230046AbhFWQMx (ORCPT ); Wed, 23 Jun 2021 12:12:53 -0400 Received: from mail-pj1-x102a.google.com (mail-pj1-x102a.google.com [IPv6:2607:f8b0:4864:20::102a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8DA17C061574 for ; Wed, 23 Jun 2021 09:10:35 -0700 (PDT) Received: by mail-pj1-x102a.google.com with SMTP id g4so1800454pjk.0 for ; Wed, 23 Jun 2021 09:10:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=wZgM83tgKsDvA+EzpxBvpaFXJwLabTWXJfE+ZcCN5s8=; b=cVL0lCZo16K9PnRVPjyprlZBeN8p0FkwdyajEMZEEzSNDsRicM6p1HRZHIq81JxYWy komdU3Z4oM7iSxLg1a5YuHpHLwxQRrreL/ixsc6nP2kaPUEO3+ttgAkQTIjYQqRenh8K eLnCpHHzmMcoyNRmZrHq9YMb5VnRh/08lDnsuIKwvU3QX1HtvuC7G7XIDLl/rNrk5FcN Qot68s2SeqfHFRfXZyUAZSBZgLGLfUlG031HIpzZIZ9LyCJ2Rh+j2dbkb91l4sgCaKfF f1rihXDarkAjjKnx4lHhHHTChdpboQPIDWA6hRx9TgCiSAHsdvvhY7J47S3fztEFki4j eFug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=wZgM83tgKsDvA+EzpxBvpaFXJwLabTWXJfE+ZcCN5s8=; b=pdAX+EFtVfyzIKqMXlccy73cGXKSyfkb+tEIwJTtW9nPLXpgSkhtTS2RIEJgvTg0EO QwcUb/yQ9DAGOXjbOmD6V8Mbs+WUsjUUzgZEf6F7z+gt31WFJ4fuX+TvC3Cn4p0wZeps YK+lAL9Da5nqV2w3UlPJ7Ddi6kFkwL/h8MBE0XJTZK2kVYS5nPvb+YJ6lenVQu9mAmZt N4XMsuJV3SI/EdaXo+7tSM4hagu5t8zAyN3h4Y9vC8XvuPfZHaPp1lqgCgREaKY2GsGZ ufMvNwt/VeuuuRyKax4/NyKoXszRMCGPmbP8Yro5c+6TJYn0ErRdqbzVG24mNOwPwzZw Ce1g== X-Gm-Message-State: AOAM531GdXSfGYyPfOCLa8+KsMjfDPUN6M3xl8IorQO9IFwNUAAByKLm PGJkhBPFK3WPam3OCPem48Ceng== X-Received: by 2002:a17:90a:2a41:: with SMTP id d1mr10107191pjg.77.1624464634777; Wed, 23 Jun 2021 09:10:34 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id g16sm323008pgl.22.2021.06.23.09.10.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Jun 2021 09:10:34 -0700 (PDT) Date: Wed, 23 Jun 2021 16:10:30 +0000 From: Sean Christopherson To: Maxim Levitsky Cc: Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Cathy Avery , Emanuele Giuseppe Esposito , linux-kernel@vger.kernel.org, Paolo Bonzini , kvm@vger.kernel.org Subject: Re: [PATCH RFC] KVM: nSVM: Fix L1 state corruption upon return from SMM Message-ID: References: <20210623074427.152266-1-vkuznets@redhat.com> <53a9f893cb895f4b52e16c374cbe988607925cdf.camel@redhat.com> <87pmwc4sh4.fsf@vitty.brq.redhat.com> <5fc502b70a89e18034716166abc65caec192c19b.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <5fc502b70a89e18034716166abc65caec192c19b.camel@redhat.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 23, 2021, Maxim Levitsky wrote: > On Wed, 2021-06-23 at 15:32 +0200, Vitaly Kuznetsov wrote: > > Maxim Levitsky writes: > > > > > On Wed, 2021-06-23 at 16:01 +0300, Maxim Levitsky wrote: > > > > On Wed, 2021-06-23 at 11:39 +0200, Paolo Bonzini wrote: > > > > > On 23/06/21 09:44, Vitaly Kuznetsov wrote: > > > > > > - RFC: I'm not 100% sure my 'smart' idea to use currently- unused > > > > > > HSAVE area is that smart. Also, we don't even seem to check that L1 > > > > > > set it up upon nested VMRUN so hypervisors which don't do that may > > > > > > remain broken. Ya, per the APM, nested_svm_vmrun() needs to verify the MSR is non-zero, and svm_set_msr() needs to verify the incoming value is a legal, page-aligned address. Both conditions are #GP. > > > > > >A very much needed selftest is also missing. > > > > > > > > > > It's certainly a bit weird, but I guess it counts as smart too.? It > > > > > needs a few more comments, but I think it's a good solution. > > > > > > > > > > One could delay the backwards memcpy until vmexit time, but that > > > > > would require a new flag so it's not worth it for what is a pretty > > > > > rare and already expensive case. And it's _almost_ architecturally legal, but the APM does clearly state that the HSAVE area is used on VRMUN and #VMEXIT. We'd definitely be taking a few liberties by accessing the area at SMI/RSM. > > We can resurrect 'hsave' and keep it internally indeed but to make this > > migratable, we'd have to add it to the nested state acquired through > > svm_get_nested_state(). Using L1's HSAVE area (ponted to by > > MSR_VM_HSAVE_PA) avoids that as we have everything in L1's memory. > I think I would prefer to avoid touching guest memory as much > as possible to avoid the shenanigans of accessing it: > > For example on nested state read we are not allowed to write guest > memory since at the point it is already migrated, and for setting > nested state we are not allowed to even read the guest memory since > the memory map might not be up to date. But that shouldn't conflict with using hsave, because hsave won't be accessed in this case until KVM_RUN, correct? > Then a malicious guest can always change its memory which also can cause > issues. The APM very clearly states that touching hsave is not allowed: Software must not attempt to read or write the host save-state area directly. > Since it didn't work before and both sides of migration need a fix, > adding a new flag and adding hsave area to nested state seems like a > very good thing. ... > This way we still avoid overhead of copying the hsave area > on each nested entry. > > What do you think? Hmm, I don't like allocating an extra page for every vCPU. And I believe this hackery is necessary only because nested_svm_vmexit() isn't following the architcture in the first place. I.e. using vmcb01 to restore host state is flat out wrong. If this the restore code (below) is instead converted to use the hsave area, then this bug is fixed _and_ we become (more) compliant with the spec. And the save/restore to HSAVE could be selective, i.e. only save/restore state that is actually consumed by nested_svm_vmexit(). It would require saving/restoring select segment _selectors_, but that is also a bug fix since the APM pseudocode clearly states that segments are reloading from the descriptor table (APM says "GDT", though I assume LDT is also legal). E.g. while software can't touch HSAVE, L2 can _legally_ change L1's GDT, so it's technically legal to have host segment registers magically change across VMRUN. There might also be side effects we're missing by not reloading segment registers, e.g. accessed bit updates? /* * Restore processor state that had been saved in vmcb01 <-- bad KVM */ kvm_set_rflags(vcpu, svm->vmcb->save.rflags); svm_set_efer(vcpu, svm->vmcb->save.efer); svm_set_cr0(vcpu, svm->vmcb->save.cr0 | X86_CR0_PE); svm_set_cr4(vcpu, svm->vmcb->save.cr4); kvm_rax_write(vcpu, svm->vmcb->save.rax); kvm_rsp_write(vcpu, svm->vmcb->save.rsp); kvm_rip_write(vcpu, svm->vmcb->save.rip); svm->vcpu.arch.dr7 = DR7_FIXED_1; kvm_update_dr7(&svm->vcpu); ... kvm_vcpu_unmap(vcpu, &map, true); nested_svm_transition_tlb_flush(vcpu); nested_svm_uninit_mmu_context(vcpu); rc = nested_svm_load_cr3(vcpu, svm->vmcb->save.cr3, false, true);