Received: by 2002:a05:7412:bbc7:b0:fc:a2b0:25d7 with SMTP id kh7csp1126068rdb; Fri, 2 Feb 2024 14:55:38 -0800 (PST) X-Google-Smtp-Source: AGHT+IGac/Tk6WnApcdynKlWJK4UG9UrXawcjPUwk2+6BALQ44BYEfIComhraW74eTbqFrm3GV/5 X-Received: by 2002:ac8:5c46:0:b0:42c:83a:c5e7 with SMTP id j6-20020ac85c46000000b0042c083ac5e7mr2831707qtj.50.1706914538327; Fri, 02 Feb 2024 14:55:38 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706914538; cv=pass; d=google.com; s=arc-20160816; b=uURP17qIFaHH7uOOG5KjGfnMQwPcf9EEKFypUl6vfe3maalQAyOS2BCZwkumZuk3fS OfinDDkDWWOPntc1vP7GHyhRICrhS+OzhYO83OIK4YAylPia42hic/PrGoLPbVLthKsI FvCSCw3iQhxhTNOJQeaR9EaFC4PBGneoCsb3lVgyfFfSAqs1kBcWHHgIVqxtmFtXWfU3 j9D+UJ5SNYVPsGPFVhKqMtHpN7HPnUPRnPBpucpZlS1A7HsHRSYtlcUR+jgZQpuN3b0A l60evyoEK38Sfza6BH7YsxeeZCvIjnwlaQS3EOoRgotJY87ASAOraBH0Z+SgaYBKx0eB YbhQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:message-id:references:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date :dkim-signature; bh=yST2IhC4x7xwue5FLDSVPscgeSgsqzoau2hWHl8C2YI=; fh=d0lMdqKXPxNuJDQKBOWhkiJ9wxEjKY67LNo+OqbNlq4=; b=Qf2IFQ28zO/xi6GtNicUFxFFGYCGqxLSyIPdJ+JD5iyYhYT43C4CE8BPz750TGBmbf 8bySMB3evZCRQ6UAAKzZpwOPEQlo7iIUOlwc1BwKRHAtY0x+6NmH+U+PWVwGuRcTwUE3 FTVYFzd76XC223yG8QOxjgQfJ8PBFyPVMM/vNmhvpbYKHoabwteozlolaM/6L6zUSimz 7vRQySV/6qc2PIDb4Mm1f++FWa6Zsd44izniNwNN6Ulpui1+Lk5liWy5IXFpYYKdR9LY jM5OaMT/Fn1ARcSLVP56c5KGL3pOZtKHUg8+PwEMuLzmaXc2pwIJ2Wiyfy0Ef39Ys1d1 Dh0g==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=Y1k3j84f; arc=pass (i=1 spf=pass spfdomain=flex--seanjc.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-50687-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-50687-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com X-Forwarded-Encrypted: i=1; AJvYcCVW3yfYCNiOD73sp+qIZenSd4Bl53JlhCF7zqOLvt32lkTH3l+6p8a0T5NPczOClqWy60kgZCosl9mIhRe4orRuMvVwrU/9ziFjbVmxYA== Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id f21-20020ac87f15000000b0042c0db4de52si505830qtk.407.2024.02.02.14.55.38 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 02 Feb 2024 14:55:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-50687-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=Y1k3j84f; arc=pass (i=1 spf=pass spfdomain=flex--seanjc.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-50687-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-50687-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 525721C23EF2 for ; Fri, 2 Feb 2024 22:55:16 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 45E1412C801; Fri, 2 Feb 2024 22:55:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Y1k3j84f" Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1FDF9126F32 for ; Fri, 2 Feb 2024 22:54:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706914500; cv=none; b=RPFJ+7cH4abyRl4Ed3wSxj+8j/4XNcuXqfRCj3Qljp49RAu+4zdMV5PesO7zZXA6P75c5XJMMJnHy9p5OsOCOJQv2OPEM9qsE6UyYH3X9Y3eqyEltDKbEk2YkVwtRXIEhwTk5gT6qa+58GQY8M9exB+pblmQFSLTSAKceLXZ75E= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706914500; c=relaxed/simple; bh=xLLdVjR78iCftIzXlIU+GT2WV5TXzmzQM7wql3RHWCw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=GlWSIemXYza5LcQrEsLiq6px6CRqjfdoXv+0B7voSErv/9ROBrwayG+/Xxkd0Q2BrDMyQC3kf7ckQreJwe8r7fxPwpy5cBIXlhjYvAPB5S6GEG+M6almP7d1yDwtQthZRIDf3b0DgY7qUebVQhINC0uoyJ01QmMFRA6+XZVt1gw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Y1k3j84f; arc=none smtp.client-ip=209.85.128.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-603c0e020a6so38661567b3.0 for ; Fri, 02 Feb 2024 14:54:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706914497; x=1707519297; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=yST2IhC4x7xwue5FLDSVPscgeSgsqzoau2hWHl8C2YI=; b=Y1k3j84fCd633LqyR0hVxmIdmrKm6oZGatKJ3KeIOCfhS/I0SKtsGPx/ZEUdu7N37k xIXcx654xNekPT8N14ASSbiM0a7BM7XJpaUN8brqbDwAGPJl4YSxgXaBEqB3U32aMTzj l8jn8BIsnzxQG0Ir6cB0GgtdDNrZ1+wvi9eXAyqYJHfxc3PZbK+1AVRc/xMFtt6DOGa3 GQ3+cOYOMQDhfRrY9yzEe9clvycBacHjisoQeNNrSZEZmM9PDpk4mwGFQxFBWW6MfRX0 dfWY1QqNqLl1dl7/7GykG5Ga6Y7B/ScfU5dXBowt0l7IMfc3H2M0GqkYcn20lXwtvf5z cS6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706914497; x=1707519297; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=yST2IhC4x7xwue5FLDSVPscgeSgsqzoau2hWHl8C2YI=; b=qe1VpTAaIy0ElJWizLhsYHGD1OyNmC8QqdGsIj/l1e90xdd+4yu/CP5bc4u6fD5IUc fl2YhikvewUvSB/aoRcRlO27k+6637WinIvNtGnshuLOdm4M7kZamrfM03GsbiHEXceD Dz6cchYuJSAZhW4VEutYY4gJvQQrvseGLtaxikg7n2VGYQ8NMu0FJclc+c2JzFinflWh 81G5eJTVhIO0sjiiQwXn9wbYS2AId3gkYxsVMsawWLNbJ2fV1BlfmjmfFg/ZlX1O+10H 0CyPbhvDGny7OjzwRq/pLw9Mkk1oVmScYj5GR5jccsyht9m2h8LZq3Jm8vZpO238iRMh dnmg== X-Gm-Message-State: AOJu0YxUVl7zcw+207cC0RSoV2OHJ/unW2qBfrV2bq7G0ZVHoXGx0Ddf nGzSbz8OJTdSJva/W2IaLdBONwt1pue8T7qqOQOdbZ75SIK+z3azZ5q6piFTxyoek1BplHz3/a/ Kjw== X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6902:993:b0:dc2:25fd:eff1 with SMTP id bv19-20020a056902099300b00dc225fdeff1mr234713ybb.4.1706914497040; Fri, 02 Feb 2024 14:54:57 -0800 (PST) Date: Fri, 2 Feb 2024 14:54:55 -0800 In-Reply-To: <20240116041457.wver7acnwthjaflr@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20231230172351.574091-1-michael.roth@amd.com> <20231230172351.574091-19-michael.roth@amd.com> <20240116041457.wver7acnwthjaflr@amd.com> Message-ID: Subject: Re: [PATCH v11 18/35] KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command From: Sean Christopherson To: Michael Roth Cc: kvm@vger.kernel.org, linux-coco@lists.linux.dev, linux-mm@kvack.org, linux-crypto@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, jroedel@suse.de, thomas.lendacky@amd.com, hpa@zytor.com, ardb@kernel.org, pbonzini@redhat.com, vkuznets@redhat.com, jmattson@google.com, luto@kernel.org, dave.hansen@linux.intel.com, slp@redhat.com, pgonda@google.com, peterz@infradead.org, srinivas.pandruvada@linux.intel.com, rientjes@google.com, dovmurik@linux.ibm.com, tobin@ibm.com, bp@alien8.de, vbabka@suse.cz, kirill@shutemov.name, ak@linux.intel.com, tony.luck@intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, alpergun@google.com, jarkko@kernel.org, ashish.kalra@amd.com, nikunj.dadhania@amd.com, pankaj.gupta@amd.com, liam.merwick@oracle.com, zhi.a.wang@intel.com, Brijesh Singh Content-Type: text/plain; charset="us-ascii" On Mon, Jan 15, 2024, Michael Roth wrote: > On Wed, Jan 10, 2024 at 07:45:36AM -0800, Sean Christopherson wrote: > > On Sat, Dec 30, 2023, Michael Roth wrote: > > > diff --git a/Documentation/virt/kvm/x86/amd-memory-encryption.rst b/Documentation/virt/kvm/x86/amd-memory-encryption.rst > > > index b1beb2fe8766..d4325b26724c 100644 > > > --- a/Documentation/virt/kvm/x86/amd-memory-encryption.rst > > > +++ b/Documentation/virt/kvm/x86/amd-memory-encryption.rst > > > @@ -485,6 +485,34 @@ Returns: 0 on success, -negative on error > > > > > > See the SEV-SNP specification for further detail on the launch input. > > > > > > +20. KVM_SNP_LAUNCH_UPDATE > > > +------------------------- > > > + > > > +The KVM_SNP_LAUNCH_UPDATE is used for encrypting a memory region. It also > > > +calculates a measurement of the memory contents. The measurement is a signature > > > +of the memory contents that can be sent to the guest owner as an attestation > > > +that the memory was encrypted correctly by the firmware. > > > + > > > +Parameters (in): struct kvm_snp_launch_update > > > + > > > +Returns: 0 on success, -negative on error > > > + > > > +:: > > > + > > > + struct kvm_sev_snp_launch_update { > > > + __u64 start_gfn; /* Guest page number to start from. */ > > > + __u64 uaddr; /* userspace address need to be encrypted */ > > > > Huh? Why is KVM taking a userspace address? IIUC, the address unconditionally > > gets translated into a gfn, so why not pass a gfn? > > > > And speaking of gfns, AFAICT start_gfn is never used. > > I think having both the uaddr and start_gfn parameters makes sense. I > think it's only awkward because how I'm using the memslot to translate > the uaddr to a GFN in the current implementation, Yes. > > Oof, reading more of the code, this *requires* an effective in-place copy-and-convert > > of guest memory. > > Yes, I'm having some trouble locating the various threads, but initially > there were some discussions about having a userspace API that allows for > UPM/gmem pages to be pre-populated before they are in-place encrypted, but > we'd all eventually decided that having KVM handle this internally was > the simplest approach. > > So that's how it's done here, KVM_SNP_LAUNCH_UPDATE copies the pages into > gmem, then passes those pages on to firmware for encryption. Then the > VMM is expected to mark the GFN range as private via > KVM_SET_MEMORY_ATTRIBUTES, since the VMM understands what constitutes > initial private/encrypted payload. I should document that better in > KVM_SNP_LAUNCH_UPDATE docs however. That's fine. As above, my complaints are that the unencrypted source *must* be covered by a memslot. That's beyond ugly. > > What's "the IMI"? Initial Measurement Image? I assume this is essentially just > > a flag that communicates whether or not the page should be measured? > > This is actually for loading a measured migration agent into the target > system so that it can handle receiving the encrypted guest data from the > source. There's still a good deal of planning around how live migration > will be handled however so if you think it's premature to expose this > via KVM I can remove the related fields. Yes, please. Though FWIW, I honestly hope KVM_SEV_SNP_LAUNCH_UPDATE goes away and we end up with a common uAPI for populating guest memory: https://lore.kernel.org/all/Zbrj5WKVgMsUFDtb@google.com > > > + __u8 page_type; /* page type */ > > > + __u8 vmpl3_perms; /* VMPL3 permission mask */ > > > + __u8 vmpl2_perms; /* VMPL2 permission mask */ > > > + __u8 vmpl1_perms; /* VMPL1 permission mask */ > > > > Why? KVM doesn't support VMPLs. > > It does actually get used by the SVSM. > I can remove these but then we'd need some capability bit or something to > know when they are available if they get re-introduced. _If_. We don't merge dead code, and we _definitely_ don't merge dead code that creates ABI. > > > + kvaddr = pfn_to_kaddr(pfns[i]); > > > + if (!virt_addr_valid(kvaddr)) { > > > > I really, really don't like that this assume guest_memfd is backed by struct page. > > There are similar enforcements in the SEV/SEV-ES code. There was some > initial discussion about relaxing this for SNP so we could support > things like /dev/mem-mapped guest memory, but then guest_memfd came > along and made that to be an unlikely use-case in the near-term given > that it relies on alloc_pages() currently and explicitly guards against > mmap()'ing pages in userspace. > > I think it makes to keep the current tightened restrictions in-place > until such a use-case comes along, since otherwise we are relaxing a > bunch of currently-useful sanity checks that span all throughout the code > to support some nebulous potential use-case that might never come along. > I think it makes more sense to cross that bridge when we get there. I disagree. You say "sanity checks", while I see a bunch of arbitrary assumptions that don't need to exist. Yes, today guest_memfd always uses struct page memory, but that should have _zero_ impact on SNP. Arbitrary assumptions often cause a lot of confusion for future readers, e.g. a few years from now, if the above code still exists, someone might wonder what is so special about struct page memory, and then waste a bunch of time trying to figure out that there's no technical reason SNP "requires" struct page memory. This is partly why I was pushing for guest_memfd to handle some of this; the gmem code _knows_ what backing type it's using, it _knows_ if the direct map is valid, etc. > > > + pr_err("Invalid HVA 0x%llx for GFN 0x%llx\n", (uint64_t)kvaddr, gfn); > > > + ret = -EINVAL; > > > + goto e_release; > > > + } > > > + > > > + ret = kvm_read_guest_page(kvm, gfn, kvaddr, 0, PAGE_SIZE); > > > > Good gravy. If I'm reading this correctly, KVM: > > > > 1. Translates an HVA into a GFN. > > 2. Gets the PFN for that GFN from guest_memfd > > 3. Verifies the PFN is not assigned to the guest > > 4. Copies memory from the shared memslot page to the guest_memfd page > > 5. Converts the page to private and asks the PSP to encrypt it > > > > (a) As above, why is #1 a thing? > > Yah, it's probably best to avoid this, as proposed above. > > > (b) Why are KVM's memory attributes never consulted? > > It doesn't really matter if the attributes are set before or after > KVM_SNP_LAUNCH_UPDATE, only that by the time the guest actually launches > they pages get set to private so they get faulted in from gmem. We could > document our expectations and enforce them here if that's preferable > however. Maybe requiring KVM_SET_MEMORY_ATTRIBUTES(private) in advance > would make it easier to enforce that userspace does the right thing. > I'll see how that looks if there are no objections. Userspace owns whether a page is PRIVATE or SHARED, full stop. If KVM can't honor that, then we need to come up with better uAPI. > > (c) What prevents TOCTOU issues with respect to the RMP? > > Time-of-use will be when the guest faults the gmem page in with C-bit > set. If it is not in the expected Guest-owned/pre-validated state that > SEV_CMD_SNP_LAUNCH_UPDATE expected/set, then the guest will get an RMP > fault or #VC exception for unvalidated page access. It will also fail > attestation. Not sure if that covers the scenarios you had in mind. I don't think this covers what I was asking, but I suspect my concern will go away once the new APIs come along, so let's table this for now. > > > (d) Why is *KVM* copying memory into guest_memfd? > > As mentioned above, there were various discussions of ways of allowing > userspace to pwrite() to the guest_memfd in advance of > "sealing"/"binding" it and then encrypting it in place. I think this was > one of the related threads: > > https://lore.kernel.org/linux-mm/YkyKywkQYbr9U0CA@google.com/ > > My read at the time was that the requirements between pKVM/TDX/SNP were all > so unique that we'd spin forever trying to come up with a userspace ABI > that worked for everyone. At the time you'd suggested for pKVM to handle > their specific requirements internally in pKVM to avoid unecessary churn on > TDX/SNP side, and I took the same approach with SNP in implementing it > internally in SNP's KVM interfaces since it seemed unlikely there would > be much common ground with how TDX handles it via KVM_TDX_INIT_MEM_REGION. Yeah, the whole "intra-memslot copy" thing threw me. > > (e) What guarantees the direct map is valid for guest_memfd? > > Are you suggesting this may change in the near-term? I asking because, when I asked, I was unaware that the plan was to shatter the direct to address the 2MiB vs. 4KiB erratum (as opposed to unmapping guest_memfd pfns). > > > + if (ret) { > > > + pr_err("SEV-SNP launch update failed, ret: 0x%x, fw_error: 0x%x\n", > > > + ret, *error); > > > + snp_page_reclaim(pfns[i]); > > > + > > > + /* > > > + * When invalid CPUID function entries are detected, the firmware > > > + * corrects these entries for debugging purpose and leaves the > > > + * page unencrypted so it can be provided users for debugging > > > + * and error-reporting. > > > + * > > > + * Copy the corrected CPUID page back to shared memory so > > > + * userpsace can retrieve this information. > > > > Why? IIUC, this is basically backdooring reads/writes into guest_memfd to avoid > > having to add proper mmap() support. > > The CPUID page is private/encrypted, so it needs to be a gmem page. > SNP firmware is doing the backdooring when it writes the unencrypted, > expected contents into the page during failure. What's wrong with copying > the contents back into the source page so userspace can be use of it? > If we implement the above-mentioned changes then the source page is just > a userspace buffer that isn't necessarily associated with a memslot, so > it becomes even more straightforward. > > Would that be acceptable? Yes, I am specifically complaining about writing guest memory on failure, which is all kinds of weird. > > > + kvfree(pfns); > > > > Saving PFNs from guest_memfd, which is fully owned by KVM, is so unnecessarily > > complex. Add a guest_memfd API (or three) to do this safely, e.g. to lock the > > pages, do (and track) the RMP conversion, etc... > > Is adding 3 gmem APIs and tracking RMP states inside gmem really less > complex than what's going on here? I think we covered this in PUCK? Holler if you still have questions here.