Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp2408833pxb; Tue, 13 Apr 2021 00:48:42 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzuH4Dr8ekH4VmVbxtstzcyqvrZq4z5UpU6dvdV1N9TkJc/xGqIGtWzjhmq6aKsYCnxe0xL X-Received: by 2002:a05:6402:34d:: with SMTP id r13mr33369309edw.64.1618300121967; Tue, 13 Apr 2021 00:48:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618300121; cv=none; d=google.com; s=arc-20160816; b=eWhaRx9IW0ePsCfNs4OnV/pDa0g6FNFzQhW09SAR1PKikJ5yEo1oRwc3Z74+p2nPU+ 71ILIndXgXVUNZ5fhqLXrpIZVqSzXJAicJOd61t+jHJ45/c3Btei+m0ZRWr2J6TdU8CW 9AjaSK7sTXPBkaehK44x+CpjBtFHfpm0qYyIZBLm6wYfZ4vZbhlnCV5PYp+A+ICUTwo7 8UU6GZElg9A2C3TVShVawGCeG9XvT2FgkjPRZomFHPHdcFQDj3q2Zm8kt9hxfEbSExNS z1LssrOrXZfhJY7uyrY9rrAq2oHIyG6EFO3asgS6c6QG09Euvwxp++9OSninuoBp5Md9 9BiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=aGErB4bv9xqkFQn8vMp8fgh92PRTGDG/QdYtyD6H5Bo=; b=nkvMEqOhxmqlELH810HX4XElRw6cPfkYq5d/2KzdD7f8v5lRnVfvtMTywLBiaZRbXW ausgqX8hjIO7zxlG6baOxAwOPI+9T604R9byAI80Z8OKtesPmnvXsIttJt3P7PQZefCb q2NqCBmyYbU0D3FzP9rAl1f2Rqs+A1ECa48eVtlxWjE+QUKrxMnZ3/ZIT1SEhfRt4y4B f/GDYWQjEbTofzmbvYTGIouP2khDYbGFVH8b+5sHf0Tt2GDBu1+u2p9e7ONO2fE7LymQ okLPJuumHlE+1Devw3ttPMAQEPwePyEKGyuHclJ7ikXmIlOCotf3u3RhT4Oy4zXaC8wu hC3w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="VQ/n7iN8"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l21si9463292edw.87.2021.04.13.00.48.19; Tue, 13 Apr 2021 00:48:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="VQ/n7iN8"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241389AbhDMCZ7 (ORCPT + 99 others); Mon, 12 Apr 2021 22:25:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42128 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238797AbhDMCZ6 (ORCPT ); Mon, 12 Apr 2021 22:25:58 -0400 Received: from mail-il1-x132.google.com (mail-il1-x132.google.com [IPv6:2607:f8b0:4864:20::132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D8A5EC061756 for ; Mon, 12 Apr 2021 19:25:39 -0700 (PDT) Received: by mail-il1-x132.google.com with SMTP id n4so12828736ili.8 for ; Mon, 12 Apr 2021 19:25:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=aGErB4bv9xqkFQn8vMp8fgh92PRTGDG/QdYtyD6H5Bo=; b=VQ/n7iN83YIR95yEAfIXMRc3d0M1M0o6J0DD9oif9aBURIAh2wlVfegAs89gYUgphw KJxtTMvjPkPojrceENg6pzVv/z2R280fNnooQm/4jmDCrXWS0la8lssFPvo5Q/tPApzm 9nmo/b4GAcrgobxRRbzB63+/ZkgOJc9IUeCO+ruyDLnS/0+F6dViIyYFlpkmlMPHLUdI 9RnpWxMcWqjvoyAgDk74zI/1+6ZiG5p+gLtUyIWMUPj8GFnCzpCzq3BOIHUxRIUJJKF4 V3MClStm6PjKRVgYMJjBGxA+4b1+Sg+dVwAwjhpHXkfkC5klHmSdCaiKVZgvA0yJ3NLO j+dQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=aGErB4bv9xqkFQn8vMp8fgh92PRTGDG/QdYtyD6H5Bo=; b=Isavaf4625fI3DtJUn/15SGjZ3E2mMav4aYPqEwByt52g75BNzMvidfJ5rAqhEyEWt 6Gz/ujMmu/YjZNm63BwhJ5KkEierevZWUG9ufIBUvRDcQF8IE+0MmBUaHycd2dk9RHsP smrhq/YQ5GWg65SrkitijT3kDbAWFiKIANLlRUyUJ/HeJ7QbgdL7PlMsnHdT5WFS31Kd sS/43g963qXB1p+VzKhEbAtewGVJ2q5PCRUCHmjNb3Jb5q4PdD3uhGf8LnMQuufUmX7M 1AAmWlvBxIr8DEshKVFavTUHLZSIjczx06N84y65KbmqALIy+o/qaWJfVFGDsJ2baB6j HaUA== X-Gm-Message-State: AOAM5314aoKKxo2R9C2Arn6Luc1zDtd6t7/SNfj9AaMwJGbSyWYhoxen j7QOkFWv7EGkljDURYhNaHBmoGiWUTzxz9u99ah55A== X-Received: by 2002:a05:6e02:1c07:: with SMTP id l7mr18616863ilh.110.1618280739005; Mon, 12 Apr 2021 19:25:39 -0700 (PDT) MIME-Version: 1.0 References: <20210413014821.GA3276@ashkalra_ubuntu_server> In-Reply-To: <20210413014821.GA3276@ashkalra_ubuntu_server> From: Steve Rutherford Date: Mon, 12 Apr 2021 19:25:03 -0700 Message-ID: Subject: Re: [PATCH v12 13/13] x86/kvm: Add kexec support for SEV Live Migration. To: Ashish Kalra Cc: Paolo Bonzini , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Joerg Roedel , Borislav Petkov , Tom Lendacky , X86 ML , KVM list , LKML , Sean Christopherson , Venu Busireddy , Brijesh Singh , kexec@lists.infradead.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 12, 2021 at 6:48 PM Ashish Kalra wrote: > > On Mon, Apr 12, 2021 at 06:23:32PM -0700, Steve Rutherford wrote: > > On Mon, Apr 12, 2021 at 5:22 PM Steve Rutherford wrote: > > > > > > On Mon, Apr 12, 2021 at 12:48 PM Ashish Kalra wrote: > > > > > > > > From: Ashish Kalra > > > > > > > > Reset the host's shared pages list related to kernel > > > > specific page encryption status settings before we load a > > > > new kernel by kexec. We cannot reset the complete > > > > shared pages list here as we need to retain the > > > > UEFI/OVMF firmware specific settings. > > > > > > > > The host's shared pages list is maintained for the > > > > guest to keep track of all unencrypted guest memory regions, > > > > therefore we need to explicitly mark all shared pages as > > > > encrypted again before rebooting into the new guest kernel. > > > > > > > > Signed-off-by: Ashish Kalra > > > > --- > > > > arch/x86/kernel/kvm.c | 24 ++++++++++++++++++++++++ > > > > 1 file changed, 24 insertions(+) > > > > > > > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c > > > > index bcc82e0c9779..4ad3ed547ff1 100644 > > > > --- a/arch/x86/kernel/kvm.c > > > > +++ b/arch/x86/kernel/kvm.c > > > > @@ -39,6 +39,7 @@ > > > > #include > > > > #include > > > > #include > > > > +#include > > > > > > > > DEFINE_STATIC_KEY_FALSE(kvm_async_pf_enabled); > > > > > > > > @@ -384,6 +385,29 @@ static void kvm_pv_guest_cpu_reboot(void *unused) > > > > */ > > > > if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) > > > > wrmsrl(MSR_KVM_PV_EOI_EN, 0); > > > > + /* > > > > + * Reset the host's shared pages list related to kernel > > > > + * specific page encryption status settings before we load a > > > > + * new kernel by kexec. NOTE: We cannot reset the complete > > > > + * shared pages list here as we need to retain the > > > > + * UEFI/OVMF firmware specific settings. > > > > + */ > > > > + if (sev_live_migration_enabled & (smp_processor_id() == 0)) { > > > What happens if the reboot of CPU0 races with another CPU servicing a > > > device request (while the reboot is pending for that CPU)? > > > Seems like you could run into a scenario where you have hypercalls racing. > > > > > > Calling this on every core isn't free, but it is an easy way to avoid this race. > > > You could also count cores, and have only last core do the job, but > > > that seems more complicated. > > On second thought, I think this may be insufficient as a fix, since my > > read of kernel/reboot.c seems to imply that devices aren't shutdown > > until after these notifiers occur. As such, a single thread might be > > able to race with itself. I could be wrong here though. > > > > The heavy hammer would be to disable migration through the MSR (which > > the subsequent boot will re-enable). > > > > I'm curious if there is a less "blocking" way of handling kexecs (that > > strategy would block LM while the guest booted). > > > > One option that comes to mind would be for the guest to "mute" the > > encryption status hypercall after the call to reset the encryption > > status. The problem would be that the encryption status for pages > > would be very temporarily inaccurate in the window between that call > > and the start of the next boot. That isn't ideal, but, on the other > > hand, the VM was about to reboot anyway, so a corrupted shared page > > for device communication probably isn't super important. Still, I'm > > not really a fan of that. This would avoid corrupting the next boot, > > which is clearly an improvement. > > > > Each time the kernel boots it could also choose something like a > > generation ID, and pass that down each time it calls the hypercall. > > This would then let userspace identify which requests were coming from > > the subsequent boot. > > > > Everything here (except, perhaps, disabling migration through the MSR) > > seems kind of complicated. I somewhat hope my interpretation of > > kernel/reboot.c is wrong and this race just is not possible in the > > first place. > > > > Disabling migration through the MSR after resetting the page encryption > status is a reasonable approach. There is a similar window existing for > normal VM boot during which LM is disabled, from the point where OVMF > checks and adds support for SEV LM and the kernel boot checks for the > same and enables LM using the MSR. I'm not totally confident that disabling LM through the MSR is sufficient. I also think the newly booted kernel needs to reset the state itself, since nothing stops the hypercalls after the disable goes through. The host won't know the difference between early boot (pre-enablement) hypercalls and racy just-before-restart hypercalls. You might disable migration through the hypercall, get a late status change hypercall, reboot, then re-enable migration, but still have stale state. I _believe_ that the kernel doesn't mark it's RAM as private on boot as an optimization (might be wrong about this), since it would have been expensive to mark all of ram as encrypted previously. I believe that is no longer a limitation given the KVM_EXIT, so we can reset this during early boot instead of just before the kexec. Thanks, Steve > > Thanks, > Ashish > > > > > + int i; > > > > + unsigned long nr_pages; > > > > + > > > > + for (i = 0; i < e820_table->nr_entries; i++) { > > > > + struct e820_entry *entry = &e820_table->entries[i]; > > > > + > > > > + if (entry->type != E820_TYPE_RAM) > > > > + continue; > > > > + > > > > + nr_pages = DIV_ROUND_UP(entry->size, PAGE_SIZE); > > > > + > > > > + kvm_sev_hypercall3(KVM_HC_PAGE_ENC_STATUS, > > > > + entry->addr, nr_pages, 1); > > > > + } > > > > + } > > > > kvm_pv_disable_apf(); > > > > kvm_disable_steal_time(); > > > > } > > > > -- > > > > 2.17.1 > > > >