Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp4644718iob; Sun, 8 May 2022 20:25:00 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxif9ydMoOTMVuYz/0r0htAE3huzPuCfd6wVkAcRz4Dlvf97kY/2gp0OU/koJdCE7EzpwEt X-Received: by 2002:a63:da13:0:b0:3c6:4c0:e2f9 with SMTP id c19-20020a63da13000000b003c604c0e2f9mr11476119pgh.493.1652066700453; Sun, 08 May 2022 20:25:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652066700; cv=none; d=google.com; s=arc-20160816; b=r1t2QHNAzht03Vd5Y7frOKPDT0tK09hFFYNlXV0hCMsIJUW5Fr3NJhu5HxHdUFItxF TJdSUUZPV2HqR0Mcvj974mR2Rn+dEmrLjvVXrQkQyNzspqgpp9//ozRrKW5FuuMBJ5Gs L8I56O/NCN3gHLYcLoHcE5GhKfAFT2HFJRl3SA3DXfsztKMZYkSBS5HmPGeRjDzoCc0C ukr4BCOC3gO/qLTOnJL1JOJhEF8/eft8UgnZKOowY5ONrrQmu+MBzHdE22YRRW0mL+J/ FdDVStDq7mwXh3jmU8D6FYw9OeH22vdTyRECy7znEHAawxxr0p2BcyM4dHt+wfnG1bdn ek5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=UBat/q3qX5a8p2zPni7WNGuj54O+b/1/pkMmkxN0SQE=; b=PUhaIeKiaEbrFdn4sdS52/lhj6a8aOtmD5KmjF0Jr/N0v/pzMjOgHsZb+kyRpXfn8c KuxAIu46lr9v8uYko1bz2QZz45LmDscC5qGsagzMQ9fZH225wON3zhfUicsBM3hAuup8 sMDWzlhjg3xOTFM6+uZmyWbKsJQkiQ1LixcwIQF4SLzmnC6zdBBKELN0jtzLgVoO9EKz 0FV//HWUK0AwiDB334DF8ixU3u9yEj4SLNzyqTNB9iFnnLN4a1YB+00JGqVShAULeI1c RTTEHVf9ckpV1Ecf7nc82qwp/+CdjHKN7F883KB1gB946xdpfcNUtJMxfFT0fwc1K169 MF8A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=bMcy2QUb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id h13-20020a63384d000000b003ab8efcc84asi12273033pgn.603.2022.05.08.20.25.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 08 May 2022 20:25:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=bMcy2QUb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7A9969E9F8; Sun, 8 May 2022 20:24:48 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1383264AbiEESZI (ORCPT + 99 others); Thu, 5 May 2022 14:25:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36456 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1383276AbiEESTo (ORCPT ); Thu, 5 May 2022 14:19:44 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AEF645DA65; Thu, 5 May 2022 11:15:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1651774558; x=1683310558; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6GH2JR5FupkC9izGlie6FMvLYsxUPH1D0aZ4F5r5lMs=; b=bMcy2QUbM2eg2oiAcPU1pw2AjOOdrF17eLRrB+AO38S4vXichZg9hFsE tK/jTtbmdBntOeiHNTVbEdg2bbDvrkwfYrCupo++HUNP5o793r7PgUHx4 UfpR7t8czY7rFOkf9UdAj78S1e6FUHmiymzBNocX3d6t1BZmsrZbqy40z jTDUhLiR8xP9BZ19syJqKNUJ0mD8EBt2cxjY/BiaXZZ9PlqT0oqyoc7Ll Kf6kAWmHGeFpKjXK/1QG7TOlTOP/y4psF4rQ1DqtbjUlqm2yUMa1Lpg8p PRT9VfluSvUB31w0H7OZZz8auK0On7InjsF3AzewwdhjKe7GlxI8AvODJ g==; X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268354852" X-IronPort-AV: E=Sophos;i="5.91,202,1647327600"; d="scan'208";a="268354852" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 May 2022 11:15:49 -0700 X-IronPort-AV: E=Sophos;i="5.91,202,1647327600"; d="scan'208";a="665083366" Received: from ls.sc.intel.com (HELO localhost) ([143.183.96.54]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 May 2022 11:15:49 -0700 From: isaku.yamahata@intel.com To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: isaku.yamahata@intel.com, isaku.yamahata@gmail.com, Paolo Bonzini , erdemaktas@google.com, Sean Christopherson , Sagi Shahar Subject: [RFC PATCH v6 061/104] KVM: TDX: Create initial guest memory Date: Thu, 5 May 2022 11:14:55 -0700 Message-Id: <70ed041fd47c1f7571aa259450b3f9244edda48d.1651774250.git.isaku.yamahata@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Isaku Yamahata Because the guest memory is protected in TDX, the creation of the initial guest memory requires a dedicated TDX module API, tdh_mem_page_add, instead of directly copying the memory contents into the guest memory in the case of the default VM type. KVM MMU page fault handler callback, private_page_add, handles it. Define new subcommand, KVM_TDX_INIT_MEM_REGION, of VM-scoped KVM_MEMORY_ENCRYPT_OP. It assigns the guest page, copies the initial memory contents into the guest memory, encrypts the guest memory. At the same time, optionally it extends memory measurement of the TDX guest. It calls the KVM MMU page fault(EPT-violation) handler to trigger the callbacks for it. Signed-off-by: Isaku Yamahata --- arch/x86/include/uapi/asm/kvm.h | 9 ++ arch/x86/kvm/mmu/mmu.c | 1 + arch/x86/kvm/vmx/tdx.c | 135 +++++++++++++++++++++++++- arch/x86/kvm/vmx/tdx.h | 2 + tools/arch/x86/include/uapi/asm/kvm.h | 9 ++ 5 files changed, 155 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 6b1c3e0e9a3c..3e919b3c7d7b 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -534,6 +534,7 @@ enum kvm_tdx_cmd_id { KVM_TDX_CAPABILITIES = 0, KVM_TDX_INIT_VM, KVM_TDX_INIT_VCPU, + KVM_TDX_INIT_MEM_REGION, KVM_TDX_CMD_NR_MAX, }; @@ -611,4 +612,12 @@ struct kvm_tdx_init_vm { }; }; +#define KVM_TDX_MEASURE_MEMORY_REGION (1UL << 0) + +struct kvm_tdx_init_mem_region { + __u64 source_addr; + __u64 gpa; + __u64 nr_pages; +}; + #endif /* _ASM_X86_KVM_H */ diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 643b33c75ae9..899dc8466a93 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5283,6 +5283,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu) out: return r; } +EXPORT_SYMBOL(kvm_mmu_load); void kvm_mmu_unload(struct kvm_vcpu *vcpu) { diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 7f1eb75bb79d..3981cd509686 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -555,6 +555,21 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int pgd_level) td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa & PAGE_MASK); } +static void tdx_measure_page(struct kvm_tdx *kvm_tdx, hpa_t gpa) +{ + struct tdx_module_output out; + u64 err; + int i; + + for (i = 0; i < PAGE_SIZE; i += TDX_EXTENDMR_CHUNKSIZE) { + err = tdh_mr_extend(kvm_tdx->tdr.pa, gpa + i, &out); + if (KVM_BUG_ON(err, &kvm_tdx->kvm)) { + pr_tdx_error(TDH_MR_EXTEND, err, &out); + break; + } + } +} + static void tdx_unpin_pfn(struct kvm *kvm, kvm_pfn_t pfn) { struct page *page = pfn_to_page(pfn); @@ -571,6 +586,7 @@ static void __tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, hpa_t hpa = pfn_to_hpa(pfn); gpa_t gpa = gfn_to_gpa(gfn); struct tdx_module_output out; + hpa_t source_pa; u64 err; if (WARN_ON_ONCE(is_error_noslot_pfn(pfn) || kvm_is_reserved_pfn(pfn))) @@ -585,12 +601,38 @@ static void __tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, * See kvm_faultin_pfn_private() and kvm_mmu_release_fault(). */ + /* Build-time faults are induced and handled via TDH_MEM_PAGE_ADD. */ if (likely(is_td_finalized(kvm_tdx))) { err = tdh_mem_page_aug(kvm_tdx->tdr.pa, gpa, hpa, &out); - if (KVM_BUG_ON(err, kvm)) + if (KVM_BUG_ON(err, kvm)) { pr_tdx_error(TDH_MEM_PAGE_AUG, err, &out); + tdx_unpin_pfn(kvm, pfn); + } return; } + + /* + * In case of TDP MMU, fault handler can run concurrently. Note + * 'source_pa' is a TD scope variable, meaning if there are multiple + * threads reaching here with all needing to access 'source_pa', it + * will break. However fortunately this won't happen, because below + * TDH_MEM_PAGE_ADD code path is only used when VM is being created + * before it is running, using KVM_TDX_INIT_MEM_REGION ioctl (which + * always uses vcpu 0's page table and protected by vcpu->mutex). + */ + if (KVM_BUG_ON(kvm_tdx->source_pa == INVALID_PAGE, kvm)) + return; + + source_pa = kvm_tdx->source_pa & ~KVM_TDX_MEASURE_MEMORY_REGION; + + err = tdh_mem_page_add(kvm_tdx->tdr.pa, gpa, hpa, source_pa, &out); + if (KVM_BUG_ON(err, kvm)) { + pr_tdx_error(TDH_MEM_PAGE_ADD, err, &out); + tdx_unpin_pfn(kvm, pfn); + } else if ((kvm_tdx->source_pa & KVM_TDX_MEASURE_MEMORY_REGION)) + tdx_measure_page(kvm_tdx, gpa); + + kvm_tdx->source_pa = INVALID_PAGE; } static void tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, @@ -1098,6 +1140,94 @@ void tdx_flush_tlb(struct kvm_vcpu *vcpu) cpu_relax(); } +#define TDX_SEPT_PFERR PFERR_WRITE_MASK + +static int tdx_init_mem_region(struct kvm *kvm, struct kvm_tdx_cmd *cmd) +{ + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); + struct kvm_tdx_init_mem_region region; + struct kvm_vcpu *vcpu; + struct page *page; + kvm_pfn_t pfn; + int idx, ret = 0; + + /* The BSP vCPU must be created before initializing memory regions. */ + if (!atomic_read(&kvm->online_vcpus)) + return -EINVAL; + + if (cmd->flags & ~KVM_TDX_MEASURE_MEMORY_REGION) + return -EINVAL; + + if (copy_from_user(®ion, (void __user *)cmd->data, sizeof(region))) + return -EFAULT; + + /* Sanity check */ + if (!IS_ALIGNED(region.source_addr, PAGE_SIZE) || + !IS_ALIGNED(region.gpa, PAGE_SIZE) || + !region.nr_pages || + region.gpa + (region.nr_pages << PAGE_SHIFT) <= region.gpa || + !kvm_is_private_gpa(kvm, region.gpa) || + !kvm_is_private_gpa(kvm, region.gpa + (region.nr_pages << PAGE_SHIFT))) + return -EINVAL; + + vcpu = kvm_get_vcpu(kvm, 0); + if (mutex_lock_killable(&vcpu->mutex)) + return -EINTR; + + vcpu_load(vcpu); + idx = srcu_read_lock(&kvm->srcu); + + kvm_mmu_reload(vcpu); + + while (region.nr_pages) { + if (signal_pending(current)) { + ret = -ERESTARTSYS; + break; + } + + if (need_resched()) + cond_resched(); + + + /* Pin the source page. */ + ret = get_user_pages_fast(region.source_addr, 1, 0, &page); + if (ret < 0) + break; + if (ret != 1) { + ret = -ENOMEM; + break; + } + + kvm_tdx->source_pa = pfn_to_hpa(page_to_pfn(page)) | + (cmd->flags & KVM_TDX_MEASURE_MEMORY_REGION); + + pfn = kvm_mmu_map_tdp_page(vcpu, region.gpa, TDX_SEPT_PFERR, + PG_LEVEL_4K); + if (is_error_noslot_pfn(pfn) || kvm->vm_bugged) + ret = -EFAULT; + else + ret = 0; + + put_page(page); + if (ret) + break; + + region.source_addr += PAGE_SIZE; + region.gpa += PAGE_SIZE; + region.nr_pages--; + } + + srcu_read_unlock(&kvm->srcu, idx); + vcpu_put(vcpu); + + mutex_unlock(&vcpu->mutex); + + if (copy_to_user((void __user *)cmd->data, ®ion, sizeof(region))) + ret = -EFAULT; + + return ret; +} + int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) { struct kvm_tdx_cmd tdx_cmd; @@ -1114,6 +1244,9 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) case KVM_TDX_INIT_VM: r = tdx_td_init(kvm, &tdx_cmd); break; + case KVM_TDX_INIT_MEM_REGION: + r = tdx_init_mem_region(kvm, &tdx_cmd); + break; default: r = -EINVAL; goto out; diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index d8dcbedd690b..29e7accee733 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -25,6 +25,8 @@ struct kvm_tdx { u64 xfam; int hkid; + hpa_t source_pa; + bool finalized; atomic_t tdh_mem_track; diff --git a/tools/arch/x86/include/uapi/asm/kvm.h b/tools/arch/x86/include/uapi/asm/kvm.h index 60a79f9ef174..af39f3adc179 100644 --- a/tools/arch/x86/include/uapi/asm/kvm.h +++ b/tools/arch/x86/include/uapi/asm/kvm.h @@ -533,6 +533,7 @@ enum kvm_tdx_cmd_id { KVM_TDX_CAPABILITIES = 0, KVM_TDX_INIT_VM, KVM_TDX_INIT_VCPU, + KVM_TDX_INIT_MEM_REGION, KVM_TDX_CMD_NR_MAX, }; @@ -610,4 +611,12 @@ struct kvm_tdx_init_vm { }; }; +#define KVM_TDX_MEASURE_MEMORY_REGION (1UL << 0) + +struct kvm_tdx_init_mem_region { + __u64 source_addr; + __u64 gpa; + __u64 nr_pages; +}; + #endif /* _ASM_X86_KVM_H */ -- 2.25.1