Received: by 2002:a05:7412:a9a2:b0:e2:908c:2ebd with SMTP id o34csp1349652rdh; Fri, 27 Oct 2023 11:25:01 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFI6ikw9Wt74J0zRVT8ycqqOKHc1dWHPI9vasoGWPoqkoqfq0n3IlqdqnHQewa1//CGz17Y X-Received: by 2002:a81:450c:0:b0:59f:7650:8fdb with SMTP id s12-20020a81450c000000b0059f76508fdbmr3347651ywa.33.1698431101450; Fri, 27 Oct 2023 11:25:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698431101; cv=none; d=google.com; s=arc-20160816; b=cAxPoXhDK726mZaIzsfU1pdVcWGpj0Bxvs3GhBx5aONh4G2fI9jkMsuOdZFTliR6ca Y1n/jWPbM36vhLrpMKPBIYfQJt4rVmlWjZ7XSjq55l1Q3e39DKTN7FtDfM71efc7pfU1 e8mrK+4F8qhp1fJiyOZPu5LqEeWmqL76QuCrKJb4tWDokj7O6QMteK51qcSQnpFltsCW GAoXyK9tLgW9BjKdmA4EOzyuZLxuRjaIL3QFbRqPi6+CUr8jc4jYlt6ITolgpaLG5cXa LakOZONctT6OecaslhyMc8HnDFnZ8UCkJQpl5eidsTj2NwPOT60GMe734Nqx7eNcisMM 3k6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:dkim-signature; bh=Xj30xbFl07atOtCQSMtQg8dB6rp8Wt3n2Y10jzFlxa0=; fh=lhteFENhZrfxRoH7K7/E/bqXvDWa/XLvUszFia9mLtM=; b=Td4FA/MaZ1K42ZxmH3KIRp5mrIwzOVdTMfUMdyum29JAXQCm8yUDz07rQcQey0n0CA UPdMEyST/RdvGXwzW8HCMjtUAYVbccMOFVLBTmb0MbizKCqhhwyPGxrgwkwTAaTKcEWX QJ/pxWY79R1luk8GdmSSTAko8U8qfScGDoKGtDHgnl/YbTPOcQYkNwevojk0SYLn/Tqi Y75VJdsRNs/thLP9WbEOt1RGILs6eePLM00HqA5v9cyBrgzmiDKKj1midoV/MGyYbDdB TDYeJA/NZ5GoJ7otL8TU+WPatlMZY63JUenT8qB7382cezNSRgJHhnk5yPSJRBsFUsph 26JA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=qMqn5pXu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id d141-20020a814f93000000b005652078563csi3375319ywb.153.2023.10.27.11.25.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Oct 2023 11:25:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=qMqn5pXu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 186D880A30F0; Fri, 27 Oct 2023 11:25:00 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346546AbjJ0SYp (ORCPT + 99 others); Fri, 27 Oct 2023 14:24:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46410 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346584AbjJ0SX4 (ORCPT ); Fri, 27 Oct 2023 14:23:56 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A5BE91BE1 for ; Fri, 27 Oct 2023 11:23:09 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-d9b9aeb4962so2099960276.3 for ; Fri, 27 Oct 2023 11:23:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1698430985; x=1699035785; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=Xj30xbFl07atOtCQSMtQg8dB6rp8Wt3n2Y10jzFlxa0=; b=qMqn5pXuqyurBO5qwuBwbiHZNZXCybzVAhRgQ8QLFNTn3BwWKtxkQK+h/y6Wxbf0Iu thfqQW0CwID1pIzQsfktx0O0RoyQ/U07HiFzGh1bAcdTpUQtfC7ekXDDwmbWya85mXuv LFlR2xd2aSqUQK6ik9eAgQ1/UkV7zoGsvkH+w1dky9uXQV43//fr+tuBs4yUwP+QYlPz uOjxu0syZiN/BKbyFSXKA/6WFUQUKGdLooDcYJtQVenFYTn1uMUckjEidaFw7azInlJJ b3rviQf+MKAuWbhBKtbO5QiRIRh2jTOV37x4iLTSxTACuqaSdCRNHhuCjR/1n43CGUy5 2lFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698430985; x=1699035785; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Xj30xbFl07atOtCQSMtQg8dB6rp8Wt3n2Y10jzFlxa0=; b=cC0GJeaA41TCjjNVm0X4i9/3AnKHZbIvU2V2JYqkHcQnT7kh5Dqk/JUrE658ZZHvYx ooUgGsTe+TT9yQ3hXSexvRDyHZkauWqaI/PVYxh6lOo90LCiYFjd4eqO034j8MmmdH4Z QTwoCvhQ1l8XzGaAzw9IxIPFFENa7/91spJJDONYz+Qgb343OQQ5RODRl/C4Q0vBQzzc I1zmtwGfayBJUBfoSO03+kTIL8ISl/aErhIDwYRt9DQznQRL85hEZT1C2tO2B2np1pOI IXVoka8GEG7s8OF4qNvVRTRBKQ+g9ckEkW+9ENlmcxUteKU9ppN/ZHp74zk8MCpFitN+ /HtA== X-Gm-Message-State: AOJu0YybC8Tij0xTNRGrikuBNyBvttjqLhvxDLyjSZre/XZFLUEHZnr0 hgJTXdqNnaMWKRdzoDOHIIo3Af3enko= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:5f4c:0:b0:da0:3e20:658d with SMTP id h12-20020a255f4c000000b00da03e20658dmr63345ybm.10.1698430985424; Fri, 27 Oct 2023 11:23:05 -0700 (PDT) Reply-To: Sean Christopherson Date: Fri, 27 Oct 2023 11:22:02 -0700 In-Reply-To: <20231027182217.3615211-1-seanjc@google.com> Mime-Version: 1.0 References: <20231027182217.3615211-1-seanjc@google.com> X-Mailer: git-send-email 2.42.0.820.g83a721a137-goog Message-ID: <20231027182217.3615211-21-seanjc@google.com> Subject: [PATCH v13 20/35] KVM: x86/mmu: Handle page fault for private memory From: Sean Christopherson To: Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Sean Christopherson , Alexander Viro , Christian Brauner , "Matthew Wilcox (Oracle)" , Andrew Morton Cc: kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Xiaoyao Li , Xu Yilun , Chao Peng , Fuad Tabba , Jarkko Sakkinen , Anish Moorthy , David Matlack , Yu Zhang , Isaku Yamahata , "=?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?=" , Vlastimil Babka , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Fri, 27 Oct 2023 11:25:00 -0700 (PDT) From: Chao Peng Add support for resolving page faults on guest private memory for VMs that differentiate between "shared" and "private" memory. For such VMs, KVM_MEM_PRIVATE memslots can include both fd-based private memory and hva-based shared memory, and KVM needs to map in the "correct" variant, i.e. KVM needs to map the gfn shared/private as appropriate based on the current state of the gfn's KVM_MEMORY_ATTRIBUTE_PRIVATE flag. For AMD's SEV-SNP and Intel's TDX, the guest effectively gets to request shared vs. private via a bit in the guest page tables, i.e. what the guest wants may conflict with the current memory attributes. To support such "implicit" conversion requests, exit to user with KVM_EXIT_MEMORY_FAULT to forward the request to userspace. Add a new flag for memory faults, KVM_MEMORY_EXIT_FLAG_PRIVATE, to communicate whether the guest wants to map memory as shared vs. private. Like KVM_MEMORY_ATTRIBUTE_PRIVATE, use bit 3 for flagging private memory so that KVM can use bits 0-2 for capturing RWX behavior if/when userspace needs such information, e.g. a likely user of KVM_EXIT_MEMORY_FAULT is to exit on missing mappings when handling guest page fault VM-Exits. In that case, userspace will want to know RWX information in order to correctly/precisely resolve the fault. Note, private memory *must* be backed by guest_memfd, i.e. shared mappings always come from the host userspace page tables, and private mappings always come from a guest_memfd instance. Co-developed-by: Yu Zhang Signed-off-by: Yu Zhang Signed-off-by: Chao Peng Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson --- Documentation/virt/kvm/api.rst | 8 ++- arch/x86/kvm/mmu/mmu.c | 101 ++++++++++++++++++++++++++++++-- arch/x86/kvm/mmu/mmu_internal.h | 1 + include/linux/kvm_host.h | 8 ++- include/uapi/linux/kvm.h | 1 + 5 files changed, 110 insertions(+), 9 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 7f00c310c24a..38dc1fda4f45 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6837,6 +6837,7 @@ spec refer, https://github.com/riscv/riscv-sbi-doc. /* KVM_EXIT_MEMORY_FAULT */ struct { + #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1ULL << 3) __u64 flags; __u64 gpa; __u64 size; @@ -6845,8 +6846,11 @@ spec refer, https://github.com/riscv/riscv-sbi-doc. KVM_EXIT_MEMORY_FAULT indicates the vCPU has encountered a memory fault that could not be resolved by KVM. The 'gpa' and 'size' (in bytes) describe the guest physical address range [gpa, gpa + size) of the fault. The 'flags' field -describes properties of the faulting access that are likely pertinent. -Currently, no flags are defined. +describes properties of the faulting access that are likely pertinent: + + - KVM_MEMORY_EXIT_FLAG_PRIVATE - When set, indicates the memory fault occurred + on a private memory access. When clear, indicates the fault occurred on a + shared access. Note! KVM_EXIT_MEMORY_FAULT is unique among all KVM exit reasons in that it accompanies a return code of '-1', not '0'! errno will always be set to EFAULT diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 4167d557c577..c4e758f0aebb 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3147,9 +3147,9 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, return level; } -int kvm_mmu_max_mapping_level(struct kvm *kvm, - const struct kvm_memory_slot *slot, gfn_t gfn, - int max_level) +static int __kvm_mmu_max_mapping_level(struct kvm *kvm, + const struct kvm_memory_slot *slot, + gfn_t gfn, int max_level, bool is_private) { struct kvm_lpage_info *linfo; int host_level; @@ -3161,6 +3161,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, break; } + if (is_private) + return max_level; + if (max_level == PG_LEVEL_4K) return PG_LEVEL_4K; @@ -3168,6 +3171,16 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, return min(host_level, max_level); } +int kvm_mmu_max_mapping_level(struct kvm *kvm, + const struct kvm_memory_slot *slot, gfn_t gfn, + int max_level) +{ + bool is_private = kvm_slot_can_be_private(slot) && + kvm_mem_is_private(kvm, gfn); + + return __kvm_mmu_max_mapping_level(kvm, slot, gfn, max_level, is_private); +} + void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { struct kvm_memory_slot *slot = fault->slot; @@ -3188,8 +3201,9 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault * Enforce the iTLB multihit workaround after capturing the requested * level, which will be used to do precise, accurate accounting. */ - fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, slot, - fault->gfn, fault->max_level); + fault->req_level = __kvm_mmu_max_mapping_level(vcpu->kvm, slot, + fault->gfn, fault->max_level, + fault->is_private); if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed) return; @@ -4261,6 +4275,55 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true, NULL); } +static inline u8 kvm_max_level_for_order(int order) +{ + BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G); + + KVM_MMU_WARN_ON(order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G) && + order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M) && + order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K)); + + if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G)) + return PG_LEVEL_1G; + + if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M)) + return PG_LEVEL_2M; + + return PG_LEVEL_4K; +} + +static void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, + struct kvm_page_fault *fault) +{ + kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT, + PAGE_SIZE, fault->write, fault->exec, + fault->is_private); +} + +static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, + struct kvm_page_fault *fault) +{ + int max_order, r; + + if (!kvm_slot_can_be_private(fault->slot)) { + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); + return -EFAULT; + } + + r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn, + &max_order); + if (r) { + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); + return r; + } + + fault->max_level = min(kvm_max_level_for_order(max_order), + fault->max_level); + fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY); + + return RET_PF_CONTINUE; +} + static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { struct kvm_memory_slot *slot = fault->slot; @@ -4293,6 +4356,14 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault return RET_PF_EMULATE; } + if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) { + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); + return -EFAULT; + } + + if (fault->is_private) + return kvm_faultin_pfn_private(vcpu, fault); + async = false; fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, false, &async, fault->write, &fault->map_writable, @@ -7173,6 +7244,26 @@ void kvm_mmu_pre_destroy_vm(struct kvm *kvm) } #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES +bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm, + struct kvm_gfn_range *range) +{ + /* + * Zap SPTEs even if the slot can't be mapped PRIVATE. KVM x86 only + * supports KVM_MEMORY_ATTRIBUTE_PRIVATE, and so it *seems* like KVM + * can simply ignore such slots. But if userspace is making memory + * PRIVATE, then KVM must prevent the guest from accessing the memory + * as shared. And if userspace is making memory SHARED and this point + * is reached, then at least one page within the range was previously + * PRIVATE, i.e. the slot's possible hugepage ranges are changing. + * Zapping SPTEs in this case ensures KVM will reassess whether or not + * a hugepage can be used for affected ranges. + */ + if (WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm))) + return false; + + return kvm_unmap_gfn_range(kvm, range); +} + static bool hugepage_test_mixed(struct kvm_memory_slot *slot, gfn_t gfn, int level) { diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index decc1f153669..86c7cb692786 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -201,6 +201,7 @@ struct kvm_page_fault { /* Derived from mmu and global state. */ const bool is_tdp; + const bool is_private; const bool nx_huge_page_workaround_enabled; /* diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 7de93858054d..e3223cafd7db 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2358,14 +2358,18 @@ static inline void kvm_account_pgtable_pages(void *virt, int nr) #define KVM_DIRTY_RING_MAX_ENTRIES 65536 static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, - gpa_t gpa, gpa_t size) + gpa_t gpa, gpa_t size, + bool is_write, bool is_exec, + bool is_private) { vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT; vcpu->run->memory_fault.gpa = gpa; vcpu->run->memory_fault.size = size; - /* Flags are not (yet) defined or communicated to userspace. */ + /* RWX flags are not (yet) defined or communicated to userspace. */ vcpu->run->memory_fault.flags = 0; + if (is_private) + vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE; } #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 33d542de0a61..29e9eb51dec9 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -527,6 +527,7 @@ struct kvm_run { } notify; /* KVM_EXIT_MEMORY_FAULT */ struct { +#define KVM_MEMORY_EXIT_FLAG_PRIVATE (1ULL << 3) __u64 flags; __u64 gpa; __u64 size; -- 2.42.0.820.g83a721a137-goog