Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp479581pxf; Wed, 10 Mar 2021 10:01:38 -0800 (PST) X-Google-Smtp-Source: ABdhPJzezTw2Ea53USdYEGZNqjhsTVcWRDmy8YOePluSAygVNn2a0yhxzEHD/oSL46Xk9JKrf/sJ X-Received: by 2002:a05:6402:312b:: with SMTP id dd11mr4669920edb.149.1615399298424; Wed, 10 Mar 2021 10:01:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1615399298; cv=none; d=google.com; s=arc-20160816; b=uWH+RNR4KUhhLctusOEwrv0UiSvAUZITdKKLmiSlBbOV+kquPYkXmnOqvMAJ6tvzhh iIK64Mr3WUMJ201dFz9jO30x85cDiBnsa8YBhzP1zrm/GrKE4VHJjhhs5OFnFpWRsGzm MEp3/dhAvD6VL/Haki2WAYMOmK7YkxxiriS8mw/T4JaGTJNgqZpsIhxmAYkIYM3h+cKo 3pqsQO4kIKH/XSQtRqAZN8qBUd90T8ghlv7uwKJYdEeO8np8UZxJWAsJMIJBEbrnkW4c ATogqXXIbEM2ffe/u9tHCwwhbrXlG10Shp/gcMamH/kG5pTpCfYtgOVPtq1UwsEKr0ZT JUWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:references:mime-version :message-id:in-reply-to:date:dkim-signature; bh=oJAEWGythqEowVh0n8HMKtwcniboC/S1aRtheNM7IbE=; b=qmywHhFCxL34F9ovDdYnd1d5HqXpH90IJTW1H+HWTNCVV8hixl88Wjzdkrbs41Vaq3 cQwd0yshc3Mm1hjtSD9Fyua9K9cPwOCNnLLZlpEIP0AZeVifKWnQrGBQL32QG+SgKXYr H5UDaD7mDsJIfBOETyeDDY+pBhwADULqFXQPGdkYGxjTc9UgKyd3vFBokPyL252VkQiz ldy9f0XCkDQHUsfHlRqXOlPD5VrG9MkRvu/UTCVihyq6syoyuB/6mwhK8a5LRseMVYhd wYVgkehqkt5g3toKpFfE/ba5sQl8fZ/LjDGXJUEz4mARr29GSwiK4ab+IGURt11RRgLQ /BFg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=cJzFnkrA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s14si73765edx.521.2021.03.10.10.01.11; Wed, 10 Mar 2021 10:01:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=cJzFnkrA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233900AbhCJR72 (ORCPT + 99 others); Wed, 10 Mar 2021 12:59:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48934 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231822AbhCJR65 (ORCPT ); Wed, 10 Mar 2021 12:58:57 -0500 Received: from mail-qt1-x84a.google.com (mail-qt1-x84a.google.com [IPv6:2607:f8b0:4864:20::84a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 64432C061761 for ; Wed, 10 Mar 2021 09:58:57 -0800 (PST) Received: by mail-qt1-x84a.google.com with SMTP id t19so13510940qta.2 for ; Wed, 10 Mar 2021 09:58:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=oJAEWGythqEowVh0n8HMKtwcniboC/S1aRtheNM7IbE=; b=cJzFnkrAT9fGjpNBXYhuuJPuJr8xf3G3JewcI+WPvURd6Io9t0i5rlaNRjeWqYPx/Y eka8vAXibXBnvp1mXYExjSyRNZKktSJI4VnhbJcjj7XalIQPeivVnmJRgECvhF7yYktc QH+jTNqGz17MxFaT24apKPJgDPJ84YCqDw0AQKQoJBDrUW4FUxuafKRVaqUASQCJgt4W zDXjXTbKU655Hi06aEooIs+AEcvNKClEJAQFk4k07xLpfajULWPSGUyKHd3q96HcDkfC OiBee+YVNUnM9DkSFpdLAcxla0asRJhapKsob7jGQ++zkQptlutrfxjQG2DUIKPAUHlR +Avw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=oJAEWGythqEowVh0n8HMKtwcniboC/S1aRtheNM7IbE=; b=Svlxvjyjy8sGM/hb7GjNS8SkkuQewL3XQGk6D4/kSTzfdXYPpjGUafmIN+YrEc5c4Q +jIeAox9ewwY4WOzLyNxJdimqBp9R+1Su9JR8m8jJL0JVkuQlRj/pSvLjfXAI+TYxpfS 5a0YGXBvvS4EPILV4sOr3NEu0ObxlrvQ0Dn+ETpmfKtXUFYi1X8RFMLxh8pttjzUkCsS XdUqSpWjVzPJ08PUMA4vSwhWXyr/JT8V8ZU+Xr+NNmV+fkZ3PpZnbQB+AQlP7tjRmslT aADTEeRsXE2iNoRxPpF1Y7T7i+1Y/28aovG31Ia8MPI6BZaazCormEFnD5L+XU97maDW OvyQ== X-Gm-Message-State: AOAM531FSeQyJuGPeafI+wPfq18d50Cbq6MwcfXcg/QLivnuO4BmQUNu luZR0kuCkmHlCrGwDhgtxqCbZpoCpRct X-Received: from r2d2-qp.c.googlers.com ([fda3:e722:ac3:10:28:9cb1:c0a8:1652]) (user=qperret job=sendgmr) by 2002:a0c:b92f:: with SMTP id u47mr4092501qvf.8.1615399136610; Wed, 10 Mar 2021 09:58:56 -0800 (PST) Date: Wed, 10 Mar 2021 17:57:45 +0000 In-Reply-To: <20210310175751.3320106-1-qperret@google.com> Message-Id: <20210310175751.3320106-29-qperret@google.com> Mime-Version: 1.0 References: <20210310175751.3320106-1-qperret@google.com> X-Mailer: git-send-email 2.30.1.766.gb4fecdf3b7-goog Subject: [PATCH v4 28/34] KVM: arm64: Use page-table to track page ownership From: Quentin Perret To: catalin.marinas@arm.com, will@kernel.org, maz@kernel.org, james.morse@arm.com, julien.thierry.kdev@gmail.com, suzuki.poulose@arm.com Cc: android-kvm@google.com, linux-kernel@vger.kernel.org, kernel-team@android.com, kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org, tabba@google.com, mark.rutland@arm.com, dbrazdil@google.com, mate.toth-pal@arm.com, seanjc@google.com, qperret@google.com, robh+dt@kernel.org, ardb@kernel.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org As the host stage 2 will be identity mapped, all the .hyp memory regions and/or memory pages donated to protected guestis will have to marked invalid in the host stage 2 page-table. At the same time, the hypervisor will need a way to track the ownership of each physical page to ensure memory sharing or donation between entities (host, guests, hypervisor) is legal. In order to enable this tracking at EL2, let's use the host stage 2 page-table itself. The idea is to use the top bits of invalid mappings to store the unique identifier of the page owner. The page-table owner (the host) gets identifier 0 such that, at boot time, it owns the entire IPA space as the pgd starts zeroed. Provide kvm_pgtable_stage2_set_owner() which allows to modify the ownership of pages in the host stage 2. It re-uses most of the map() logic, but ends up creating invalid mappings instead. This impacts how we do refcount as we now need to count invalid mappings when they are used for ownership tracking. Signed-off-by: Quentin Perret --- arch/arm64/include/asm/kvm_pgtable.h | 21 +++++++ arch/arm64/kvm/hyp/pgtable.c | 92 ++++++++++++++++++++++++---- 2 files changed, 101 insertions(+), 12 deletions(-) diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h index 4ae19247837b..b09af4612656 100644 --- a/arch/arm64/include/asm/kvm_pgtable.h +++ b/arch/arm64/include/asm/kvm_pgtable.h @@ -238,6 +238,27 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys, enum kvm_pgtable_prot prot, void *mc); +/** + * kvm_pgtable_stage2_set_owner() - Annotate invalid mappings with metadata + * encoding the ownership of a page in the + * IPA space. + * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init(). + * @addr: Intermediate physical address at which to place the annotation. + * @size: Size of the IPA range to annotate. + * @mc: Cache of pre-allocated and zeroed memory from which to allocate + * page-table pages. + * @owner_id: Unique identifier for the owner of the page. + * + * The page-table owner has identifier 0. This function can be used to mark + * portions of the IPA space as owned by other entities. When a stage 2 is used + * with identity-mappings, these annotations allow to use the page-table data + * structure as a simple rmap. + * + * Return: 0 on success, negative error code on failure. + */ +int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size, + void *mc, u32 owner_id); + /** * kvm_pgtable_stage2_unmap() - Remove a mapping from a guest stage-2 page-table. * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init(). diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index f37b4179b880..e4670b639726 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -48,6 +48,8 @@ KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \ KVM_PTE_LEAF_ATTR_HI_S2_XN) +#define KVM_INVALID_PTE_OWNER_MASK GENMASK(63, 32) + struct kvm_pgtable_walk_data { struct kvm_pgtable *pgt; struct kvm_pgtable_walker *walker; @@ -186,6 +188,11 @@ static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level) return pte; } +static kvm_pte_t kvm_init_invalid_leaf_owner(u32 owner_id) +{ + return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id); +} + static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr, u32 level, kvm_pte_t *ptep, enum kvm_pgtable_walk_flags flag) @@ -440,6 +447,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt) struct stage2_map_data { u64 phys; kvm_pte_t attr; + u32 owner_id; kvm_pte_t *anchor; kvm_pte_t *childp; @@ -506,6 +514,24 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot, return 0; } +static bool stage2_is_permission_change(kvm_pte_t old, kvm_pte_t new) +{ + if (!kvm_pte_valid(old) || !kvm_pte_valid(new)) + return false; + + return !((old ^ new) & (~KVM_PTE_LEAF_ATTR_S2_PERMS)); +} + +static bool stage2_pte_is_counted(kvm_pte_t pte) +{ + /* + * The refcount tracks valid entries as well as invalid entries if they + * encode ownership of a page to another entity than the page-table + * owner, whose id is 0. + */ + return !!pte; +} + static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, struct stage2_map_data *data) @@ -517,28 +543,36 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level, if (!kvm_block_mapping_supported(addr, end, phys, level)) return -E2BIG; - new = kvm_init_valid_leaf_pte(phys, data->attr, level); - if (kvm_pte_valid(old)) { + if (kvm_pte_valid(data->attr)) + new = kvm_init_valid_leaf_pte(phys, data->attr, level); + else + new = kvm_init_invalid_leaf_owner(data->owner_id); + + if (stage2_pte_is_counted(old)) { /* * Skip updating the PTE if we are trying to recreate the exact * same mapping or only change the access permissions. Instead, * the vCPU will exit one more time from guest if still needed * and then go through the path of relaxing permissions. */ - if (!((old ^ new) & (~KVM_PTE_LEAF_ATTR_S2_PERMS))) + if (stage2_is_permission_change(old, new)) return -EAGAIN; /* - * There's an existing different valid leaf entry, so perform - * break-before-make. + * Clear the existing PTE, and perform break-before-make with + * TLB maintenance if it was valid. */ kvm_clear_pte(ptep); - kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level); + if (kvm_pte_valid(old)) { + kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, + level); + } mm_ops->put_page(ptep); } smp_store_release(ptep, new); - mm_ops->get_page(ptep); + if (stage2_pte_is_counted(new)) + mm_ops->get_page(ptep); data->phys += granule; return 0; } @@ -574,7 +608,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, int ret; if (data->anchor) { - if (kvm_pte_valid(pte)) + if (stage2_pte_is_counted(pte)) mm_ops->put_page(ptep); return 0; @@ -599,9 +633,10 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, * a table. Accesses beyond 'end' that fall within the new table * will be mapped lazily. */ - if (kvm_pte_valid(pte)) { + if (stage2_pte_is_counted(pte)) { kvm_clear_pte(ptep); - kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level); + if (kvm_pte_valid(pte)) + kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level); mm_ops->put_page(ptep); } @@ -683,6 +718,7 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size, .mmu = pgt->mmu, .memcache = mc, .mm_ops = pgt->mm_ops, + .owner_id = 0, }; struct kvm_pgtable_walker walker = { .cb = stage2_map_walker, @@ -696,6 +732,33 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size, if (ret) return ret; + /* Set the valid flag to distinguish with the set_owner() path. */ + map_data.attr |= KVM_PTE_VALID; + + ret = kvm_pgtable_walk(pgt, addr, size, &walker); + dsb(ishst); + return ret; +} + +int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size, + void *mc, u32 owner_id) +{ + int ret; + struct stage2_map_data map_data = { + .mmu = pgt->mmu, + .memcache = mc, + .mm_ops = pgt->mm_ops, + .owner_id = owner_id, + .attr = 0, + }; + struct kvm_pgtable_walker walker = { + .cb = stage2_map_walker, + .flags = KVM_PGTABLE_WALK_TABLE_PRE | + KVM_PGTABLE_WALK_LEAF | + KVM_PGTABLE_WALK_TABLE_POST, + .arg = &map_data, + }; + ret = kvm_pgtable_walk(pgt, addr, size, &walker); dsb(ishst); return ret; @@ -725,8 +788,13 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, kvm_pte_t pte = *ptep, *childp = NULL; bool need_flush = false; - if (!kvm_pte_valid(pte)) + if (!kvm_pte_valid(pte)) { + if (stage2_pte_is_counted(pte)) { + kvm_clear_pte(ptep); + mm_ops->put_page(ptep); + } return 0; + } if (kvm_pte_table(pte, level)) { childp = kvm_pte_follow(pte, mm_ops); @@ -948,7 +1016,7 @@ static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep, struct kvm_pgtable_mm_ops *mm_ops = arg; kvm_pte_t pte = *ptep; - if (!kvm_pte_valid(pte)) + if (!stage2_pte_is_counted(pte)) return 0; mm_ops->put_page(ptep); -- 2.30.1.766.gb4fecdf3b7-goog