Received: by 2002:a05:7412:31a9:b0:e2:908c:2ebd with SMTP id et41csp3990551rdb; Thu, 14 Sep 2023 08:37:04 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFnARvVUIQZLn4xSSHwwJg+ZWE6MWKdc3hC1ZODSi5RnngRPe9nJtPlSkg5YdqC2fDEO3eu X-Received: by 2002:a05:6a00:150b:b0:68f:c6f8:1451 with SMTP id q11-20020a056a00150b00b0068fc6f81451mr6662679pfu.16.1694705823645; Thu, 14 Sep 2023 08:37:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694705823; cv=none; d=google.com; s=arc-20160816; b=UXQWoDTCYkLyi4097jTGyIvcIfBn3Q9DuOQvwfvan8IhFIXvWnE0Dh5HXUC0Pb9+6T +j4mBEoXkQjLZnFmTof/my9UN0qxAGIxNIIpsAoD7M2v9uTZzwjY2VXXNHFOGSieJfRn jKQp7dkTJsXfUlZ3Rc68PIJsKk+RhIo1/qrunpV2SZ7by2Zy0C9QBKUBwHff0/8Vtt5v T4WOgn+SmZb9FnYAKysuXWgN3GQCTDyV1UN6pXSAazyr837+tcmWQ/kI1o//NHb03HPB Uq3GunODWKJFbF54uNGAw90jl6R6K6iR+hii6xVXVX+wCBaoHv01Eptoy65s0iq7gDTi KPrw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:dkim-signature; bh=hXg+c13z5FpxpMh0dTDS8MDb5ldwgANdA63niVsOR7o=; fh=61hsfVoef5Tbbo+Rm06/Hxsz4fAtyORDF8Po5ZVRZDI=; b=syNLxA1OqPVvEOVuA2/NAmioyf7gETpjQJQ85Oluh8XdzQhuwE5Kf9ZnG429qlYuCD 70HawhyCuTtwNKtAInfo17EXs4xQGF2Org95EYGnXprTY+T0Qs4Q9hbU9v4hUCix0DFm oS0lIxt7x+7Eeec7QJHWK16hyJ0acZF4TyVg3Uel8PhPlf7PfoX1cq/IyIN5fI/yIXJ7 66i+qLo4DMDxaKj2IM+FnW8O45rMupj+iEmWAnd5A0KqumnYNBXJcL/1VuU/DqzOOY1n Aw/JZEL4xVA1FgG0JnfwRQqyKguqv5QZ98eS+GP8G5J9e/mrP2juYs69vjltKXAIEVvI joxg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=XOgQ+BIU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id h19-20020a056a001a5300b0068fc48d1f11si1900437pfv.69.2023.09.14.08.37.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Sep 2023 08:37:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=XOgQ+BIU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 6ACF880324CD; Wed, 13 Sep 2023 18:59:30 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234571AbjINB7L (ORCPT + 99 others); Wed, 13 Sep 2023 21:59:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60696 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233998AbjINB6Z (ORCPT ); Wed, 13 Sep 2023 21:58:25 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8A3B41FEC for ; Wed, 13 Sep 2023 18:56:10 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-591138c0978so6476197b3.1 for ; Wed, 13 Sep 2023 18:56:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694656570; x=1695261370; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=hXg+c13z5FpxpMh0dTDS8MDb5ldwgANdA63niVsOR7o=; b=XOgQ+BIUQpyRBoHtPwDbpt+SW3P3ARosHcjB/Uf7RYA4VZHM+RzCwKxT2vYV27Z4Jo MQ9NhTtoICBZEYgD/4fubr4T30I2Lqdo/yFMs8lwnE176lr9A8qeJfhZtaU5iqbkbO5h Qi3sz0TXlrW2FWaXreLY7hCqo53IJEw2A9NP0kxN0CyJjnklz2HyQFwxYIx4oStn4g5Z NtUcu307kp88lC39Ay0K/vIUI0N5mx2U62HoLkzZmdN3s0N2QhvETyo/XicjCjtbC9Y1 sWvToehki8bz9wUptri/uj7XVYcsbkNm+8Uvv/3gSHiARrUYIxz7TaMouYHH/T5MBHc1 AH/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694656570; x=1695261370; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=hXg+c13z5FpxpMh0dTDS8MDb5ldwgANdA63niVsOR7o=; b=TfxEShmM22hA5dHHucmhiFA1rHZ9c8aWHjrI8luf1SLzr0Z/2vTQBdErHJ1iOMZdjR SXXEP0jw/MX1b+pRteRB1F1KfvSBVGvqV+LHuVnIkvsgzxEdAg0eaPiPiQcxx1X3xEOK SJvoJSUsunrkqYFy8RBeVKoD9657HbYJt1PBs3yzqRIfA8NFFpk3tZ+8hNCy/cY/kX2H kqqzO4W6KfEf014Z4oOOpV353fUMY1XsbI5j/wy4HdO/dnd/Q3lyyR73hyl5dijFw+I0 h2n6fISFKgKJGfUHek1W8fXJCX/uvG+PGc1eKkadepSkN52kEGzQMJ/V8zGp6vhDCCWa CIOQ== X-Gm-Message-State: AOJu0Yz8IGvpZJgaaKPPfpwMNXY4vFKkB08DyTZe3RPZemnIG+4f+nI+ wr5jZUviyhWjaOHytuVIz/77DxQBMoM= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:690c:72a:b0:59b:b8bf:5973 with SMTP id bt10-20020a05690c072a00b0059bb8bf5973mr115588ywb.0.1694656569799; Wed, 13 Sep 2023 18:56:09 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 13 Sep 2023 18:55:15 -0700 In-Reply-To: <20230914015531.1419405-1-seanjc@google.com> Mime-Version: 1.0 References: <20230914015531.1419405-1-seanjc@google.com> X-Mailer: git-send-email 2.42.0.283.g2d96d420d3-goog Message-ID: <20230914015531.1419405-18-seanjc@google.com> Subject: [RFC PATCH v12 17/33] KVM: x86: Disallow hugepages when memory attributes are mixed From: Sean Christopherson To: Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Sean Christopherson , "Matthew Wilcox (Oracle)" , Andrew Morton , Paul Moore , James Morris , "Serge E. Hallyn" Cc: kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, Chao Peng , Fuad Tabba , Jarkko Sakkinen , Anish Moorthy , Yu Zhang , Isaku Yamahata , Xu Yilun , Vlastimil Babka , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Wed, 13 Sep 2023 18:59:30 -0700 (PDT) X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email From: Chao Peng Disallow creating hugepages with mixed memory attributes, e.g. shared versus private, as mapping a hugepage in this case would allow the guest to access memory with the wrong attributes, e.g. overlaying private memory with a shared hugepage. Tracking whether or not attributes are mixed via the existing disallow_lpage field, but use the most significant bit in 'disallow_lpage' to indicate a hugepage has mixed attributes instead using the normal refcounting. Whether or not attributes are mixed is binary; either they are or they aren't. Attempting to squeeze that info into the refcount is unnecessarily complex as it would require knowing the previous state of the mixed count when updating attributes. Using a flag means KVM just needs to ensure the current status is reflected in the memslots. Signed-off-by: Chao Peng Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_host.h | 3 + arch/x86/kvm/mmu/mmu.c | 152 +++++++++++++++++++++++++++++++- arch/x86/kvm/x86.c | 4 + 3 files changed, 157 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 3a2b53483524..91a28ddf7cfd 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1838,6 +1838,9 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu); int kvm_mmu_init_vm(struct kvm *kvm); void kvm_mmu_uninit_vm(struct kvm *kvm); +void kvm_mmu_init_memslot_memory_attributes(struct kvm *kvm, + struct kvm_memory_slot *slot); + void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu); void kvm_mmu_reset_context(struct kvm_vcpu *vcpu); void kvm_mmu_slot_remove_write_access(struct kvm *kvm, diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 0f0231d2b74f..a079f36a8bf5 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -795,16 +795,26 @@ static struct kvm_lpage_info *lpage_info_slot(gfn_t gfn, return &slot->arch.lpage_info[level - 2][idx]; } +/* + * The most significant bit in disallow_lpage tracks whether or not memory + * attributes are mixed, i.e. not identical for all gfns at the current level. + * The lower order bits are used to refcount other cases where a hugepage is + * disallowed, e.g. if KVM has shadow a page table at the gfn. + */ +#define KVM_LPAGE_MIXED_FLAG BIT(31) + static void update_gfn_disallow_lpage_count(const struct kvm_memory_slot *slot, gfn_t gfn, int count) { struct kvm_lpage_info *linfo; - int i; + int old, i; for (i = PG_LEVEL_2M; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) { linfo = lpage_info_slot(gfn, slot, i); + + old = linfo->disallow_lpage; linfo->disallow_lpage += count; - WARN_ON_ONCE(linfo->disallow_lpage < 0); + WARN_ON_ONCE((old ^ linfo->disallow_lpage) & KVM_LPAGE_MIXED_FLAG); } } @@ -7172,3 +7182,141 @@ void kvm_mmu_pre_destroy_vm(struct kvm *kvm) if (kvm->arch.nx_huge_page_recovery_thread) kthread_stop(kvm->arch.nx_huge_page_recovery_thread); } + +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES +static bool hugepage_test_mixed(struct kvm_memory_slot *slot, gfn_t gfn, + int level) +{ + return lpage_info_slot(gfn, slot, level)->disallow_lpage & KVM_LPAGE_MIXED_FLAG; +} + +static void hugepage_clear_mixed(struct kvm_memory_slot *slot, gfn_t gfn, + int level) +{ + lpage_info_slot(gfn, slot, level)->disallow_lpage &= ~KVM_LPAGE_MIXED_FLAG; +} + +static void hugepage_set_mixed(struct kvm_memory_slot *slot, gfn_t gfn, + int level) +{ + lpage_info_slot(gfn, slot, level)->disallow_lpage |= KVM_LPAGE_MIXED_FLAG; +} + +static bool hugepage_has_attrs(struct kvm *kvm, struct kvm_memory_slot *slot, + gfn_t gfn, int level, unsigned long attrs) +{ + const unsigned long start = gfn; + const unsigned long end = start + KVM_PAGES_PER_HPAGE(level); + + if (level == PG_LEVEL_2M) + return kvm_range_has_memory_attributes(kvm, start, end, attrs); + + for (gfn = start; gfn < end; gfn += KVM_PAGES_PER_HPAGE(level - 1)) { + if (hugepage_test_mixed(slot, gfn, level - 1) || + attrs != kvm_get_memory_attributes(kvm, gfn)) + return false; + } + return true; +} + +bool kvm_arch_post_set_memory_attributes(struct kvm *kvm, + struct kvm_gfn_range *range) +{ + unsigned long attrs = range->arg.attributes; + struct kvm_memory_slot *slot = range->slot; + int level; + + lockdep_assert_held_write(&kvm->mmu_lock); + lockdep_assert_held(&kvm->slots_lock); + + /* + * KVM x86 currently only supports KVM_MEMORY_ATTRIBUTE_PRIVATE, skip + * the slot if the slot will never consume the PRIVATE attribute. + */ + if (!kvm_slot_can_be_private(slot)) + return false; + + /* + * The sequence matters here: upper levels consume the result of lower + * level's scanning. + */ + for (level = PG_LEVEL_2M; level <= KVM_MAX_HUGEPAGE_LEVEL; level++) { + gfn_t nr_pages = KVM_PAGES_PER_HPAGE(level); + gfn_t gfn = gfn_round_for_level(range->start, level); + + /* Process the head page if it straddles the range. */ + if (gfn != range->start || gfn + nr_pages > range->end) { + /* + * Skip mixed tracking if the aligned gfn isn't covered + * by the memslot, KVM can't use a hugepage due to the + * misaligned address regardless of memory attributes. + */ + if (gfn >= slot->base_gfn) { + if (hugepage_has_attrs(kvm, slot, gfn, level, attrs)) + hugepage_clear_mixed(slot, gfn, level); + else + hugepage_set_mixed(slot, gfn, level); + } + gfn += nr_pages; + } + + /* + * Pages entirely covered by the range are guaranteed to have + * only the attributes which were just set. + */ + for ( ; gfn + nr_pages <= range->end; gfn += nr_pages) + hugepage_clear_mixed(slot, gfn, level); + + /* + * Process the last tail page if it straddles the range and is + * contained by the memslot. Like the head page, KVM can't + * create a hugepage if the slot size is misaligned. + */ + if (gfn < range->end && + (gfn + nr_pages) <= (slot->base_gfn + slot->npages)) { + if (hugepage_has_attrs(kvm, slot, gfn, level, attrs)) + hugepage_clear_mixed(slot, gfn, level); + else + hugepage_set_mixed(slot, gfn, level); + } + } + return false; +} + +void kvm_mmu_init_memslot_memory_attributes(struct kvm *kvm, + struct kvm_memory_slot *slot) +{ + int level; + + if (!kvm_slot_can_be_private(slot)) + return; + + for (level = PG_LEVEL_2M; level <= KVM_MAX_HUGEPAGE_LEVEL; level++) { + /* + * Don't bother tracking mixed attributes for pages that can't + * be huge due to alignment, i.e. process only pages that are + * entirely contained by the memslot. + */ + gfn_t end = gfn_round_for_level(slot->base_gfn + slot->npages, level); + gfn_t start = gfn_round_for_level(slot->base_gfn, level); + gfn_t nr_pages = KVM_PAGES_PER_HPAGE(level); + gfn_t gfn; + + if (start < slot->base_gfn) + start += nr_pages; + + /* + * Unlike setting attributes, every potential hugepage needs to + * be manually checked as the attributes may already be mixed. + */ + for (gfn = start; gfn < end; gfn += nr_pages) { + unsigned long attrs = kvm_get_memory_attributes(kvm, gfn); + + if (hugepage_has_attrs(kvm, slot, gfn, level, attrs)) + hugepage_clear_mixed(slot, gfn, level); + else + hugepage_set_mixed(slot, gfn, level); + } + } +} +#endif diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8d21b7b09bb5..ac36a5b7b5a3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -12598,6 +12598,10 @@ static int kvm_alloc_memslot_metadata(struct kvm *kvm, } } +#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES + kvm_mmu_init_memslot_memory_attributes(kvm, slot); +#endif + if (kvm_page_track_create_memslot(kvm, slot, npages)) goto out_free; -- 2.42.0.283.g2d96d420d3-goog