Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp5979225ybc; Wed, 27 Nov 2019 12:47:56 -0800 (PST) X-Google-Smtp-Source: APXvYqy/NgvqTnKqOmJ0hV2cdnOgDxmO1m6Twg2JFo7eiT6yzjx/PHvVUANuqanubnSKrJYILG64 X-Received: by 2002:a17:906:6b01:: with SMTP id q1mr50908228ejr.162.1574887676222; Wed, 27 Nov 2019 12:47:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1574887676; cv=none; d=google.com; s=arc-20160816; b=j3TWwUk0rbnX1PSQaNGVJ2L/gqpk9xeI7LbpuIXQLaQ6WRWRdAioUlYmfK2ZCe8LxD 8Zj/1u722l+W2F9Y3EHZ86FBmPHDUdW7lZLwoW5shHzEkJSeZ77JyYPC2KduJ9qXN7ET CvM6GGJaqdK/XCyJZkaWiHgsA2o1MR09Jx9/ytqa8NuS1Ttj34whbVphRAeQuds7v+ps ryeL8kd8g3kxLkhKJCU4cOw+KJi0ne5SSKdWfRiqHFLyEiz0Drxro6nkwoWmRCN1jtjy OcsaQnVoz02WCiinQVF6v5cciMsl96Ny7RbrRJsbI7HeavBky7ei8d53fA9q/G+31FTr nZ/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=rT+KLkegRWNxSr6jOxfojQ78zv7wqQ1pLfvBBkzmnDw=; b=NJAL7gy0FYMD41sHNAdTnzU6rPpZI4IZHqf3gGtglek7EPKAk9geu5l7dKfq+7u91R OBnPT1HKZU8PSrTR8sVzt5xBVIzzmLa1Tpk7dBvp/00sZuKAecVgvxq+krOkGZAbUP+C fXohCsReMfkmyJ+hxsnN+jOfJWATVAKgUfPsz+q601M52t86Jq8ZptLZeD6CAcy3gOM9 s1iGWn/EUmK6s3hFqf1cyfpUDzzQOXFdrY7JitR3Pme0E0ta0M/uzhvcYKGhwdkqVJpR E+ikiv9UCDu6ZpeRlggKyFieZ8//6trhsbL0DbSTrXcHollR7FKkV85Zqu5qju2cHgur bX2w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=a2lCO7f1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c26si11890391edc.416.2019.11.27.12.47.32; Wed, 27 Nov 2019 12:47:56 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=a2lCO7f1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728176AbfK0UpL (ORCPT + 99 others); Wed, 27 Nov 2019 15:45:11 -0500 Received: from mail.kernel.org ([198.145.29.99]:55502 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729701AbfK0UpI (ORCPT ); Wed, 27 Nov 2019 15:45:08 -0500 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A5FE82166E; Wed, 27 Nov 2019 20:45:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1574887507; bh=qOxCD+CM2Y1IKGBln2CH2CVS0iSKh61BQUMC3vhM1w4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=a2lCO7f15pek3YfXenxU3lL6cje0UGrMYjX4HpJdO7Q2ihLr1IS+QdhD4hfeT5gTG czTXvEW/w1XfgoKRsGLA/iJyBHcsMc7zl4ht7qvhCSQkdvoGp65ejJLPr/lsUu2FBr Ro2DCAUfNe5zyD8mipSYsW+6MCi4xsbhuj5jWiA4= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Adam Borowski , Dan Williams , Sean Christopherson , Paolo Bonzini , David Hildenbrand Subject: [PATCH 4.9 111/151] KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved Date: Wed, 27 Nov 2019 21:31:34 +0100 Message-Id: <20191127203042.970127158@linuxfoundation.org> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191127203000.773542911@linuxfoundation.org> References: <20191127203000.773542911@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Sean Christopherson commit a78986aae9b2988f8493f9f65a587ee433e83bc3 upstream. Explicitly exempt ZONE_DEVICE pages from kvm_is_reserved_pfn() and instead manually handle ZONE_DEVICE on a case-by-case basis. For things like page refcounts, KVM needs to treat ZONE_DEVICE pages like normal pages, e.g. put pages grabbed via gup(). But for flows such as setting A/D bits or shifting refcounts for transparent huge pages, KVM needs to to avoid processing ZONE_DEVICE pages as the flows in question lack the underlying machinery for proper handling of ZONE_DEVICE pages. This fixes a hang reported by Adam Borowski[*] in dev_pagemap_cleanup() when running a KVM guest backed with /dev/dax memory, as KVM straight up doesn't put any references to ZONE_DEVICE pages acquired by gup(). Note, Dan Williams proposed an alternative solution of doing put_page() on ZONE_DEVICE pages immediately after gup() in order to simplify the auditing needed to ensure is_zone_device_page() is called if and only if the backing device is pinned (via gup()). But that approach would break kvm_vcpu_{un}map() as KVM requires the page to be pinned from map() 'til unmap() when accessing guest memory, unlike KVM's secondary MMU, which coordinates with mmu_notifier invalidations to avoid creating stale page references, i.e. doesn't rely on pages being pinned. [*] http://lkml.kernel.org/r/20190919115547.GA17963@angband.pl Reported-by: Adam Borowski Analyzed-by: David Hildenbrand Acked-by: Dan Williams Cc: stable@vger.kernel.org Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman [sean: backport to 4.x; resolve conflict in mmu.c] Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu.c | 8 ++++---- include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 26 +++++++++++++++++++++++--- 3 files changed, 28 insertions(+), 7 deletions(-) --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2934,7 +2934,7 @@ static void transparent_hugepage_adjust( * here. */ if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) && - level == PT_PAGE_TABLE_LEVEL && + !kvm_is_zone_device_pfn(pfn) && level == PT_PAGE_TABLE_LEVEL && PageTransCompoundMap(pfn_to_page(pfn)) && !mmu_gfn_lpage_is_disallowed(vcpu, gfn, PT_DIRECTORY_LEVEL)) { unsigned long mask; @@ -4890,9 +4890,9 @@ restart: * the guest, and the guest page table is using 4K page size * mapping if the indirect sp has level = 1. */ - if (sp->role.direct && - !kvm_is_reserved_pfn(pfn) && - PageTransCompoundMap(pfn_to_page(pfn))) { + if (sp->role.direct && !kvm_is_reserved_pfn(pfn) && + !kvm_is_zone_device_pfn(pfn) && + PageTransCompoundMap(pfn_to_page(pfn))) { drop_spte(kvm, sptep); need_tlb_flush = 1; goto restart; --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -843,6 +843,7 @@ int kvm_cpu_has_pending_timer(struct kvm void kvm_vcpu_kick(struct kvm_vcpu *vcpu); bool kvm_is_reserved_pfn(kvm_pfn_t pfn); +bool kvm_is_zone_device_pfn(kvm_pfn_t pfn); struct kvm_irq_ack_notifier { struct hlist_node link; --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -131,10 +131,30 @@ __weak void kvm_arch_mmu_notifier_invali { } +bool kvm_is_zone_device_pfn(kvm_pfn_t pfn) +{ + /* + * The metadata used by is_zone_device_page() to determine whether or + * not a page is ZONE_DEVICE is guaranteed to be valid if and only if + * the device has been pinned, e.g. by get_user_pages(). WARN if the + * page_count() is zero to help detect bad usage of this helper. + */ + if (!pfn_valid(pfn) || WARN_ON_ONCE(!page_count(pfn_to_page(pfn)))) + return false; + + return is_zone_device_page(pfn_to_page(pfn)); +} + bool kvm_is_reserved_pfn(kvm_pfn_t pfn) { + /* + * ZONE_DEVICE pages currently set PG_reserved, but from a refcounting + * perspective they are "normal" pages, albeit with slightly different + * usage rules. + */ if (pfn_valid(pfn)) - return PageReserved(pfn_to_page(pfn)); + return PageReserved(pfn_to_page(pfn)) && + !kvm_is_zone_device_pfn(pfn); return true; } @@ -1758,7 +1778,7 @@ static void kvm_release_pfn_dirty(kvm_pf void kvm_set_pfn_dirty(kvm_pfn_t pfn) { - if (!kvm_is_reserved_pfn(pfn)) { + if (!kvm_is_reserved_pfn(pfn) && !kvm_is_zone_device_pfn(pfn)) { struct page *page = pfn_to_page(pfn); if (!PageReserved(page)) @@ -1769,7 +1789,7 @@ EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty); void kvm_set_pfn_accessed(kvm_pfn_t pfn) { - if (!kvm_is_reserved_pfn(pfn)) + if (!kvm_is_reserved_pfn(pfn) && !kvm_is_zone_device_pfn(pfn)) mark_page_accessed(pfn_to_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);