Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp5996948ybc; Wed, 27 Nov 2019 13:07:29 -0800 (PST) X-Google-Smtp-Source: APXvYqyrfi6GeK/1VKbOQy5cffdHbSBAZY8JZHUK+cbnmgpZ0rW3nCubB5IipVlq1/5kz4Mu1pCZ X-Received: by 2002:a17:906:cb93:: with SMTP id mf19mr50980063ejb.87.1574888849221; Wed, 27 Nov 2019 13:07:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1574888849; cv=none; d=google.com; s=arc-20160816; b=c8gdW0N5UzBGpUSmtIc/cjkBO3nUcVqP2PGYkpRdr/1vJutNpALojWRmrpg8P58Fdb 6kmozIk3imxYEDmmwdSPLieeMCnN6OOhVht8zsMarS2dBNQCDEkYvzVL527HT/T+zkQU ResxYtcj2A/x2HGATqNvENtlq+pFmTWpRoQESa5UyzLOeM+C4ZE6qPGcgFR3i4aUpXZp DaZgohvF9YclGbcBMZTLdo34YNjznHryIFJkggEtPrTKIT0OwOtPzWATjXUWeXlNdtud 5ZMNTYMdGWbLi4kDaQp6ipx5ueTITqjz9CKh7NbmPmEc9Zl9eCV2aXAVWMBnuXogogLi +NoQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=hv2tntMTjO7EwuvrjMQwPMtOPUpm5nKLHqk8WNYap/Q=; b=sTjoI0g1wmY1y+MMFBFqu0V4i9vxXYHOj54Ww7JGIOsP2STtIR3iX+m5Ev5lWn2RlP FibksQqm4P60YgKUz0UY0N9UfcDlpQe6sc2zQQoKbdJLcjwZ4pFWianpD07lD3IsL46H cp3iOYVmCeE1ZiufzhCmcRTe6hxvHWqfiXeYrEGXeqKo3Zpojvldoyqbm1N8moIhO1Nb 85WZlEorELIlaPuwITbynxF3zHpBoxPvSpcENNT1d2vI9XIKwHTLsGmMXnHoPsZ6eF5y RkKfugdQy2YxlO7Urdz4qicK3QHQOKXp9RJ+WXJOGdffy+WHWhwZThaayUxcE9na+SXc FZvQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=yMBSD7RS; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id by4si11514644edb.87.2019.11.27.13.07.05; Wed, 27 Nov 2019 13:07:29 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=yMBSD7RS; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732367AbfK0VF6 (ORCPT + 99 others); Wed, 27 Nov 2019 16:05:58 -0500 Received: from mail.kernel.org ([198.145.29.99]:59804 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727440AbfK0VF5 (ORCPT ); Wed, 27 Nov 2019 16:05:57 -0500 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id E9B152086A; Wed, 27 Nov 2019 21:05:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1574888756; bh=roSMi2E4aQEUsCzUFo/nqA8frS5jVbo6/6nvYCpg3eY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=yMBSD7RSbzwt9DPby0qs0uygMO1TdG1AaLGL63RIOK2d6qONiRWBfES06PCzFHb+g wzIIvBx9GvjzMSvp2fz8ZDC4jbTOeOhcHMyqCOIqGKldeiSZin2S18sjTf1H53xDaR asaCvQrO6zr9C5ds97k9aQrzbqG+wA8HAOVNGdy8= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Adam Borowski , Dan Williams , Sean Christopherson , Paolo Bonzini , David Hildenbrand Subject: [PATCH 4.19 257/306] KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved Date: Wed, 27 Nov 2019 21:31:47 +0100 Message-Id: <20191127203133.629360119@linuxfoundation.org> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191127203114.766709977@linuxfoundation.org> References: <20191127203114.766709977@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Sean Christopherson commit a78986aae9b2988f8493f9f65a587ee433e83bc3 upstream. Explicitly exempt ZONE_DEVICE pages from kvm_is_reserved_pfn() and instead manually handle ZONE_DEVICE on a case-by-case basis. For things like page refcounts, KVM needs to treat ZONE_DEVICE pages like normal pages, e.g. put pages grabbed via gup(). But for flows such as setting A/D bits or shifting refcounts for transparent huge pages, KVM needs to to avoid processing ZONE_DEVICE pages as the flows in question lack the underlying machinery for proper handling of ZONE_DEVICE pages. This fixes a hang reported by Adam Borowski[*] in dev_pagemap_cleanup() when running a KVM guest backed with /dev/dax memory, as KVM straight up doesn't put any references to ZONE_DEVICE pages acquired by gup(). Note, Dan Williams proposed an alternative solution of doing put_page() on ZONE_DEVICE pages immediately after gup() in order to simplify the auditing needed to ensure is_zone_device_page() is called if and only if the backing device is pinned (via gup()). But that approach would break kvm_vcpu_{un}map() as KVM requires the page to be pinned from map() 'til unmap() when accessing guest memory, unlike KVM's secondary MMU, which coordinates with mmu_notifier invalidations to avoid creating stale page references, i.e. doesn't rely on pages being pinned. [*] http://lkml.kernel.org/r/20190919115547.GA17963@angband.pl Reported-by: Adam Borowski Analyzed-by: David Hildenbrand Acked-by: Dan Williams Cc: stable@vger.kernel.org Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini [sean: backport to 4.x; resolve conflict in mmu.c] Signed-off-by: Sean Christopherson Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/mmu.c | 8 ++++---- include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 26 +++++++++++++++++++++++--- 3 files changed, 28 insertions(+), 7 deletions(-) --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3261,7 +3261,7 @@ static void transparent_hugepage_adjust( * here. */ if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) && - level == PT_PAGE_TABLE_LEVEL && + !kvm_is_zone_device_pfn(pfn) && level == PT_PAGE_TABLE_LEVEL && PageTransCompoundMap(pfn_to_page(pfn)) && !mmu_gfn_lpage_is_disallowed(vcpu, gfn, PT_DIRECTORY_LEVEL)) { unsigned long mask; @@ -5709,9 +5709,9 @@ restart: * the guest, and the guest page table is using 4K page size * mapping if the indirect sp has level = 1. */ - if (sp->role.direct && - !kvm_is_reserved_pfn(pfn) && - PageTransCompoundMap(pfn_to_page(pfn))) { + if (sp->role.direct && !kvm_is_reserved_pfn(pfn) && + !kvm_is_zone_device_pfn(pfn) && + PageTransCompoundMap(pfn_to_page(pfn))) { drop_spte(kvm, sptep); need_tlb_flush = 1; goto restart; --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -911,6 +911,7 @@ int kvm_cpu_has_pending_timer(struct kvm void kvm_vcpu_kick(struct kvm_vcpu *vcpu); bool kvm_is_reserved_pfn(kvm_pfn_t pfn); +bool kvm_is_zone_device_pfn(kvm_pfn_t pfn); struct kvm_irq_ack_notifier { struct hlist_node link; --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -147,10 +147,30 @@ __weak int kvm_arch_mmu_notifier_invalid return 0; } +bool kvm_is_zone_device_pfn(kvm_pfn_t pfn) +{ + /* + * The metadata used by is_zone_device_page() to determine whether or + * not a page is ZONE_DEVICE is guaranteed to be valid if and only if + * the device has been pinned, e.g. by get_user_pages(). WARN if the + * page_count() is zero to help detect bad usage of this helper. + */ + if (!pfn_valid(pfn) || WARN_ON_ONCE(!page_count(pfn_to_page(pfn)))) + return false; + + return is_zone_device_page(pfn_to_page(pfn)); +} + bool kvm_is_reserved_pfn(kvm_pfn_t pfn) { + /* + * ZONE_DEVICE pages currently set PG_reserved, but from a refcounting + * perspective they are "normal" pages, albeit with slightly different + * usage rules. + */ if (pfn_valid(pfn)) - return PageReserved(pfn_to_page(pfn)); + return PageReserved(pfn_to_page(pfn)) && + !kvm_is_zone_device_pfn(pfn); return true; } @@ -1727,7 +1747,7 @@ EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty) void kvm_set_pfn_dirty(kvm_pfn_t pfn) { - if (!kvm_is_reserved_pfn(pfn)) { + if (!kvm_is_reserved_pfn(pfn) && !kvm_is_zone_device_pfn(pfn)) { struct page *page = pfn_to_page(pfn); if (!PageReserved(page)) @@ -1738,7 +1758,7 @@ EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty); void kvm_set_pfn_accessed(kvm_pfn_t pfn) { - if (!kvm_is_reserved_pfn(pfn)) + if (!kvm_is_reserved_pfn(pfn) && !kvm_is_zone_device_pfn(pfn)) mark_page_accessed(pfn_to_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);