Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp8593448imu; Tue, 4 Dec 2018 10:47:14 -0800 (PST) X-Google-Smtp-Source: AFSGD/XrMDt9Zcw/Omvvjamm7YPM4bYcO6KaHTXV+yxHgCy3HBQWG/3PKnnYiFNkPk+FXjgPFfik X-Received: by 2002:a63:a30a:: with SMTP id s10mr16438475pge.234.1543949234786; Tue, 04 Dec 2018 10:47:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543949234; cv=none; d=google.com; s=arc-20160816; b=I2x3OzEL3z4l0Exiph7j2mxS8Z3hUNBRfLGU2M5lrlFSMjy6hvSMiXMMeDAMLsinxI gcLcqp2QFFXUYcLsdIfSFQhytwqRFzbEU5M+hAyrn8ZnnydvFSmhtjE742PZxYvVmtL0 ql4Wjg7DbgvcSRQ4bldXk4/FsZRJj2HwKHMFu1AcXD7FjEGxfSrkSNJ80lRxo4wlP59+ WoLnp2O9joJvufEQGOr8k2j3JtYRFgwNh//P95llIWlBFrH+gPTRYwvfp52C8SZJTEN1 Jo8q4EUNX1LRZG5XTRCMSHWc5zOuGYFcoXrM31R0RdkRA1SWew6DBu/8Ilm/sX1oXb7F YITg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id; bh=I6gj9FsenQ4s474h2HUmJmmEXyNzKGlauEhMW7nZKm0=; b=UaJG3rniTadECC/EYC4cPuPnBXSowEdUu1BwiXMrojp2FcEF35VUzsSUJaWylGbhac P6npinXRV2kDK6BQ/dHlL6YtB8M2Wxnl7A0ZG/CzUlWlCmwzXmO6Yo4t9ZVapOEvVEyI +5IjUtL8B/px0Xkmg+Tky5GPEvJYnbEXCycLtInxSHcBof1Vd8V+eh0hB7mtJsJ7Z2xm tH77L1e+pw986kQjNbo+hiVKWyTPtylM6jeqIu3Hyu6N/cq8crhjiWAXNVD7GXvTK+U2 76LEWCoXLdMrnFjp668lu+EGgEb9NPCJksnzL7mAPwgd1mQnLupC0zFEPoUTpo4qXnpL DlWQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b8si16210587pge.384.2018.12.04.10.46.57; Tue, 04 Dec 2018 10:47:14 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726006AbeLDSpJ (ORCPT + 99 others); Tue, 4 Dec 2018 13:45:09 -0500 Received: from mga04.intel.com ([192.55.52.120]:3821 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725831AbeLDSpJ (ORCPT ); Tue, 4 Dec 2018 13:45:09 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 04 Dec 2018 10:45:09 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,315,1539673200"; d="scan'208";a="280903225" Received: from ahduyck-desk1.jf.intel.com ([10.7.198.76]) by orsmga005.jf.intel.com with ESMTP; 04 Dec 2018 10:45:08 -0800 Message-ID: Subject: Re: [PATCH RFC 0/3] Fix KVM misinterpreting Reserved page as an MMIO page From: Alexander Duyck To: Yi Zhang Cc: dan.j.williams@intel.com, pbonzini@redhat.com, brho@google.com, kvm@vger.kernel.org, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, dave.jiang@intel.com, yu.c.zhang@intel.com, pagupta@redhat.com, david@redhat.com, jack@suse.cz, hch@lst.de, rkrcmar@redhat.com, jglisse@redhat.com Date: Tue, 04 Dec 2018 10:45:08 -0800 In-Reply-To: <20181204065914.GB73736@tiger-server> References: <154386493754.27193.1300965403157243427.stgit@ahduyck-desk1.amr.corp.intel.com> <20181204065914.GB73736@tiger-server> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5 (3.28.5-2.fc28) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2018-12-04 at 14:59 +0800, Yi Zhang wrote: > On 2018-12-03 at 11:25:20 -0800, Alexander Duyck wrote: > > I have loosely based this patch series off of the following patch series > > from Zhang Yi: > > https://lore.kernel.org/lkml/cover.1536342881.git.yi.z.zhang@linux.intel.com > > > > The original set had attempted to address the fact that DAX pages were > > treated like MMIO pages which had resulted in reduced performance. It > > attempted to address this by ignoring the PageReserved flag if the page > > was either a DEV_DAX or FS_DAX page. > > > > I am proposing this as an alternative to that set. The main reason for this > > is because I believe there are a few issues that were overlooked with that > > original set. Specifically KVM seems to have two different uses for the > > PageReserved flag. One being whether or not we can pin the memory, the other > > being if we should be marking the pages as dirty or accessed. I believe > > only the pinning really applies so I have split the uses of > > kvm_is_reserved_pfn and updated the function uses to determine support for > > page pinning to include a check of the pgmap to see if it supports pinning. > > kvm is not the only one users of the dax page. Yes, but KVM and virtualization in general seems to be the place where the code carrying the assumption that PageReserved == MMIO exists. > A similar user of PageReserved to look at is: > drivers/vfio/vfio_iommu_type1.c:is_invalid_reserved_pfn( > vfio is also want to know the page is capable for pinning. I would lump vfio in with virtualization as I said above. A quick search also shows that there is also arch/x86/kvm/mmu.c:kvm_is_mmio_pfn() which had a similar assumption but is already carrying workarounds. > I throught that you have removed the reserved flag on the dax page > > in https://patchwork.kernel.org/patch/10707267/ > > is something I missing here? That patch wasn't about DAX memory. That patch was about the fact that the reserved flag was expensive as a __set_bit operation. I was leaving the bit set for DAX and all other hot-plug memory and not setting it for deferred memory init. The reserved bit is essentially meant to flag everything that is not standard system RAM page. Historically speaking most of that was MMIO, now that isn't necessarily the case with the introduction of ZONE_DEVICE pages. The issue is DAX isn't necessarily system RAM either. So if we don't set the reserved bit for DAX then we have to go through and start adding exception cases to the paths that handle system RAM to split it off from DAX. Dan had pointed out one such example in kernel/power/snapshot.c:saveable_page() as I recall. > > > > --- > > > > Alexander Duyck (3): > > kvm: Split use cases for kvm_is_reserved_pfn to kvm_is_refcounted_pfn > > mm: Add support for exposing if dev_pagemap supports refcount pinning > > kvm: Add additional check to determine if a page is refcounted > > > > > > arch/x86/kvm/mmu.c | 6 +++--- > > drivers/nvdimm/pfn_devs.c | 2 ++ > > include/linux/kvm_host.h | 2 +- > > include/linux/memremap.h | 5 ++++- > > include/linux/mm.h | 11 +++++++++++ > > virt/kvm/kvm_main.c | 34 +++++++++++++++++++++++++--------- > > 6 files changed, 46 insertions(+), 14 deletions(-) > > > > --