Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp7447822imu; Mon, 3 Dec 2018 13:06:54 -0800 (PST) X-Google-Smtp-Source: AFSGD/U9hrOZarx2DTQ8nGN14l+ChkPLsSspwu7+i+5kqXeUQEpMi8wU9R3JrKDlhxzQ/Hn1MQJn X-Received: by 2002:a63:3c44:: with SMTP id i4mr14304887pgn.286.1543871214799; Mon, 03 Dec 2018 13:06:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543871214; cv=none; d=google.com; s=arc-20160816; b=T6QZvk+6PB3l4t/KHxrS/pOPhOOnJYizY+XN6DyHtqumShUHfBHSBXarmMDzgSz14k dJlrDTV/FhBaVZrikikS5ricc3rog0BRfr3PplE/xaZLvTj8kzebCByGmXA5FwPkOXnu lxtTn7vPICGcENgXrV3RHxU3oIaVETc2Hzp7hI2BmCWXke9CYmH8Eq7+mP7HP5DHySuF vWihe5fbatI1Zyid9afvApysOL15ofuY86lyjsV/tCk2ufplpStsn/C14eSqJC1Q7yqK ItbkoqJimvgF4Zzx0hJy24RHjoS4cx+RuKxJRC57ZSHm7aJIp6mCtrjpe4hBjF5nIA+g 8Lfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=SGWQP49vuk5U58f8cJSyNKiSR2apn1y2MIRnFJCsql8=; b=h3B+VoQXuj7ggSBMhE9hXXyA+X/WYJp3+s0JeXGlpI/VrahRP6ZhUIhmD88DWqSTXS U1+pHu67InioGHA/mKBUCfq1V1jmb8vgA/ynRE98Rpw2gOYnNpA7+Kq1J3gmhJvm93dO QYZI4TdKeq1eBNNzfmFP80kP6V8Uk9gHVYNguFPUiHU3IbL8zH4QIqqGDRh9SFZ1TMrP PgQ9WPg38vUF4BKQhDeeEqU3BmknEfJQ9m215wqV0/udXkt6goK/KVH958LGdjf9Gosb mTKEeDGgRlL1yJ93i3CJGgbxum2ROZ4Be3x1L3BihtDa2LHotpfIcjUCNY2jY+u165KS +zNA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=l+IgNfNM; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m64si15845742pfb.224.2018.12.03.13.06.40; Mon, 03 Dec 2018 13:06:54 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=l+IgNfNM; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726098AbeLCVFu (ORCPT + 99 others); Mon, 3 Dec 2018 16:05:50 -0500 Received: from mail-oi1-f195.google.com ([209.85.167.195]:37150 "EHLO mail-oi1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725908AbeLCVFu (ORCPT ); Mon, 3 Dec 2018 16:05:50 -0500 Received: by mail-oi1-f195.google.com with SMTP id y23so12303332oia.4 for ; Mon, 03 Dec 2018 13:05:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=SGWQP49vuk5U58f8cJSyNKiSR2apn1y2MIRnFJCsql8=; b=l+IgNfNMHhweVm07x1YncvoHDJcnZFEaUeQftsH5WLGVUHSRvGQjHseol2ZdrIrD5u IkcXF0mlAac61epK9mrIWSsdJxlT20J5VKu+ROoS8G7DfG3bvueOfiVfiul6Oke99bZ0 NpHPHW+NWYAlLqSXS3ZUQ2Z6iY61Yn9kF+6MDPpn/nasbgMMEmInhjrC7/JiSQ2kNELd +p5YlCqzeEr/Yu9q05w9OmG66/zzeqJycd1IjtIUBvJZGMdfTKa4b2QDgJW6iIYGJVDA G/GLxjS4M5Kp0hZmM+rxRQpu3v7hGK7ErxrcwZT/WZKKM0auJBJVoHePHpEipGEUAvng NjnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=SGWQP49vuk5U58f8cJSyNKiSR2apn1y2MIRnFJCsql8=; b=YDotCNl7gqO8JWZNkFhOroKaBrs7h75I1to5OwPxGn+wHaf5nz+XKI6geeX6o9hHwS 0bVw/JP9h7U6fhvFme2G51e8xJrMwukW3vWBeXpCtiqCMZgQoUlBwiTyE+qk+xuQZf91 mqbNDj4AugA2OH26KLF+R7gunvpdQ3Ri5hTHmChTJnxscVVJDmPzdXhb0F5u817WBiPS D2sSfmswjCXWz4ERNcjaTKGU5gG/Z78qx2XNjivK0HxVyXtdRdoFZFENO8xIumJb5Utp zDtd3CWQIF1pMa2kDEPQZ3vFu8vhTrmM8uweh336mSHJHLGqyKDsiK8OoJRqaUep1PMx LxDA== X-Gm-Message-State: AA+aEWYEflzwKNQapxhqsUcbv91bVOjpH1Bghw5RisDntDi34tS8FnkM 17HJfYqdk9RH1A2Y5gJp16ej/SfQ2EJ9+LLlHEV1dw== X-Received: by 2002:aca:d78b:: with SMTP id o133mr5165776oig.232.1543871148918; Mon, 03 Dec 2018 13:05:48 -0800 (PST) MIME-Version: 1.0 References: <154386493754.27193.1300965403157243427.stgit@ahduyck-desk1.amr.corp.intel.com> <154386513120.27193.7977541941078967487.stgit@ahduyck-desk1.amr.corp.intel.com> <97943d2ed62e6887f4ba51b985ef4fb5478bc586.camel@linux.intel.com> <2a3f70b011b56de2289e2f304b3d2d617c5658fb.camel@linux.intel.com> In-Reply-To: <2a3f70b011b56de2289e2f304b3d2d617c5658fb.camel@linux.intel.com> From: Dan Williams Date: Mon, 3 Dec 2018 13:05:37 -0800 Message-ID: Subject: Re: [PATCH RFC 2/3] mm: Add support for exposing if dev_pagemap supports refcount pinning To: alexander.h.duyck@linux.intel.com Cc: Paolo Bonzini , Zhang Yi , Barret Rhoden , KVM list , linux-nvdimm , Linux Kernel Mailing List , Linux MM , Dave Jiang , "Zhang, Yu C" , Pankaj Gupta , David Hildenbrand , Jan Kara , Christoph Hellwig , rkrcmar@redhat.com, =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 3, 2018 at 12:53 PM Alexander Duyck wrote: > > On Mon, 2018-12-03 at 12:31 -0800, Dan Williams wrote: > > On Mon, Dec 3, 2018 at 12:21 PM Alexander Duyck > > wrote: > > > > > > On Mon, 2018-12-03 at 11:47 -0800, Dan Williams wrote: > > > > On Mon, Dec 3, 2018 at 11:25 AM Alexander Duyck > > > > wrote: > > > > > > > > > > Add a means of exposing if a pagemap supports refcount pinning. I am doing > > > > > this to expose if a given pagemap has backing struct pages that will allow > > > > > for the reference count of the page to be incremented to lock the page > > > > > into place. > > > > > > > > > > The KVM code already has several spots where it was trying to use a > > > > > pfn_valid check combined with a PageReserved check to determien if it could > > > > > take a reference on the page. I am adding this check so in the case of the > > > > > page having the reserved flag checked we can check the pagemap for the page > > > > > to determine if we might fall into the special DAX case. > > > > > > > > > > Signed-off-by: Alexander Duyck > > > > > --- > > > > > drivers/nvdimm/pfn_devs.c | 2 ++ > > > > > include/linux/memremap.h | 5 ++++- > > > > > include/linux/mm.h | 11 +++++++++++ > > > > > 3 files changed, 17 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c > > > > > index 6f22272e8d80..7a4a85bcf7f4 100644 > > > > > --- a/drivers/nvdimm/pfn_devs.c > > > > > +++ b/drivers/nvdimm/pfn_devs.c > > > > > @@ -640,6 +640,8 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap) > > > > > } else > > > > > return -ENXIO; > > > > > > > > > > + pgmap->support_refcount_pinning = true; > > > > > + > > > > > > > > There should be no dev_pagemap instance instance where this isn't > > > > true, so I'm missing why this is needed? > > > > > > I thought in the case of HMM there were instances where you couldn't > > > pin the page, isn't there? Specifically I am thinking of the definition > > > of MEMORY_DEVICE_PUBLIC: > > > Device memory that is cache coherent from device and CPU point of > > > view. This is use on platform that have an advance system bus (like > > > CAPI or CCIX). A driver can hotplug the device memory using > > > ZONE_DEVICE and with that memory type. Any page of a process can be > > > migrated to such memory. However no one should be allow to pin such > > > memory so that it can always be evicted. > > > > > > It sounds like MEMORY_DEVICE_PUBLIC and MMIO would want to fall into > > > the same category here in order to allow a hot-plug event to remove the > > > device and take the memory with it, or is my understanding on this not > > > correct? > > > > I don't understand how HMM expects to enforce no pinning, but in any > > event it should always be the expectation an elevated reference count > > on a page prevents that page from disappearing. Anything else is > > broken. > > I don't think that is true for device MMIO though. > > In the case of MMIO you have the memory region backed by a device, if > that device is hot-plugged or fails in some way then that backing would > go away and the reads would return and all 1's response. Until p2pdma there are no struct pages for device memory, is that what you're referring? Otherwise any device driver that leaks "struct pages" into random code paths in the kernel had better not expect to be able to surprise-remove those pages from the system. Any dev_pagemap user should expect to do a coordinated removal with the driver that waits for page references to drop before the device can be physically removed. > Holding a reference to the page doesn't guarantee that the backing > device cannot go away. Correct there is no physical guarantee, but that's not the point. It needs to be coordinated, otherwise all bets are off with respect to system stability. > I believe that is the origin of the original use > of the PageReserved check in KVM in terms of if it will try to use the > get_page/put_page functions. Is it? MMIO does not typically have a corresponding 'struct page'. > I believe this is also why > MEMORY_DEVICE_PUBLIC specifically calls out that you should not allow > pinning such memory. I don't think that call out was referencing device hotplug, I believe it was the HMM expectation that it should be able to move an HMM page from device to System-RAM at will.