Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp3261046yba; Sun, 28 Apr 2019 21:55:27 -0700 (PDT) X-Google-Smtp-Source: APXvYqywPEiQticH1YzqV9Ki1ZSDQwEOFy6W/e0m+5a/BDuaWubzM52lyaonKd4l0rQxlFn5egqL X-Received: by 2002:a63:2118:: with SMTP id h24mr9057481pgh.320.1556513727382; Sun, 28 Apr 2019 21:55:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1556513727; cv=none; d=google.com; s=arc-20160816; b=wF0fwcqz2QzQIK8vc4mSv82CitB2Q5neiMbcY+7yUt7vri27eEzN3OhuBdZcSPb4qg nW7TQwjwy86V3Q6q80O4fAGKPlW8SD0dWPYyAY5m0j+chZ5DbT6itqDjuDv8zMCuh5LV Gls4X0L40u0L7+SZs01EZOizH86oVRTgApL6c/oIW1MeXymnd2mWyoW1nbWYcmw1CWMd flQvaZhiglphhIS+aBgIkrVQt2NOA8kobRJU8OjdvUnz6LnKe85oYCHxwkcS+ZzHt0q+ Pb61VlF31UxuM7PBtElnwzZz+S9Gq9C59ytsYEC/K72tfVML6YKURDTgidTyQ1la47G9 sI4w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=DWrhJvzI4B9THuKSC7ont7yGgnZboxq2WA/aGo3q4Do=; b=BMO6BYRuJThfOsHjyqwKntK7j5JFlCZ8DQdvdVMIFp00GuZO8/GCg97eAp7CIv2/fg 8OPvJD3hUwIvkZPsHwPuIjWmnBE8g6Wo9sB2SYzRSjD45Ql6WpU4tHFrQWbmBfAt9YxF VdZyBpd+OV3ITP+oP40eBUmC9z/USz75Sn2JlVI6FS5lm+t8LwVXBCGlYr940gGj/Qwr QIVoAdG3pg/wzGhWNeq7Y+0PAG46xpV0/IPBncdnfSZbauFgG0+mm0eNXgOyIg1CsLsQ P3zvJdKrcNBJTla5dWlGRyttw0LBgSngGNIQ/dMVDmOvCLRO0DAq2tVazbZO6aln9db3 36CQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 38si31414982pln.128.2019.04.28.21.55.12; Sun, 28 Apr 2019 21:55:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727330AbfD2EyQ (ORCPT + 99 others); Mon, 29 Apr 2019 00:54:16 -0400 Received: from mga03.intel.com ([134.134.136.65]:28441 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727083AbfD2EyI (ORCPT ); Mon, 29 Apr 2019 00:54:08 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Apr 2019 21:54:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,408,1549958400"; d="scan'208";a="146566289" Received: from iweiny-desk2.sc.intel.com ([10.3.52.157]) by orsmga003.jf.intel.com with ESMTP; 28 Apr 2019 21:54:07 -0700 From: ira.weiny@intel.com To: lsf-pc@lists.linux-foundation.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Dan Williams , Jan Kara , =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= , John Hubbard , Michal Hocko , Ira Weiny Subject: [RFC PATCH 04/10] WIP: mm/gup: Ensure F_LONGTERM lease is held on GUP pages Date: Sun, 28 Apr 2019 21:53:53 -0700 Message-Id: <20190429045359.8923-5-ira.weiny@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190429045359.8923-1-ira.weiny@intel.com> References: <20190429045359.8923-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Ira Weiny Honestly I think I should remove this patch. It is removed later in the series and ensuring the lease is there at GUP time does not guarantee the lease is held. The user could remove the lease??? Regardless the code in GUP to take the lease holds it even if the user does try to remove it and will take the lease back if they race and the lease is remove prior to the GUP getting a reference to it... So pretty much anyway you slice it this patch is not needed... FOLL_LONGTERM pins are currently disabled for GUP calls which map to FS DAX files. As an alternative allow these files to be mapped if the user has taken a F_LONGTERM lease on the file. The intention is that the user is aware of the dangers of file truncated/hole punch and accepts file which has been mapped this way (such as is done with RDMA) and they have taken this lease to indicate they will accept the behavior if the filesystem needs to take action. Example user space pseudocode for a user using RDMA and reacting to a lease break of this type would look like this: lease_break() { ... if (sigio.fd == rdma_fd) { ibv_dereg_mr(mr); close(rdma_fd); } } foo() { rdma_fd = open() fcntl(rdma_fd, F_SETLEASE, F_LONGTERM); sigaction(SIGIO, ... lease_break ...); ptr = mmap(rdma_fd, ...); mr = ibv_reg_mr(ptr, ...); } Failure to process the SIGIO as above will result in a SIGBUS being given to the process. SIGBUS is implemented in later patches. This patch X of Y fails the FOLL_LONGTERM pin if the FL_LONGTERM lease is not held. --- fs/locks.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++ include/linux/mm.h | 2 ++ mm/gup.c | 13 +++++++++++++ mm/huge_memory.c | 20 ++++++++++++++++++++ 4 files changed, 82 insertions(+) diff --git a/fs/locks.c b/fs/locks.c index 8ea1c5713e6a..31c8b761a578 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -2939,3 +2939,50 @@ static int __init filelock_init(void) return 0; } core_initcall(filelock_init); + +// FIXME what about GUP calls to Device DAX??? +// I believe they will still return true for *_devmap +// +// return true if the page has a LONGTERM lease associated with it's file. +bool mapping_inode_has_longterm(struct page *page) +{ + bool ret; + struct inode *inode; + struct file_lock *fl; + struct file_lock_context *ctx; + + /* + * should never be here unless we are a "page cache" page without a + * page cache. + */ + if (WARN_ON(PageAnon(page))) + return false; + if (WARN_ON(!page)) + return false; + if (WARN_ON(!page->mapping)) + return false; + if (WARN_ON(!page->mapping->host)) + return false; + + /* Ensure page->mapping isn't freed while we look at it */ + /* FIXME mm lock is held here I think? so is this really needed? */ + rcu_read_lock(); + inode = page->mapping->host; + + ctx = locks_get_lock_context(inode, F_RDLCK); + + ret = false; + spin_lock(&ctx->flc_lock); + list_for_each_entry(fl, &ctx->flc_lease, fl_list) { + if (fl->fl_flags & FL_LONGTERM) { + ret = true; + break; + } + } + spin_unlock(&ctx->flc_lock); + rcu_read_unlock(); + + return ret; +} +EXPORT_SYMBOL_GPL(mapping_inode_has_longterm); + diff --git a/include/linux/mm.h b/include/linux/mm.h index 77e34ec5dfbe..cde359e71b7b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1572,6 +1572,8 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, int get_user_pages_fast(unsigned long start, int nr_pages, unsigned int gup_flags, struct page **pages); +bool mapping_inode_has_longterm(struct page *page); + /* Container for pinned pfns / pages */ struct frame_vector { unsigned int nr_allocated; /* Number of frames we have space for */ diff --git a/mm/gup.c b/mm/gup.c index a8ac75bc1452..5ae1dd31a58d 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -292,6 +292,12 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, page = pte_page(pte); else goto no_page; + + if (unlikely(flags & FOLL_LONGTERM) && + !mapping_inode_has_longterm(page)) { + page = ERR_PTR(-EINVAL); + goto out; + } } else if (unlikely(!page)) { if (flags & FOLL_DUMP) { /* Avoid special (like zero) pages in core dumps */ @@ -1869,6 +1875,13 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr, } SetPageReferenced(page); pages[*nr] = page; + + if (unlikely(flags & FOLL_LONGTERM) && + !mapping_inode_has_longterm(page)) { + undo_dev_pagemap(nr, nr_start, pages); + return 0; + } + if (get_gup_pin_page(page)) { undo_dev_pagemap(nr, nr_start, pages); return 0; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 404acdcd0455..8819624c740f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -910,6 +910,16 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, if (!*pgmap) return ERR_PTR(-EFAULT); page = pfn_to_page(pfn); + + // Check for Layout lease. + // FIXME combine logic + if (unlikely(flags & FOLL_LONGTERM)) { + WARN_ON_ONCE(PageAnon(page)); + if (!mapping_inode_has_longterm(page)) { + return NULL; + } + } + get_page(page); return page; @@ -1050,6 +1060,16 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, if (!*pgmap) return ERR_PTR(-EFAULT); page = pfn_to_page(pfn); + + // Check for LONGTERM lease. + // FIXME combine logic remove Warn + if (unlikely(flags & FOLL_LONGTERM)) { + WARN_ON_ONCE(PageAnon(page)); + if (!mapping_inode_has_longterm(page)) { + return NULL; + } + } + get_page(page); return page; -- 2.20.1