2019-04-29 04:56:53

by Ira Weiny

[permalink] [raw]
Subject: [RFC PATCH 00/10] RDMA/FS DAX "LONGTERM" lease proposal

From: Ira Weiny <[email protected]>

In order to support RDMA to File system pages[*] without On Demand Paging a
number of things need to be done.

1) GUP "longterm"[1] users need to inform the other subsystems that they have
taken a pin on a page which may remain pinned for a very "long time".[1]

2) Any page which is "controlled" by a file system such needs to have special
handling. The details of the handling depends on if the page is page cache
backed or not.

2a) A page cache backed page which has been pinned by GUP Longterm can use a
bounce buffer to allow the file system to write back snap shots of the page.
This is handled by the FS recognizing the GUP longterm pin and making a copy
of the page to be written back.
NOTE: this patch set does not address this path.

2b) A FS "controlled" page which is not page cache backed is either easier
to deal with or harder depending on the operation the filesystem is trying
to do.

2ba) [Hard case] If the FS operation _is_ a truncate or hole punch the
FS can no longer use the pages in question until the pin has been
removed. This patch set presents a solution to this by introducing
some reasonable restrictions on user space applications.

2bb) [Easy case] If the FS operation is _not_ a truncate or hole punch
then there is nothing which need be done. Data is Read or Written
directly to the page. This is an easy case which would currently work
if not for GUP longterm pins being disabled. Therefore this patch set
need not change access to the file data but does allow for GUP pins
after 2ba above is dealt with.


The architecture of this series is to introduce a F_LONGTERM file lease
mechanism which applications use in one of 2 ways.

1) Applications which may require hole punch or truncation operations on files
they intend to mmmapping and pinning for long periods. Can take a
F_LONGTERM lease on the file. When a file system operation needs truncate
access to this file the lease is broken and the application gets a SIGIO.
Upon catching SIGIO the application can un-pin (note munmap is not required)
the memory associated with that file. At that point the truncating user can
proceed. Re-pinning the memory is entirely left up to the application. In
some cases a new mmap will be required (as with a truncation) or a SIGBUS
would be experienced anyway.

Failure to respond to a SIGIO lease break within the system configured
lease-break-time will result in a SIGBUS.

WIP: SIGBUS could be caught and ignored... what danger does this present...
should this be SIGKILL or should we wait another lease-break-time and then
send SIGKILL?

2) Applications which don't require hold punch or truncate operations can use
pinning without taking a F_LONGTERM lease. However, applications such as
this are expected to have considered the access to the files they are
mmaping and are expected to be controlling them in a way that other users on
a system can't truncate a file and cause a DOS on the application. These
applications will be sent a SIGBUS if someone attempts to truncate or hole
punch a file.

ALTERNATIVE WIP patch in series: If the F_LONGTERM lease is not taken
fail the GUP.

The patches compile and have been tested to a first degree.

NOTES:
Can we deal with the lease/pin at the VFS layer? or for all FSs?
LONGTERM seems like a bad name. Suggestions?

[1] The definition of long time is debatable but it has been established
that RDMAs use of pages, minutes or hours after the pin is the extreme case
which makes this problem most severe.

[*] Not all file system pages are Page Cache pages. FS DAX bypasses the page
cache.


Ira Weiny (10):
fs/locks: Add trace_leases_conflict
fs/locks: Introduce FL_LONGTERM file lease
mm/gup: Pass flags down to __gup_device_huge* calls
WIP: mm/gup: Ensure F_LONGTERM lease is held on GUP pages
mm/gup: Take FL_LONGTERM lease if not set by user
fs/locks: Add longterm lease traces
fs/dax: Create function dax_mapping_is_dax()
mm/gup: fs: Send SIGBUS on truncate of active file
fs/locks: Add tracepoint for SIGBUS on LONGTERM expiration
mm/gup: Remove FOLL_LONGTERM DAX exclusion

fs/dax.c | 23 ++-
fs/ext4/inode.c | 4 +
fs/locks.c | 301 +++++++++++++++++++++++++++++--
fs/xfs/xfs_file.c | 4 +
include/linux/dax.h | 6 +
include/linux/fs.h | 18 ++
include/linux/mm.h | 2 +
include/trace/events/filelock.h | 74 +++++++-
include/uapi/asm-generic/fcntl.h | 2 +
mm/gup.c | 107 ++++-------
mm/huge_memory.c | 18 ++
11 files changed, 468 insertions(+), 91 deletions(-)

--
2.20.1


2019-04-29 04:57:05

by Ira Weiny

[permalink] [raw]
Subject: [RFC PATCH 10/10] mm/gup: Remove FOLL_LONGTERM DAX exclusion

From: Ira Weiny <[email protected]>

Now that there is a mechanism for users to safely take LONGTERM pins on
FS DAX pages. Remove the FS DAX exclusion from GUP with FOLL_LONGTERM.

Special processing remains in effect for CONFIG_CMA
---
mm/gup.c | 65 ++++++--------------------------------------------------
1 file changed, 6 insertions(+), 59 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 1ee17f2339f7..cf6863422cb9 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1324,26 +1324,6 @@ long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
}
EXPORT_SYMBOL(get_user_pages_remote);

-#if defined(CONFIG_FS_DAX) || defined (CONFIG_CMA)
-static bool check_dax_vmas(struct vm_area_struct **vmas, long nr_pages)
-{
- long i;
- struct vm_area_struct *vma_prev = NULL;
-
- for (i = 0; i < nr_pages; i++) {
- struct vm_area_struct *vma = vmas[i];
-
- if (vma == vma_prev)
- continue;
-
- vma_prev = vma;
-
- if (vma_is_fsdax(vma))
- return true;
- }
- return false;
-}
-
#ifdef CONFIG_CMA
static struct page *new_non_cma_page(struct page *page, unsigned long private)
{
@@ -1474,18 +1454,6 @@ static long check_and_migrate_cma_pages(struct task_struct *tsk,

return nr_pages;
}
-#else
-static long check_and_migrate_cma_pages(struct task_struct *tsk,
- struct mm_struct *mm,
- unsigned long start,
- unsigned long nr_pages,
- struct page **pages,
- struct vm_area_struct **vmas,
- unsigned int gup_flags)
-{
- return nr_pages;
-}
-#endif

/*
* __gup_longterm_locked() is a wrapper for __get_user_pages_locked which
@@ -1499,49 +1467,28 @@ static long __gup_longterm_locked(struct task_struct *tsk,
struct vm_area_struct **vmas,
unsigned int gup_flags)
{
- struct vm_area_struct **vmas_tmp = vmas;
unsigned long flags = 0;
- long rc, i;
+ long rc;

- if (gup_flags & FOLL_LONGTERM) {
- if (!pages)
- return -EINVAL;
-
- if (!vmas_tmp) {
- vmas_tmp = kcalloc(nr_pages,
- sizeof(struct vm_area_struct *),
- GFP_KERNEL);
- if (!vmas_tmp)
- return -ENOMEM;
- }
+ if (flags & FOLL_LONGTERM)
flags = memalloc_nocma_save();
- }

rc = __get_user_pages_locked(tsk, mm, start, nr_pages, pages,
- vmas_tmp, NULL, gup_flags);
+ vmas, NULL, gup_flags);

if (gup_flags & FOLL_LONGTERM) {
memalloc_nocma_restore(flags);
if (rc < 0)
goto out;

- if (check_dax_vmas(vmas_tmp, rc)) {
- for (i = 0; i < rc; i++)
- put_page(pages[i]);
- rc = -EOPNOTSUPP;
- goto out;
- }
-
rc = check_and_migrate_cma_pages(tsk, mm, start, rc, pages,
- vmas_tmp, gup_flags);
+ vmas, gup_flags);
}

out:
- if (vmas_tmp != vmas)
- kfree(vmas_tmp);
return rc;
}
-#else /* !CONFIG_FS_DAX && !CONFIG_CMA */
+#else /* !CONFIG_CMA */
static __always_inline long __gup_longterm_locked(struct task_struct *tsk,
struct mm_struct *mm,
unsigned long start,
@@ -1553,7 +1500,7 @@ static __always_inline long __gup_longterm_locked(struct task_struct *tsk,
return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas,
NULL, flags);
}
-#endif /* CONFIG_FS_DAX || CONFIG_CMA */
+#endif /* CONFIG_CMA */

/*
* This is the same as get_user_pages_remote(), just with a
--
2.20.1

2019-04-29 04:57:47

by Ira Weiny

[permalink] [raw]
Subject: [RFC PATCH 06/10] fs/locks: Add longterm lease traces

From: Ira Weiny <[email protected]>

---
fs/locks.c | 5 +++++
include/trace/events/filelock.h | 37 ++++++++++++++++++++++++++++++++-
2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/fs/locks.c b/fs/locks.c
index ae508d192223..58c6d7a411b6 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2136,6 +2136,8 @@ static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg)
}
new->fa_fd = fd;

+ trace_take_longterm_lease(fl);
+
error = vfs_setlease(filp, arg, &fl, (void **)&new);
if (fl)
locks_free_lock(fl);
@@ -3118,6 +3120,8 @@ bool page_set_longterm_lease(struct page *page)
kref_get(&existing_fl->gup_ref);
}

+ trace_take_longterm_lease(existing_fl);
+
spin_unlock(&ctx->flc_lock);
percpu_up_read(&file_rwsem);

@@ -3153,6 +3157,7 @@ void page_remove_longterm_lease(struct page *page)
percpu_down_read(&file_rwsem);
spin_lock(&ctx->flc_lock);
found = find_longterm_lease(inode);
+ trace_release_longterm_lease(found);
if (found)
kref_put(&found->gup_ref, release_longterm_lease);
spin_unlock(&ctx->flc_lock);
diff --git a/include/trace/events/filelock.h b/include/trace/events/filelock.h
index 4b735923f2ff..c6f39f03cb8b 100644
--- a/include/trace/events/filelock.h
+++ b/include/trace/events/filelock.h
@@ -27,7 +27,8 @@
{ FL_SLEEP, "FL_SLEEP" }, \
{ FL_DOWNGRADE_PENDING, "FL_DOWNGRADE_PENDING" }, \
{ FL_UNLOCK_PENDING, "FL_UNLOCK_PENDING" }, \
- { FL_OFDLCK, "FL_OFDLCK" })
+ { FL_OFDLCK, "FL_OFDLCK" }, \
+ { FL_LONGTERM, "FL_LONGTERM" })

#define show_fl_type(val) \
__print_symbolic(val, \
@@ -238,6 +239,40 @@ TRACE_EVENT(leases_conflict,
show_fl_type(__entry->b_fl_type))
);

+DECLARE_EVENT_CLASS(longterm_lease,
+ TP_PROTO(struct file_lock *fl),
+
+ TP_ARGS(fl),
+
+ TP_STRUCT__entry(
+ __field(void *, fl)
+ __field(void *, owner)
+ __field(unsigned int, fl_flags)
+ __field(unsigned int, cnt)
+ __field(unsigned char, fl_type)
+ ),
+
+ TP_fast_assign(
+ __entry->fl = fl;
+ __entry->owner = fl ? fl->fl_owner : NULL;
+ __entry->fl_flags = fl ? fl->fl_flags : 0;
+ __entry->cnt = fl ? kref_read(&fl->gup_ref) : 0;
+ __entry->fl_type = fl ? fl->fl_type : 0;
+ ),
+
+ TP_printk("owner=0x%p fl=%p(%d) fl_flags=%s fl_type=%s",
+ __entry->owner, __entry->fl, __entry->cnt,
+ show_fl_flags(__entry->fl_flags),
+ show_fl_type(__entry->fl_type))
+);
+DEFINE_EVENT(longterm_lease, take_longterm_lease,
+ TP_PROTO(struct file_lock *fl),
+ TP_ARGS(fl));
+DEFINE_EVENT(longterm_lease, release_longterm_lease,
+ TP_PROTO(struct file_lock *fl),
+ TP_ARGS(fl));
+
+
#endif /* _TRACE_FILELOCK_H */

/* This part must be outside protection */
--
2.20.1