LinuxLists.cc - [PATCH v8 00/16] re-enable DAX PMD support

2016-10-19 19:34:19

Subject: [PATCH v8 00/16] re-enable DAX PMD support

DAX PMDs have been disabled since Jan Kara introduced DAX radix tree based
locking. This series allows DAX PMDs to participate in the DAX radix tree
based locking scheme so that they can be re-enabled.

Changes since v7:
- Rebased on v4.9-rc1, dropping one ext4 patch that had already been merged.
- Added Reviewed-by tags from Jan Kara.

Here is a tree containing my changes:
https://git.kernel.org/cgit/linux/kernel/git/zwisler/linux.git/log/?h=dax_pmd_v8

Ross Zwisler (16):
ext4: tell DAX the size of allocation holes
dax: remove buffer_size_valid()
ext2: remove support for DAX PMD faults
dax: make 'wait_table' global variable static
dax: remove the last BUG_ON() from fs/dax.c
dax: consistent variable naming for DAX entries
dax: coordinate locking for offsets in PMD range
dax: remove dax_pmd_fault()
dax: correct dax iomap code namespace
dax: add dax_iomap_sector() helper function
dax: dax_iomap_fault() needs to call iomap_end()
dax: move RADIX_DAX_* defines to dax.h
dax: move put_(un)locked_mapping_entry() in dax.c
dax: add struct iomap based DAX PMD support
xfs: use struct iomap based DAX PMD fault path
dax: remove "depends on BROKEN" from FS_DAX_PMD

fs/Kconfig | 1 -
fs/dax.c | 826 +++++++++++++++++++++++++++++-----------------------
fs/ext2/file.c | 35 +--
fs/ext4/inode.c | 3 +
fs/xfs/xfs_aops.c | 26 +-
fs/xfs/xfs_aops.h | 3 -
fs/xfs/xfs_file.c | 10 +-
include/linux/dax.h | 58 +++-
mm/filemap.c | 5 +-
9 files changed, 537 insertions(+), 430 deletions(-)

--
2.7.4

2016-10-19 19:34:20

by Ross Zwisler

[permalink] [raw]

Subject: [PATCH v8 01/16] ext4: tell DAX the size of allocation holes

When DAX calls _ext4_get_block() and the file offset points to a hole we
currently don't set bh->b_size. This is current worked around via
buffer_size_valid() in fs/dax.c.

_ext4_get_block() has the hole size information from ext4_map_blocks(), so
populate bh->b_size so we can remove buffer_size_valid() in a later patch.

Signed-off-by: Ross Zwisler <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/ext4/inode.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 9c06472..3d58b2b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -767,6 +767,9 @@ static int _ext4_get_block(struct inode *inode, sector_t iblock,
ext4_update_bh_state(bh, map.m_flags);
bh->b_size = inode->i_sb->s_blocksize * map.m_len;
ret = 0;
+ } else if (ret == 0) {
+ /* hole case, need to fill in bh->b_size */
+ bh->b_size = inode->i_sb->s_blocksize * map.m_len;
}
return ret;
}
--
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2016-10-19 19:34:24

by Ross Zwisler

[permalink] [raw]

Subject: [PATCH v8 05/16] dax: remove the last BUG_ON() from fs/dax.c

Don't take down the kernel if we get an invalid 'from' and 'length'
argument pair. Just warn once and return an error.

Signed-off-by: Ross Zwisler <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/dax.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index e52e754..219fa2b 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1194,7 +1194,8 @@ int dax_zero_page_range(struct inode *inode, loff_t from, unsigned length,
/* Block boundary? Nothing to do */
if (!length)
return 0;
- BUG_ON((offset + length) > PAGE_SIZE);
+ if (WARN_ON_ONCE((offset + length) > PAGE_SIZE))
+ return -EINVAL;

memset(&bh, 0, sizeof(bh));
bh.b_bdev = inode->i_sb->s_bdev;
--
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2016-10-19 19:34:25

by Ross Zwisler

[permalink] [raw]

Subject: [PATCH v8 06/16] dax: consistent variable naming for DAX entries

No functional change.

Consistently use the variable name 'entry' instead of 'ret' for DAX radix
tree entries. This was already happening in most of the code, so update
get_unlocked_mapping_entry(), grab_mapping_entry() and
dax_unlock_mapping_entry().

Signed-off-by: Ross Zwisler <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/dax.c | 34 +++++++++++++++++-----------------
1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 219fa2b..835e7f0 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -357,7 +357,7 @@ static inline void *unlock_slot(struct address_space *mapping, void **slot)
static void *get_unlocked_mapping_entry(struct address_space *mapping,
pgoff_t index, void ***slotp)
{
- void *ret, **slot;
+ void *entry, **slot;
struct wait_exceptional_entry_queue ewait;
wait_queue_head_t *wq = dax_entry_waitqueue(mapping, index);

@@ -367,13 +367,13 @@ static void *get_unlocked_mapping_entry(struct address_space *mapping,
ewait.key.index = index;

for (;;) {
- ret = __radix_tree_lookup(&mapping->page_tree, index, NULL,
+ entry = __radix_tree_lookup(&mapping->page_tree, index, NULL,
&slot);
- if (!ret || !radix_tree_exceptional_entry(ret) ||
+ if (!entry || !radix_tree_exceptional_entry(entry) ||
!slot_locked(mapping, slot)) {
if (slotp)
*slotp = slot;
- return ret;
+ return entry;
}
prepare_to_wait_exclusive(wq, &ewait.wait,
TASK_UNINTERRUPTIBLE);
@@ -396,13 +396,13 @@ static void *get_unlocked_mapping_entry(struct address_space *mapping,
*/
static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index)
{
- void *ret, **slot;
+ void *entry, **slot;

restart:
spin_lock_irq(&mapping->tree_lock);
- ret = get_unlocked_mapping_entry(mapping, index, &slot);
+ entry = get_unlocked_mapping_entry(mapping, index, &slot);
/* No entry for given index? Make sure radix tree is big enough. */
- if (!ret) {
+ if (!entry) {
int err;

spin_unlock_irq(&mapping->tree_lock);
@@ -410,10 +410,10 @@ static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index)
mapping_gfp_mask(mapping) & ~__GFP_HIGHMEM);
if (err)
return ERR_PTR(err);
- ret = (void *)(RADIX_TREE_EXCEPTIONAL_ENTRY |
+ entry = (void *)(RADIX_TREE_EXCEPTIONAL_ENTRY |
RADIX_DAX_ENTRY_LOCK);
spin_lock_irq(&mapping->tree_lock);
- err = radix_tree_insert(&mapping->page_tree, index, ret);
+ err = radix_tree_insert(&mapping->page_tree, index, entry);
radix_tree_preload_end();
if (err) {
spin_unlock_irq(&mapping->tree_lock);
@@ -425,11 +425,11 @@ static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index)
/* Good, we have inserted empty locked entry into the tree. */
mapping->nrexceptional++;
spin_unlock_irq(&mapping->tree_lock);
- return ret;
+ return entry;
}
/* Normal page in radix tree? */
- if (!radix_tree_exceptional_entry(ret)) {
- struct page *page = ret;
+ if (!radix_tree_exceptional_entry(entry)) {
+ struct page *page = entry;

get_page(page);
spin_unlock_irq(&mapping->tree_lock);
@@ -442,9 +442,9 @@ static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index)
}
return page;
}
- ret = lock_slot(mapping, slot);
+ entry = lock_slot(mapping, slot);
spin_unlock_irq(&mapping->tree_lock);
- return ret;
+ return entry;
}

void dax_wake_mapping_entry_waiter(struct address_space *mapping,
@@ -469,11 +469,11 @@ void dax_wake_mapping_entry_waiter(struct address_space *mapping,

void dax_unlock_mapping_entry(struct address_space *mapping, pgoff_t index)
{
- void *ret, **slot;
+ void *entry, **slot;

spin_lock_irq(&mapping->tree_lock);
- ret = __radix_tree_lookup(&mapping->page_tree, index, NULL, &slot);
- if (WARN_ON_ONCE(!ret || !radix_tree_exceptional_entry(ret) ||
+ entry = __radix_tree_lookup(&mapping->page_tree, index, NULL, &slot);
+ if (WARN_ON_ONCE(!entry || !radix_tree_exceptional_entry(entry) ||
!slot_locked(mapping, slot))) {
spin_unlock_irq(&mapping->tree_lock);
return;
--
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2016-10-19 19:34:26

by Ross Zwisler

[permalink] [raw]

Subject: [PATCH v8 07/16] dax: coordinate locking for offsets in PMD range

DAX radix tree locking currently locks entries based on the unique
combination of the 'mapping' pointer and the pgoff_t 'index' for the entry.
This works for PTEs, but as we move to PMDs we will need to have all the
offsets within the range covered by the PMD to map to the same bit lock.
To accomplish this, for ranges covered by a PMD entry we will instead lock
based on the page offset of the beginning of the PMD entry. The 'mapping'
pointer is still used in the same way.

Signed-off-by: Ross Zwisler <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/dax.c | 65 +++++++++++++++++++++++++++++++++--------------------
include/linux/dax.h | 2 +-
mm/filemap.c | 2 +-
3 files changed, 43 insertions(+), 26 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 835e7f0..7238702 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -64,14 +64,6 @@ static int __init init_dax_wait_table(void)
}
fs_initcall(init_dax_wait_table);

-static wait_queue_head_t *dax_entry_waitqueue(struct address_space *mapping,
- pgoff_t index)
-{
- unsigned long hash = hash_long((unsigned long)mapping ^ index,
- DAX_WAIT_TABLE_BITS);
- return wait_table + hash;
-}
-
static long dax_map_atomic(struct block_device *bdev, struct blk_dax_ctl *dax)
{
struct request_queue *q = bdev->bd_queue;
@@ -285,7 +277,7 @@ EXPORT_SYMBOL_GPL(dax_do_io);
*/
struct exceptional_entry_key {
struct address_space *mapping;
- unsigned long index;
+ pgoff_t entry_start;
};

struct wait_exceptional_entry_queue {
@@ -293,6 +285,26 @@ struct wait_exceptional_entry_queue {
struct exceptional_entry_key key;
};

+static wait_queue_head_t *dax_entry_waitqueue(struct address_space *mapping,
+ pgoff_t index, void *entry, struct exceptional_entry_key *key)
+{
+ unsigned long hash;
+
+ /*
+ * If 'entry' is a PMD, align the 'index' that we use for the wait
+ * queue to the start of that PMD. This ensures that all offsets in
+ * the range covered by the PMD map to the same bit lock.
+ */
+ if (RADIX_DAX_TYPE(entry) == RADIX_DAX_PMD)
+ index &= ~((1UL << (PMD_SHIFT - PAGE_SHIFT)) - 1);
+
+ key->mapping = mapping;
+ key->entry_start = index;
+
+ hash = hash_long((unsigned long)mapping ^ index, DAX_WAIT_TABLE_BITS);
+ return wait_table + hash;
+}
+
static int wake_exceptional_entry_func(wait_queue_t *wait, unsigned int mode,
int sync, void *keyp)
{
@@ -301,7 +313,7 @@ static int wake_exceptional_entry_func(wait_queue_t *wait, unsigned int mode,
container_of(wait, struct wait_exceptional_entry_queue, wait);

if (key->mapping != ewait->key.mapping ||
- key->index != ewait->key.index)
+ key->entry_start != ewait->key.entry_start)
return 0;
return autoremove_wake_function(wait, mode, sync, NULL);
}
@@ -359,12 +371,10 @@ static void *get_unlocked_mapping_entry(struct address_space *mapping,
{
void *entry, **slot;
struct wait_exceptional_entry_queue ewait;
- wait_queue_head_t *wq = dax_entry_waitqueue(mapping, index);
+ wait_queue_head_t *wq;

init_wait(&ewait.wait);
ewait.wait.func = wake_exceptional_entry_func;
- ewait.key.mapping = mapping;
- ewait.key.index = index;

for (;;) {
entry = __radix_tree_lookup(&mapping->page_tree, index, NULL,
@@ -375,6 +385,8 @@ static void *get_unlocked_mapping_entry(struct address_space *mapping,
*slotp = slot;
return entry;
}
+
+ wq = dax_entry_waitqueue(mapping, index, entry, &ewait.key);
prepare_to_wait_exclusive(wq, &ewait.wait,
TASK_UNINTERRUPTIBLE);
spin_unlock_irq(&mapping->tree_lock);
@@ -447,10 +459,20 @@ static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index)
return entry;
}

+/*
+ * We do not necessarily hold the mapping->tree_lock when we call this
+ * function so it is possible that 'entry' is no longer a valid item in the
+ * radix tree. This is okay, though, because all we really need to do is to
+ * find the correct waitqueue where tasks might be sleeping waiting for that
+ * old 'entry' and wake them.
+ */
void dax_wake_mapping_entry_waiter(struct address_space *mapping,
- pgoff_t index, bool wake_all)
+ pgoff_t index, void *entry, bool wake_all)
{
- wait_queue_head_t *wq = dax_entry_waitqueue(mapping, index);
+ struct exceptional_entry_key key;
+ wait_queue_head_t *wq;
+
+ wq = dax_entry_waitqueue(mapping, index, entry, &key);

/*
* Checking for locked entry and prepare_to_wait_exclusive() happens
@@ -458,13 +480,8 @@ void dax_wake_mapping_entry_waiter(struct address_space *mapping,
* So at this point all tasks that could have seen our entry locked
* must be in the waitqueue and the following check will see them.
*/
- if (waitqueue_active(wq)) {
- struct exceptional_entry_key key;
-
- key.mapping = mapping;
- key.index = index;
+ if (waitqueue_active(wq))
__wake_up(wq, TASK_NORMAL, wake_all ? 0 : 1, &key);
- }
}

void dax_unlock_mapping_entry(struct address_space *mapping, pgoff_t index)
@@ -480,7 +497,7 @@ void dax_unlock_mapping_entry(struct address_space *mapping, pgoff_t index)
}
unlock_slot(mapping, slot);
spin_unlock_irq(&mapping->tree_lock);
- dax_wake_mapping_entry_waiter(mapping, index, false);
+ dax_wake_mapping_entry_waiter(mapping, index, entry, false);
}

static void put_locked_mapping_entry(struct address_space *mapping,
@@ -505,7 +522,7 @@ static void put_unlocked_mapping_entry(struct address_space *mapping,
return;

/* We have to wake up next waiter for the radix tree entry lock */
- dax_wake_mapping_entry_waiter(mapping, index, false);
+ dax_wake_mapping_entry_waiter(mapping, index, entry, false);
}

/*
@@ -532,7 +549,7 @@ int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index)
radix_tree_delete(&mapping->page_tree, index);
mapping->nrexceptional--;
spin_unlock_irq(&mapping->tree_lock);
- dax_wake_mapping_entry_waiter(mapping, index, true);
+ dax_wake_mapping_entry_waiter(mapping, index, entry, true);

return 1;
}
diff --git a/include/linux/dax.h b/include/linux/dax.h
index add6c4b..a41a747 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -22,7 +22,7 @@ int iomap_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
int dax_fault(struct vm_area_struct *, struct vm_fault *, get_block_t);
int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
void dax_wake_mapping_entry_waiter(struct address_space *mapping,
- pgoff_t index, bool wake_all);
+ pgoff_t index, void *entry, bool wake_all);

#ifdef CONFIG_FS_DAX
struct page *read_dax_sector(struct block_device *bdev, sector_t n);
diff --git a/mm/filemap.c b/mm/filemap.c
index 849f459..1ffb7dc 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -143,7 +143,7 @@ static int page_cache_tree_insert(struct address_space *mapping,
if (node)
workingset_node_pages_dec(node);
/* Wakeup waiters for exceptional entry lock */
- dax_wake_mapping_entry_waiter(mapping, page->index,
+ dax_wake_mapping_entry_waiter(mapping, page->index, p,
false);
}
}
--
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2016-10-19 19:34:34

by Ross Zwisler

[permalink] [raw]

Subject: [PATCH v8 15/16] xfs: use struct iomap based DAX PMD fault path

Switch xfs_filemap_pmd_fault() from using dax_pmd_fault() to the new and
improved dax_iomap_pmd_fault(). Also, now that it has no more users,
remove xfs_get_blocks_dax_fault().

Signed-off-by: Ross Zwisler <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/xfs/xfs_aops.c | 26 +++++---------------------
fs/xfs/xfs_aops.h | 3 ---
fs/xfs/xfs_file.c | 2 +-
3 files changed, 6 insertions(+), 25 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 3e57a56..561cf14 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1298,8 +1298,7 @@ __xfs_get_blocks(
sector_t iblock,
struct buffer_head *bh_result,
int create,
- bool direct,
- bool dax_fault)
+ bool direct)
{
struct xfs_inode *ip = XFS_I(inode);
struct xfs_mount *mp = ip->i_mount;
@@ -1420,13 +1419,8 @@ __xfs_get_blocks(
if (ISUNWRITTEN(&imap))
set_buffer_unwritten(bh_result);
/* direct IO needs special help */
- if (create) {
- if (dax_fault)
- ASSERT(!ISUNWRITTEN(&imap));
- else
- xfs_map_direct(inode, bh_result, &imap, offset,
- is_cow);
- }
+ if (create)
+ xfs_map_direct(inode, bh_result, &imap, offset, is_cow);
}

/*
@@ -1466,7 +1460,7 @@ xfs_get_blocks(
struct buffer_head *bh_result,
int create)
{
- return __xfs_get_blocks(inode, iblock, bh_result, create, false, false);
+ return __xfs_get_blocks(inode, iblock, bh_result, create, false);
}

int
@@ -1476,17 +1470,7 @@ xfs_get_blocks_direct(
struct buffer_head *bh_result,
int create)
{
- return __xfs_get_blocks(inode, iblock, bh_result, create, true, false);
-}
-
-int
-xfs_get_blocks_dax_fault(
- struct inode *inode,
- sector_t iblock,
- struct buffer_head *bh_result,
- int create)
-{
- return __xfs_get_blocks(inode, iblock, bh_result, create, true, true);
+ return __xfs_get_blocks(inode, iblock, bh_result, create, true);
}

/*
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index b3c6634..34dc00d 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -59,9 +59,6 @@ int xfs_get_blocks(struct inode *inode, sector_t offset,
struct buffer_head *map_bh, int create);
int xfs_get_blocks_direct(struct inode *inode, sector_t offset,
struct buffer_head *map_bh, int create);
-int xfs_get_blocks_dax_fault(struct inode *inode, sector_t offset,
- struct buffer_head *map_bh, int create);
-
int xfs_end_io_direct_write(struct kiocb *iocb, loff_t offset,
ssize_t size, void *private);
int xfs_setfilesize(struct xfs_inode *ip, xfs_off_t offset, size_t size);
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index e7f35d54..ca2ab73 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1711,7 +1711,7 @@ xfs_filemap_pmd_fault(
}

xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
- ret = dax_pmd_fault(vma, addr, pmd, flags, xfs_get_blocks_dax_fault);
+ ret = dax_iomap_pmd_fault(vma, addr, pmd, flags, &xfs_iomap_ops);
xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);

if (flags & FAULT_FLAG_WRITE)
--
2.7.4

2016-10-19 19:34:28

by Ross Zwisler

[permalink] [raw]

Subject: [PATCH v8 09/16] dax: correct dax iomap code namespace

The recently added DAX functions that use the new struct iomap data
structure were named iomap_dax_rw(), iomap_dax_fault() and
iomap_dax_actor(). These are actually defined in fs/dax.c, though, so
should be part of the "dax" namespace and not the "iomap" namespace.
Rename them to dax_iomap_rw(), dax_iomap_fault() and dax_iomap_actor()
respectively.

Signed-off-by: Ross Zwisler <[email protected]>
Suggested-by: Dave Chinner <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/dax.c | 16 ++++++++--------
fs/ext2/file.c | 6 +++---
fs/xfs/xfs_file.c | 8 ++++----
include/linux/dax.h | 4 ++--
4 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 3d0b103..fdbd7a1 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1031,7 +1031,7 @@ EXPORT_SYMBOL_GPL(dax_truncate_page);

#ifdef CONFIG_FS_IOMAP
static loff_t
-iomap_dax_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
+dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
struct iomap *iomap)
{
struct iov_iter *iter = data;
@@ -1088,7 +1088,7 @@ iomap_dax_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
}

/**
- * iomap_dax_rw - Perform I/O to a DAX file
+ * dax_iomap_rw - Perform I/O to a DAX file
* @iocb: The control block for this I/O
* @iter: The addresses to do I/O from or to
* @ops: iomap ops passed from the file system
@@ -1098,7 +1098,7 @@ iomap_dax_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
* and evicting any page cache pages in the region under I/O.
*/
ssize_t
-iomap_dax_rw(struct kiocb *iocb, struct iov_iter *iter,
+dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
struct iomap_ops *ops)
{
struct address_space *mapping = iocb->ki_filp->f_mapping;
@@ -1128,7 +1128,7 @@ iomap_dax_rw(struct kiocb *iocb, struct iov_iter *iter,

while (iov_iter_count(iter)) {
ret = iomap_apply(inode, pos, iov_iter_count(iter), flags, ops,
- iter, iomap_dax_actor);
+ iter, dax_iomap_actor);
if (ret <= 0)
break;
pos += ret;
@@ -1138,10 +1138,10 @@ iomap_dax_rw(struct kiocb *iocb, struct iov_iter *iter,
iocb->ki_pos += done;
return done ? done : ret;
}
-EXPORT_SYMBOL_GPL(iomap_dax_rw);
+EXPORT_SYMBOL_GPL(dax_iomap_rw);

/**
- * iomap_dax_fault - handle a page fault on a DAX file
+ * dax_iomap_fault - handle a page fault on a DAX file
* @vma: The virtual memory area where the fault occurred
* @vmf: The description of the fault
* @ops: iomap ops passed from the file system
@@ -1150,7 +1150,7 @@ EXPORT_SYMBOL_GPL(iomap_dax_rw);
* or mkwrite handler for DAX files. Assumes the caller has done all the
* necessary locking for the page fault to proceed successfully.
*/
-int iomap_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
+int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
struct iomap_ops *ops)
{
struct address_space *mapping = vma->vm_file->f_mapping;
@@ -1252,5 +1252,5 @@ int iomap_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
return VM_FAULT_SIGBUS | major;
return VM_FAULT_NOPAGE | major;
}
-EXPORT_SYMBOL_GPL(iomap_dax_fault);
+EXPORT_SYMBOL_GPL(dax_iomap_fault);
#endif /* CONFIG_FS_IOMAP */
diff --git a/fs/ext2/file.c b/fs/ext2/file.c
index fb88b51..b0f2415 100644
--- a/fs/ext2/file.c
+++ b/fs/ext2/file.c
@@ -38,7 +38,7 @@ static ssize_t ext2_dax_read_iter(struct kiocb *iocb, struct iov_iter *to)
return 0; /* skip atime */

inode_lock_shared(inode);
- ret = iomap_dax_rw(iocb, to, &ext2_iomap_ops);
+ ret = dax_iomap_rw(iocb, to, &ext2_iomap_ops);
inode_unlock_shared(inode);

file_accessed(iocb->ki_filp);
@@ -62,7 +62,7 @@ static ssize_t ext2_dax_write_iter(struct kiocb *iocb, struct iov_iter *from)
if (ret)
goto out_unlock;

- ret = iomap_dax_rw(iocb, from, &ext2_iomap_ops);
+ ret = dax_iomap_rw(iocb, from, &ext2_iomap_ops);
if (ret > 0 && iocb->ki_pos > i_size_read(inode)) {
i_size_write(inode, iocb->ki_pos);
mark_inode_dirty(inode);
@@ -99,7 +99,7 @@ static int ext2_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
}
down_read(&ei->dax_sem);

- ret = iomap_dax_fault(vma, vmf, &ext2_iomap_ops);
+ ret = dax_iomap_fault(vma, vmf, &ext2_iomap_ops);

up_read(&ei->dax_sem);
if (vmf->flags & FAULT_FLAG_WRITE)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index a314fc7..e7f35d54 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -344,7 +344,7 @@ xfs_file_dax_read(
return 0; /* skip atime */

xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);
- ret = iomap_dax_rw(iocb, to, &xfs_iomap_ops);
+ ret = dax_iomap_rw(iocb, to, &xfs_iomap_ops);
xfs_rw_iunlock(ip, XFS_IOLOCK_SHARED);

file_accessed(iocb->ki_filp);
@@ -691,7 +691,7 @@ xfs_file_dax_write(

trace_xfs_file_dax_write(ip, count, pos);

- ret = iomap_dax_rw(iocb, from, &xfs_iomap_ops);
+ ret = dax_iomap_rw(iocb, from, &xfs_iomap_ops);
if (ret > 0 && iocb->ki_pos > i_size_read(inode)) {
i_size_write(inode, iocb->ki_pos);
error = xfs_setfilesize(ip, pos, ret);
@@ -1640,7 +1640,7 @@ xfs_filemap_page_mkwrite(
xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);

if (IS_DAX(inode)) {
- ret = iomap_dax_fault(vma, vmf, &xfs_iomap_ops);
+ ret = dax_iomap_fault(vma, vmf, &xfs_iomap_ops);
} else {
ret = iomap_page_mkwrite(vma, vmf, &xfs_iomap_ops);
ret = block_page_mkwrite_return(ret);
@@ -1674,7 +1674,7 @@ xfs_filemap_fault(
* changes to xfs_get_blocks_direct() to map unwritten extent
* ioend for conversion on read-only mappings.
*/
- ret = iomap_dax_fault(vma, vmf, &xfs_iomap_ops);
+ ret = dax_iomap_fault(vma, vmf, &xfs_iomap_ops);
} else
ret = filemap_fault(vma, vmf);
xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 0f74866..a3dfee4 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -11,13 +11,13 @@ struct iomap_ops;
/* We use lowest available exceptional entry bit for locking */
#define RADIX_DAX_ENTRY_LOCK (1 << RADIX_TREE_EXCEPTIONAL_SHIFT)

-ssize_t iomap_dax_rw(struct kiocb *iocb, struct iov_iter *iter,
+ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
struct iomap_ops *ops);
ssize_t dax_do_io(struct kiocb *, struct inode *, struct iov_iter *,
get_block_t, dio_iodone_t, int flags);
int dax_zero_page_range(struct inode *, loff_t from, unsigned len, get_block_t);
int dax_truncate_page(struct inode *, loff_t from, get_block_t);
-int iomap_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
+int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
struct iomap_ops *ops);
int dax_fault(struct vm_area_struct *, struct vm_fault *, get_block_t);
int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
--
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2016-10-19 19:34:29

by Ross Zwisler

[permalink] [raw]

Subject: [PATCH v8 10/16] dax: add dax_iomap_sector() helper function

To be able to correctly calculate the sector from a file position and a
struct iomap there is a complex little bit of logic that currently happens
in both dax_iomap_actor() and dax_iomap_fault(). This will need to be
repeated yet again in the DAX PMD fault handler when it is added, so break
it out into a helper function.

Signed-off-by: Ross Zwisler <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/dax.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index fdbd7a1..7737954 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1030,6 +1030,11 @@ int dax_truncate_page(struct inode *inode, loff_t from, get_block_t get_block)
EXPORT_SYMBOL_GPL(dax_truncate_page);

#ifdef CONFIG_FS_IOMAP
+static sector_t dax_iomap_sector(struct iomap *iomap, loff_t pos)
+{
+ return iomap->blkno + (((pos & PAGE_MASK) - iomap->offset) >> 9);
+}
+
static loff_t
dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
struct iomap *iomap)
@@ -1055,8 +1060,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
struct blk_dax_ctl dax = { 0 };
ssize_t map_len;

- dax.sector = iomap->blkno +
- (((pos & PAGE_MASK) - iomap->offset) >> 9);
+ dax.sector = dax_iomap_sector(iomap, pos);
dax.size = (length + offset + PAGE_SIZE - 1) & PAGE_MASK;
map_len = dax_map_atomic(iomap->bdev, &dax);
if (map_len < 0) {
@@ -1193,7 +1197,7 @@ int dax_iomap_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
goto unlock_entry;
}

- sector = iomap.blkno + (((pos & PAGE_MASK) - iomap.offset) >> 9);
+ sector = dax_iomap_sector(&iomap, pos);

if (vmf->cow_page) {
switch (iomap.type) {
--
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2016-10-27 14:41:55

by Ross Zwisler

[permalink] [raw]

Subject: Re: [PATCH v8 00/16] re-enable DAX PMD support

On Wed, Oct 19, 2016 at 01:34:19PM -0600, Ross Zwisler wrote:
> DAX PMDs have been disabled since Jan Kara introduced DAX radix tree based
> locking. This series allows DAX PMDs to participate in the DAX radix tree
> based locking scheme so that they can be re-enabled.
>
> Changes since v7:
> - Rebased on v4.9-rc1, dropping one ext4 patch that had already been merged.
> - Added Reviewed-by tags from Jan Kara.
>
> Here is a tree containing my changes:
> https://git.kernel.org/cgit/linux/kernel/git/zwisler/linux.git/log/?h=dax_pmd_v8
>
> Ross Zwisler (16):
> ext4: tell DAX the size of allocation holes
> dax: remove buffer_size_valid()
> ext2: remove support for DAX PMD faults
> dax: make 'wait_table' global variable static
> dax: remove the last BUG_ON() from fs/dax.c
> dax: consistent variable naming for DAX entries
> dax: coordinate locking for offsets in PMD range
> dax: remove dax_pmd_fault()
> dax: correct dax iomap code namespace
> dax: add dax_iomap_sector() helper function
> dax: dax_iomap_fault() needs to call iomap_end()
> dax: move RADIX_DAX_* defines to dax.h
> dax: move put_(un)locked_mapping_entry() in dax.c
> dax: add struct iomap based DAX PMD support
> xfs: use struct iomap based DAX PMD fault path
> dax: remove "depends on BROKEN" from FS_DAX_PMD
>
> fs/Kconfig | 1 -
> fs/dax.c | 826 +++++++++++++++++++++++++++++-----------------------
> fs/ext2/file.c | 35 +--
> fs/ext4/inode.c | 3 +
> fs/xfs/xfs_aops.c | 26 +-
> fs/xfs/xfs_aops.h | 3 -
> fs/xfs/xfs_file.c | 10 +-
> include/linux/dax.h | 58 +++-
> mm/filemap.c | 5 +-
> 9 files changed, 537 insertions(+), 430 deletions(-)
>
> --
> 2.7.4

Ping on this series. Dave, is the plan still for you to pull this in via the
XFS development tree? Do you need anything else on my side for this?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>