LinuxLists.cc - [PATCH v3 00/11] re-enable DAX PMD support

2016-09-27 20:47:51

Subject: [PATCH v3 00/11] re-enable DAX PMD support

DAX PMDs have been disabled since Jan Kara introduced DAX radix tree based
locking. This series allows DAX PMDs to participate in the DAX radix tree
based locking scheme so that they can be re-enabled.

Jan and Christoph, can you please help review these changes?

Andrew, when the time is right can you please push these changes to Linus via
the -mm tree?

In some simple mmap I/O testing with FIO the use of PMD faults more than
doubles I/O performance as compared with PTE faults. Here is the FIO script I
used for my testing:

[global]
bs=4k
size=2G
directory=/mnt/pmem0
ioengine=mmap
[randrw]
rw=randrw

Here are the performance results with XFS using only pte faults:
READ: io=1022.7MB, aggrb=557610KB/s, minb=557610KB/s, maxb=557610KB/s, mint=1878msec, maxt=1878msec
WRITE: io=1025.4MB, aggrb=559084KB/s, minb=559084KB/s, maxb=559084KB/s, mint=1878msec, maxt=1878msec

Here are performance numbers for that same test using PMD faults:
READ: io=1022.7MB, aggrb=1406.7MB/s, minb=1406.7MB/s, maxb=1406.7MB/s, mint=727msec, maxt=727msec
WRITE: io=1025.4MB, aggrb=1410.4MB/s, minb=1410.4MB/s, maxb=1410.4MB/s, mint=727msec, maxt=727msec

This was done on a random lab machine with a PMEM device made from memmap'd
RAM. To get XFS to use PMD faults, I did the following:

mkfs.xfs -f -d su=2m,sw=1 /dev/pmem0
mount -o dax /dev/pmem0 /mnt/pmem0
xfs_io -c "extsize 2m" /mnt/pmem0

Changes since v2:
- Removed the struct buffer_head + get_block_t based dax_pmd_fault() handler.
All DAX PMD faults will now happen via the new struct iomap based
iomap_dax_pmd_fault().
- Added a new struct iomap based PMD path which is now used by XFS.
- Now that it is using struct iomap, ext2 no longer needs to modified so that
ext2_get_block() will give us the size of a hole.
- Remove support for DAX PMD faults for ext2. I can't get them to reliably
happen in my testing.
- Removed unused xfs_get_blocks_dax_fault() wrapper
- Added a bunch of comments around my changes in dax.c.

This was built upon xfs/for-next with PMD performance fixes from Toshi Kani and
Dan Williams. Dan's patch has already been merged for v4.8, and Toshi's
patches are currently queued in Andrew Morton's mm tree for v4.9 inclusion.

Here is a tree containing my changes and all the fixes that I've been testing:
https://git.kernel.org/cgit/linux/kernel/git/zwisler/linux.git/log/?h=dax_pmd_v3

Ross Zwisler (11):
ext4: allow DAX writeback for hole punch
ext4: tell DAX the size of allocation holes
dax: remove buffer_size_valid()
ext2: remove support for DAX PMD faults
dax: make 'wait_table' global variable static
dax: consistent variable naming for DAX entries
dax: coordinate locking for offsets in PMD range
dax: remove dax_pmd_fault()
dax: add struct iomap based DAX PMD support
xfs: use struct iomap based DAX PMD fault path
dax: remove "depends on BROKEN" from FS_DAX_PMD

fs/Kconfig | 1 -
fs/dax.c | 696 +++++++++++++++++++++++++++++-----------------------
fs/ext2/file.c | 24 +-
fs/ext4/inode.c | 7 +-
fs/xfs/xfs_aops.c | 25 +-
fs/xfs/xfs_aops.h | 3 -
fs/xfs/xfs_file.c | 2 +-
include/linux/dax.h | 37 ++-
mm/filemap.c | 6 +-
9 files changed, 434 insertions(+), 367 deletions(-)

--
2.7.4

2016-09-27 20:47:59

by Ross Zwisler

[permalink] [raw]

Subject: [PATCH v3 08/11] dax: remove dax_pmd_fault()

dax_pmd_fault() is the old struct buffer_head + get_block_t based 2 MiB DAX
fault handler. This fault handler has been disabled for several kernel
releases, and support for PMDs will be reintroduced using the struct iomap
interface instead.

Signed-off-by: Ross Zwisler <[email protected]>
---
fs/dax.c | 213 ----------------------------------------------------
include/linux/dax.h | 6 +-
2 files changed, 1 insertion(+), 218 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 406feea..b5e7b13 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -909,219 +909,6 @@ int dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
}
EXPORT_SYMBOL_GPL(dax_fault);

-#if defined(CONFIG_TRANSPARENT_HUGEPAGE)
-/*
- * The 'colour' (ie low bits) within a PMD of a page offset. This comes up
- * more often than one might expect in the below function.
- */
-#define PG_PMD_COLOUR ((PMD_SIZE >> PAGE_SHIFT) - 1)
-
-static void __dax_dbg(struct buffer_head *bh, unsigned long address,
- const char *reason, const char *fn)
-{
- if (bh) {
- char bname[BDEVNAME_SIZE];
- bdevname(bh->b_bdev, bname);
- pr_debug("%s: %s addr: %lx dev %s state %lx start %lld "
- "length %zd fallback: %s\n", fn, current->comm,
- address, bname, bh->b_state, (u64)bh->b_blocknr,
- bh->b_size, reason);
- } else {
- pr_debug("%s: %s addr: %lx fallback: %s\n", fn,
- current->comm, address, reason);
- }
-}
-
-#define dax_pmd_dbg(bh, address, reason) __dax_dbg(bh, address, reason, "dax_pmd")
-
-/**
- * dax_pmd_fault - handle a PMD fault on a DAX file
- * @vma: The virtual memory area where the fault occurred
- * @vmf: The description of the fault
- * @get_block: The filesystem method used to translate file offsets to blocks
- *
- * When a page fault occurs, filesystems may call this helper in their
- * pmd_fault handler for DAX files.
- */
-int dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
- pmd_t *pmd, unsigned int flags, get_block_t get_block)
-{
- struct file *file = vma->vm_file;
- struct address_space *mapping = file->f_mapping;
- struct inode *inode = mapping->host;
- struct buffer_head bh;
- unsigned blkbits = inode->i_blkbits;
- unsigned long pmd_addr = address & PMD_MASK;
- bool write = flags & FAULT_FLAG_WRITE;
- struct block_device *bdev;
- pgoff_t size, pgoff;
- sector_t block;
- int result = 0;
- bool alloc = false;
-
- /* dax pmd mappings require pfn_t_devmap() */
- if (!IS_ENABLED(CONFIG_FS_DAX_PMD))
- return VM_FAULT_FALLBACK;
-
- /* Fall back to PTEs if we're going to COW */
- if (write && !(vma->vm_flags & VM_SHARED)) {
- split_huge_pmd(vma, pmd, address);
- dax_pmd_dbg(NULL, address, "cow write");
- return VM_FAULT_FALLBACK;
- }
- /* If the PMD would extend outside the VMA */
- if (pmd_addr < vma->vm_start) {
- dax_pmd_dbg(NULL, address, "vma start unaligned");
- return VM_FAULT_FALLBACK;
- }
- if ((pmd_addr + PMD_SIZE) > vma->vm_end) {
- dax_pmd_dbg(NULL, address, "vma end unaligned");
- return VM_FAULT_FALLBACK;
- }
-
- pgoff = linear_page_index(vma, pmd_addr);
- size = (i_size_read(inode) + PAGE_SIZE - 1) >> PAGE_SHIFT;
- if (pgoff >= size)
- return VM_FAULT_SIGBUS;
- /* If the PMD would cover blocks out of the file */
- if ((pgoff | PG_PMD_COLOUR) >= size) {
- dax_pmd_dbg(NULL, address,
- "offset + huge page size > file size");
- return VM_FAULT_FALLBACK;
- }
-
- memset(&bh, 0, sizeof(bh));
- bh.b_bdev = inode->i_sb->s_bdev;
- block = (sector_t)pgoff << (PAGE_SHIFT - blkbits);
-
- bh.b_size = PMD_SIZE;
-
- if (get_block(inode, block, &bh, 0) != 0)
- return VM_FAULT_SIGBUS;
-
- if (!buffer_mapped(&bh) && write) {
- if (get_block(inode, block, &bh, 1) != 0)
- return VM_FAULT_SIGBUS;
- alloc = true;
- WARN_ON_ONCE(buffer_unwritten(&bh) || buffer_new(&bh));
- }
-
- bdev = bh.b_bdev;
-
- if (bh.b_size < PMD_SIZE) {
- dax_pmd_dbg(&bh, address, "allocated block too small");
- return VM_FAULT_FALLBACK;
- }
-
- /*
- * If we allocated new storage, make sure no process has any
- * zero pages covering this hole
- */
- if (alloc) {
- loff_t lstart = pgoff << PAGE_SHIFT;
- loff_t lend = lstart + PMD_SIZE - 1; /* inclusive */
-
- truncate_pagecache_range(inode, lstart, lend);
- }
-
- if (!write && !buffer_mapped(&bh)) {
- spinlock_t *ptl;
- pmd_t entry;
- struct page *zero_page = get_huge_zero_page();
-
- if (unlikely(!zero_page)) {
- dax_pmd_dbg(&bh, address, "no zero page");
- goto fallback;
- }
-
- ptl = pmd_lock(vma->vm_mm, pmd);
- if (!pmd_none(*pmd)) {
- spin_unlock(ptl);
- dax_pmd_dbg(&bh, address, "pmd already present");
- goto fallback;
- }
-
- dev_dbg(part_to_dev(bdev->bd_part),
- "%s: %s addr: %lx pfn: <zero> sect: %llx\n",
- __func__, current->comm, address,
- (unsigned long long) to_sector(&bh, inode));
-
- entry = mk_pmd(zero_page, vma->vm_page_prot);
- entry = pmd_mkhuge(entry);
- set_pmd_at(vma->vm_mm, pmd_addr, pmd, entry);
- result = VM_FAULT_NOPAGE;
- spin_unlock(ptl);
- } else {
- struct blk_dax_ctl dax = {
- .sector = to_sector(&bh, inode),
- .size = PMD_SIZE,
- };
- long length = dax_map_atomic(bdev, &dax);
-
- if (length < 0) {
- dax_pmd_dbg(&bh, address, "dax-error fallback");
- goto fallback;
- }
- if (length < PMD_SIZE) {
- dax_pmd_dbg(&bh, address, "dax-length too small");
- dax_unmap_atomic(bdev, &dax);
- goto fallback;
- }
- if (pfn_t_to_pfn(dax.pfn) & PG_PMD_COLOUR) {
- dax_pmd_dbg(&bh, address, "pfn unaligned");
- dax_unmap_atomic(bdev, &dax);
- goto fallback;
- }
-
- if (!pfn_t_devmap(dax.pfn)) {
- dax_unmap_atomic(bdev, &dax);
- dax_pmd_dbg(&bh, address, "pfn not in memmap");
- goto fallback;
- }
- dax_unmap_atomic(bdev, &dax);
-
- /*
- * For PTE faults we insert a radix tree entry for reads, and
- * leave it clean. Then on the first write we dirty the radix
- * tree entry via the dax_pfn_mkwrite() path. This sequence
- * allows the dax_pfn_mkwrite() call to be simpler and avoid a
- * call into get_block() to translate the pgoff to a sector in
- * order to be able to create a new radix tree entry.
- *
- * The PMD path doesn't have an equivalent to
- * dax_pfn_mkwrite(), though, so for a read followed by a
- * write we traverse all the way through dax_pmd_fault()
- * twice. This means we can just skip inserting a radix tree
- * entry completely on the initial read and just wait until
- * the write to insert a dirty entry.
- */
- if (write) {
- /*
- * We should insert radix-tree entry and dirty it here.
- * For now this is broken...
- */
- }
-
- dev_dbg(part_to_dev(bdev->bd_part),
- "%s: %s addr: %lx pfn: %lx sect: %llx\n",
- __func__, current->comm, address,
- pfn_t_to_pfn(dax.pfn),
- (unsigned long long) dax.sector);
- result |= vmf_insert_pfn_pmd(vma, address, pmd,
- dax.pfn, write);
- }
-
- out:
- return result;
-
- fallback:
- count_vm_event(THP_FAULT_FALLBACK);
- result = VM_FAULT_FALLBACK;
- goto out;
-}
-EXPORT_SYMBOL_GPL(dax_pmd_fault);
-#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-
/**
* dax_pfn_mkwrite - handle first write to DAX page
* @vma: The virtual memory area where the fault occurred
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 4065601..d9a8350 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -48,16 +48,12 @@ static inline int __dax_zero_page_range(struct block_device *bdev,
}
#endif

-#if defined(CONFIG_TRANSPARENT_HUGEPAGE)
-int dax_pmd_fault(struct vm_area_struct *, unsigned long addr, pmd_t *,
- unsigned int flags, get_block_t);
-#else
static inline int dax_pmd_fault(struct vm_area_struct *vma, unsigned long addr,
pmd_t *pmd, unsigned int flags, get_block_t gb)
{
return VM_FAULT_FALLBACK;
}
-#endif
+
int dax_pfn_mkwrite(struct vm_area_struct *, struct vm_fault *);
#define dax_mkwrite(vma, vmf, gb) dax_fault(vma, vmf, gb)

--
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2016-09-27 20:48:01

by Ross Zwisler

[permalink] [raw]

Subject: [PATCH v3 10/11] xfs: use struct iomap based DAX PMD fault path

Switch xfs_filemap_pmd_fault() from using dax_pmd_fault() to the new and
improved iomap_dax_pmd_fault(). Also, now that it has no more users,
remove xfs_get_blocks_dax_fault().

Signed-off-by: Ross Zwisler <[email protected]>
---
fs/xfs/xfs_aops.c | 25 +++++--------------------
fs/xfs/xfs_aops.h | 3 ---
fs/xfs/xfs_file.c | 2 +-
3 files changed, 6 insertions(+), 24 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 4a28fa9..39c754f 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1170,8 +1170,7 @@ __xfs_get_blocks(
sector_t iblock,
struct buffer_head *bh_result,
int create,
- bool direct,
- bool dax_fault)
+ bool direct)
{
struct xfs_inode *ip = XFS_I(inode);
struct xfs_mount *mp = ip->i_mount;
@@ -1265,12 +1264,8 @@ __xfs_get_blocks(
if (ISUNWRITTEN(&imap))
set_buffer_unwritten(bh_result);
/* direct IO needs special help */
- if (create) {
- if (dax_fault)
- ASSERT(!ISUNWRITTEN(&imap));
- else
- xfs_map_direct(inode, bh_result, &imap, offset);
- }
+ if (create)
+ xfs_map_direct(inode, bh_result, &imap, offset);
}

/*
@@ -1310,7 +1305,7 @@ xfs_get_blocks(
struct buffer_head *bh_result,
int create)
{
- return __xfs_get_blocks(inode, iblock, bh_result, create, false, false);
+ return __xfs_get_blocks(inode, iblock, bh_result, create, false);
}

int
@@ -1320,17 +1315,7 @@ xfs_get_blocks_direct(
struct buffer_head *bh_result,
int create)
{
- return __xfs_get_blocks(inode, iblock, bh_result, create, true, false);
-}
-
-int
-xfs_get_blocks_dax_fault(
- struct inode *inode,
- sector_t iblock,
- struct buffer_head *bh_result,
- int create)
-{
- return __xfs_get_blocks(inode, iblock, bh_result, create, true, true);
+ return __xfs_get_blocks(inode, iblock, bh_result, create, true);
}

/*
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index 1950e3b..6779e9d 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -57,9 +57,6 @@ int xfs_get_blocks(struct inode *inode, sector_t offset,
struct buffer_head *map_bh, int create);
int xfs_get_blocks_direct(struct inode *inode, sector_t offset,
struct buffer_head *map_bh, int create);
-int xfs_get_blocks_dax_fault(struct inode *inode, sector_t offset,
- struct buffer_head *map_bh, int create);
-
int xfs_end_io_direct_write(struct kiocb *iocb, loff_t offset,
ssize_t size, void *private);
int xfs_setfilesize(struct xfs_inode *ip, xfs_off_t offset, size_t size);
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 882f264..e86b2be 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1539,7 +1539,7 @@ xfs_filemap_pmd_fault(
}

xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
- ret = dax_pmd_fault(vma, addr, pmd, flags, xfs_get_blocks_dax_fault);
+ ret = iomap_dax_pmd_fault(vma, addr, pmd, flags, &xfs_iomap_ops);
xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);

if (flags & FAULT_FLAG_WRITE)
--
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2016-09-27 20:47:58

by Ross Zwisler

[permalink] [raw]

Subject: [PATCH v3 07/11] dax: coordinate locking for offsets in PMD range

DAX radix tree locking currently locks entries based on the unique
combination of the 'mapping' pointer and the pgoff_t 'index' for the entry.
This works for PTEs, but as we move to PMDs we will need to have all the
offsets within the range covered by the PMD to map to the same bit lock.
To accomplish this, for ranges covered by a PMD entry we will instead lock
based on the page offset of the beginning of the PMD entry. The 'mapping'
pointer is still used in the same way.

Signed-off-by: Ross Zwisler <[email protected]>
---
fs/dax.c | 37 ++++++++++++++++++++++++-------------
include/linux/dax.h | 2 +-
mm/filemap.c | 2 +-
3 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index baef586..406feea 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -64,10 +64,17 @@ static int __init init_dax_wait_table(void)
}
fs_initcall(init_dax_wait_table);

+static pgoff_t dax_entry_start(pgoff_t index, void *entry)
+{
+ if (RADIX_DAX_TYPE(entry) == RADIX_DAX_PMD)
+ index &= (PMD_MASK >> PAGE_SHIFT);
+ return index;
+}
+
static wait_queue_head_t *dax_entry_waitqueue(struct address_space *mapping,
- pgoff_t index)
+ pgoff_t entry_start)
{
- unsigned long hash = hash_long((unsigned long)mapping ^ index,
+ unsigned long hash = hash_long((unsigned long)mapping ^ entry_start,
DAX_WAIT_TABLE_BITS);
return wait_table + hash;
}
@@ -285,7 +292,7 @@ EXPORT_SYMBOL_GPL(dax_do_io);
*/
struct exceptional_entry_key {
struct address_space *mapping;
- unsigned long index;
+ pgoff_t entry_start;
};

struct wait_exceptional_entry_queue {
@@ -301,7 +308,7 @@ static int wake_exceptional_entry_func(wait_queue_t *wait, unsigned int mode,
container_of(wait, struct wait_exceptional_entry_queue, wait);

if (key->mapping != ewait->key.mapping ||
- key->index != ewait->key.index)
+ key->entry_start != ewait->key.entry_start)
return 0;
return autoremove_wake_function(wait, mode, sync, NULL);
}
@@ -359,12 +366,10 @@ static void *get_unlocked_mapping_entry(struct address_space *mapping,
{
void *entry, **slot;
struct wait_exceptional_entry_queue ewait;
- wait_queue_head_t *wq = dax_entry_waitqueue(mapping, index);
+ wait_queue_head_t *wq;

init_wait(&ewait.wait);
ewait.wait.func = wake_exceptional_entry_func;
- ewait.key.mapping = mapping;
- ewait.key.index = index;

for (;;) {
entry = __radix_tree_lookup(&mapping->page_tree, index, NULL,
@@ -375,6 +380,11 @@ static void *get_unlocked_mapping_entry(struct address_space *mapping,
*slotp = slot;
return entry;
}
+
+ wq = dax_entry_waitqueue(mapping,
+ dax_entry_start(index, entry));
+ ewait.key.mapping = mapping;
+ ewait.key.entry_start = dax_entry_start(index, entry);
prepare_to_wait_exclusive(wq, &ewait.wait,
TASK_UNINTERRUPTIBLE);
spin_unlock_irq(&mapping->tree_lock);
@@ -447,10 +457,11 @@ restart:
return entry;
}

-void dax_wake_mapping_entry_waiter(struct address_space *mapping,
+void dax_wake_mapping_entry_waiter(void *entry, struct address_space *mapping,
pgoff_t index, bool wake_all)
{
- wait_queue_head_t *wq = dax_entry_waitqueue(mapping, index);
+ wait_queue_head_t *wq = dax_entry_waitqueue(mapping,
+ dax_entry_start(index, entry));

/*
* Checking for locked entry and prepare_to_wait_exclusive() happens
@@ -462,7 +473,7 @@ void dax_wake_mapping_entry_waiter(struct address_space *mapping,
struct exceptional_entry_key key;

key.mapping = mapping;
- key.index = index;
+ key.entry_start = dax_entry_start(index, entry);
__wake_up(wq, TASK_NORMAL, wake_all ? 0 : 1, &key);
}
}
@@ -480,7 +491,7 @@ void dax_unlock_mapping_entry(struct address_space *mapping, pgoff_t index)
}
unlock_slot(mapping, slot);
spin_unlock_irq(&mapping->tree_lock);
- dax_wake_mapping_entry_waiter(mapping, index, false);
+ dax_wake_mapping_entry_waiter(entry, mapping, index, false);
}

static void put_locked_mapping_entry(struct address_space *mapping,
@@ -505,7 +516,7 @@ static void put_unlocked_mapping_entry(struct address_space *mapping,
return;

/* We have to wake up next waiter for the radix tree entry lock */
- dax_wake_mapping_entry_waiter(mapping, index, false);
+ dax_wake_mapping_entry_waiter(entry, mapping, index, false);
}

/*
@@ -532,7 +543,7 @@ int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index)
radix_tree_delete(&mapping->page_tree, index);
mapping->nrexceptional--;
spin_unlock_irq(&mapping->tree_lock);
- dax_wake_mapping_entry_waiter(mapping, index, true);
+ dax_wake_mapping_entry_waiter(entry, mapping, index, true);

return 1;
}
diff --git a/include/linux/dax.h b/include/linux/dax.h
index add6c4b..4065601 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -21,7 +21,7 @@ int iomap_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf,
struct iomap_ops *ops);
int dax_fault(struct vm_area_struct *, struct vm_fault *, get_block_t);
int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index);
-void dax_wake_mapping_entry_waiter(struct address_space *mapping,
+void dax_wake_mapping_entry_waiter(void *entry, struct address_space *mapping,
pgoff_t index, bool wake_all);

#ifdef CONFIG_FS_DAX
diff --git a/mm/filemap.c b/mm/filemap.c
index 8a287df..35e880d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -617,7 +617,7 @@ static int page_cache_tree_insert(struct address_space *mapping,
if (node)
workingset_node_pages_dec(node);
/* Wakeup waiters for exceptional entry lock */
- dax_wake_mapping_entry_waiter(mapping, page->index,
+ dax_wake_mapping_entry_waiter(p, mapping, page->index,
false);
}
}
--
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2016-09-28 02:08:42

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH v3 00/11] re-enable DAX PMD support

On Tue, Sep 27, 2016 at 02:47:51PM -0600, Ross Zwisler wrote:
> DAX PMDs have been disabled since Jan Kara introduced DAX radix tree based
> locking. This series allows DAX PMDs to participate in the DAX radix tree
> based locking scheme so that they can be re-enabled.
>
> Jan and Christoph, can you please help review these changes?

About to get on a plane, so it might take a bit to do a real review.
In general this looks fine, but I guess the first two ext4 patches
should just go straight to Ted independent of the rest?

Also Jan just posted a giant DAX patchbomb, we'll need to find a way
to integrate all that work, and maybe prioritize things if we want
to get bits into 4.9 still.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2016-09-28 04:55:50

by Dave Chinner

[permalink] [raw]

Subject: Re: [PATCH v3 00/11] re-enable DAX PMD support

On Tue, Sep 27, 2016 at 07:08:42PM -0700, Christoph Hellwig wrote:
> On Tue, Sep 27, 2016 at 02:47:51PM -0600, Ross Zwisler wrote:
> > DAX PMDs have been disabled since Jan Kara introduced DAX radix tree based
> > locking. This series allows DAX PMDs to participate in the DAX radix tree
> > based locking scheme so that they can be re-enabled.
> >
> > Jan and Christoph, can you please help review these changes?
>
> About to get on a plane, so it might take a bit to do a real review.
> In general this looks fine, but I guess the first two ext4 patches
> should just go straight to Ted independent of the rest?
>
> Also Jan just posted a giant DAX patchbomb, we'll need to find a way
> to integrate all that work, and maybe prioritize things if we want
> to get bits into 4.9 still.

I'm not going to have time to do much review or testing of the DAX
changes (apart from the cursor comments I've already made) because
of the huge pile of XFS reflink changes I've got ot get through
first. However, I've already got the iomap dax bits in the XFS tree
so I can pull everything through there if review and testing is
covered otherwise......

Cheers,

Dave.
--
Dave Chinner
[email protected]

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2016-09-29 18:23:37

by Ross Zwisler

[permalink] [raw]

Subject: Re: [PATCH v3 00/11] re-enable DAX PMD support

On Wed, Sep 28, 2016 at 02:55:50PM +1000, Dave Chinner wrote:
> On Tue, Sep 27, 2016 at 07:08:42PM -0700, Christoph Hellwig wrote:
> > On Tue, Sep 27, 2016 at 02:47:51PM -0600, Ross Zwisler wrote:
> > > DAX PMDs have been disabled since Jan Kara introduced DAX radix tree based
> > > locking. This series allows DAX PMDs to participate in the DAX radix tree
> > > based locking scheme so that they can be re-enabled.
> > >
> > > Jan and Christoph, can you please help review these changes?
> >
> > About to get on a plane, so it might take a bit to do a real review.
> > In general this looks fine, but I guess the first two ext4 patches
> > should just go straight to Ted independent of the rest?
> >
> > Also Jan just posted a giant DAX patchbomb, we'll need to find a way
> > to integrate all that work, and maybe prioritize things if we want
> > to get bits into 4.9 still.
>
> I'm not going to have time to do much review or testing of the DAX
> changes (apart from the cursor comments I've already made) because
> of the huge pile of XFS reflink changes I've got ot get through
> first. However, I've already got the iomap dax bits in the XFS tree
> so I can pull everything through there if review and testing is
> covered otherwise......

Frankly I'd love it if my changes could make it into v4.9 through the XFS
tree. They've passed xfstests both with and without DAX, and they've passed
all the targeted testing I've been able to throw at them. If that works, we
can integrate Jan's changes on top of them during the v4.9 cycle and merge for
v4.10.

I'll work on incorporating changes for your feedback, Dave, and hopefully have
a v4 out by the end of the day.