LinuxLists.cc - [PATCH v3 0/7] dax: handling media errors

2016-04-23 19:13:35

Subject: [PATCH v3 0/7] dax: handling media errors

Until now, dax has been disabled if media errors were found on
any device. This series attempts to address that.

The first three patches from Dan re-enable dax even when media
errors are present.

The fourth patch from Matthew removes the
zeroout path from dax entirely, making zeroout operations always
go through the driver (The motivation is that if a backing device
has media errors, and we create a sparse file on it, we don't
want the initial zeroing to happen via dax, we want to give the
block driver a chance to clear the errors).

The fifth patch changes the behaviour of dax_do_io by adding a
wrapper around it that is passed all the arguments also needed by
__blockdev_do_direct_IO. If (the new) __dax_do_io fails with -EIO
due to a bad block, we simply retry with the direct_IO path which
forces the IO to go through the block driver, and can attempt to
clear the error.

Patch 6 reduces our calls to clear_pmem from dax in the
truncate/hole-punch cases. We check if the range being truncated
is sector aligned/sized, and if so, send blkdev_issue_zeroout
instead of clear_pmem so that errors can be handled better by
the driver.

Patch 7 fixes a redundant comment in DAX and is mostly unrelated
to the rest of this series.

This series also depends on/is based on Jan Kara's DAX Locking
fixes series [1].

[1]: http://www.spinics.net/lists/linux-mm/msg105819.html

v3:
- Wrapper-ize the direct_IO fallback again and make an exception
for -EIOCBQUEUED (Jeff, Dan)
- Reduce clear_pmem usage in DAX to the minimum

Dan Williams (3):
block, dax: pass blk_dax_ctl through to drivers
dax: fallback from pmd to pte on error
dax: enable dax in the presence of known media errors (badblocks)

Matthew Wilcox (1):
dax: use sb_issue_zerout instead of calling dax_clear_sectors

Vishal Verma (3):
dax: handle media errors in dax_do_io
dax: for truncate/hole-punch, do zeroing through the driver if
possible
dax: fix a comment in dax_zero_page_range and dax_truncate_page

arch/powerpc/sysdev/axonram.c | 10 +++---
block/ioctl.c | 9 -----
drivers/block/brd.c | 9 ++---
drivers/nvdimm/pmem.c | 17 +++++++---
drivers/s390/block/dcssblk.c | 12 +++----
fs/block_dev.c | 7 ++--
fs/dax.c | 78 +++++++++++++++----------------------------
fs/ext2/inode.c | 12 +++----
fs/ext4/inode.c | 5 +--
fs/xfs/xfs_aops.c | 8 ++---
fs/xfs/xfs_bmap_util.c | 15 +++------
include/linux/blkdev.h | 3 +-
include/linux/dax.h | 31 ++++++++++++++++-
13 files changed, 108 insertions(+), 108 deletions(-)

--
2.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2016-04-23 19:13:36

by Verma, Vishal L

[permalink] [raw]

Subject: [PATCH v3 1/7] block, dax: pass blk_dax_ctl through to drivers

From: Dan Williams <[email protected]>

This is in preparation for doing badblocks checking against the
requested sector range in the driver. Currently we opportunistically
return as much data that can be "dax'd" starting at the given sector.
When errors are present we want to limit that range to the first
encountered error, or fail the dax request if the range encompasses an
error.

Signed-off-by: Dan Williams <[email protected]>
---
arch/powerpc/sysdev/axonram.c | 10 +++++-----
drivers/block/brd.c | 9 +++++----
drivers/nvdimm/pmem.c | 9 +++++----
drivers/s390/block/dcssblk.c | 12 ++++++------
fs/block_dev.c | 2 +-
include/linux/blkdev.h | 3 +--
6 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index 0d112b9..d85673f 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -139,17 +139,17 @@ axon_ram_make_request(struct request_queue *queue, struct bio *bio)

/**
* axon_ram_direct_access - direct_access() method for block device
- * @device, @sector, @data: see block_device_operations method
+ * @dax: see block_device_operations method
*/
static long
-axon_ram_direct_access(struct block_device *device, sector_t sector,
- void __pmem **kaddr, pfn_t *pfn)
+axon_ram_direct_access(struct block_device *device, struct blk_dax_ctl *dax)
{
+ sector_t sector = get_start_sect(device) + dax->sector;
struct axon_ram_bank *bank = device->bd_disk->private_data;
loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;

- *kaddr = (void __pmem __force *) bank->io_addr + offset;
- *pfn = phys_to_pfn_t(bank->ph_addr + offset, PFN_DEV);
+ dax->addr = (void __pmem __force *) bank->io_addr + offset;
+ dax->pfn = phys_to_pfn_t(bank->ph_addr + offset, PFN_DEV);
return bank->size - offset;
}

diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 51a071e..71521c1 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -380,9 +380,10 @@ static int brd_rw_page(struct block_device *bdev, sector_t sector,
}

#ifdef CONFIG_BLK_DEV_RAM_DAX
-static long brd_direct_access(struct block_device *bdev, sector_t sector,
- void __pmem **kaddr, pfn_t *pfn)
+static long brd_direct_access(struct block_device *bdev,
+ struct blk_dax_ctl *dax)
{
+ sector_t sector = get_start_sect(bdev) + dax->sector;
struct brd_device *brd = bdev->bd_disk->private_data;
struct page *page;

@@ -391,8 +392,8 @@ static long brd_direct_access(struct block_device *bdev, sector_t sector,
page = brd_insert_page(brd, sector);
if (!page)
return -ENOSPC;
- *kaddr = (void __pmem *)page_address(page);
- *pfn = page_to_pfn_t(page);
+ dax->addr = (void __pmem *)page_address(page);
+ dax->pfn = page_to_pfn_t(page);

return PAGE_SIZE;
}
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index f798899..f72733c 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -181,14 +181,15 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
return rc;
}

-static long pmem_direct_access(struct block_device *bdev, sector_t sector,
- void __pmem **kaddr, pfn_t *pfn)
+static long pmem_direct_access(struct block_device *bdev,
+ struct blk_dax_ctl *dax)
{
+ sector_t sector = get_start_sect(bdev) + dax->sector;
struct pmem_device *pmem = bdev->bd_disk->private_data;
resource_size_t offset = sector * 512 + pmem->data_offset;

- *kaddr = pmem->virt_addr + offset;
- *pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);
+ dax->addr = pmem->virt_addr + offset;
+ dax->pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);

return pmem->size - pmem->pfn_pad - offset;
}
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index b839086..613f587 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -30,8 +30,8 @@ static int dcssblk_open(struct block_device *bdev, fmode_t mode);
static void dcssblk_release(struct gendisk *disk, fmode_t mode);
static blk_qc_t dcssblk_make_request(struct request_queue *q,
struct bio *bio);
-static long dcssblk_direct_access(struct block_device *bdev, sector_t secnum,
- void __pmem **kaddr, pfn_t *pfn);
+static long dcssblk_direct_access(struct block_device *bdev,
+ struct blk_dax_ctl *dax)

static char dcssblk_segments[DCSSBLK_PARM_LEN] = "\0";

@@ -883,9 +883,9 @@ fail:
}

static long
-dcssblk_direct_access (struct block_device *bdev, sector_t secnum,
- void __pmem **kaddr, pfn_t *pfn)
+dcssblk_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax)
{
+ sector_t secnum = get_start_sect(bdev) + dax->sector;
struct dcssblk_dev_info *dev_info;
unsigned long offset, dev_sz;

@@ -894,8 +894,8 @@ dcssblk_direct_access (struct block_device *bdev, sector_t secnum,
return -ENODEV;
dev_sz = dev_info->end - dev_info->start;
offset = secnum * 512;
- *kaddr = (void __pmem *) (dev_info->start + offset);
- *pfn = __pfn_to_pfn_t(PFN_DOWN(dev_info->start + offset), PFN_DEV);
+ dax->addr = (void __pmem *) (dev_info->start + offset);
+ dax->pfn = __pfn_to_pfn_t(PFN_DOWN(dev_info->start + offset), PFN_DEV);

return dev_sz - offset;
}
diff --git a/fs/block_dev.c b/fs/block_dev.c
index b25bb23..79defba 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -488,7 +488,7 @@ long bdev_direct_access(struct block_device *bdev, struct blk_dax_ctl *dax)
sector += get_start_sect(bdev);
if (sector % (PAGE_SIZE / 512))
return -EINVAL;
- avail = ops->direct_access(bdev, sector, &dax->addr, &dax->pfn);
+ avail = ops->direct_access(bdev, dax);
if (!avail)
return -ERANGE;
if (avail > 0 && avail & ~PAGE_MASK)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 669e419..9d8c6d5 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1656,8 +1656,7 @@ struct block_device_operations {
int (*rw_page)(struct block_device *, sector_t, struct page *, int rw);
int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
- long (*direct_access)(struct block_device *, sector_t, void __pmem **,
- pfn_t *);
+ long (*direct_access)(struct block_device *, struct blk_dax_ctl *dax);
unsigned int (*check_events) (struct gendisk *disk,
unsigned int clearing);
/* ->media_changed() is DEPRECATED, use ->check_events() instead */
--
2.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2016-04-23 19:13:37

by Verma, Vishal L

[permalink] [raw]

Subject: [PATCH v3 2/7] dax: fallback from pmd to pte on error

From: Dan Williams <[email protected]>

In preparation for consulting a badblocks list in pmem_direct_access(),
teach dax_pmd_fault() to fallback rather than fail immediately upon
encountering an error. The thought being that reducing the span of the
dax request may avoid the error region.

Signed-off-by: Dan Williams <[email protected]>
---
fs/dax.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 5a34f08..52f0044 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1111,8 +1111,8 @@ int __dax_pmd_fault(struct vm_area_struct *vma, unsigned long address,
long length = dax_map_atomic(bdev, &dax);

if (length < 0) {
- result = VM_FAULT_SIGBUS;
- goto out;
+ dax_pmd_dbg(&bh, address, "dax-error fallback");
+ goto fallback;
}
if (length < PMD_SIZE) {
dax_pmd_dbg(&bh, address, "dax-length too small");
--
2.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2016-04-23 19:13:38

by Verma, Vishal L

[permalink] [raw]

Subject: [PATCH v3 3/7] dax: enable dax in the presence of known media errors (badblocks)

From: Dan Williams <[email protected]>

1/ If a mapping overlaps a bad sector fail the request.

2/ Do not opportunistically report more dax-capable capacity than is
requested when errors present.

[vishal: fix a conflict with system RAM collision patches]
Signed-off-by: Dan Williams <[email protected]>
---
block/ioctl.c | 9 ---------
drivers/nvdimm/pmem.c | 8 ++++++++
2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/block/ioctl.c b/block/ioctl.c
index 4ff1f92..bf80bfd 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -423,15 +423,6 @@ bool blkdev_dax_capable(struct block_device *bdev)
|| (bdev->bd_part->nr_sects % (PAGE_SIZE / 512)))
return false;

- /*
- * If the device has known bad blocks, force all I/O through the
- * driver / page cache.
- *
- * TODO: support finer grained dax error handling
- */
- if (disk->bb && disk->bb->count)
- return false;
-
return true;
}
#endif
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index f72733c..4567d9a 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -188,9 +188,17 @@ static long pmem_direct_access(struct block_device *bdev,
struct pmem_device *pmem = bdev->bd_disk->private_data;
resource_size_t offset = sector * 512 + pmem->data_offset;

+ if (unlikely(is_bad_pmem(&pmem->bb, sector, dax->size)))
+ return -EIO;
dax->addr = pmem->virt_addr + offset;
dax->pfn = phys_to_pfn_t(pmem->phys_addr + offset, pmem->pfn_flags);

+ /*
+ * If badblocks are present, limit known good range to the
+ * requested range.
+ */
+ if (unlikely(pmem->bb.count))
+ return dax->size;
return pmem->size - pmem->pfn_pad - offset;
}

--
2.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2016-04-23 19:13:40

by Verma, Vishal L

[permalink] [raw]

Subject: [PATCH v3 5/7] dax: handle media errors in dax_do_io

dax_do_io (called for read() or write() for a dax file system) may fail
in the presence of bad blocks or media errors. Since we expect that a
write should clear media errors on nvdimms, make dax_do_io fall back to
the direct_IO path, which will send down a bio to the driver, which can
then attempt to clear the error.

Cc: Matthew Wilcox <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Vishal Verma <[email protected]>
---
fs/block_dev.c | 5 +++--
fs/ext2/inode.c | 5 +++--
fs/ext4/inode.c | 5 +++--
fs/xfs/xfs_aops.c | 8 ++++----
include/linux/dax.h | 30 ++++++++++++++++++++++++++++++
5 files changed, 43 insertions(+), 10 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 79defba..7c90516 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -168,8 +168,9 @@ blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, loff_t offset)
struct inode *inode = bdev_file_inode(file);

if (IS_DAX(inode))
- return dax_do_io(iocb, inode, iter, offset, blkdev_get_block,
- NULL, DIO_SKIP_DIO_COUNT);
+ return dax_io_fallback(iocb, inode, I_BDEV(inode), iter, offset,
+ blkdev_get_block, blkdev_get_block,
+ NULL, NULL, DIO_SKIP_DIO_COUNT);
return __blockdev_direct_IO(iocb, inode, I_BDEV(inode), iter, offset,
blkdev_get_block, NULL, NULL,
DIO_SKIP_DIO_COUNT);
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 35f2b0bf..1cec54b 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -862,8 +862,9 @@ ext2_direct_IO(struct kiocb *iocb, struct iov_iter *iter, loff_t offset)
ssize_t ret;

if (IS_DAX(inode))
- ret = dax_do_io(iocb, inode, iter, offset, ext2_get_block, NULL,
- DIO_LOCKING);
+ ret = dax_io_fallback(iocb, inode, inode->i_sb->s_bdev, iter,
+ offset, ext2_get_block, ext2_get_block,
+ NULL, NULL, DIO_LOCKING | DIO_SKIP_HOLES);
else
ret = blockdev_direct_IO(iocb, inode, iter, offset,
ext2_get_block);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 6d5d5c1..d29848b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3411,8 +3411,9 @@ static ssize_t ext4_direct_IO_write(struct kiocb *iocb, struct iov_iter *iter,
BUG_ON(ext4_encrypted_inode(inode) && S_ISREG(inode->i_mode));
#endif
if (IS_DAX(inode)) {
- ret = dax_do_io(iocb, inode, iter, offset, get_block_func,
- ext4_end_io_dio, dio_flags);
+ ret = dax_io_fallback(iocb, inode, inode->i_sb->s_bdev, iter,
+ offset, get_block_func, get_block_func,
+ ext4_end_io_dio, NULL, dio_flags);
} else
ret = __blockdev_direct_IO(iocb, inode,
inode->i_sb->s_bdev, iter, offset,
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index e49b240..48fe10a 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1412,7 +1412,7 @@ xfs_vm_direct_IO(
struct inode *inode = iocb->ki_filp->f_mapping->host;
dio_iodone_t *endio = NULL;
int flags = 0;
- struct block_device *bdev;
+ struct block_device *bdev = xfs_find_bdev_for_inode(inode);

if (iov_iter_rw(iter) == WRITE) {
endio = xfs_end_io_direct_write;
@@ -1420,11 +1420,11 @@ xfs_vm_direct_IO(
}

if (IS_DAX(inode)) {
- return dax_do_io(iocb, inode, iter, offset,
- xfs_get_blocks_direct, endio, 0);
+ return dax_io_fallback(iocb, inode, bdev, iter, offset,
+ xfs_get_blocks_direct, xfs_get_blocks_direct,
+ endio, NULL, flags);
}

- bdev = xfs_find_bdev_for_inode(inode);
return __blockdev_direct_IO(iocb, inode, bdev, iter, offset,
xfs_get_blocks_direct, endio, NULL, flags);
}
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 426841a..7200e6f 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -3,6 +3,7 @@

#include <linux/fs.h>
#include <linux/mm.h>
+#include <linux/uio.h>
#include <linux/radix-tree.h>
#include <asm/pgtable.h>

@@ -64,4 +65,33 @@ static inline bool dax_mapping(struct address_space *mapping)
struct writeback_control;
int dax_writeback_mapping_range(struct address_space *mapping,
struct block_device *bdev, struct writeback_control *wbc);
+/*
+ * This is a wrapper for dax_do_io which may be used for writes, and will
+ * perform a fallback to direct_io semantics if the dax_io fails due to a
+ * media error.
+ */
+static inline ssize_t dax_io_fallback(struct kiocb *iocb, struct inode *inode,
+ struct block_device *bdev, struct iov_iter *iter, loff_t pos,
+ get_block_t dax_get_block, get_block_t dio_get_block,
+ dio_iodone_t end_io, dio_submit_t submit_io, int flags)
+{
+ ssize_t retval;
+
+ retval = dax_do_io(iocb, inode, iter, pos, dax_get_block, end_io,
+ flags);
+ if (iov_iter_rw(iter) == WRITE && retval == -EIO) {
+ /*
+ * __dax_do_io may have failed a write due to a bad block.
+ * Retry with direct_io, and if the direct_IO also fails
+ * (with the exception of -EIOCBQUEUED), return -EIO as
+ * that was the original error that led us down the
+ * direct_IO path.
+ */
+ retval = __blockdev_direct_IO(iocb, inode, bdev, iter, pos,
+ dio_get_block, end_io, submit_io, flags);
+ if (retval < 0 && retval != -EIOCBQUEUED)
+ return -EIO;
+ }
+ return retval;
+}
#endif
--
2.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2016-04-23 19:13:39

by Verma, Vishal L

[permalink] [raw]

Subject: [PATCH v3 4/7] dax: use sb_issue_zerout instead of calling dax_clear_sectors

From: Matthew Wilcox <[email protected]>

dax_clear_sectors() cannot handle poisoned blocks. These must be
zeroed using the BIO interface instead. Convert ext2 and XFS to use
only sb_issue_zerout().

Signed-off-by: Matthew Wilcox <[email protected]>
[vishal: Also remove the dax_clear_sectors function entirely]
Signed-off-by: Vishal Verma <[email protected]>
---
fs/dax.c | 32 --------------------------------
fs/ext2/inode.c | 7 +++----
fs/xfs/xfs_bmap_util.c | 15 ++++-----------
include/linux/dax.h | 1 -
4 files changed, 7 insertions(+), 48 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 52f0044..5948d9b 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -116,38 +116,6 @@ struct page *read_dax_sector(struct block_device *bdev, sector_t n)
return page;
}

-/*
- * dax_clear_sectors() is called from within transaction context from XFS,
- * and hence this means the stack from this point must follow GFP_NOFS
- * semantics for all operations.
- */
-int dax_clear_sectors(struct block_device *bdev, sector_t _sector, long _size)
-{
- struct blk_dax_ctl dax = {
- .sector = _sector,
- .size = _size,
- };
-
- might_sleep();
- do {
- long count, sz;
-
- count = dax_map_atomic(bdev, &dax);
- if (count < 0)
- return count;
- sz = min_t(long, count, SZ_128K);
- clear_pmem(dax.addr, sz);
- dax.size -= sz;
- dax.sector += sz / 512;
- dax_unmap_atomic(bdev, &dax);
- cond_resched();
- } while (dax.size);
-
- wmb_pmem();
- return 0;
-}
-EXPORT_SYMBOL_GPL(dax_clear_sectors);
-
static bool buffer_written(struct buffer_head *bh)
{
return buffer_mapped(bh) && !buffer_unwritten(bh);
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 1f07b75..35f2b0bf 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -26,6 +26,7 @@
#include <linux/highuid.h>
#include <linux/pagemap.h>
#include <linux/dax.h>
+#include <linux/blkdev.h>
#include <linux/quotaops.h>
#include <linux/writeback.h>
#include <linux/buffer_head.h>
@@ -737,10 +738,8 @@ static int ext2_get_blocks(struct inode *inode,
* so that it's not found by another thread before it's
* initialised
*/
- err = dax_clear_sectors(inode->i_sb->s_bdev,
- le32_to_cpu(chain[depth-1].key) <<
- (inode->i_blkbits - 9),
- 1 << inode->i_blkbits);
+ err = sb_issue_zeroout(inode->i_sb,
+ le32_to_cpu(chain[depth-1].key), 1, GFP_NOFS);
if (err) {
mutex_unlock(&ei->truncate_mutex);
goto cleanup;
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 3b63098..930ac6a 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -72,18 +72,11 @@ xfs_zero_extent(
struct xfs_mount *mp = ip->i_mount;
xfs_daddr_t sector = xfs_fsb_to_db(ip, start_fsb);
sector_t block = XFS_BB_TO_FSBT(mp, sector);
- ssize_t size = XFS_FSB_TO_B(mp, count_fsb);
-
- if (IS_DAX(VFS_I(ip)))
- return dax_clear_sectors(xfs_find_bdev_for_inode(VFS_I(ip)),
- sector, size);
-
- /*
- * let the block layer decide on the fastest method of
- * implementing the zeroing.
- */
- return sb_issue_zeroout(mp->m_super, block, count_fsb, GFP_NOFS);

+ return blkdev_issue_zeroout(xfs_find_bdev_for_inode(VFS_I(ip)),
+ block << (mp->m_super->s_blocksize_bits - 9),
+ count_fsb << (mp->m_super->s_blocksize_bits - 9),
+ GFP_NOFS, true);
}

/*
diff --git a/include/linux/dax.h b/include/linux/dax.h
index ef94fa7..426841a 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -11,7 +11,6 @@

ssize_t dax_do_io(struct kiocb *, struct inode *, struct iov_iter *, loff_t,
get_block_t, dio_iodone_t, int flags);
-int dax_clear_sectors(struct block_device *bdev, sector_t _sector, long _size);
int dax_zero_page_range(struct inode *, loff_t from, unsigned len, get_block_t);
int dax_truncate_page(struct inode *, loff_t from, get_block_t);
int dax_fault(struct vm_area_struct *, struct vm_fault *, get_block_t);
--
2.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2016-04-23 19:13:41

by Verma, Vishal L

[permalink] [raw]

Subject: [PATCH v3 6/7] dax: for truncate/hole-punch, do zeroing through the driver if possible

In the truncate or hole-punch path in dax, we clear out sub-page ranges.
If these sub-page ranges are sector aligned and sized, we can do the
zeroing through the driver instead so that error-clearing is handled
automatically.

For sub-sector ranges, we still have to rely on clear_pmem and have the
possibility of tripping over errors.

Cc: Matthew Wilcox <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Jan Kara <[email protected]>
Signed-off-by: Vishal Verma <[email protected]>
---
fs/dax.c | 30 +++++++++++++++++++++++++-----
1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 5948d9b..d8c974e 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1196,6 +1196,20 @@ out:
}
EXPORT_SYMBOL_GPL(dax_pfn_mkwrite);

+static bool dax_range_is_aligned(struct block_device *bdev,
+ struct blk_dax_ctl *dax, unsigned int offset,
+ unsigned int length)
+{
+ unsigned short sector_size = bdev_logical_block_size(bdev);
+
+ if (((u64)dax->addr + offset) % sector_size)
+ return false;
+ if (length % sector_size)
+ return false;
+
+ return true;
+}
+
/**
* dax_zero_page_range - zero a range within a page of a DAX file
* @inode: The file being truncated
@@ -1240,11 +1254,17 @@ int dax_zero_page_range(struct inode *inode, loff_t from, unsigned length,
.size = PAGE_SIZE,
};

- if (dax_map_atomic(bdev, &dax) < 0)
- return PTR_ERR(dax.addr);
- clear_pmem(dax.addr + offset, length);
- wmb_pmem();
- dax_unmap_atomic(bdev, &dax);
+ if (dax_range_is_aligned(bdev, &dax, offset, length))
+ return blkdev_issue_zeroout(bdev, dax.sector,
+ length / bdev_logical_block_size(bdev),
+ GFP_NOFS, true);
+ else {
+ if (dax_map_atomic(bdev, &dax) < 0)
+ return PTR_ERR(dax.addr);
+ clear_pmem(dax.addr + offset, length);
+ wmb_pmem();
+ dax_unmap_atomic(bdev, &dax);
+ }
}

return 0;
--
2.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2016-04-23 19:13:42

by Verma, Vishal L

[permalink] [raw]

Subject: [PATCH v3 7/7] dax: fix a comment in dax_zero_page_range and dax_truncate_page

The distinction between PAGE_SIZE and PAGE_CACHE_SIZE was removed in

09cbfea mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release}
macros

The comments for the above functions described a distinction between
those, that is now redundant, so remove those paragraphs

Cc: Matthew Wilcox <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Signed-off-by: Vishal Verma <[email protected]>
---
fs/dax.c | 12 ------------
1 file changed, 12 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index d8c974e..b8fa85a 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1221,12 +1221,6 @@ static bool dax_range_is_aligned(struct block_device *bdev,
* page in a DAX file. This is intended for hole-punch operations. If
* you are truncating a file, the helper function dax_truncate_page() may be
* more convenient.
- *
- * We work in terms of PAGE_SIZE here for commonality with
- * block_truncate_page(), but we could go down to PAGE_SIZE if the filesystem
- * took care of disposing of the unnecessary blocks. Even if the filesystem
- * block size is smaller than PAGE_SIZE, we have to zero the rest of the page
- * since the file might be mmapped.
*/
int dax_zero_page_range(struct inode *inode, loff_t from, unsigned length,
get_block_t get_block)
@@ -1279,12 +1273,6 @@ EXPORT_SYMBOL_GPL(dax_zero_page_range);
*
* Similar to block_truncate_page(), this function can be called by a
* filesystem when it is truncating a DAX file to handle the partial page.
- *
- * We work in terms of PAGE_SIZE here for commonality with
- * block_truncate_page(), but we could go down to PAGE_SIZE if the filesystem
- * took care of disposing of the unnecessary blocks. Even if the filesystem
- * block size is smaller than PAGE_SIZE, we have to zero the rest of the page
- * since the file might be mmapped.
*/
int dax_truncate_page(struct inode *inode, loff_t from, get_block_t get_block)
{
--
2.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>