2018-06-06 10:40:27

by Christoph Hellwig

[permalink] [raw]
Subject: iomap preparations for GFS2

Hi all,

this is a slight rework of the patches from Andreas to prepare for gfs2
using the iomap code. I'd like to start with an immutable branch for
iomap patches in either the XFS tree or a tree of mine own with something
like this so that we can have a nice common base for both the major
iomap changes for XFS and the GFS2 work.


2018-06-06 10:59:06

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 1/6] fs: factor out a __generic_write_end helper

On Wed, Jun 06, 2018 at 12:40:28PM +0200, Christoph Hellwig wrote:
> Bits of the buffer.c based write_end implementations that don't know
> about buffer_heads and can be reused by other implementations.
>
> Signed-off-by: Christoph Hellwig <[email protected]>

This actually already had two reviews:

Reviewed-by: Brian Foster <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>

That got lost through the detour via gfs2.

2018-06-06 10:40:31

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 4/6] iomap: complete partial direct I/O writes synchronously

From: Andreas Gruenbacher <[email protected]>

According to xfstest generic/240, applications seem to expect direct I/O
writes to either complete as a whole or to fail; short direct I/O writes
are apparently not appreciated. This means that when only part of an
asynchronous direct I/O write succeeds, we can either fail the entire
write, or we can wait for the partial write to complete and retry the
remaining write as buffered I/O. The old __blockdev_direct_IO helper
has code for waiting for partial writes to complete; the new
iomap_dio_rw iomap helper does not.

The above mentioned fallback mode is needed for gfs2, which doesn't
allow block allocations under direct I/O to avoid taking cluster-wide
exclusive locks. As a consequence, an asynchronous direct I/O write to
a file range that contains a hole will result in a short write. In that
case, wait for the short write to complete to allow gfs2 to recover.

Signed-off-by: Andreas Gruenbacher <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
---
fs/iomap.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index 206539d369a8..f428baa32185 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -811,6 +811,7 @@ struct iomap_dio {
atomic_t ref;
unsigned flags;
int error;
+ bool wait_for_completion;

union {
/* used during submission and for synchronous completion: */
@@ -914,9 +915,8 @@ static void iomap_dio_bio_end_io(struct bio *bio)
iomap_dio_set_error(dio, blk_status_to_errno(bio->bi_status));

if (atomic_dec_and_test(&dio->ref)) {
- if (is_sync_kiocb(dio->iocb)) {
+ if (dio->wait_for_completion) {
struct task_struct *waiter = dio->submit.waiter;
-
WRITE_ONCE(dio->submit.waiter, NULL);
wake_up_process(waiter);
} else if (dio->flags & IOMAP_DIO_WRITE) {
@@ -1130,13 +1130,12 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
dio->end_io = end_io;
dio->error = 0;
dio->flags = 0;
+ dio->wait_for_completion = is_sync_kiocb(iocb);

dio->submit.iter = iter;
- if (is_sync_kiocb(iocb)) {
- dio->submit.waiter = current;
- dio->submit.cookie = BLK_QC_T_NONE;
- dio->submit.last_queue = NULL;
- }
+ dio->submit.waiter = current;
+ dio->submit.cookie = BLK_QC_T_NONE;
+ dio->submit.last_queue = NULL;

if (iov_iter_rw(iter) == READ) {
if (pos >= dio->i_size)
@@ -1186,7 +1185,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
dio_warn_stale_pagecache(iocb->ki_filp);
ret = 0;

- if (iov_iter_rw(iter) == WRITE && !is_sync_kiocb(iocb) &&
+ if (iov_iter_rw(iter) == WRITE && !dio->wait_for_completion &&
!inode->i_sb->s_dio_done_wq) {
ret = sb_init_dio_done_wq(inode->i_sb);
if (ret < 0)
@@ -1201,8 +1200,10 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
iomap_dio_actor);
if (ret <= 0) {
/* magic error code to fall back to buffered I/O */
- if (ret == -ENOTBLK)
+ if (ret == -ENOTBLK) {
+ dio->wait_for_completion = true;
ret = 0;
+ }
break;
}
pos += ret;
@@ -1223,7 +1224,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
dio->flags &= ~IOMAP_DIO_NEED_SYNC;

if (!atomic_dec_and_test(&dio->ref)) {
- if (!is_sync_kiocb(iocb))
+ if (!dio->wait_for_completion)
return -EIOCBQUEUED;

for (;;) {
--
2.14.2

2018-06-06 10:40:28

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 1/6] fs: factor out a __generic_write_end helper

Bits of the buffer.c based write_end implementations that don't know
about buffer_heads and can be reused by other implementations.

Signed-off-by: Christoph Hellwig <[email protected]>
---
fs/buffer.c | 67 +++++++++++++++++++++++++++++++----------------------------
fs/internal.h | 2 ++
2 files changed, 37 insertions(+), 32 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index cabc045f483d..aba2a948b235 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2076,6 +2076,40 @@ int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len,
}
EXPORT_SYMBOL(block_write_begin);

+int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
+ struct page *page)
+{
+ loff_t old_size = inode->i_size;
+ bool i_size_changed = false;
+
+ /*
+ * No need to use i_size_read() here, the i_size cannot change under us
+ * because we hold i_rwsem.
+ *
+ * But it's important to update i_size while still holding page lock:
+ * page writeout could otherwise come in and zero beyond i_size.
+ */
+ if (pos + copied > inode->i_size) {
+ i_size_write(inode, pos + copied);
+ i_size_changed = true;
+ }
+
+ unlock_page(page);
+ put_page(page);
+
+ if (old_size < pos)
+ pagecache_isize_extended(inode, old_size, pos);
+ /*
+ * Don't mark the inode dirty under page lock. First, it unnecessarily
+ * makes the holding time of page lock longer. Second, it forces lock
+ * ordering of page lock and transaction start for journaling
+ * filesystems.
+ */
+ if (i_size_changed)
+ mark_inode_dirty(inode);
+ return copied;
+}
+
int block_write_end(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata)
@@ -2116,39 +2150,8 @@ int generic_write_end(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata)
{
- struct inode *inode = mapping->host;
- loff_t old_size = inode->i_size;
- int i_size_changed = 0;
-
copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
-
- /*
- * No need to use i_size_read() here, the i_size
- * cannot change under us because we hold i_mutex.
- *
- * But it's important to update i_size while still holding page lock:
- * page writeout could otherwise come in and zero beyond i_size.
- */
- if (pos+copied > inode->i_size) {
- i_size_write(inode, pos+copied);
- i_size_changed = 1;
- }
-
- unlock_page(page);
- put_page(page);
-
- if (old_size < pos)
- pagecache_isize_extended(inode, old_size, pos);
- /*
- * Don't mark the inode dirty under page lock. First, it unnecessarily
- * makes the holding time of page lock longer. Second, it forces lock
- * ordering of page lock and transaction start for journaling
- * filesystems.
- */
- if (i_size_changed)
- mark_inode_dirty(inode);
-
- return copied;
+ return __generic_write_end(mapping->host, pos, copied, page);
}
EXPORT_SYMBOL(generic_write_end);

diff --git a/fs/internal.h b/fs/internal.h
index 980d005b21b4..4a18bdbd2214 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -43,6 +43,8 @@ static inline int __sync_blockdev(struct block_device *bdev, int wait)
extern void guard_bio_eod(int rw, struct bio *bio);
extern int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
get_block_t *get_block, struct iomap *iomap);
+int __generic_write_end(struct inode *inode, loff_t pos, unsigned copied,
+ struct page *page);

/*
* char_dev.c
--
2.14.2

2018-06-06 10:40:32

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 5/6] iomap: generic inline data handling

From: Andreas Gruenbacher <[email protected]>

Add generic inline data handling by adding a pointer to the inline data
region to struct iomap. When handling a buffered IOMAP_INLINE write,
iomap_write_begin will copy the current inline data from the inline data
region into the page cache, and iomap_write_end will copy the changes in
the page cache back to the inline data region.

This doesn't cover inline data reads and direct I/O yet because so far,
we have no users.

Signed-off-by: Andreas Gruenbacher <[email protected]>
[hch: small cleanups to better fit in with other iomap work]
Signed-off-by: Christoph Hellwig <[email protected]>
---
fs/iomap.c | 62 +++++++++++++++++++++++++++++++++++++++++++++------
include/linux/iomap.h | 1 +
2 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index f428baa32185..f2a491b98b7c 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -116,6 +116,26 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
truncate_pagecache_range(inode, max(pos, i_size), pos + len);
}

+static void
+iomap_read_inline_data(struct inode *inode, struct page *page,
+ struct iomap *iomap)
+{
+ size_t size = i_size_read(inode);
+ void *addr;
+
+ if (PageUptodate(page))
+ return;
+
+ BUG_ON(page->index);
+ BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
+
+ addr = kmap_atomic(page);
+ memcpy(addr, iomap->inline_data, size);
+ memset(addr + size, 0, PAGE_SIZE - size);
+ kunmap_atomic(addr);
+ SetPageUptodate(page);
+}
+
static int
iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
struct page **pagep, struct iomap *iomap)
@@ -133,7 +153,11 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
if (!page)
return -ENOMEM;

- status = __block_write_begin_int(page, pos, len, NULL, iomap);
+ if (iomap->type == IOMAP_INLINE)
+ iomap_read_inline_data(inode, page, iomap);
+ else
+ status = __block_write_begin_int(page, pos, len, NULL, iomap);
+
if (unlikely(status)) {
unlock_page(page);
put_page(page);
@@ -146,14 +170,37 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
return status;
}

+static int
+iomap_write_end_inline(struct inode *inode, struct page *page,
+ struct iomap *iomap, loff_t pos, unsigned copied)
+{
+ void *addr;
+
+ WARN_ON_ONCE(!PageUptodate(page));
+ BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
+
+ addr = kmap_atomic(page);
+ memcpy(iomap->inline_data + pos, addr + pos, copied);
+ kunmap_atomic(addr);
+
+ mark_inode_dirty(inode);
+ __generic_write_end(inode, pos, copied, page);
+ return copied;
+}
+
static int
iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
- unsigned copied, struct page *page)
+ unsigned copied, struct page *page, struct iomap *iomap)
{
int ret;

- ret = generic_write_end(NULL, inode->i_mapping, pos, len,
- copied, page, NULL);
+ if (iomap->type == IOMAP_INLINE) {
+ ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
+ } else {
+ ret = generic_write_end(NULL, inode->i_mapping, pos, len,
+ copied, page, NULL);
+ }
+
if (ret < len)
iomap_write_failed(inode, pos, len);
return ret;
@@ -208,7 +255,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,

flush_dcache_page(page);

- status = iomap_write_end(inode, pos, bytes, copied, page);
+ status = iomap_write_end(inode, pos, bytes, copied, page,
+ iomap);
if (unlikely(status < 0))
break;
copied = status;
@@ -302,7 +350,7 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,

WARN_ON_ONCE(!PageUptodate(page));

- status = iomap_write_end(inode, pos, bytes, bytes, page);
+ status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
if (unlikely(status <= 0)) {
if (WARN_ON_ONCE(status == 0))
return -EIO;
@@ -354,7 +402,7 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
zero_user(page, offset, bytes);
mark_page_accessed(page);

- return iomap_write_end(inode, pos, bytes, bytes, page);
+ return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
}

static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 212f4d59bcbf..69ef622f650e 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -56,6 +56,7 @@ struct iomap {
union {
struct block_device *bdev;
struct dax_device *dax_dev;
+ void *inline_data;
};
};

--
2.14.2

2018-06-06 10:40:29

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 2/6] iomap: move bdev and dax_dev in a union

We always either ask for a block device or DAX device mapping, so we can
use the same space for both.

Signed-off-by: Christoph Hellwig <[email protected]>
---
fs/ext2/inode.c | 6 ++++--
fs/ext4/inode.c | 6 ++++--
fs/xfs/xfs_iomap.c | 6 ++++--
include/linux/iomap.h | 6 ++++--
4 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 71635909df3b..8aead4e9dbc1 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -815,9 +815,11 @@ static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
return ret;

iomap->flags = 0;
- iomap->bdev = inode->i_sb->s_bdev;
iomap->offset = (u64)first_block << blkbits;
- iomap->dax_dev = sbi->s_daxdev;
+ if (IS_DAX(inode))
+ iomap->dax_dev = sbi->s_daxdev;
+ else
+ iomap->bdev = inode->i_sb->s_bdev;

if (ret == 0) {
iomap->type = IOMAP_HOLE;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2ea07efbe016..79027e99118f 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3524,8 +3524,10 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
iomap->flags = 0;
if (ext4_inode_datasync_dirty(inode))
iomap->flags |= IOMAP_F_DIRTY;
- iomap->bdev = inode->i_sb->s_bdev;
- iomap->dax_dev = sbi->s_daxdev;
+ if (IS_DAX(inode))
+ iomap->dax_dev = sbi->s_daxdev;
+ else
+ iomap->bdev = inode->i_sb->s_bdev;
iomap->offset = (u64)first_block << blkbits;
iomap->length = (u64)map.m_len << blkbits;

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index c6ce6f9335b6..587110db1ad2 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -70,8 +70,10 @@ xfs_bmbt_to_iomap(
}
iomap->offset = XFS_FSB_TO_B(mp, imap->br_startoff);
iomap->length = XFS_FSB_TO_B(mp, imap->br_blockcount);
- iomap->bdev = xfs_find_bdev_for_inode(VFS_I(ip));
- iomap->dax_dev = xfs_find_daxdev_for_inode(VFS_I(ip));
+ if (IS_DAX(VFS_I(ip)))
+ iomap->bdev = xfs_find_bdev_for_inode(VFS_I(ip));
+ else
+ iomap->dax_dev = xfs_find_daxdev_for_inode(VFS_I(ip));
}

xfs_extlen_t
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index a044a824da85..212f4d59bcbf 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -53,8 +53,10 @@ struct iomap {
u64 length; /* length of mapping, bytes */
u16 type; /* type of mapping */
u16 flags; /* flags for mapping */
- struct block_device *bdev; /* block device for I/O */
- struct dax_device *dax_dev; /* dax_dev for dax operations */
+ union {
+ struct block_device *bdev;
+ struct dax_device *dax_dev;
+ };
};

/*
--
2.14.2

2018-06-06 10:40:30

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 3/6] iomap: mark newly allocated buffer heads as new

From: Andreas Gruenbacher <[email protected]>

In iomap_to_bh, not only mark buffer heads in IOMAP_UNWRITTEN maps as
new, but also buffer heads in IOMAP_MAPPED maps with the IOMAP_F_NEW
flag set. This will be used by filesystems like gfs2, which allocate
blocks in iomap->begin.

Minor corrections to the comment for IOMAP_UNWRITTEN maps.

Signed-off-by: Andreas Gruenbacher <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
---
fs/buffer.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index aba2a948b235..c8c2b7d8b8d6 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1900,15 +1900,16 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
break;
case IOMAP_UNWRITTEN:
/*
- * For unwritten regions, we always need to ensure that
- * sub-block writes cause the regions in the block we are not
- * writing to are zeroed. Set the buffer as new to ensure this.
+ * For unwritten regions, we always need to ensure that regions
+ * in the block we are not writing to are zeroed. Mark the
+ * buffer as new to ensure this.
*/
set_buffer_new(bh);
set_buffer_unwritten(bh);
/* FALLTHRU */
case IOMAP_MAPPED:
- if (offset >= i_size_read(inode))
+ if ((iomap->flags & IOMAP_F_NEW) ||
+ offset >= i_size_read(inode))
set_buffer_new(bh);
bh->b_blocknr = (iomap->addr + offset - iomap->offset) >>
inode->i_blkbits;
--
2.14.2

2018-06-06 10:40:33

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 6/6] iomap: add a page_done callback

This will be used by gfs2 to attach data to transactions for the journaled
data mode. But the concept is generic enough that we might be able to
use it for other purposes like encryption/integrity post-processing in the
future.

Based on a patch from Andreas Gruenbacher.

Signed-off-by: Christoph Hellwig <[email protected]>
---
fs/iomap.c | 3 +++
include/linux/iomap.h | 8 ++++++++
2 files changed, 11 insertions(+)

diff --git a/fs/iomap.c b/fs/iomap.c
index f2a491b98b7c..a7ecdd5ddd17 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -201,6 +201,9 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
copied, page, NULL);
}

+ if (iomap->page_done)
+ iomap->page_done(inode, pos, copied, page, iomap);
+
if (ret < len)
iomap_write_failed(inode, pos, len);
return ret;
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 69ef622f650e..474e2255ac6e 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -58,6 +58,14 @@ struct iomap {
struct dax_device *dax_dev;
void *inline_data;
};
+
+ /*
+ * Called when finished processing a page in the mapping returned in
+ * thus iomap. At least for now this is only supported in the buffered
+ * write path.
+ */
+ void (*page_done)(struct inode *inode, loff_t pos, unsigned copied,
+ struct page *page, struct iomap *iomap);
};

/*
--
2.14.2

2018-06-06 11:09:09

by Andreas Grünbacher

[permalink] [raw]
Subject: Re: [PATCH 1/6] fs: factor out a __generic_write_end helper

2018-06-06 12:59 GMT+02:00 Christoph Hellwig <[email protected]>:
> On Wed, Jun 06, 2018 at 12:40:28PM +0200, Christoph Hellwig wrote:
>> Bits of the buffer.c based write_end implementations that don't know
>> about buffer_heads and can be reused by other implementations.
>>
>> Signed-off-by: Christoph Hellwig <[email protected]>
>
> This actually already had two reviews:
>
> Reviewed-by: Brian Foster <[email protected]>
> Reviewed-by: Darrick J. Wong <[email protected]>
>
> That got lost through the detour via gfs2.

And one by me which has been dropped this time around, but I suppose
it doesn't matter.

Andreas

2018-06-06 11:37:02

by Andreas Gruenbacher

[permalink] [raw]
Subject: Re: [PATCH 2/6] iomap: move bdev and dax_dev in a union

On 6 June 2018 at 12:40, Christoph Hellwig <[email protected]> wrote:
> We always either ask for a block device or DAX device mapping, so we can
> use the same space for both.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> fs/ext2/inode.c | 6 ++++--
> fs/ext4/inode.c | 6 ++++--
> fs/xfs/xfs_iomap.c | 6 ++++--
> include/linux/iomap.h | 6 ++++--
> 4 files changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
> index 71635909df3b..8aead4e9dbc1 100644
> --- a/fs/ext2/inode.c
> +++ b/fs/ext2/inode.c
> @@ -815,9 +815,11 @@ static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> return ret;
>
> iomap->flags = 0;
> - iomap->bdev = inode->i_sb->s_bdev;
> iomap->offset = (u64)first_block << blkbits;
> - iomap->dax_dev = sbi->s_daxdev;
> + if (IS_DAX(inode))
> + iomap->dax_dev = sbi->s_daxdev;
> + else
> + iomap->bdev = inode->i_sb->s_bdev;
>
> if (ret == 0) {
> iomap->type = IOMAP_HOLE;
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 2ea07efbe016..79027e99118f 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3524,8 +3524,10 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
> iomap->flags = 0;
> if (ext4_inode_datasync_dirty(inode))
> iomap->flags |= IOMAP_F_DIRTY;
> - iomap->bdev = inode->i_sb->s_bdev;
> - iomap->dax_dev = sbi->s_daxdev;
> + if (IS_DAX(inode))
> + iomap->dax_dev = sbi->s_daxdev;
> + else
> + iomap->bdev = inode->i_sb->s_bdev;
> iomap->offset = (u64)first_block << blkbits;
> iomap->length = (u64)map.m_len << blkbits;
>
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index c6ce6f9335b6..587110db1ad2 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -70,8 +70,10 @@ xfs_bmbt_to_iomap(
> }
> iomap->offset = XFS_FSB_TO_B(mp, imap->br_startoff);
> iomap->length = XFS_FSB_TO_B(mp, imap->br_blockcount);
> - iomap->bdev = xfs_find_bdev_for_inode(VFS_I(ip));
> - iomap->dax_dev = xfs_find_daxdev_for_inode(VFS_I(ip));
> + if (IS_DAX(VFS_I(ip)))
> + iomap->bdev = xfs_find_bdev_for_inode(VFS_I(ip));
> + else
> + iomap->dax_dev = xfs_find_daxdev_for_inode(VFS_I(ip));

This seems backwards.

> }
>
> xfs_extlen_t
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index a044a824da85..212f4d59bcbf 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -53,8 +53,10 @@ struct iomap {
> u64 length; /* length of mapping, bytes */
> u16 type; /* type of mapping */
> u16 flags; /* flags for mapping */
> - struct block_device *bdev; /* block device for I/O */
> - struct dax_device *dax_dev; /* dax_dev for dax operations */
> + union {
> + struct block_device *bdev;
> + struct dax_device *dax_dev;
> + };
> };
>
> /*
> --
> 2.14.2
>

Andreas

2018-06-06 11:37:22

by Andreas Gruenbacher

[permalink] [raw]
Subject: Re: [PATCH 6/6] iomap: add a page_done callback

On 6 June 2018 at 12:40, Christoph Hellwig <[email protected]> wrote:
> This will be used by gfs2 to attach data to transactions for the journaled
> data mode. But the concept is generic enough that we might be able to
> use it for other purposes like encryption/integrity post-processing in the
> future.
>
> Based on a patch from Andreas Gruenbacher.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> fs/iomap.c | 3 +++
> include/linux/iomap.h | 8 ++++++++
> 2 files changed, 11 insertions(+)
>
> diff --git a/fs/iomap.c b/fs/iomap.c
> index f2a491b98b7c..a7ecdd5ddd17 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -201,6 +201,9 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> copied, page, NULL);
> }
>
> + if (iomap->page_done)
> + iomap->page_done(inode, pos, copied, page, iomap);
> +
> if (ret < len)
> iomap_write_failed(inode, pos, len);
> return ret;
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 69ef622f650e..474e2255ac6e 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -58,6 +58,14 @@ struct iomap {
> struct dax_device *dax_dev;
> void *inline_data;
> };
> +
> + /*
> + * Called when finished processing a page in the mapping returned in
> + * thus iomap. At least for now this is only supported in the buffered

"thus" -> "this"

> + * write path.
> + */
> + void (*page_done)(struct inode *inode, loff_t pos, unsigned copied,
> + struct page *page, struct iomap *iomap);
> };
>
> /*
> --
> 2.14.2
>

Andreas

2018-06-06 11:38:50

by Andreas Gruenbacher

[permalink] [raw]
Subject: Re: iomap preparations for GFS2

On 6 June 2018 at 12:40, Christoph Hellwig <[email protected]> wrote:
> Hi all,
>
> this is a slight rework of the patches from Andreas to prepare for gfs2
> using the iomap code. I'd like to start with an immutable branch for
> iomap patches in either the XFS tree or a tree of mine own with something
> like this so that we can have a nice common base for both the major
> iomap changes for XFS and the GFS2 work.

Okay.

Thanks,
Andreas

2018-06-06 12:05:10

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 2/6] iomap: move bdev and dax_dev in a union

On Wed, Jun 06, 2018 at 01:37:02PM +0200, Andreas Gruenbacher wrote:
> This seems backwards.

It is. So much for last minute polarity fixups..

2018-06-06 15:31:25

by Andreas Grünbacher

[permalink] [raw]
Subject: Re: iomap preparations for GFS2

2018-06-06 13:38 GMT+02:00 Andreas Gruenbacher <[email protected]>:
> On 6 June 2018 at 12:40, Christoph Hellwig <[email protected]> wrote:
>> Hi all,
>>
>> this is a slight rework of the patches from Andreas to prepare for gfs2
>> using the iomap code. I'd like to start with an immutable branch for
>> iomap patches in either the XFS tree or a tree of mine own with something
>> like this so that we can have a nice common base for both the major
>> iomap changes for XFS and the GFS2 work.

When do you want to push this upstream? We've got the gfs2 patches
that depend on it, and we cannot make progress with those before this
set is pushed.

Thanks,
Andreas

2018-06-06 15:39:55

by Christoph Hellwig

[permalink] [raw]
Subject: Re: iomap preparations for GFS2

On Wed, Jun 06, 2018 at 05:31:25PM +0200, Andreas Gr?nbacher wrote:
> 2018-06-06 13:38 GMT+02:00 Andreas Gruenbacher <[email protected]>:
> > On 6 June 2018 at 12:40, Christoph Hellwig <[email protected]> wrote:
> >> Hi all,
> >>
> >> this is a slight rework of the patches from Andreas to prepare for gfs2
> >> using the iomap code. I'd like to start with an immutable branch for
> >> iomap patches in either the XFS tree or a tree of mine own with something
> >> like this so that we can have a nice common base for both the major
> >> iomap changes for XFS and the GFS2 work.
>
> When do you want to push this upstream? We've got the gfs2 patches
> that depend on it, and we cannot make progress with those before this
> set is pushed.

As soon as we have 4.18r-c1 can start work for the next merge
window.

2018-06-07 13:50:03

by Andreas Grünbacher

[permalink] [raw]
Subject: Re: [PATCH 5/6] iomap: generic inline data handling

2018-06-06 12:40 GMT+02:00 Christoph Hellwig <[email protected]>:
> From: Andreas Gruenbacher <[email protected]>
>
> Add generic inline data handling by adding a pointer to the inline data
> region to struct iomap. When handling a buffered IOMAP_INLINE write,
> iomap_write_begin will copy the current inline data from the inline data
> region into the page cache, and iomap_write_end will copy the changes in
> the page cache back to the inline data region.
>
> This doesn't cover inline data reads and direct I/O yet because so far,
> we have no users.
>
> Signed-off-by: Andreas Gruenbacher <[email protected]>
> [hch: small cleanups to better fit in with other iomap work]
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> fs/iomap.c | 62 +++++++++++++++++++++++++++++++++++++++++++++------
> include/linux/iomap.h | 1 +
> 2 files changed, 56 insertions(+), 7 deletions(-)
>
> diff --git a/fs/iomap.c b/fs/iomap.c
> index f428baa32185..f2a491b98b7c 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -116,6 +116,26 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
> truncate_pagecache_range(inode, max(pos, i_size), pos + len);
> }
>
> +static void
> +iomap_read_inline_data(struct inode *inode, struct page *page,
> + struct iomap *iomap)
> +{
> + size_t size = i_size_read(inode);
> + void *addr;
> +
> + if (PageUptodate(page))
> + return;
> +
> + BUG_ON(page->index);
> + BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
> +
> + addr = kmap_atomic(page);
> + memcpy(addr, iomap->inline_data, size);
> + memset(addr + size, 0, PAGE_SIZE - size);
> + kunmap_atomic(addr);
> + SetPageUptodate(page);
> +}

Can you please move iomap_read_inline_data above iomap_write_failed in
fs/iomap.c so that it will end up above the readpage code? We'll need
iomap_read_inline_data in the readpage code.

> static int
> iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> struct page **pagep, struct iomap *iomap)
> @@ -133,7 +153,11 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> if (!page)
> return -ENOMEM;
>
> - status = __block_write_begin_int(page, pos, len, NULL, iomap);
> + if (iomap->type == IOMAP_INLINE)
> + iomap_read_inline_data(inode, page, iomap);
> + else
> + status = __block_write_begin_int(page, pos, len, NULL, iomap);
> +
> if (unlikely(status)) {
> unlock_page(page);
> put_page(page);
> @@ -146,14 +170,37 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> return status;
> }
>
> +static int
> +iomap_write_end_inline(struct inode *inode, struct page *page,
> + struct iomap *iomap, loff_t pos, unsigned copied)
> +{
> + void *addr;
> +
> + WARN_ON_ONCE(!PageUptodate(page));
> + BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
> +
> + addr = kmap_atomic(page);
> + memcpy(iomap->inline_data + pos, addr + pos, copied);
> + kunmap_atomic(addr);
> +
> + mark_inode_dirty(inode);
> + __generic_write_end(inode, pos, copied, page);
> + return copied;
> +}
> +
> static int
> iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> - unsigned copied, struct page *page)
> + unsigned copied, struct page *page, struct iomap *iomap)
> {
> int ret;
>
> - ret = generic_write_end(NULL, inode->i_mapping, pos, len,
> - copied, page, NULL);
> + if (iomap->type == IOMAP_INLINE) {
> + ret = iomap_write_end_inline(inode, page, iomap, pos, copied);
> + } else {
> + ret = generic_write_end(NULL, inode->i_mapping, pos, len,
> + copied, page, NULL);
> + }
> +
> if (ret < len)
> iomap_write_failed(inode, pos, len);
> return ret;
> @@ -208,7 +255,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>
> flush_dcache_page(page);
>
> - status = iomap_write_end(inode, pos, bytes, copied, page);
> + status = iomap_write_end(inode, pos, bytes, copied, page,
> + iomap);
> if (unlikely(status < 0))
> break;
> copied = status;
> @@ -302,7 +350,7 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>
> WARN_ON_ONCE(!PageUptodate(page));
>
> - status = iomap_write_end(inode, pos, bytes, bytes, page);
> + status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
> if (unlikely(status <= 0)) {
> if (WARN_ON_ONCE(status == 0))
> return -EIO;
> @@ -354,7 +402,7 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
> zero_user(page, offset, bytes);
> mark_page_accessed(page);
>
> - return iomap_write_end(inode, pos, bytes, bytes, page);
> + return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
> }
>
> static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 212f4d59bcbf..69ef622f650e 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -56,6 +56,7 @@ struct iomap {
> union {
> struct block_device *bdev;
> struct dax_device *dax_dev;
> + void *inline_data;
> };
> };
>
> --
> 2.14.2
>

Thanks,
Andreas