2016-11-01 21:06:10

by Jan Kara

[permalink] [raw]
Subject: [PATCH 0/11] ext4: Convert ext4 DAX IO to iomap framework

Hello,

this patch set converts ext4 DAX IO paths to the new iomap framework and
removes the old bh-based DAX functions. As a result ext4 gains PMD page
fault support, also some other minor bugs get fixed. The patch set is based
on Ross' DAX PMD page fault support series [1]. It passes xfstests both in
DAX and non-DAX mode.

The question is how shall we merge this. If Dave is pulling PMD patches through
XFS tree, then these patches could go there as well (chances for conflicts
with other ext4 stuff are relatively low) or Dave could just export a stable
branch with PMD series which Ted would just pull...

Honza

[1] http://www.spinics.net/lists/linux-mm/msg115247.html


2016-11-01 21:06:14

by Jan Kara

[permalink] [raw]
Subject: [PATCH 04/11] ext4: Convert DAX reads to iomap infrastructure

Implement basic iomap_begin function that handles reading and use it for
DAX reads.

Signed-off-by: Jan Kara <[email protected]>
---
fs/ext4/ext4.h | 2 ++
fs/ext4/file.c | 40 +++++++++++++++++++++++++++++++++++++++-
fs/ext4/inode.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 95 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 282a51b07c57..098b39910001 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3271,6 +3271,8 @@ static inline bool ext4_aligned_io(struct inode *inode, loff_t off, loff_t len)
return IS_ALIGNED(off, blksize) && IS_ALIGNED(len, blksize);
}

+extern struct iomap_ops ext4_iomap_ops;
+
#endif /* __KERNEL__ */

#define EFSBADCRC EBADMSG /* Bad CRC detected */
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 55f8b922b76d..28ebc2418dc2 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -31,6 +31,44 @@
#include "xattr.h"
#include "acl.h"

+#ifdef CONFIG_FS_DAX
+static ssize_t ext4_dax_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+ struct inode *inode = iocb->ki_filp->f_mapping->host;
+ ssize_t ret;
+
+ inode_lock_shared(inode);
+ /*
+ * Recheck under inode lock - at this point we are sure it cannot
+ * change anymore
+ */
+ if (!IS_DAX(inode)) {
+ inode_unlock_shared(inode);
+ /* Fallback to buffered IO in case we cannot support DAX */
+ return generic_file_read_iter(iocb, to);
+ }
+ ret = dax_iomap_rw(iocb, to, &ext4_iomap_ops);
+ inode_unlock_shared(inode);
+
+ file_accessed(iocb->ki_filp);
+ return ret;
+}
+#endif
+
+static ssize_t ext4_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+ struct inode *inode = iocb->ki_filp->f_mapping->host;
+
+ if (!iov_iter_count(to))
+ return 0; /* skip atime */
+
+#ifdef CONFIG_FS_DAX
+ if (IS_DAX(inode))
+ return ext4_dax_read_iter(iocb, to);
+#endif
+ return generic_file_read_iter(iocb, to);
+}
+
/*
* Called when an inode is released. Note that this is different
* from ext4_file_open: open gets called at every open, but release
@@ -691,7 +729,7 @@ loff_t ext4_llseek(struct file *file, loff_t offset, int whence)

const struct file_operations ext4_file_operations = {
.llseek = ext4_llseek,
- .read_iter = generic_file_read_iter,
+ .read_iter = ext4_file_read_iter,
.write_iter = ext4_file_write_iter,
.unlocked_ioctl = ext4_ioctl,
#ifdef CONFIG_COMPAT
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 4cbd1b24c237..cc10145ea98b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -37,6 +37,7 @@
#include <linux/printk.h>
#include <linux/slab.h>
#include <linux/bitops.h>
+#include <linux/iomap.h>

#include "ext4_jbd2.h"
#include "xattr.h"
@@ -3305,6 +3306,59 @@ int ext4_dax_get_block(struct inode *inode, sector_t iblock,
clear_buffer_new(bh_result);
return 0;
}
+
+static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
+ unsigned flags, struct iomap *iomap)
+{
+ unsigned int blkbits = inode->i_blkbits;
+ unsigned long first_block = offset >> blkbits;
+ unsigned long last_block = (offset + length - 1) >> blkbits;
+ struct ext4_map_blocks map;
+ int ret;
+
+ if (flags & IOMAP_WRITE)
+ return -EIO;
+
+ if (WARN_ON_ONCE(ext4_has_inline_data(inode)))
+ return -ERANGE;
+
+ map.m_lblk = first_block;
+ map.m_len = last_block - first_block + 1;
+
+ ret = ext4_map_blocks(NULL, inode, &map, 0);
+ if (ret < 0)
+ return ret;
+
+ iomap->flags = 0;
+ iomap->bdev = inode->i_sb->s_bdev;
+ iomap->offset = first_block << blkbits;
+
+ if (ret == 0) {
+ iomap->type = IOMAP_HOLE;
+ iomap->blkno = IOMAP_NULL_BLOCK;
+ iomap->length = (u64)map.m_len << blkbits;
+ } else {
+ if (map.m_flags & EXT4_MAP_MAPPED) {
+ iomap->type = IOMAP_MAPPED;
+ } else if (map.m_flags & EXT4_MAP_UNWRITTEN) {
+ iomap->type = IOMAP_UNWRITTEN;
+ } else {
+ WARN_ON_ONCE(1);
+ return -EIO;
+ }
+ iomap->blkno = (sector_t)map.m_pblk << (blkbits - 9);
+ iomap->length = (u64)map.m_len << blkbits;
+ }
+
+ if (map.m_flags & EXT4_MAP_NEW)
+ iomap->flags |= IOMAP_F_NEW;
+ return 0;
+}
+
+struct iomap_ops ext4_iomap_ops = {
+ .iomap_begin = ext4_iomap_begin,
+};
+
#else
/* Just define empty function, it will never get called. */
int ext4_dax_get_block(struct inode *inode, sector_t iblock,
--
2.6.6


2016-11-01 21:06:12

by Jan Kara

[permalink] [raw]
Subject: [PATCH 02/11] ext4: Allow unaligned unlocked DAX IO

Currently we don't allow unaligned writes without inode_lock. This is
because zeroing of partial blocks could cause data corruption for racing
unaligned writes to the same block. However DAX handles zeroing during
block allocation and thus zeroing of partial blocks cannot race. Allow
DAX unaligned IO to run without inode_lock.

Signed-off-by: Jan Kara <[email protected]>
---
fs/ext4/file.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index a6a7becb9465..55f8b922b76d 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -161,7 +161,8 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter *from)

iocb->private = &overwrite;
/* Check whether we do a DIO overwrite or not */
- if (o_direct && ext4_should_dioread_nolock(inode) && !unaligned_aio &&
+ if (((o_direct && !unaligned_aio) || IS_DAX(inode)) &&
+ ext4_should_dioread_nolock(inode) &&
ext4_overwrite_io(inode, iocb->ki_pos, iov_iter_count(from)))
overwrite = 1;

--
2.6.6


2016-11-01 22:12:35

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH 0/11] ext4: Convert ext4 DAX IO to iomap framework

On Tue, Nov 01, 2016 at 10:06:10PM +0100, Jan Kara wrote:
> Hello,
>
> this patch set converts ext4 DAX IO paths to the new iomap framework and
> removes the old bh-based DAX functions. As a result ext4 gains PMD page
> fault support, also some other minor bugs get fixed. The patch set is based
> on Ross' DAX PMD page fault support series [1]. It passes xfstests both in
> DAX and non-DAX mode.
>
> The question is how shall we merge this. If Dave is pulling PMD patches through
> XFS tree, then these patches could go there as well (chances for conflicts
> with other ext4 stuff are relatively low) or Dave could just export a stable
> branch with PMD series which Ted would just pull...

I plan to grab Ross's PMD series in the next couple of days and I'll
push it out as a stable topic branch once I've sanity tested it. I
don't really want to take a big chunk of ext4 stuff through the XFS
tree if it can be avoided....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2016-11-01 22:45:50

by Ross Zwisler

[permalink] [raw]
Subject: Re: [PATCH 0/11] ext4: Convert ext4 DAX IO to iomap framework

On Wed, Nov 02, 2016 at 09:12:35AM +1100, Dave Chinner wrote:
> On Tue, Nov 01, 2016 at 10:06:10PM +0100, Jan Kara wrote:
> > Hello,
> >
> > this patch set converts ext4 DAX IO paths to the new iomap framework and
> > removes the old bh-based DAX functions. As a result ext4 gains PMD page
> > fault support, also some other minor bugs get fixed. The patch set is based
> > on Ross' DAX PMD page fault support series [1]. It passes xfstests both in
> > DAX and non-DAX mode.
> >
> > The question is how shall we merge this. If Dave is pulling PMD patches through
> > XFS tree, then these patches could go there as well (chances for conflicts
> > with other ext4 stuff are relatively low) or Dave could just export a stable
> > branch with PMD series which Ted would just pull...
>
> I plan to grab Ross's PMD series in the next couple of days and I'll
> push it out as a stable topic branch once I've sanity tested it. I
> don't really want to take a big chunk of ext4 stuff through the XFS
> tree if it can be avoided....

Yea, we also need to figure out how to get Jan's "dax: Clear dirty bits after
flushing caches" set merged, which is mostly MM stuff and I think will go
through akpm's tree? That set is also based on my PMD stuff.

2016-11-01 23:09:03

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 0/11] ext4: Convert ext4 DAX IO to iomap framework

On Tue 01-11-16 16:45:50, Ross Zwisler wrote:
> On Wed, Nov 02, 2016 at 09:12:35AM +1100, Dave Chinner wrote:
> > On Tue, Nov 01, 2016 at 10:06:10PM +0100, Jan Kara wrote:
> > > Hello,
> > >
> > > this patch set converts ext4 DAX IO paths to the new iomap framework and
> > > removes the old bh-based DAX functions. As a result ext4 gains PMD page
> > > fault support, also some other minor bugs get fixed. The patch set is based
> > > on Ross' DAX PMD page fault support series [1]. It passes xfstests both in
> > > DAX and non-DAX mode.
> > >
> > > The question is how shall we merge this. If Dave is pulling PMD patches through
> > > XFS tree, then these patches could go there as well (chances for conflicts
> > > with other ext4 stuff are relatively low) or Dave could just export a stable
> > > branch with PMD series which Ted would just pull...
> >
> > I plan to grab Ross's PMD series in the next couple of days and I'll
> > push it out as a stable topic branch once I've sanity tested it. I
> > don't really want to take a big chunk of ext4 stuff through the XFS
> > tree if it can be avoided....
>
> Yea, we also need to figure out how to get Jan's "dax: Clear dirty bits after
> flushing caches" set merged, which is mostly MM stuff and I think will go
> through akpm's tree? That set is also based on my PMD stuff.

Yeah, I've spoken to Andrew and he wants to take the MM changes through his
tree. I'll talk to him how to make this happen given the patches the series
depends on but the series still needs some review so "how to merge" is not
exactly a question of the day...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2016-11-02 13:03:53

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 0/11] ext4: Convert ext4 DAX IO to iomap framework

On Wed, Nov 02, 2016 at 12:09:03AM +0100, Jan Kara wrote:
> > > I plan to grab Ross's PMD series in the next couple of days and I'll
> > > push it out as a stable topic branch once I've sanity tested it. I
> > > don't really want to take a big chunk of ext4 stuff through the XFS
> > > tree if it can be avoided....
> >
> > Yea, we also need to figure out how to get Jan's "dax: Clear dirty bits after
> > flushing caches" set merged, which is mostly MM stuff and I think will go
> > through akpm's tree? That set is also based on my PMD stuff.
>
> Yeah, I've spoken to Andrew and he wants to take the MM changes through his
> tree. I'll talk to him how to make this happen given the patches the series
> depends on but the series still needs some review so "how to merge" is not
> exactly a question of the day...

I assume there isn't a convenient git tree I can pull for the purposes
of testing Jan's patch series and for the purposes of seeing how
everything fits together? In other words, as far as I understand
things there currently isn't an unstable topic branch I can pull for
the purposes of review and testing, but hopefully Dave will make a
stable topic branch in a few days. Is that right?

- Ted

2016-11-02 14:27:46

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 02/11] ext4: Allow unaligned unlocked DAX IO

On Tue, Nov 01, 2016 at 10:06:12PM +0100, Jan Kara wrote:
> Currently we don't allow unaligned writes without inode_lock. This is
> because zeroing of partial blocks could cause data corruption for racing
> unaligned writes to the same block. However DAX handles zeroing during
> block allocation and thus zeroing of partial blocks cannot race. Allow
> DAX unaligned IO to run without inode_lock.

DAX writes should always take the inode lock. Without that you break
the expectations of existing applications that did not specify O_DIRECT.

2016-11-02 22:23:55

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH 0/11] ext4: Convert ext4 DAX IO to iomap framework

On Wed, Nov 02, 2016 at 09:03:53AM -0400, Theodore Ts'o wrote:
> On Wed, Nov 02, 2016 at 12:09:03AM +0100, Jan Kara wrote:
> > > > I plan to grab Ross's PMD series in the next couple of days and I'll
> > > > push it out as a stable topic branch once I've sanity tested it. I
> > > > don't really want to take a big chunk of ext4 stuff through the XFS
> > > > tree if it can be avoided....
> > >
> > > Yea, we also need to figure out how to get Jan's "dax: Clear dirty bits after
> > > flushing caches" set merged, which is mostly MM stuff and I think will go
> > > through akpm's tree? That set is also based on my PMD stuff.
> >
> > Yeah, I've spoken to Andrew and he wants to take the MM changes through his
> > tree. I'll talk to him how to make this happen given the patches the series
> > depends on but the series still needs some review so "how to merge" is not
> > exactly a question of the day...
>
> I assume there isn't a convenient git tree I can pull for the purposes
> of testing Jan's patch series and for the purposes of seeing how
> everything fits together?

I think Jan posted a test branch with all the current stuff under
review in it here:

git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git dax

> In other words, as far as I understand
> things there currently isn't an unstable topic branch I can pull for
> the purposes of review and testing, but hopefully Dave will make a
> stable topic branch in a few days. Is that right?

I've got the dax-pmd topic branch running regression tests right
now. I'm hoping that all goes well and if it does I'll get it out
by the end of the day.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2016-11-04 00:55:02

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 0/11] ext4: Convert ext4 DAX IO to iomap framework

On Wed 02-11-16 09:03:53, Ted Tso wrote:
> On Wed, Nov 02, 2016 at 12:09:03AM +0100, Jan Kara wrote:
> > > > I plan to grab Ross's PMD series in the next couple of days and I'll
> > > > push it out as a stable topic branch once I've sanity tested it. I
> > > > don't really want to take a big chunk of ext4 stuff through the XFS
> > > > tree if it can be avoided....
> > >
> > > Yea, we also need to figure out how to get Jan's "dax: Clear dirty bits after
> > > flushing caches" set merged, which is mostly MM stuff and I think will go
> > > through akpm's tree? That set is also based on my PMD stuff.
> >
> > Yeah, I've spoken to Andrew and he wants to take the MM changes through his
> > tree. I'll talk to him how to make this happen given the patches the series
> > depends on but the series still needs some review so "how to merge" is not
> > exactly a question of the day...
>
> I assume there isn't a convenient git tree I can pull for the purposes
> of testing Jan's patch series and for the purposes of seeing how
> everything fits together? In other words, as far as I understand
> things there currently isn't an unstable topic branch I can pull for
> the purposes of review and testing, but hopefully Dave will make a
> stable topic branch in a few days. Is that right?

So just for getting idea what the changes are about and to run some tests
you can use my unstable branch which Dave mentioned:

git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git dax

As Christoph had some review comments, I'll be updating that probably
tomorrow when I have tested updated series. I'll also add some other iomap
related patches that were sitting later in my branch after DAX mmap changes
but which, as I realized, don't really depend on them.

WRT the merging the plan is Dave will take all patches that are
prerequisite for this series and push them out in a stable branch which you
can then pull into your tree to merge ext4 iomap changes.

Honza
--
Jan Kara <jack-IBi9RG/[email protected]>
SUSE Labs, CR

2016-11-04 04:46:52

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 02/11] ext4: Allow unaligned unlocked DAX IO

On Wed 02-11-16 07:27:46, Christoph Hellwig wrote:
> On Tue, Nov 01, 2016 at 10:06:12PM +0100, Jan Kara wrote:
> > Currently we don't allow unaligned writes without inode_lock. This is
> > because zeroing of partial blocks could cause data corruption for racing
> > unaligned writes to the same block. However DAX handles zeroing during
> > block allocation and thus zeroing of partial blocks cannot race. Allow
> > DAX unaligned IO to run without inode_lock.
>
> DAX writes should always take the inode lock. Without that you break
> the expectations of existing applications that did not specify O_DIRECT.

Yeah, actually this patch has no practical effect since later iomap
conversion just effectively reverts this patch and I didn't realize it.
I'll just drop it. Thanks.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR