2015-07-03 15:15:30

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH] ext4: Return the length of a hole from get_block

From: Matthew Wilcox <[email protected]>

Currently, if ext4's get_block encounters a hole, it does not modify the
buffer_head. That's fine for many callers, but for DAX, it's useful to
know how large the hole is. XFS already returns the length of the hole,
so this improvement should not confuse any callers.

Signed-off-by: Matthew Wilcox <[email protected]>
---
fs/ext4/ext4.h | 6 ++++--
fs/ext4/inode.c | 6 +++++-
2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 9a83f14..7b7c097 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -154,12 +154,14 @@ struct ext4_allocation_request {
* well as to store the information returned by ext4_map_blocks(). It
* takes less room on the stack than a struct buffer_head.
*/
+#define EXT4_MAP_UPTODATE (1 << BH_Uptodate)
#define EXT4_MAP_NEW (1 << BH_New)
#define EXT4_MAP_MAPPED (1 << BH_Mapped)
#define EXT4_MAP_UNWRITTEN (1 << BH_Unwritten)
#define EXT4_MAP_BOUNDARY (1 << BH_Boundary)
-#define EXT4_MAP_FLAGS (EXT4_MAP_NEW | EXT4_MAP_MAPPED |\
- EXT4_MAP_UNWRITTEN | EXT4_MAP_BOUNDARY)
+#define EXT4_MAP_FLAGS (EXT4_MAP_UPTODATE | EXT4_MAP_NEW | \
+ EXT4_MAP_MAPPED | EXT4_MAP_UNWRITTEN | \
+ EXT4_MAP_BOUNDARY)

struct ext4_map_blocks {
ext4_fsblk_t m_pblk;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 9b46f6f..0fc49ac 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -494,7 +494,11 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
retval = map->m_len;
map->m_len = retval;
} else if (ext4_es_is_delayed(&es) || ext4_es_is_hole(&es)) {
- retval = 0;
+ map->m_flags |= EXT4_MAP_UPTODATE;
+ retval = es.es_len - (map->m_lblk - es.es_lblk);
+ if (retval > map->m_len)
+ retval = map->m_len;
+ map->m_len = retval;
} else {
BUG_ON(1);
}
--
2.1.4



2015-07-13 15:16:12

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] ext4: Return the length of a hole from get_block

On Fri 03-07-15 11:15:11, Matthew Wilcox wrote:
> From: Matthew Wilcox <[email protected]>
>
> Currently, if ext4's get_block encounters a hole, it does not modify the
> buffer_head. That's fine for many callers, but for DAX, it's useful to
> know how large the hole is. XFS already returns the length of the hole,
> so this improvement should not confuse any callers.
>
> Signed-off-by: Matthew Wilcox <[email protected]>

So I'm somewhat wondering: What is the reason of BH_Uptodate flag being
set? I can see the XFS sets it in some cases as well but the use of the
flag isn't really clear to me...

Honza

> ---
> fs/ext4/ext4.h | 6 ++++--
> fs/ext4/inode.c | 6 +++++-
> 2 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 9a83f14..7b7c097 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -154,12 +154,14 @@ struct ext4_allocation_request {
> * well as to store the information returned by ext4_map_blocks(). It
> * takes less room on the stack than a struct buffer_head.
> */
> +#define EXT4_MAP_UPTODATE (1 << BH_Uptodate)
> #define EXT4_MAP_NEW (1 << BH_New)
> #define EXT4_MAP_MAPPED (1 << BH_Mapped)
> #define EXT4_MAP_UNWRITTEN (1 << BH_Unwritten)
> #define EXT4_MAP_BOUNDARY (1 << BH_Boundary)
> -#define EXT4_MAP_FLAGS (EXT4_MAP_NEW | EXT4_MAP_MAPPED |\
> - EXT4_MAP_UNWRITTEN | EXT4_MAP_BOUNDARY)
> +#define EXT4_MAP_FLAGS (EXT4_MAP_UPTODATE | EXT4_MAP_NEW | \
> + EXT4_MAP_MAPPED | EXT4_MAP_UNWRITTEN | \
> + EXT4_MAP_BOUNDARY)
>
> struct ext4_map_blocks {
> ext4_fsblk_t m_pblk;
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 9b46f6f..0fc49ac 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -494,7 +494,11 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
> retval = map->m_len;
> map->m_len = retval;
> } else if (ext4_es_is_delayed(&es) || ext4_es_is_hole(&es)) {
> - retval = 0;
> + map->m_flags |= EXT4_MAP_UPTODATE;
> + retval = es.es_len - (map->m_lblk - es.es_lblk);
> + if (retval > map->m_len)
> + retval = map->m_len;
> + map->m_len = retval;
> } else {
> BUG_ON(1);
> }
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-07-13 15:26:17

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] ext4: Return the length of a hole from get_block

On Mon, Jul 13, 2015 at 05:16:10PM +0200, Jan Kara wrote:
> On Fri 03-07-15 11:15:11, Matthew Wilcox wrote:
> > From: Matthew Wilcox <[email protected]>
> >
> > Currently, if ext4's get_block encounters a hole, it does not modify the
> > buffer_head. That's fine for many callers, but for DAX, it's useful to
> > know how large the hole is. XFS already returns the length of the hole,
> > so this improvement should not confuse any callers.
> >
> > Signed-off-by: Matthew Wilcox <[email protected]>
>
> So I'm somewhat wondering: What is the reason of BH_Uptodate flag being
> set? I can see the XFS sets it in some cases as well but the use of the
> flag isn't really clear to me...

No clue. I'm just following the documentation in buffer.c:

* NOTE! All mapped/uptodate combinations are valid:
*
* Mapped Uptodate Meaning
*
* No No "unknown" - must do get_block()
* No Yes "hole" - zero-filled
* Yes No "allocated" - allocated on disk, not read in
* Yes Yes "valid" - allocated and up-to-date in memory.


2015-07-14 09:02:49

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] ext4: Return the length of a hole from get_block

On Mon 13-07-15 11:26:15, Matthew Wilcox wrote:
> On Mon, Jul 13, 2015 at 05:16:10PM +0200, Jan Kara wrote:
> > On Fri 03-07-15 11:15:11, Matthew Wilcox wrote:
> > > From: Matthew Wilcox <[email protected]>
> > >
> > > Currently, if ext4's get_block encounters a hole, it does not modify the
> > > buffer_head. That's fine for many callers, but for DAX, it's useful to
> > > know how large the hole is. XFS already returns the length of the hole,
> > > so this improvement should not confuse any callers.
> > >
> > > Signed-off-by: Matthew Wilcox <[email protected]>
> >
> > So I'm somewhat wondering: What is the reason of BH_Uptodate flag being
> > set? I can see the XFS sets it in some cases as well but the use of the
> > flag isn't really clear to me...
>
> No clue. I'm just following the documentation in buffer.c:
>
> * NOTE! All mapped/uptodate combinations are valid:
> *
> * Mapped Uptodate Meaning
> *
> * No No "unknown" - must do get_block()
> * No Yes "hole" - zero-filled
> * Yes No "allocated" - allocated on disk, not read in
> * Yes Yes "valid" - allocated and up-to-date in memory.

OK, but that speaks about buffer head attached to a page. get_block()
callback gets a temporary bh (at least in some cases) only so that it can
communicate result of block mapping. And BH_Uptodate should be set only if
data in the buffer is properly filled (which cannot be the case for
temporary bh which doesn't have *any* data) and it simply isn't the case
even for bh attached to a page because ext4 get_block() functions don't
touch bh->b_data at all. So I just wouldn't set BH_Uptodate in get_block()
at all..

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-07-14 13:49:07

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] ext4: Return the length of a hole from get_block

On Tue, Jul 14, 2015 at 11:02:46AM +0200, Jan Kara wrote:
> On Mon 13-07-15 11:26:15, Matthew Wilcox wrote:
> > On Mon, Jul 13, 2015 at 05:16:10PM +0200, Jan Kara wrote:
> > > On Fri 03-07-15 11:15:11, Matthew Wilcox wrote:
> > > > From: Matthew Wilcox <[email protected]>
> > > >
> > > > Currently, if ext4's get_block encounters a hole, it does not modify the
> > > > buffer_head. That's fine for many callers, but for DAX, it's useful to
> > > > know how large the hole is. XFS already returns the length of the hole,
> > > > so this improvement should not confuse any callers.
> > > >
> > > > Signed-off-by: Matthew Wilcox <[email protected]>
> > >
> > > So I'm somewhat wondering: What is the reason of BH_Uptodate flag being
> > > set? I can see the XFS sets it in some cases as well but the use of the
> > > flag isn't really clear to me...
> >
> > No clue. I'm just following the documentation in buffer.c:
> >
> > * NOTE! All mapped/uptodate combinations are valid:
> > *
> > * Mapped Uptodate Meaning
> > *
> > * No No "unknown" - must do get_block()
> > * No Yes "hole" - zero-filled
> > * Yes No "allocated" - allocated on disk, not read in
> > * Yes Yes "valid" - allocated and up-to-date in memory.
>
> OK, but that speaks about buffer head attached to a page. get_block()
> callback gets a temporary bh (at least in some cases) only so that it can
> communicate result of block mapping. And BH_Uptodate should be set only if
> data in the buffer is properly filled (which cannot be the case for
> temporary bh which doesn't have *any* data) and it simply isn't the case
> even for bh attached to a page because ext4 get_block() functions don't
> touch bh->b_data at all. So I just wouldn't set BH_Uptodate in get_block()
> at all..

OK, but how should DAX then distinguish between an old-style filesystem
(like current ext4) which reports "unknown" and leaves b_size untouched
when it encounters a hole, versus a new-style filesystem (XFS, ext4 with
this patch) which wants to report the size of a hole in b_size? The use
of Uptodate currently distinguishes the two cases.

Plus, why would you want bh's to be treated differently, depending on
whether they're stack-based or attached to a page? That seems even more
confusing than bh's already are.

2015-07-14 22:24:27

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] ext4: Return the length of a hole from get_block

On Tue, Jul 14, 2015 at 09:48:51AM -0400, Matthew Wilcox wrote:
> On Tue, Jul 14, 2015 at 11:02:46AM +0200, Jan Kara wrote:
> > On Mon 13-07-15 11:26:15, Matthew Wilcox wrote:
> > > On Mon, Jul 13, 2015 at 05:16:10PM +0200, Jan Kara wrote:
> > > > On Fri 03-07-15 11:15:11, Matthew Wilcox wrote:
> > > > > From: Matthew Wilcox <[email protected]>
> > > > >
> > > > > Currently, if ext4's get_block encounters a hole, it does not modify the
> > > > > buffer_head. That's fine for many callers, but for DAX, it's useful to
> > > > > know how large the hole is. XFS already returns the length of the hole,
> > > > > so this improvement should not confuse any callers.
> > > > >
> > > > > Signed-off-by: Matthew Wilcox <[email protected]>
> > > >
> > > > So I'm somewhat wondering: What is the reason of BH_Uptodate flag being
> > > > set? I can see the XFS sets it in some cases as well but the use of the
> > > > flag isn't really clear to me...
> > >
> > > No clue. I'm just following the documentation in buffer.c:
> > >
> > > * NOTE! All mapped/uptodate combinations are valid:
> > > *
> > > * Mapped Uptodate Meaning
> > > *
> > > * No No "unknown" - must do get_block()
> > > * No Yes "hole" - zero-filled
> > > * Yes No "allocated" - allocated on disk, not read in
> > > * Yes Yes "valid" - allocated and up-to-date in memory.
> >
> > OK, but that speaks about buffer head attached to a page. get_block()
> > callback gets a temporary bh (at least in some cases) only so that it can
> > communicate result of block mapping. And BH_Uptodate should be set only if
> > data in the buffer is properly filled (which cannot be the case for
> > temporary bh which doesn't have *any* data) and it simply isn't the case
> > even for bh attached to a page because ext4 get_block() functions don't
> > touch bh->b_data at all. So I just wouldn't set BH_Uptodate in get_block()
> > at all..
>
> OK, but how should DAX then distinguish between an old-style filesystem
> (like current ext4) which reports "unknown" and leaves b_size untouched
> when it encounters a hole, versus a new-style filesystem (XFS, ext4 with
> this patch) which wants to report the size of a hole in b_size? The use
> of Uptodate currently distinguishes the two cases.
>
> Plus, why would you want bh's to be treated differently, depending on
> whether they're stack-based or attached to a page? That seems even more
> confusing than bh's already are.

The best solution to this is to kill get_block() and move to an
iomap() interface using a struct iomap to pass the mapped region
back to the caller. We're already moving this way (*) and when I
remove buffer heads from XFS I'll be moving it to an iomap based
infrastructure and so I'll want to convert the DAX code at the same
time. Also, ISTR Christoph directed the GFS2 folk to implementing
the iomap interface to solve this same get_block hole problem the
are having with fiemap(?).

IMO we should just stop abusing bufferheads for this function and
add an iomap method that has sane, clear semantics that aren't
entangled with something carried on a page to track it's state....

(*) See https://lkml.org/lkml/2013/7/23/809 for an example of
multiple page write contexts using ->iomap callouts, and note how
similar that interface is to the PNFS ->map_blocks export operation
in include/linux/exportfs.h.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2015-07-15 09:59:09

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] ext4: Return the length of a hole from get_block

On Tue 14-07-15 09:48:51, Matthew Wilcox wrote:
> On Tue, Jul 14, 2015 at 11:02:46AM +0200, Jan Kara wrote:
> > On Mon 13-07-15 11:26:15, Matthew Wilcox wrote:
> > > On Mon, Jul 13, 2015 at 05:16:10PM +0200, Jan Kara wrote:
> > > > On Fri 03-07-15 11:15:11, Matthew Wilcox wrote:
> > > > > From: Matthew Wilcox <[email protected]>
> > > > >
> > > > > Currently, if ext4's get_block encounters a hole, it does not modify the
> > > > > buffer_head. That's fine for many callers, but for DAX, it's useful to
> > > > > know how large the hole is. XFS already returns the length of the hole,
> > > > > so this improvement should not confuse any callers.
> > > > >
> > > > > Signed-off-by: Matthew Wilcox <[email protected]>
> > > >
> > > > So I'm somewhat wondering: What is the reason of BH_Uptodate flag being
> > > > set? I can see the XFS sets it in some cases as well but the use of the
> > > > flag isn't really clear to me...
> > >
> > > No clue. I'm just following the documentation in buffer.c:
> > >
> > > * NOTE! All mapped/uptodate combinations are valid:
> > > *
> > > * Mapped Uptodate Meaning
> > > *
> > > * No No "unknown" - must do get_block()
> > > * No Yes "hole" - zero-filled
> > > * Yes No "allocated" - allocated on disk, not read in
> > > * Yes Yes "valid" - allocated and up-to-date in memory.
> >
> > OK, but that speaks about buffer head attached to a page. get_block()
> > callback gets a temporary bh (at least in some cases) only so that it can
> > communicate result of block mapping. And BH_Uptodate should be set only if
> > data in the buffer is properly filled (which cannot be the case for
> > temporary bh which doesn't have *any* data) and it simply isn't the case
> > even for bh attached to a page because ext4 get_block() functions don't
> > touch bh->b_data at all. So I just wouldn't set BH_Uptodate in get_block()
> > at all..
>
> OK, but how should DAX then distinguish between an old-style filesystem
> (like current ext4) which reports "unknown" and leaves b_size untouched
> when it encounters a hole, versus a new-style filesystem (XFS, ext4 with
> this patch) which wants to report the size of a hole in b_size? The use
> of Uptodate currently distinguishes the two cases.
>
> Plus, why would you want bh's to be treated differently, depending on
> whether they're stack-based or attached to a page? That seems even more
> confusing than bh's already are.

Well, you may want to treat them differently because they *are* different.
For example touching b_size of page-attached buffer_head is a no-go.
get_block() interface is abusing buffer_head structure for historical
reasons.

Seeing you have hit issues with using buffer_head for passing mapping
information I agree with Dave that we should convert DAX code to use
iomaps instead of cluttering get_block() via buffer_head further. You can
lift struct iomap from include/linux/exportfs.h (and related constant
definitions) and use it for passing map information. It should be quite
straightforward and simple now that DAX doesn't have many users. We will
have:

typedef int (iomap_fn_t)(struct inode *inode, loff_t offset, u64 length,
bool create, struct iomap *iomap);

and DAX functions will take this instead of get_block_t. Adding a wrapper
to ext4_map_blocks() to work as iomap_fn_t is pretty straightforward as
well. I'm sorry we didn't come up with this immediately when you started
implementing DAX...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2015-07-16 01:48:57

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] ext4: Return the length of a hole from get_block

On Wed, Jul 15, 2015 at 11:59:03AM +0200, Jan Kara wrote:
> On Tue 14-07-15 09:48:51, Matthew Wilcox wrote:
> > On Tue, Jul 14, 2015 at 11:02:46AM +0200, Jan Kara wrote:
> > > On Mon 13-07-15 11:26:15, Matthew Wilcox wrote:
> > > > On Mon, Jul 13, 2015 at 05:16:10PM +0200, Jan Kara wrote:
> > > > > On Fri 03-07-15 11:15:11, Matthew Wilcox wrote:
> > > > > > From: Matthew Wilcox <[email protected]>
> > > > > >
> > > > > > Currently, if ext4's get_block encounters a hole, it does not modify the
> > > > > > buffer_head. That's fine for many callers, but for DAX, it's useful to
> > > > > > know how large the hole is. XFS already returns the length of the hole,
> > > > > > so this improvement should not confuse any callers.
> > > > > >
> > > > > > Signed-off-by: Matthew Wilcox <[email protected]>
> > > > >
> > > > > So I'm somewhat wondering: What is the reason of BH_Uptodate flag being
> > > > > set? I can see the XFS sets it in some cases as well but the use of the
> > > > > flag isn't really clear to me...
> > > >
> > > > No clue. I'm just following the documentation in buffer.c:
> > > >
> > > > * NOTE! All mapped/uptodate combinations are valid:
> > > > *
> > > > * Mapped Uptodate Meaning
> > > > *
> > > > * No No "unknown" - must do get_block()
> > > > * No Yes "hole" - zero-filled
> > > > * Yes No "allocated" - allocated on disk, not read in
> > > > * Yes Yes "valid" - allocated and up-to-date in memory.
> > >
> > > OK, but that speaks about buffer head attached to a page. get_block()
> > > callback gets a temporary bh (at least in some cases) only so that it can
> > > communicate result of block mapping. And BH_Uptodate should be set only if
> > > data in the buffer is properly filled (which cannot be the case for
> > > temporary bh which doesn't have *any* data) and it simply isn't the case
> > > even for bh attached to a page because ext4 get_block() functions don't
> > > touch bh->b_data at all. So I just wouldn't set BH_Uptodate in get_block()
> > > at all..
> >
> > OK, but how should DAX then distinguish between an old-style filesystem
> > (like current ext4) which reports "unknown" and leaves b_size untouched
> > when it encounters a hole, versus a new-style filesystem (XFS, ext4 with
> > this patch) which wants to report the size of a hole in b_size? The use
> > of Uptodate currently distinguishes the two cases.
> >
> > Plus, why would you want bh's to be treated differently, depending on
> > whether they're stack-based or attached to a page? That seems even more
> > confusing than bh's already are.
>
> Well, you may want to treat them differently because they *are* different.
> For example touching b_size of page-attached buffer_head is a no-go.
> get_block() interface is abusing buffer_head structure for historical
> reasons.
>
> Seeing you have hit issues with using buffer_head for passing mapping
> information I agree with Dave that we should convert DAX code to use
> iomaps instead of cluttering get_block() via buffer_head further. You can
> lift struct iomap from include/linux/exportfs.h (and related constant
> definitions) and use it for passing map information. It should be quite
> straightforward and simple now that DAX doesn't have many users. We will
> have:
>
> typedef int (iomap_fn_t)(struct inode *inode, loff_t offset, u64 length,
> bool create, struct iomap *iomap);
>
> and DAX functions will take this instead of get_block_t. Adding a wrapper
> to ext4_map_blocks() to work as iomap_fn_t is pretty straightforward as
> well.

I've got one of them sitting around in a multipage write patchset
somewhere. See below.

> I'm sorry we didn't come up with this immediately when you started
> implementing DAX...

I thought I did but, IIRC, there were more important things to get
working than an new iomap interface....

Cheers,

Dave.
--
Dave Chinner
[email protected]

multipage write: Add generic block write begin/end support

From: Dave Chinner <[email protected]>

Add support into the generic buffer based IO paths for iomap based
writes. This requires being able to pass the iomap rather than a
get_blocks callback to the mapping functions and mapping buffers
from the iomap rather than a callout. Also provide an exported
write_begin function that takes an iomap rather than an opaque
fsdata blob.

Signed-off-by: Dave Chinner <[email protected]>

diff --git a/fs/buffer.c b/fs/buffer.c
index 3e7dca2..02ddaf8 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -21,6 +21,7 @@
#include <linux/kernel.h>
#include <linux/syscalls.h>
#include <linux/fs.h>
+#include <linux/iomap.h>
#include <linux/mm.h>
#include <linux/percpu.h>
#include <linux/slab.h>
@@ -1834,8 +1835,67 @@ void page_zero_new_buffers(struct page *page, unsigned from, unsigned to)
}
EXPORT_SYMBOL(page_zero_new_buffers);

-int block_prepare_write(struct page *page, unsigned from, unsigned to,
- get_block_t *get_block)
+static void
+iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
+ struct iomap *iomap)
+{
+ loff_t offset = block << inode->i_blkbits;
+
+ printk("iomap_to_bh block %lld\n", block);
+
+ /* XXX: bdev needs to be put into iomap */
+ bh->b_bdev = inode->i_sb->s_bdev;
+
+ /*
+ * block points to offset in file we need to map, iomap contains
+ * the offset at which the map starts. If the map ends before the
+ * current block, then do not map the buffer and let the caller
+ * handle it.
+ */
+ if (offset >= iomap->offset + iomap->length)
+ return;
+
+ switch (iomap->type) {
+ case IOMAP_HOLE:
+ /*
+ * if the buffer is not up to date or beyond the current EOF, we
+ * need to mark it as new to ensure sub-block zeroing is executed
+ * if necessary.
+ */
+ if (!buffer_uptodate(bh) ||
+ (offset >= i_size_read(inode)))
+ set_buffer_new(bh);
+ break;
+ case IOMAP_DELALLOC:
+ if (!buffer_uptodate(bh) ||
+ (offset >= i_size_read(inode)))
+ set_buffer_new(bh);
+ set_buffer_uptodate(bh);
+ set_buffer_mapped(bh);
+ set_buffer_delay(bh);
+ break;
+ case IOMAP_UNWRITTEN:
+ /*
+ * for unwritten regions, we always need to ensure that
+ * sub-block writes cause the regions in the block we are not
+ * writing to are zeroed. Set the buffer as new to ensre this.
+ */
+ set_buffer_new(bh);
+ set_buffer_unwritten(bh);
+ /* fall through */
+ case IOMAP_MAPPED:
+ if (offset >= i_size_read(inode))
+ set_buffer_new(bh);
+ bh->b_blocknr = iomap->blkno +
+ ((offset - iomap->offset) >> inode->i_blkbits);
+ set_buffer_mapped(bh);
+ break;
+ }
+
+}
+
+static int __block_prepare_write(struct page *page, unsigned from,
+ unsigned to, int use_get_block, void *map)
{
struct inode *inode = page->mapping->host;
unsigned block_start, block_end;
@@ -1849,6 +1909,9 @@ int block_prepare_write(struct page *page, unsigned from, unsigned to,
BUG_ON(to > PAGE_CACHE_SIZE);
BUG_ON(from > to);

+ if (!use_get_block)
+ printk("prep_write from %lld, to %lld\n", from, to);
+
blocksize = 1 << inode->i_blkbits;
if (!page_has_buffers(page))
create_empty_buffers(page, blocksize, 0);
@@ -1871,9 +1934,15 @@ int block_prepare_write(struct page *page, unsigned from, unsigned to,
clear_buffer_new(bh);
if (!buffer_mapped(bh)) {
WARN_ON(bh->b_size != blocksize);
- err = get_block(inode, block, bh, 1);
- if (err)
- break;
+ if (use_get_block) {
+ get_block_t *get_block = map;
+ err = get_block(inode, block, bh, 1);
+ if (err)
+ break;
+ } else {
+ iomap_to_bh(inode, block, bh, map);
+ }
+
if (buffer_new(bh)) {
unmap_underlying_metadata(bh->b_bdev,
bh->b_blocknr);
@@ -1916,6 +1985,12 @@ int block_prepare_write(struct page *page, unsigned from, unsigned to,
}
return err;
}
+
+int block_prepare_write(struct page *page, unsigned from, unsigned to,
+ get_block_t *get_block)
+{
+ return __block_prepare_write(page, from, to, 1, get_block);
+}
EXPORT_SYMBOL(block_prepare_write);

static int __block_commit_write(struct inode *inode, struct page *page,
@@ -1963,6 +2038,41 @@ int __block_write_begin(struct page *page, loff_t pos, unsigned len,
EXPORT_SYMBOL(__block_write_begin);

/*
+ * Filesystems implementing the new truncate sequence should use the
+ * _newtrunc postfix variant which won't incorrectly call vmtruncate.
+ * The filesystem needs to handle block truncation upon failure.
+ */
+int block_write_begin_iomap(struct address_space *mapping, loff_t pos,
+ unsigned len, unsigned flags, struct page **pagep,
+ void **fsdata)
+{
+ struct iomap *iomap = fsdata;
+ pgoff_t index = pos >> PAGE_CACHE_SHIFT;
+ unsigned start = pos & (PAGE_CACHE_SIZE - 1);
+ struct page *page;
+ int status = 0;
+
+ /* if we are off the iomap, then we are done with it */
+ if (pos >= iomap->offset + iomap->length)
+ return -ERANGE;
+
+ page = grab_cache_page_write_begin(mapping, index, flags);
+ if (!page)
+ return -ENOMEM;
+
+ status = __block_prepare_write(page, start, start + len, 0, fsdata);
+ if (unlikely(status)) {
+ unlock_page(page);
+ page_cache_release(page);
+ page = NULL;
+ }
+
+ *pagep = page;
+ return status;
+}
+EXPORT_SYMBOL(block_write_begin_iomap);
+
+/*
* block_write_begin takes care of the basic task of block allocation and
* bringing partial write blocks uptodate first.
*
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index ec94c12..a9ae5e0 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -205,6 +205,9 @@ int block_is_partially_uptodate(struct page *page, read_descriptor_t *desc,
unsigned long from);
int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len,
unsigned flags, struct page **pagep, get_block_t *get_block);
+int block_write_begin_iomap(struct address_space *mapping, loff_t pos,
+ unsigned len, unsigned flags, struct page **pagep,
+ void **fsdata);
int __block_write_begin(struct page *page, loff_t pos, unsigned len,
get_block_t *get_block);
int block_write_end(struct file *, struct address_space *,

2015-07-16 07:08:40

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] ext4: Return the length of a hole from get_block

On Thu 16-07-15 11:46:47, Dave Chinner wrote:
> On Wed, Jul 15, 2015 at 11:59:03AM +0200, Jan Kara wrote:
> > On Tue 14-07-15 09:48:51, Matthew Wilcox wrote:
> > > On Tue, Jul 14, 2015 at 11:02:46AM +0200, Jan Kara wrote:
> > > > On Mon 13-07-15 11:26:15, Matthew Wilcox wrote:
> > > > > On Mon, Jul 13, 2015 at 05:16:10PM +0200, Jan Kara wrote:
> > > > > > On Fri 03-07-15 11:15:11, Matthew Wilcox wrote:
> > > > > > > From: Matthew Wilcox <[email protected]>
> > > > > > >
> > > > > > > Currently, if ext4's get_block encounters a hole, it does not modify the
> > > > > > > buffer_head. That's fine for many callers, but for DAX, it's useful to
> > > > > > > know how large the hole is. XFS already returns the length of the hole,
> > > > > > > so this improvement should not confuse any callers.
> > > > > > >
> > > > > > > Signed-off-by: Matthew Wilcox <[email protected]>
> > > > > >
> > > > > > So I'm somewhat wondering: What is the reason of BH_Uptodate flag being
> > > > > > set? I can see the XFS sets it in some cases as well but the use of the
> > > > > > flag isn't really clear to me...
> > > > >
> > > > > No clue. I'm just following the documentation in buffer.c:
> > > > >
> > > > > * NOTE! All mapped/uptodate combinations are valid:
> > > > > *
> > > > > * Mapped Uptodate Meaning
> > > > > *
> > > > > * No No "unknown" - must do get_block()
> > > > > * No Yes "hole" - zero-filled
> > > > > * Yes No "allocated" - allocated on disk, not read in
> > > > > * Yes Yes "valid" - allocated and up-to-date in memory.
> > > >
> > > > OK, but that speaks about buffer head attached to a page. get_block()
> > > > callback gets a temporary bh (at least in some cases) only so that it can
> > > > communicate result of block mapping. And BH_Uptodate should be set only if
> > > > data in the buffer is properly filled (which cannot be the case for
> > > > temporary bh which doesn't have *any* data) and it simply isn't the case
> > > > even for bh attached to a page because ext4 get_block() functions don't
> > > > touch bh->b_data at all. So I just wouldn't set BH_Uptodate in get_block()
> > > > at all..
> > >
> > > OK, but how should DAX then distinguish between an old-style filesystem
> > > (like current ext4) which reports "unknown" and leaves b_size untouched
> > > when it encounters a hole, versus a new-style filesystem (XFS, ext4 with
> > > this patch) which wants to report the size of a hole in b_size? The use
> > > of Uptodate currently distinguishes the two cases.
> > >
> > > Plus, why would you want bh's to be treated differently, depending on
> > > whether they're stack-based or attached to a page? That seems even more
> > > confusing than bh's already are.
> >
> > Well, you may want to treat them differently because they *are* different.
> > For example touching b_size of page-attached buffer_head is a no-go.
> > get_block() interface is abusing buffer_head structure for historical
> > reasons.
> >
> > Seeing you have hit issues with using buffer_head for passing mapping
> > information I agree with Dave that we should convert DAX code to use
> > iomaps instead of cluttering get_block() via buffer_head further. You can
> > lift struct iomap from include/linux/exportfs.h (and related constant
> > definitions) and use it for passing map information. It should be quite
> > straightforward and simple now that DAX doesn't have many users. We will
> > have:
> >
> > typedef int (iomap_fn_t)(struct inode *inode, loff_t offset, u64 length,
> > bool create, struct iomap *iomap);
> >
> > and DAX functions will take this instead of get_block_t. Adding a wrapper
> > to ext4_map_blocks() to work as iomap_fn_t is pretty straightforward as
> > well.
>
> I've got one of them sitting around in a multipage write patchset
> somewhere. See below.

Yeah, but Matthew actually doesn't need your patch. Your patch converts
fs/buffer.c so that iomap interface works with buffer heads but Matthew can
convert *just* the DAX code because that is standalone and doesn't need
buffer heads at all. So that is even simpler.

Honza

2015-07-17 22:12:06

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH] ext4: Return the length of a hole from get_block

On Thu, Jul 16, 2015 at 09:08:34AM +0200, Jan Kara wrote:
> On Thu 16-07-15 11:46:47, Dave Chinner wrote:
> > On Wed, Jul 15, 2015 at 11:59:03AM +0200, Jan Kara wrote:
> > > On Tue 14-07-15 09:48:51, Matthew Wilcox wrote:
> > > > On Tue, Jul 14, 2015 at 11:02:46AM +0200, Jan Kara wrote:
> > > > > On Mon 13-07-15 11:26:15, Matthew Wilcox wrote:
> > > > > > On Mon, Jul 13, 2015 at 05:16:10PM +0200, Jan Kara wrote:
> > > > > > > On Fri 03-07-15 11:15:11, Matthew Wilcox wrote:
> > > > > > > > From: Matthew Wilcox <[email protected]>
> > > > > > > >
> > > > > > > > Currently, if ext4's get_block encounters a hole, it does not modify the
> > > > > > > > buffer_head. That's fine for many callers, but for DAX, it's useful to
> > > > > > > > know how large the hole is. XFS already returns the length of the hole,
> > > > > > > > so this improvement should not confuse any callers.
> > > > > > > >
> > > > > > > > Signed-off-by: Matthew Wilcox <[email protected]>
> > > > > > >
> > > > > > > So I'm somewhat wondering: What is the reason of BH_Uptodate flag being
> > > > > > > set? I can see the XFS sets it in some cases as well but the use of the
> > > > > > > flag isn't really clear to me...
> > > > > >
> > > > > > No clue. I'm just following the documentation in buffer.c:
> > > > > >
> > > > > > * NOTE! All mapped/uptodate combinations are valid:
> > > > > > *
> > > > > > * Mapped Uptodate Meaning
> > > > > > *
> > > > > > * No No "unknown" - must do get_block()
> > > > > > * No Yes "hole" - zero-filled
> > > > > > * Yes No "allocated" - allocated on disk, not read in
> > > > > > * Yes Yes "valid" - allocated and up-to-date in memory.
> > > > >
> > > > > OK, but that speaks about buffer head attached to a page. get_block()
> > > > > callback gets a temporary bh (at least in some cases) only so that it can
> > > > > communicate result of block mapping. And BH_Uptodate should be set only if
> > > > > data in the buffer is properly filled (which cannot be the case for
> > > > > temporary bh which doesn't have *any* data) and it simply isn't the case
> > > > > even for bh attached to a page because ext4 get_block() functions don't
> > > > > touch bh->b_data at all. So I just wouldn't set BH_Uptodate in get_block()
> > > > > at all..
> > > >
> > > > OK, but how should DAX then distinguish between an old-style filesystem
> > > > (like current ext4) which reports "unknown" and leaves b_size untouched
> > > > when it encounters a hole, versus a new-style filesystem (XFS, ext4 with
> > > > this patch) which wants to report the size of a hole in b_size? The use
> > > > of Uptodate currently distinguishes the two cases.
> > > >
> > > > Plus, why would you want bh's to be treated differently, depending on
> > > > whether they're stack-based or attached to a page? That seems even more
> > > > confusing than bh's already are.
> > >
> > > Well, you may want to treat them differently because they *are* different.
> > > For example touching b_size of page-attached buffer_head is a no-go.
> > > get_block() interface is abusing buffer_head structure for historical
> > > reasons.
> > >
> > > Seeing you have hit issues with using buffer_head for passing mapping
> > > information I agree with Dave that we should convert DAX code to use
> > > iomaps instead of cluttering get_block() via buffer_head further. You can
> > > lift struct iomap from include/linux/exportfs.h (and related constant
> > > definitions) and use it for passing map information. It should be quite
> > > straightforward and simple now that DAX doesn't have many users. We will
> > > have:
> > >
> > > typedef int (iomap_fn_t)(struct inode *inode, loff_t offset, u64 length,
> > > bool create, struct iomap *iomap);
> > >
> > > and DAX functions will take this instead of get_block_t. Adding a wrapper
> > > to ext4_map_blocks() to work as iomap_fn_t is pretty straightforward as
> > > well.
> >
> > I've got one of them sitting around in a multipage write patchset
> > somewhere. See below.
>
> Yeah, but Matthew actually doesn't need your patch. Your patch converts
> fs/buffer.c so that iomap interface works with buffer heads but Matthew can
> convert *just* the DAX code because that is standalone and doesn't need
> buffer heads at all. So that is even simpler.

True, I was thinking that it would help not needing to convert ext4
initially, but it's the wrong conversion for that...

Cheers,

Dave.
--
Dave Chinner
[email protected]