LinuxLists.cc - ext4_page_mkwrite and delalloc

2008-06-12 18:14:18

Subject: ext4_page_mkwrite and delalloc

Hi,

With delalloc we should not do writepage in ext4_page_mkwrite. The idea
with delalloc is to delay the block allocation and make sure we allocate
chunks of blocks together at writepages. So i guess we should update
ext4_page_mkwrite to use write_begin and write_end instead of writepage.
Taking i_alloc_sem should protect against parallel truncate and the page
lock should protect against parallel write_begin/write_end.

How about the patch below ?

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index cac132b..7f162cc 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3543,18 +3543,6 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
return err;
}

-static int ext4_bh_prepare_fill(handle_t *handle, struct buffer_head *bh)
-{
- if (!buffer_mapped(bh)) {
- /*
- * Mark buffer as dirty so that
- * block_write_full_page() writes it
- */
- set_buffer_dirty(bh);
- }
- return 0;
-}
-
static int ext4_bh_unmapped(handle_t *handle, struct buffer_head *bh)
{
return !buffer_mapped(bh);
@@ -3596,24 +3584,22 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page)
if (!walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
ext4_bh_unmapped))
goto out_unlock;
- /*
- * Now mark all the buffer head dirty so
- * that writepage can write it
- */
- walk_page_buffers(NULL, page_buffers(page), 0, len,
- NULL, ext4_bh_prepare_fill);
}
/*
- * OK, we need to fill the hole... Lock the page and do writepage.
- * We can't do write_begin and write_end here because we don't
- * have inode_mutex and that allow parallel write_begin, write_end call.
+ * OK, we need to fill the hole... Lock the page and do write_begin
+ * write_end. We are not holding inode.i__mutex here. That allow
+ * parallel write_begin, write_end call.
* (lock_page prevent this from happening on the same page though)
*/
- lock_page(page);
- wbc.range_start = page_offset(page);
- wbc.range_end = page_offset(page) + len;
- ret = mapping->a_ops->writepage(page, &wbc);
- /* writepage unlocks the page */
+ ret = mapping->a_ops->write_begin(file, mapping, page_offset(page),
+ len, AOP_FLAG_UNINTERRUPTIBLE, &page, NULL);
+ if (ret < 0)
+ goto out_unlock;
+ ret = mapping->a_ops->write_end(file, mapping, page_offset(page),
+ len, len, page, NULL);
+ if (ret < 0)
+ goto out_unlock;
+ ret = 0;
out_unlock:
up_read(&inode->i_alloc_sem);
return ret;

If we agree i will send an updated ext4_page_mkwrite.patch and other
related patches that needed to be updated so that the patch queue apply
cleanly.

-aneesh

2008-06-12 21:00:48

by Mingming Cao

[permalink] [raw]

Subject: Re: ext4_page_mkwrite and delalloc

On Thu, 2008-06-12 at 23:44 +0530, Aneesh Kumar K.V wrote:
> Hi,
>
> With delalloc we should not do writepage in ext4_page_mkwrite. The idea
> with delalloc is to delay the block allocation and make sure we allocate
> chunks of blocks together at writepages. So i guess we should update
> ext4_page_mkwrite to use write_begin and write_end instead of writepage.

I agree with delayed allocation page_mkwrite is much simplier, just to
block reservation to prevent ENOSPC

> Taking i_alloc_sem should protect against parallel truncate and the page
> lock should protect against parallel write_begin/write_end.
>
> How about the patch below ?
>

Do we plan to support page_mkwrite for non delalloc? the following patch
seems suggesting that we only do page_mkwrite with delalloc?

> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index cac132b..7f162cc 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3543,18 +3543,6 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
> return err;
> }
>
> -static int ext4_bh_prepare_fill(handle_t *handle, struct buffer_head *bh)
> -{
> - if (!buffer_mapped(bh)) {
> - /*
> - * Mark buffer as dirty so that
> - * block_write_full_page() writes it
> - */
> - set_buffer_dirty(bh);
> - }
> - return 0;
> -}
> -
> static int ext4_bh_unmapped(handle_t *handle, struct buffer_head *bh)
> {
> return !buffer_mapped(bh);
> @@ -3596,24 +3584,22 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page)
> if (!walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
> ext4_bh_unmapped))
> goto out_unlock;
> - /*
> - * Now mark all the buffer head dirty so
> - * that writepage can write it
> - */
> - walk_page_buffers(NULL, page_buffers(page), 0, len,
> - NULL, ext4_bh_prepare_fill);
> }
> /*
> - * OK, we need to fill the hole... Lock the page and do writepage.
> - * We can't do write_begin and write_end here because we don't
> - * have inode_mutex and that allow parallel write_begin, write_end call.
> + * OK, we need to fill the hole... Lock the page and do write_begin
> + * write_end. We are not holding inode.i__mutex here. That allow
> + * parallel write_begin, write_end call.
> * (lock_page prevent this from happening on the same page though)
> */
> - lock_page(page);
> - wbc.range_start = page_offset(page);
> - wbc.range_end = page_offset(page) + len;
> - ret = mapping->a_ops->writepage(page, &wbc);
> - /* writepage unlocks the page */
> + ret = mapping->a_ops->write_begin(file, mapping, page_offset(page),
> + len, AOP_FLAG_UNINTERRUPTIBLE, &page, NULL);

What is this AOP_FLAG_UNINTERRUPTIBLE flag ? Also shouldn't we test
delalloc is enabled?

> + if (ret < 0)
> + goto out_unlock;
> + ret = mapping->a_ops->write_end(file, mapping, page_offset(page),
> + len, len, page, NULL);

I am still puzzled why we need to mark the page dirty in write_end here.
Thought only do block reservation in write_begin is enough, we haven't
write anything yet...

Mingming
> + if (ret < 0)
> + goto out_unlock;
> + ret = 0;
> out_unlock:
> up_read(&inode->i_alloc_sem);
> return ret;
>
>
>
> If we agree i will send an updated ext4_page_mkwrite.patch and other
> related patches that needed to be updated so that the patch queue apply
> cleanly.
>
> -aneesh

2008-06-13 03:24:02

by Aneesh Kumar K.V

[permalink] [raw]

Subject: Re: ext4_page_mkwrite and delalloc

On Thu, Jun 12, 2008 at 02:00:46PM -0700, Mingming Cao wrote:
> On Thu, 2008-06-12 at 23:44 +0530, Aneesh Kumar K.V wrote:
> > Hi,
> >
> > With delalloc we should not do writepage in ext4_page_mkwrite. The idea
> > with delalloc is to delay the block allocation and make sure we allocate
> > chunks of blocks together at writepages. So i guess we should update
> > ext4_page_mkwrite to use write_begin and write_end instead of writepage.
>
> I agree with delayed allocation page_mkwrite is much simplier, just to
> block reservation to prevent ENOSPC
>
> > Taking i_alloc_sem should protect against parallel truncate and the page
> > lock should protect against parallel write_begin/write_end.
> >
> > How about the patch below ?
> >
>
> Do we plan to support page_mkwrite for non delalloc? the following patch
> seems suggesting that we only do page_mkwrite with delalloc?

Yes it is needed for non delalloc also. The primary requirement is for
lock inversion patches. With lock inversion patches we don't do
block allocation in writepage

>
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index cac132b..7f162cc 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -3543,18 +3543,6 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
> > return err;
> > }
> >
> > -static int ext4_bh_prepare_fill(handle_t *handle, struct buffer_head *bh)
> > -{
> > - if (!buffer_mapped(bh)) {
> > - /*
> > - * Mark buffer as dirty so that
> > - * block_write_full_page() writes it
> > - */
> > - set_buffer_dirty(bh);
> > - }
> > - return 0;
> > -}
> > -
> > static int ext4_bh_unmapped(handle_t *handle, struct buffer_head *bh)
> > {
> > return !buffer_mapped(bh);
> > @@ -3596,24 +3584,22 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page)
> > if (!walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
> > ext4_bh_unmapped))
> > goto out_unlock;
> > - /*
> > - * Now mark all the buffer head dirty so
> > - * that writepage can write it
> > - */
> > - walk_page_buffers(NULL, page_buffers(page), 0, len,
> > - NULL, ext4_bh_prepare_fill);
> > }
> > /*
> > - * OK, we need to fill the hole... Lock the page and do writepage.
> > - * We can't do write_begin and write_end here because we don't
> > - * have inode_mutex and that allow parallel write_begin, write_end call.
> > + * OK, we need to fill the hole... Lock the page and do write_begin
> > + * write_end. We are not holding inode.i__mutex here. That allow
> > + * parallel write_begin, write_end call.
> > * (lock_page prevent this from happening on the same page though)
> > */
> > - lock_page(page);
> > - wbc.range_start = page_offset(page);
> > - wbc.range_end = page_offset(page) + len;
> > - ret = mapping->a_ops->writepage(page, &wbc);
> > - /* writepage unlocks the page */
> > + ret = mapping->a_ops->write_begin(file, mapping, page_offset(page),
> > + len, AOP_FLAG_UNINTERRUPTIBLE, &page, NULL);
>
> What is this AOP_FLAG_UNINTERRUPTIBLE flag ? Also shouldn't we test
> delalloc is enabled?
>

Since we are not doing any real copy here I guess we can say that
we don't do short write. The flag means that.

#define AOP_FLAG_UNINTERRUPTIBLE 0x0001 /* will not do a short write */

> > + if (ret < 0)
> > + goto out_unlock;
> > + ret = mapping->a_ops->write_end(file, mapping, page_offset(page),
> > + len, len, page, NULL);
>
> I am still puzzled why we need to mark the page dirty in write_end here.
> Thought only do block reservation in write_begin is enough, we haven't
> write anything yet...

The reason is to get the ordered and journaled mode behavior correct.
We need ensure that the meta-data that got allocated in the write_begin
get commited in the right order. We need add the buffer_heads
corresponding to the data (page) to the right list in the journal.
write_end mostly does that.

-aneesh

2008-06-13 22:35:00

by Mingming Cao

[permalink] [raw]

Subject: Re: ext4_page_mkwrite and delalloc

On Fri, 2008-06-13 at 08:50 +0530, Aneesh Kumar K.V wrote:
> On Thu, Jun 12, 2008 at 02:00:46PM -0700, Mingming Cao wrote:
> > On Thu, 2008-06-12 at 23:44 +0530, Aneesh Kumar K.V wrote:
> > > Hi,
> > >
> > > With delalloc we should not do writepage in ext4_page_mkwrite. The idea
> > > with delalloc is to delay the block allocation and make sure we allocate
> > > chunks of blocks together at writepages. So i guess we should update
> > > ext4_page_mkwrite to use write_begin and write_end instead of writepage.
> >
> > I agree with delayed allocation page_mkwrite is much simplier, just to
> > block reservation to prevent ENOSPC
> >
> > > Taking i_alloc_sem should protect against parallel truncate and the page
> > > lock should protect against parallel write_begin/write_end.
> > >
> > > How about the patch below ?
> > >
> >
> > Do we plan to support page_mkwrite for non delalloc? the following patch
> > seems suggesting that we only do page_mkwrite with delalloc?
>
> Yes it is needed for non delalloc also. The primary requirement is for
> lock inversion patches. With lock inversion patches we don't do
> block allocation in writepage
>
>
> >
> > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > > index cac132b..7f162cc 100644
> > > --- a/fs/ext4/inode.c
> > > +++ b/fs/ext4/inode.c
> > > @@ -3543,18 +3543,6 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
> > > return err;
> > > }
> > >
> > > -static int ext4_bh_prepare_fill(handle_t *handle, struct buffer_head *bh)
> > > -{
> > > - if (!buffer_mapped(bh)) {
> > > - /*
> > > - * Mark buffer as dirty so that
> > > - * block_write_full_page() writes it
> > > - */
> > > - set_buffer_dirty(bh);
> > > - }
> > > - return 0;
> > > -}
> > > -
> > > static int ext4_bh_unmapped(handle_t *handle, struct buffer_head *bh)
> > > {
> > > return !buffer_mapped(bh);
> > > @@ -3596,24 +3584,22 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page)
> > > if (!walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
> > > ext4_bh_unmapped))
> > > goto out_unlock;
> > > - /*
> > > - * Now mark all the buffer head dirty so
> > > - * that writepage can write it
> > > - */
> > > - walk_page_buffers(NULL, page_buffers(page), 0, len,
> > > - NULL, ext4_bh_prepare_fill);
> > > }
> > > /*
> > > - * OK, we need to fill the hole... Lock the page and do writepage.
> > > - * We can't do write_begin and write_end here because we don't
> > > - * have inode_mutex and that allow parallel write_begin, write_end call.
> > > + * OK, we need to fill the hole... Lock the page and do write_begin
> > > + * write_end. We are not holding inode.i__mutex here. That allow
> > > + * parallel write_begin, write_end call.
> > > * (lock_page prevent this from happening on the same page though)
> > > */
> > > - lock_page(page);
> > > - wbc.range_start = page_offset(page);
> > > - wbc.range_end = page_offset(page) + len;
> > > - ret = mapping->a_ops->writepage(page, &wbc);
> > > - /* writepage unlocks the page */
> > > + ret = mapping->a_ops->write_begin(file, mapping, page_offset(page),
> > > + len, AOP_FLAG_UNINTERRUPTIBLE, &page, NULL);
> >
> > What is this AOP_FLAG_UNINTERRUPTIBLE flag ? Also shouldn't we test
> > delalloc is enabled?
> >
>
> Since we are not doing any real copy here I guess we can say that
> we don't do short write. The flag means that.
>
> #define AOP_FLAG_UNINTERRUPTIBLE 0x0001 /* will not do a short write */
>
> > > + if (ret < 0)
> > > + goto out_unlock;
> > > + ret = mapping->a_ops->write_end(file, mapping, page_offset(page),
> > > + len, len, page, NULL);
> >
> > I am still puzzled why we need to mark the page dirty in write_end here.
> > Thought only do block reservation in write_begin is enough, we haven't
> > write anything yet...
>
>
> The reason is to get the ordered and journaled mode behavior correct.
> We need ensure that the meta-data that got allocated in the write_begin
> get commited in the right order.

I am confused here, I thought this patch is to take advantage of delayed
allocation, so that we could just call the write_begin in mkwrite, there
is only block reservation, but no real block allocation and meta-data
changes? Thus no need to worry about the ordering?

> We need add the buffer_heads
> corresponding to the data (page) to the right list in the journal.
> write_end mostly does that.
>
I probably missed the basic here, I was assuming the patch also based on
the new orderd mode? But with the new ordered mode, this part(using
buffer heads) is replaced with the journal inode list, and with delayed
allocation, the code to ensure the ordering is pushed later at
writepages() time.

> -aneesh
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2008-06-14 06:44:04

by Aneesh Kumar K.V

[permalink] [raw]

Subject: Re: ext4_page_mkwrite and delalloc

On Fri, Jun 13, 2008 at 03:35:21PM -0700, Mingming wrote:
> >
> > Since we are not doing any real copy here I guess we can say that
> > we don't do short write. The flag means that.
> >
> > #define AOP_FLAG_UNINTERRUPTIBLE 0x0001 /* will not do a short write */
> >
> > > > + if (ret < 0)
> > > > + goto out_unlock;
> > > > + ret = mapping->a_ops->write_end(file, mapping, page_offset(page),
> > > > + len, len, page, NULL);
> > >
> > > I am still puzzled why we need to mark the page dirty in write_end here.
> > > Thought only do block reservation in write_begin is enough, we haven't
> > > write anything yet...
> >
> >
> > The reason is to get the ordered and journaled mode behavior correct.
> > We need ensure that the meta-data that got allocated in the write_begin
> > get commited in the right order.
>
> I am confused here, I thought this patch is to take advantage of delayed
> allocation, so that we could just call the write_begin in mkwrite, there
> is only block reservation, but no real block allocation and meta-data
> changes? Thus no need to worry about the ordering?
>

The changes are update to ext4_page_mkwrite. This call back is used when
we try to write to page. With nodelalloc and ordered mode we need to
make sure we allocate blocks in ext4_page_mkwrite. Because we can't
allocate blocks in writepage with nodelalloc. So we use write_begin and
write_end. This will ensure that we use block reservation in case of
delayed allocation and do block allocation in case of nodelalloc.

Earlier it used writepage always. That would not work with delayed
allocation. Hence the changes.

> > We need add the buffer_heads
> > corresponding to the data (page) to the right list in the journal.
> > write_end mostly does that.
> >
> I probably missed the basic here, I was assuming the patch also based on
> the new orderd mode? But with the new ordered mode, this part(using
> buffer heads) is replaced with the journal inode list, and with delayed
> allocation, the code to ensure the ordering is pushed later at
> writepages() time.
>

-aneesh

2008-06-16 14:11:43

by Jan Kara

[permalink] [raw]

Subject: Re: ext4_page_mkwrite and delalloc

Hi Aneesh,

On Thu 12-06-08 23:44:07, Aneesh Kumar K.V wrote:
> With delalloc we should not do writepage in ext4_page_mkwrite. The idea
> with delalloc is to delay the block allocation and make sure we allocate
> chunks of blocks together at writepages. So i guess we should update
> ext4_page_mkwrite to use write_begin and write_end instead of writepage.
> Taking i_alloc_sem should protect against parallel truncate and the page
> lock should protect against parallel write_begin/write_end.
>
> How about the patch below ?
In principle the patch looks fine, I would only like to see two things
checked:
1) Did you do some stress testing of the patch - combining mmapped writes
with ordinary writes to the same file and truncation so that we detect
possible bugs in locking / data corruption due to some bad locking. This
significantly changes when write_begin / write_end can be called in ext4
(i.e., it is now called without i_mutex - BTW: that is probably worth a
comment before these functions).
2) How does this change influence CPU load for mmapped accesses - I worry
about write_begin / write_end path being significantly heavier than just
calling writepage. Probably just mmap a large file, write single byte
to every page and measure using oprofile whether accumulated time spent in
page_mkwrite didn't change to much.

Thanks.
Honza
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index cac132b..7f162cc 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3543,18 +3543,6 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
> return err;
> }
>
> -static int ext4_bh_prepare_fill(handle_t *handle, struct buffer_head *bh)
> -{
> - if (!buffer_mapped(bh)) {
> - /*
> - * Mark buffer as dirty so that
> - * block_write_full_page() writes it
> - */
> - set_buffer_dirty(bh);
> - }
> - return 0;
> -}
> -
> static int ext4_bh_unmapped(handle_t *handle, struct buffer_head *bh)
> {
> return !buffer_mapped(bh);
> @@ -3596,24 +3584,22 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page)
> if (!walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
> ext4_bh_unmapped))
> goto out_unlock;
> - /*
> - * Now mark all the buffer head dirty so
> - * that writepage can write it
> - */
> - walk_page_buffers(NULL, page_buffers(page), 0, len,
> - NULL, ext4_bh_prepare_fill);
> }
> /*
> - * OK, we need to fill the hole... Lock the page and do writepage.
> - * We can't do write_begin and write_end here because we don't
> - * have inode_mutex and that allow parallel write_begin, write_end call.
> + * OK, we need to fill the hole... Lock the page and do write_begin
> + * write_end. We are not holding inode.i__mutex here. That allow
> + * parallel write_begin, write_end call.
> * (lock_page prevent this from happening on the same page though)
> */
> - lock_page(page);
> - wbc.range_start = page_offset(page);
> - wbc.range_end = page_offset(page) + len;
> - ret = mapping->a_ops->writepage(page, &wbc);
> - /* writepage unlocks the page */
> + ret = mapping->a_ops->write_begin(file, mapping, page_offset(page),
> + len, AOP_FLAG_UNINTERRUPTIBLE, &page, NULL);
> + if (ret < 0)
> + goto out_unlock;
> + ret = mapping->a_ops->write_end(file, mapping, page_offset(page),
> + len, len, page, NULL);
> + if (ret < 0)
> + goto out_unlock;
> + ret = 0;
> out_unlock:
> up_read(&inode->i_alloc_sem);
> return ret;
>
> If we agree i will send an updated ext4_page_mkwrite.patch and other
> related patches that needed to be updated so that the patch queue apply
> cleanly.
--
Jan Kara <[email protected]>
SUSE Labs, CR

2008-06-16 16:09:35

by Aneesh Kumar K.V

[permalink] [raw]

Subject: Re: ext4_page_mkwrite and delalloc

On Mon, Jun 16, 2008 at 04:11:41PM +0200, Jan Kara wrote:
> Hi Aneesh,
>
> On Thu 12-06-08 23:44:07, Aneesh Kumar K.V wrote:
> > With delalloc we should not do writepage in ext4_page_mkwrite. The idea
> > with delalloc is to delay the block allocation and make sure we allocate
> > chunks of blocks together at writepages. So i guess we should update
> > ext4_page_mkwrite to use write_begin and write_end instead of writepage.
> > Taking i_alloc_sem should protect against parallel truncate and the page
> > lock should protect against parallel write_begin/write_end.
> >
> > How about the patch below ?
> In principle the patch looks fine, I would only like to see two things
> checked:
> 1) Did you do some stress testing of the patch - combining mmapped writes
> with ordinary writes to the same file and truncation so that we detect
> possible bugs in locking / data corruption due to some bad locking. This
> significantly changes when write_begin / write_end can be called in ext4
> (i.e., it is now called without i_mutex - BTW: that is probably worth a
> comment before these functions).
> 2) How does this change influence CPU load for mmapped accesses - I worry
> about write_begin / write_end path being significantly heavier than just
> calling writepage. Probably just mmap a large file, write single byte
> to every page and measure using oprofile whether accumulated time spent in
> page_mkwrite didn't change to much.
>

We can actually get away with page_mkwrite if we agree that SIGBUS on
ENOSPC is not the right way. We can implement writepages for different
data mode, and allocate blocks in writepages. With those changes we don't
allocate blocks for mmap area maping holes upon write. Instead we
allocate block during writepages. Since we can start a transaction
during writepages we should be ok with respect to new locking.?

-aneesh

2008-06-16 17:34:35

by Jan Kara

[permalink] [raw]

Subject: Re: ext4_page_mkwrite and delalloc

> On Mon, Jun 16, 2008 at 04:11:41PM +0200, Jan Kara wrote:
> > Hi Aneesh,
> >
> > On Thu 12-06-08 23:44:07, Aneesh Kumar K.V wrote:
> > > With delalloc we should not do writepage in ext4_page_mkwrite. The idea
> > > with delalloc is to delay the block allocation and make sure we allocate
> > > chunks of blocks together at writepages. So i guess we should update
> > > ext4_page_mkwrite to use write_begin and write_end instead of writepage.
> > > Taking i_alloc_sem should protect against parallel truncate and the page
> > > lock should protect against parallel write_begin/write_end.
> > >
> > > How about the patch below ?
> > In principle the patch looks fine, I would only like to see two things
> > checked:
> > 1) Did you do some stress testing of the patch - combining mmapped writes
> > with ordinary writes to the same file and truncation so that we detect
> > possible bugs in locking / data corruption due to some bad locking. This
> > significantly changes when write_begin / write_end can be called in ext4
> > (i.e., it is now called without i_mutex - BTW: that is probably worth a
> > comment before these functions).
> > 2) How does this change influence CPU load for mmapped accesses - I worry
> > about write_begin / write_end path being significantly heavier than just
> > calling writepage. Probably just mmap a large file, write single byte
> > to every page and measure using oprofile whether accumulated time spent in
> > page_mkwrite didn't change to much.
> >
>
> We can actually get away with page_mkwrite if we agree that SIGBUS on
> ENOSPC is not the right way. We can implement writepages for different
> data mode, and allocate blocks in writepages. With those changes we don't
> allocate blocks for mmap area maping holes upon write. Instead we
> allocate block during writepages. Since we can start a transaction
> during writepages we should be ok with respect to new locking.?
Yes, but this has the disadvantage that with this solution you are
unable to free memory by writeback under memory pressure in some cases
(at that path we are not called via writepages()). It may or may not
matter, I'm not sure.
I personally prefer returning SIGBUS on ENOSPC for ext4 (for ext2/ext3
I tend to agree with Andrew that we probably shouldn't change the
behavior).

Honza
--
Jan Kara <[email protected]>
SuSE CR Labs