2013-03-11 18:54:25

by Theodore Ts'o

[permalink] [raw]
Subject: [REGRESSION] allocated N with only M reserved metadata blocks

We discussed this regression on today's conference call, and I noted
that that I thought it was something that had been introduced in new
bug/regresion fix commits that are in the dev branch but not yet in
Linus's tree.

This turns out to be incorrect. I went back through my testing logs,
and from what I can tell, the warning message in question[1] was not
present in 3.8, but was present in the xfstests auto run which I did
(using the standard mount options and 4k block size on test #127) with
a commit which corresponds to the upstream commit id 1231b3a1eb5

So it is a regression, but it's not a regression caused by the most
recent set of changes which we have been testing/validating.

I missed it because my get-results scripts doesn't show me warning
messages (something which I need to fix for the next development
cycle).

- Ted

[1] 127 2431s ...[12774.037351] EXT4-fs warning (device vdb): ext4_da_update_reserve_space:360: ino 1133, allocated 1 with only 0 reserved metadata blocks (releasing 1 blocks with reserved 16 data blocks)
[12774.041006] ------------[ cut here ]------------
[12774.042175] WARNING: at /tyt/linux/ext4/fs/ext4/inode.c:361 ext4_da_update_reserve_space+0xdf/0x1e0()
[12774.044166] Hardware name: Bochs
[12774.044882] Modules linked in:
[12774.045685] Pid: 24352, comm: fsx Tainted: G W 3.8.0-rc3-00064-ge759417 #35
[12774.047371] Call Trace:
[12774.047914] [<c015c534>] warn_slowpath_common+0x68/0x7d
[12774.049138] [<c02759c2>] ? ext4_da_update_reserve_space+0xdf/0x1e0
[12774.050360] [<c015c55d>] warn_slowpath_null+0x14/0x18
[12774.051022] [<c02759c2>] ext4_da_update_reserve_space+0xdf/0x1e0
[12774.051807] [<c0297edb>] ext4_ext_map_blocks+0x12b4/0x13a9
[12774.052581] [<c0199693>] ? trace_hardirqs_off+0xb/0xd
[12774.053233] [<c019ba23>] ? __lock_acquire+0x534/0xc07
[12774.053893] [<c0182bc3>] ? sched_clock_cpu+0x112/0x122
[12774.054570] [<c0199693>] ? trace_hardirqs_off+0xb/0xd
[12774.055236] [<c019a8b3>] ? lock_acquired+0x1d2/0x1da
[12774.055944] [<c0275cd5>] ext4_map_blocks+0x212/0x344
[12774.056589] [<c0278bef>] mpage_da_map_and_submit+0x94/0x3eb
[12774.057323] [<c01e0c8f>] ? rcu_read_unlock+0x17/0x19
[12774.057972] [<c01e0d5f>] ? find_get_pages_tag+0xce/0xed
[12774.058658] [<c027903a>] write_cache_pages_da+0xf4/0x31f
[12774.059412] [<c02794c0>] ext4_da_writepages+0x25b/0x3c3
[12774.060140] [<c01e85d4>] do_writepages+0x23/0x2d
[12774.060754] [<c01e1d85>] __filemap_fdatawrite_range+0x5a/0x62
[12774.061506] [<c01e1db8>] filemap_write_and_wait_range+0x2b/0x58
[12774.062346] [<c0272027>] ext4_sync_file+0x7f/0x2c0
[12774.062975] [<c0182c05>] ? local_clock+0x32/0x49
[12774.063593] [<c022d756>] vfs_fsync_range+0x36/0x41
[12774.064228] [<c0271fa8>] ? ext4_file_write+0x42c/0x42c
[12774.064902] [<c022d77a>] vfs_fsync+0x19/0x1b
[12774.065526] [<c01ffb20>] sys_msync+0xec/0x154
[12774.066093] [<c080e1d0>] ? __do_page_fault+0x346/0x346
[12774.066767] [<c080b6c8>] syscall_call+0x7/0xb
[12774.067348] [<c0800000>] ? acpi_processor_add+0x182/0x3e8
[12774.068059] ---[ end trace c26cb86224ce73de ]---


2013-03-11 21:02:06

by Eric Whitney

[permalink] [raw]
Subject: Re: [REGRESSION] allocated N with only M reserved metadata blocks

* Theodore Ts'o <[email protected]>:
> We discussed this regression on today's conference call, and I noted
> that that I thought it was something that had been introduced in new
> bug/regresion fix commits that are in the dev branch but not yet in
> Linus's tree.
>
> This turns out to be incorrect. I went back through my testing logs,
> and from what I can tell, the warning message in question[1] was not
> present in 3.8, but was present in the xfstests auto run which I did
> (using the standard mount options and 4k block size on test #127) with
> a commit which corresponds to the upstream commit id 1231b3a1eb5
>
> So it is a regression, but it's not a regression caused by the most
> recent set of changes which we have been testing/validating.
>
> I missed it because my get-results scripts doesn't show me warning
> messages (something which I need to fix for the next development
> cycle).
>
> - Ted
>
> [1] 127 2431s ...[12774.037351] EXT4-fs warning (device vdb): ext4_da_update_reserve_space:360: ino 1133, allocated 1 with only 0 reserved metadata blocks (releasing 1 blocks with reserved 16 data blocks)
> [12774.041006] ------------[ cut here ]------------
> [12774.042175] WARNING: at /tyt/linux/ext4/fs/ext4/inode.c:361 ext4_da_update_reserve_space+0xdf/0x1e0()
> [12774.044166] Hardware name: Bochs
> [12774.044882] Modules linked in:
> [12774.045685] Pid: 24352, comm: fsx Tainted: G W 3.8.0-rc3-00064-ge759417 #35
> [12774.047371] Call Trace:
> [12774.047914] [<c015c534>] warn_slowpath_common+0x68/0x7d
> [12774.049138] [<c02759c2>] ? ext4_da_update_reserve_space+0xdf/0x1e0
> [12774.050360] [<c015c55d>] warn_slowpath_null+0x14/0x18
> [12774.051022] [<c02759c2>] ext4_da_update_reserve_space+0xdf/0x1e0
> [12774.051807] [<c0297edb>] ext4_ext_map_blocks+0x12b4/0x13a9
> [12774.052581] [<c0199693>] ? trace_hardirqs_off+0xb/0xd
> [12774.053233] [<c019ba23>] ? __lock_acquire+0x534/0xc07
> [12774.053893] [<c0182bc3>] ? sched_clock_cpu+0x112/0x122
> [12774.054570] [<c0199693>] ? trace_hardirqs_off+0xb/0xd
> [12774.055236] [<c019a8b3>] ? lock_acquired+0x1d2/0x1da
> [12774.055944] [<c0275cd5>] ext4_map_blocks+0x212/0x344
> [12774.056589] [<c0278bef>] mpage_da_map_and_submit+0x94/0x3eb
> [12774.057323] [<c01e0c8f>] ? rcu_read_unlock+0x17/0x19
> [12774.057972] [<c01e0d5f>] ? find_get_pages_tag+0xce/0xed
> [12774.058658] [<c027903a>] write_cache_pages_da+0xf4/0x31f
> [12774.059412] [<c02794c0>] ext4_da_writepages+0x25b/0x3c3
> [12774.060140] [<c01e85d4>] do_writepages+0x23/0x2d
> [12774.060754] [<c01e1d85>] __filemap_fdatawrite_range+0x5a/0x62
> [12774.061506] [<c01e1db8>] filemap_write_and_wait_range+0x2b/0x58
> [12774.062346] [<c0272027>] ext4_sync_file+0x7f/0x2c0
> [12774.062975] [<c0182c05>] ? local_clock+0x32/0x49
> [12774.063593] [<c022d756>] vfs_fsync_range+0x36/0x41
> [12774.064228] [<c0271fa8>] ? ext4_file_write+0x42c/0x42c
> [12774.064902] [<c022d77a>] vfs_fsync+0x19/0x1b
> [12774.065526] [<c01ffb20>] sys_msync+0xec/0x154
> [12774.066093] [<c080e1d0>] ? __do_page_fault+0x346/0x346
> [12774.066767] [<c080b6c8>] syscall_call+0x7/0xb
> [12774.067348] [<c0800000>] ? acpi_processor_add+0x182/0x3e8
> [12774.068059] ---[ end trace c26cb86224ce73de ]---
> --

FWIW, this might not be a regression. I believe I've got this warning in my
testing logs from both 3.8 and 3.8-rc7 (commit 01a523eb51 in 3.9-rc1 affects
message format and line numbering) on both x86 and ARM. I didn't run xfstest
127 prior to 3.8-rc7, so I don't know how far back the warning may have
occurred for that particular test.

For 3.8 on x86, 127 triggered the warning in the 4k block, no journal,
metadata_csum, dioread_nolock, and bigalloc test cases.

For 3.8 on ARM/Pandaboard, 127 triggered the warning on metadata_csum.

The 3.8-rc7 results are a little different with respect to number of warnings
and test cases, so I'm thinking this one isn't completely deterministic,
testing-wise. Multiple test runs may be required to see it.

x86 trace from 3.8 follows.

Regards,
Eric


[ 2918.672138] EXT4-fs (vdb): ext4_da_update_reserve_space: ino 18944, allocated 1 with only 0 reserved metadata blocks
[ 2918.672138]
[ 2918.672144] ------------[ cut here ]------------
[ 2918.676682] WARNING: at fs/ext4/inode.c:362 ext4_da_update_reserve_space+0x249/0x260()
[ 2918.677882] Hardware name: Bochs
[ 2918.678812] Modules linked in: kvm_intel kvm snd_hda_intel snd_hda_codec snd_hwdep microcode snd_pcm snd_timer psmouse snd serio_raw soundcore snd_page_alloc i2c_piix4 virtio_balloon mac_hid lp parport floppy
[ 2918.681396] Pid: 13118, comm: fsx Not tainted 3.8.0-ext4testing #1
[ 2918.682268] Call Trace:
[ 2918.683129] [<ffffffff81058d5f>] warn_slowpath_common+0x7f/0xc0
[ 2918.684045] [<ffffffff81058dba>] warn_slowpath_null+0x1a/0x20
[ 2918.684922] [<ffffffff8123d469>] ext4_da_update_reserve_space+0x249/0x260
[ 2918.685821] [<ffffffff8126d36b>] ext4_ext_map_blocks+0x119b/0x16e0
[ 2918.686715] [<ffffffff8123d6a5>] ext4_map_blocks+0x225/0x2c0
[ 2918.687591] [<ffffffff81241f2d>] mpage_da_map_and_submit+0x15d/0x420
[ 2918.688485] [<ffffffff8124272e>] write_cache_pages_da+0x45e/0x4b0
[ 2918.689327] [<ffffffff81242ac9>] ext4_da_writepages+0x349/0x600
[ 2918.690170] [<ffffffff811494f1>] do_writepages+0x21/0x50
[ 2918.690989] [<ffffffff8113e8c9>] __filemap_fdatawrite_range+0x59/0x60
[ 2918.691794] [<ffffffff8113e920>] filemap_write_and_wait_range+0x50/0x70
[ 2918.692633] [<ffffffff81238671>] ext4_sync_file+0x71/0x350
[ 2918.693440] [<ffffffff811ca7eb>] vfs_fsync+0x2b/0x40
[ 2918.694233] [<ffffffff81170835>] sys_msync+0x145/0x1c0
[ 2918.695013] [<ffffffff816f27d9>] system_call_fastpath+0x16/0x1b
[ 2918.695792] ---[ end trace a322ff0e61019890 ]---




2013-03-11 21:22:41

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [REGRESSION] allocated N with only M reserved metadata blocks

On Mon, Mar 11, 2013 at 05:02:01PM -0400, Eric Whitney wrote:
>
> FWIW, this might not be a regression. I believe I've got this warning in my
> testing logs from both 3.8 and 3.8-rc7 (commit 01a523eb51 in 3.9-rc1 affects
> message format and line numbering) on both x86 and ARM. I didn't run xfstest
> 127 prior to 3.8-rc7, so I don't know how far back the warning may have
> occurred for that particular test.
>
> The 3.8-rc7 results are a little different with respect to number of warnings
> and test cases, so I'm thinking this one isn't completely deterministic,
> testing-wise. Multiple test runs may be required to see it.

Thanks for the extra data; this saved be a whole bunch of time, since
I probably would have started doing a series git bisects tonight. :-)

- Ted


2013-03-12 07:59:05

by Lukas Czerner

[permalink] [raw]
Subject: Re: [REGRESSION] allocated N with only M reserved metadata blocks

On Mon, 11 Mar 2013, Theodore Ts'o wrote:

> Date: Mon, 11 Mar 2013 17:22:39 -0400
> From: Theodore Ts'o <[email protected]>
> To: Eric Whitney <[email protected]>
> Cc: [email protected]
> Subject: Re: [REGRESSION] allocated N with only M reserved metadata blocks
>
> On Mon, Mar 11, 2013 at 05:02:01PM -0400, Eric Whitney wrote:
> >
> > FWIW, this might not be a regression. I believe I've got this warning in my
> > testing logs from both 3.8 and 3.8-rc7 (commit 01a523eb51 in 3.9-rc1 affects
> > message format and line numbering) on both x86 and ARM. I didn't run xfstest
> > 127 prior to 3.8-rc7, so I don't know how far back the warning may have
> > occurred for that particular test.
> >
> > The 3.8-rc7 results are a little different with respect to number of warnings
> > and test cases, so I'm thinking this one isn't completely deterministic,
> > testing-wise. Multiple test runs may be required to see it.
>
> Thanks for the extra data; this saved be a whole bunch of time, since
> I probably would have started doing a series git bisects tonight. :-)
>
> - Ted

I found that the problem is present even in 3.7.0, however it is not
present in 3.6.0. I guess we need to bisect now :)

-Lukas

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2013-03-12 09:48:33

by Lukas Czerner

[permalink] [raw]
Subject: Re: [REGRESSION] allocated N with only M reserved metadata blocks

On Tue, 12 Mar 2013, Luk?? Czerner wrote:

> Date: Tue, 12 Mar 2013 08:58:53 +0100 (CET)
> From: Luk?? Czerner <[email protected]>
> To: Theodore Ts'o <[email protected]>
> Cc: Eric Whitney <[email protected]>, [email protected]
> Subject: Re: [REGRESSION] allocated N with only M reserved metadata blocks
>
> On Mon, 11 Mar 2013, Theodore Ts'o wrote:
>
> > Date: Mon, 11 Mar 2013 17:22:39 -0400
> > From: Theodore Ts'o <[email protected]>
> > To: Eric Whitney <[email protected]>
> > Cc: [email protected]
> > Subject: Re: [REGRESSION] allocated N with only M reserved metadata blocks
> >
> > On Mon, Mar 11, 2013 at 05:02:01PM -0400, Eric Whitney wrote:
> > >
> > > FWIW, this might not be a regression. I believe I've got this warning in my
> > > testing logs from both 3.8 and 3.8-rc7 (commit 01a523eb51 in 3.9-rc1 affects
> > > message format and line numbering) on both x86 and ARM. I didn't run xfstest
> > > 127 prior to 3.8-rc7, so I don't know how far back the warning may have
> > > occurred for that particular test.
> > >
> > > The 3.8-rc7 results are a little different with respect to number of warnings
> > > and test cases, so I'm thinking this one isn't completely deterministic,
> > > testing-wise. Multiple test runs may be required to see it.
> >
> > Thanks for the extra data; this saved be a whole bunch of time, since
> > I probably would have started doing a series git bisects tonight. :-)
> >
> > - Ted
>
> I found that the problem is present even in 3.7.0, however it is not
> present in 3.6.0. I guess we need to bisect now :)
>
> -Lukas

Apparently the problem is caused by the commit:

67a5da564f97f31c4054d358e00b34d7ee570da5 ext4: make the zero-out
chunk size tunable

Reverting that commit on 3.7 makes the warning disappear. I'll try
to see what's going on there, however from the nature of the change
it seems to me that the problem might have always been there but the
change just makes it easier/possible to hit. We'll see...

-Lukas

2013-03-12 14:11:33

by Lukas Czerner

[permalink] [raw]
Subject: Re: [REGRESSION] allocated N with only M reserved metadata blocks

On Tue, 12 Mar 2013, Luk?? Czerner wrote:

> Date: Tue, 12 Mar 2013 10:48:27 +0100 (CET)
> From: Luk?? Czerner <[email protected]>
> To: Luk?? Czerner <[email protected]>
> Cc: Theodore Ts'o <[email protected]>, Eric Whitney <[email protected]>,
> [email protected]
> Subject: Re: [REGRESSION] allocated N with only M reserved metadata blocks
>
> On Tue, 12 Mar 2013, Luk?? Czerner wrote:
>
> > Date: Tue, 12 Mar 2013 08:58:53 +0100 (CET)
> > From: Luk?? Czerner <[email protected]>
> > To: Theodore Ts'o <[email protected]>
> > Cc: Eric Whitney <[email protected]>, [email protected]
> > Subject: Re: [REGRESSION] allocated N with only M reserved metadata blocks
> >
> > On Mon, 11 Mar 2013, Theodore Ts'o wrote:
> >
> > > Date: Mon, 11 Mar 2013 17:22:39 -0400
> > > From: Theodore Ts'o <[email protected]>
> > > To: Eric Whitney <[email protected]>
> > > Cc: [email protected]
> > > Subject: Re: [REGRESSION] allocated N with only M reserved metadata blocks
> > >
> > > On Mon, Mar 11, 2013 at 05:02:01PM -0400, Eric Whitney wrote:
> > > >
> > > > FWIW, this might not be a regression. I believe I've got this warning in my
> > > > testing logs from both 3.8 and 3.8-rc7 (commit 01a523eb51 in 3.9-rc1 affects
> > > > message format and line numbering) on both x86 and ARM. I didn't run xfstest
> > > > 127 prior to 3.8-rc7, so I don't know how far back the warning may have
> > > > occurred for that particular test.
> > > >
> > > > The 3.8-rc7 results are a little different with respect to number of warnings
> > > > and test cases, so I'm thinking this one isn't completely deterministic,
> > > > testing-wise. Multiple test runs may be required to see it.
> > >
> > > Thanks for the extra data; this saved be a whole bunch of time, since
> > > I probably would have started doing a series git bisects tonight. :-)
> > >
> > > - Ted
> >
> > I found that the problem is present even in 3.7.0, however it is not
> > present in 3.6.0. I guess we need to bisect now :)
> >
> > -Lukas
>
> Apparently the problem is caused by the commit:
>
> 67a5da564f97f31c4054d358e00b34d7ee570da5 ext4: make the zero-out
> chunk size tunable
>
> Reverting that commit on 3.7 makes the warning disappear. I'll try
> to see what's going on there, however from the nature of the change
> it seems to me that the problem might have always been there but the
> change just makes it easier/possible to hit. We'll see...
>
> -Lukas

So there is indeed a problem with the mentioned commit

67a5da564f97f31c4054d358e00b34d7ee570da5

Due to the bug in that code is has exactly the opposite result -
with this commit we will _never_ zero out blocks instead of creating
uninitialized extents. In other words, we will always create
uninitialized extent.

This can be easily fixed by the following patch (which makes the
warning go away), but it brings up a question whether this "optimization"
was worth it in the first place since noone noticed that it had exactly
the opposite effect than it should have had :)

However it still does not resolve the issue completely, because even
without the zeroout we should have had reserved enough metadata
blocks to cover the extent split. I still need to investigate
a little bit further.

-Lukas

---

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index cac80120..aeb80bc 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3227,7 +3227,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,

if (EXT4_EXT_MAY_ZEROOUT & split_flag)
max_zeroout = sbi->s_extent_max_zeroout_kb >>
- inode->i_sb->s_blocksize_bits;
+ (inode->i_sb->s_blocksize_bits - 10);

/* If extent is less than s_max_zeroout_kb, zeroout directly */
if (max_zeroout && (ee_len <= max_zeroout)) {

2013-03-12 16:19:22

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [REGRESSION] allocated N with only M reserved metadata blocks

On Tue, Mar 12, 2013 at 03:11:23PM +0100, Lukáš Czerner wrote:
> So there is indeed a problem with the mentioned commit
>
> 67a5da564f97f31c4054d358e00b34d7ee570da5
>
> Due to the bug in that code is has exactly the opposite result -
> with this commit we will _never_ zero out blocks instead of creating
> uninitialized extents. In other words, we will always create
> uninitialized extent.

Whoops. I even remember how this bug happened. Originally
max_zeroout was in file system blocks, and it was suggested that we
change this to use units of kilobytes instead. Unfortunately, this
change wasn't done completely. :-(

> This can be easily fixed by the following patch (which makes the
> warning go away), but it brings up a question whether this "optimization"
> was worth it in the first place since noone noticed that it had exactly
> the opposite effect than it should have had :)

Well, I had noticed that random AIO workloads resulted in the extent
tree getting far more fragmented than I had expected. (See previous
discusisons about how we really need to improve our ability to merge
empty leaf and index nodes in the extent tree.)

It will be worthwhile to fix this bug and then see how much remains of
extent tree fragmentation problem once this is fixed....

Thanks for the catch! When you have a chance, could you resend with a
commit description?

> However it still does not resolve the issue completely, because even
> without the zeroout we should have had reserved enough metadata
> blocks to cover the extent split. I still need to investigate
> a little bit further.

Yep, agreed. Sounds like there is more than one bug hiding here.

- Ted

2013-03-13 08:27:59

by Zheng Liu

[permalink] [raw]
Subject: Re: [REGRESSION] allocated N with only M reserved metadata blocks

On Tue, Mar 12, 2013 at 12:19:13PM -0400, Theodore Ts'o wrote:
> On Tue, Mar 12, 2013 at 03:11:23PM +0100, Lukáš Czerner wrote:
> > So there is indeed a problem with the mentioned commit
> >
> > 67a5da564f97f31c4054d358e00b34d7ee570da5
> >
> > Due to the bug in that code is has exactly the opposite result -
> > with this commit we will _never_ zero out blocks instead of creating
> > uninitialized extents. In other words, we will always create
> > uninitialized extent.
>
> Whoops. I even remember how this bug happened. Originally
> max_zeroout was in file system blocks, and it was suggested that we
> change this to use units of kilobytes instead. Unfortunately, this
> change wasn't done completely. :-(
>
> > This can be easily fixed by the following patch (which makes the
> > warning go away), but it brings up a question whether this "optimization"
> > was worth it in the first place since noone noticed that it had exactly
> > the opposite effect than it should have had :)
>
> Well, I had noticed that random AIO workloads resulted in the extent
> tree getting far more fragmented than I had expected. (See previous
> discusisons about how we really need to improve our ability to merge
> empty leaf and index nodes in the extent tree.)

Agree, we could do better in merging extent tree. I will take a look at
it, but, sorry, my plate is too full recently. So it has a low priority.

Regards,
- Zheng