Hi All,
When I run xfstest to do some tests in upstream kernel (commit id:
d43b7167), I get a warning from dmesg. The warning is as below:
Oct 8 22:39:45 lz-desktop kernel: EXT4-fs (sda1):
ext4_da_update_reserve_space: ino 21, allocated 1 with only 0 reserved
metadata blocks
Oct 8 22:39:45 lz-desktop kernel:
Oct 8 22:39:45 lz-desktop kernel: ------------[ cut here ]------------
Oct 8 22:39:45 lz-desktop kernel: WARNING: at fs/ext4/inode.c:362
ext4_da_update_reserve_space+0x116/0x224 [ext4]()
Oct 8 22:39:45 lz-desktop kernel: Hardware name: OptiPlex 780
Oct 8 22:39:45 lz-desktop kernel: Modules linked in: ext4(O) jbd2
crc16 autofs4 cpufreq_ondemand acpi_cpufreq mperf ipv6 dm_mirror
dm_region_hash dm_log dm_mod dcdbas pcspkr serio_raw i2c_i801 i2c_core
sg parport_pc parport snd_hda_codec_analog snd_hda_intel snd_hda_codec
snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc
e1000e button ext3 jbd sd_mod ahci libahci libata scsi_mod ehci_hcd
uhci_hcd
Oct 8 22:39:45 lz-desktop kernel: Pid: 9239, comm: fsx Tainted: G
W O 3.6.0+ #23
Oct 8 22:39:45 lz-desktop kernel: Call Trace:
Oct 8 22:39:45 lz-desktop kernel: [<ffffffff8203212e>]
warn_slowpath_common+0x85/0x9d
Oct 8 22:39:45 lz-desktop kernel: [<ffffffff82032160>]
warn_slowpath_null+0x1a/0x1c
Oct 8 22:39:45 lz-desktop kernel: [<ffffffffa0299c63>]
ext4_da_update_reserve_space+0x116/0x224 [ext4]
Oct 8 22:39:45 lz-desktop kernel: [<ffffffffa02c0a5d>]
ext4_ext_map_blocks+0xd42/0xf36 [ext4]
Oct 8 22:39:45 lz-desktop kernel: [<ffffffffa029a66e>] ?
ext4_map_blocks+0x10c/0x1f8 [ext4]
Oct 8 22:39:45 lz-desktop kernel: [<ffffffffa029a66e>] ?
ext4_map_blocks+0x10c/0x1f8 [ext4]
Oct 8 22:39:45 lz-desktop kernel: [<ffffffffa029a6a0>]
ext4_map_blocks+0x13e/0x1f8 [ext4]
Oct 8 22:39:45 lz-desktop kernel: [<ffffffffa029af29>]
mpage_da_map_and_submit+0x117/0x497 [ext4]
Oct 8 22:39:45 lz-desktop kernel: [<ffffffffa029ba55>]
ext4_da_writepages+0x37d/0x52b [ext4]
Oct 8 22:39:45 lz-desktop kernel: [<ffffffff8205fef5>] ?
sched_clock_local+0x1c/0x82
Oct 8 22:39:45 lz-desktop kernel: [<ffffffff820ca586>] do_writepages+0x23/0x2c
Oct 8 22:39:45 lz-desktop kernel: [<ffffffff820c2eb6>]
__filemap_fdatawrite_range+0x53/0x55
Oct 8 22:39:45 lz-desktop kernel: [<ffffffff820c2ee8>]
filemap_write_and_wait_range+0x30/0x59
Oct 8 22:39:45 lz-desktop kernel: [<ffffffffa0292916>]
ext4_sync_file+0x9a/0x35c [ext4]
Oct 8 22:39:45 lz-desktop kernel: [<ffffffff82060057>] ? local_clock+0x2b/0x3c
Oct 8 22:39:45 lz-desktop kernel: [<ffffffff82071b1f>] ?
lock_release_holdtime+0x1c/0x11f
Oct 8 22:39:45 lz-desktop kernel: [<ffffffff821261ff>]
vfs_fsync_range+0x1d/0x26
Oct 8 22:39:45 lz-desktop kernel: [<ffffffff82126224>] vfs_fsync+0x1c/0x1e
Oct 8 22:39:45 lz-desktop kernel: [<ffffffff820e7b3a>] sys_msync+0x116/0x190
Oct 8 22:39:45 lz-desktop kernel: [<ffffffff821cc7de>] ?
trace_hardirqs_on_thunk+0x3a/0x3f
Oct 8 22:39:45 lz-desktop kernel: [<ffffffff823a5492>]
system_call_fastpath+0x16/0x1b
Oct 8 22:39:45 lz-desktop kernel: ---[ end trace f6002ce762f77c1e ]---
...
This warning is printed when test case #127 and #225 are being run
with bigalloc and delalloc. You could use the following commands to
reproduce it using xfstest.
1. mkfs.ext4 -O bigalloc /dev/XXX1
2. mkfs.ext4 -O bigalloc /dev/XXX2
3. export MKFS_OPTIONS="-O bigalloc"
4. export MOUNT_OPTIONS="-o acl,user_xattr"
5. ./check 127
I notice that the warning is from ext4_da_update_reserve_space(), and
in this function a new 'if' statement is added (commit id: 97795d2a).
Hopefully it is useful to fix this problem. Thanks.
Regards,
Zheng
On Tue, Oct 09, 2012 at 01:48:20AM -0400, Theodore Ts'o wrote:
> On Mon, Oct 08, 2012 at 10:59:34PM +0800, Zheng Liu wrote:
> >
> > This warning is printed when test case #127 and #225 are being run
> > with bigalloc and delalloc. You could use the following commands to
> > reproduce it using xfstest.
>
> Ah, I see I misunderstood you; I thought you had said that the tests
> had failed, I didn't realize what what you had meant was that they
> were trigging a WARN_ON.
>
> I believe this warning is related to the general class of bugs
> surrounding bigalloc and delalloc, once we get the extent status tree
> in mainline, I'm hoping we will be able to use it to significantly
> improve how we handle bigalloc's delalloc handling.
>
> Here are the results I have from running my tests, as well as the
> results from right after the previous merge window for comparison....
Sorry for my bad expression. :-(
OK, I see. Thanks for the result. It is very useful for me to
understand whether or not my patches break something up.
Regards,
Zheng
On Tue, Oct 09, 2012 at 02:29:31PM +0800, Zheng Liu wrote:
>
> OK, I see. Thanks for the result. It is very useful for me to
> understand whether or not my patches break something up.
... and indeed, this is a WARN_ON(1) which is present in the
3.5.0-07665-gf7da9cd kernel, so it's also not a regression.
It is indeed, something we need to fix, and it's part of the problem
where where the delayed allocation for bigalloc is completely screwed
up. Part of the problem is when we write into a cluster which has not
yet been mapped in the extent tree, but which might (or might not)
have had other blocks in the cluster that have already been subject to
delayed allocation, we don't know whether to reserve clusters for the
purposes of doing the the delayed allocation accounting. Fixing this
w/o the extent status tree means having to search the page cache and
for other pages in the cluster, which is not only painful, but tricky
from the perspective of lock ordering.
Unfortunately, I didn't notice this problem originally because I
hadn't been doing regular xfstests runs with bigalloc, and most of my
testing had been with direct I/O, where these issues didn't come up.
- Ted
On Tue, Oct 9, 2012 at 9:51 AM, Theodore Ts'o <[email protected]> wrote:
> It is indeed, something we need to fix, and it's part of the problem
> where where the delayed allocation for bigalloc is completely screwed
> up. Part of the problem is when we write into a cluster which has not
> yet been mapped in the extent tree, but which might (or might not)
> have had other blocks in the cluster that have already been subject to
> delayed allocation, we don't know whether to reserve clusters for the
> purposes of doing the the delayed allocation accounting. Fixing this
> w/o the extent status tree means having to search the page cache and
> for other pages in the cluster, which is not only painful, but tricky
> from the perspective of lock ordering.
>
> Unfortunately, I didn't notice this problem originally because I
> hadn't been doing regular xfstests runs with bigalloc, and most of my
> testing had been with direct I/O, where these issues didn't come up.
>
> - Ted
Hi Ted,
Does it mean I'd better turn off delalloc if I use bigalloc with linux 3.5.3?
Regards,
Andrey.
On Wed, Oct 10, 2012 at 01:02:52PM -0400, Andrey Sidorov wrote:
> On Tue, Oct 9, 2012 at 9:51 AM, Theodore Ts'o <[email protected]> wrote:
>
> > It is indeed, something we need to fix, and it's part of the problem
> > where where the delayed allocation for bigalloc is completely screwed
> > up. Part of the problem is when we write into a cluster which has not
> > yet been mapped in the extent tree, but which might (or might not)
> > have had other blocks in the cluster that have already been subject to
> > delayed allocation, we don't know whether to reserve clusters for the
> > purposes of doing the the delayed allocation accounting. Fixing this
> > w/o the extent status tree means having to search the page cache and
> > for other pages in the cluster, which is not only painful, but tricky
> > from the perspective of lock ordering.
> >
> > Unfortunately, I didn't notice this problem originally because I
> > hadn't been doing regular xfstests runs with bigalloc, and most of my
> > testing had been with direct I/O, where these issues didn't come up.
> >
> > - Ted
>
> Hi Ted,
>
> Does it mean I'd better turn off delalloc if I use bigalloc with linux 3.5.3?
Hi Andrey,
This warning is only triggered in a stress test case. In our product
system we never meet this warning, certainly we have backported bigalloc
to 2.6.32 kernel, though. So IMHO we needn't turn off delalloc.
Regards,
Zheng
On Thu, Oct 11, 2012 at 11:57 AM, Zheng Liu <[email protected]> wrote:
> Hi Andrey,
>
> This warning is only triggered in a stress test case. In our product
> system we never meet this warning, certainly we have backported bigalloc
> to 2.6.32 kernel, though. So IMHO we needn't turn off delalloc.
>
> Regards,
> Zheng
Hi Zheng,
I didn't see it as well, but wanted to be sure it won't break file
system once occurred.
Our use-case is mostly dio and pre-fallocated writes, so I think we'll be fine.
Thanks!
Regards,
Andrey.
On Sat, Oct 13, 2012 at 02:29:48PM -0400, Andrey Sidorov wrote:
> I didn't see it as well, but wanted to be sure it won't break file
> system once occurred.
> Our use-case is mostly dio and pre-fallocated writes, so I think we'll be fine.
> Thanks!
Yes, that will work just fine. That in fact was the use case I was
primarily interested in when I did the bigalloc work. The main
weaknesses come from using buffered writes with delayed allocation and
when the disk is almost full.
- Ted