Hello everyone,
I decided to switch to ext4 today. My kernel is 2.6.27-rc3 with some
additional patches, but none to the ext4 code. Here is my system setup:
* /dev/sda3 is encrypted with LUKS/device-mapper
* lvm2 on the encrypted partition
* two logical volumes with ext4 inside the lvm2
The ext4 filesystems have been converted from ext3 using this command:
$ tune2fs -O extents -E test_fs /dev/mapper/<device>
After mounting the partitions and logging in it took half a minute to hang
the system (or at least freeze all applications that access the fs). The
log contains the following:
kernel BUG at fs/ext4/mballoc.c:3963!
invalid opcode: 0000 [#1] SMP
Modules linked in: iwl3945 snd_hda_intel vboxdrv [last unloaded: iwl3945]
Pid: 5487, comm: opera Not tainted (2.6.27-rc3 #1)
EIP: 0060:[<c0232a93>] EFLAGS: 00210202 CPU: 0
EIP is at ext4_mb_free_blocks+0x593/0x5a0
EAX: ede02038 EBX: 0000000c ECX: 00000173 EDX: ede0223c
ESI: 00000000 EDI: 00000000 EBP: ed8cdc78 ESP: ed8cdbfc
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process opera (pid: 5487, ti=ed8cc000 task=ed8b6570 task.ti=ed8cc000)
Stack: 0000000c ed8cdc68 c07163c0 ed8cdc30 c015e36b ede020b8 ede043ec
f5e85880
edd72a94 f5c14e00 ede0d76c f609f100 00000000 f5e85800 ede02038
ede0223c
000405cc c15b1620 ed8b1000 c15cf060 ee783000 f5dc8cc0 f5c14e00
0000000c
Call Trace:
[<c015e36b>] ? find_get_page+0x2b/0xc0
[<c0215887>] ? ext4_free_blocks+0x87/0x160
[<c021d0f3>] ? ext4_clear_blocks+0xd3/0x100
[<c021d295>] ? ext4_free_data+0x175/0x1d0
[<c021da6e>] ? ext4_truncate+0x56e/0x640
[<c0167e41>] ? truncate_inode_pages_range+0x181/0x350
[<c016e027>] ? unmap_mapping_range+0x87/0x250
[<c016e3c5>] ? vmtruncate+0xc5/0x190
[<c0196885>] ? inode_setattr+0x65/0x1a0
[<c021980a>] ? ext4_setattr+0x23a/0x320
[<c0196ab5>] ? notify_change+0xf5/0x350
[<c0180648>] ? do_truncate+0x68/0x90
[<c018b4ca>] ? path_walk+0x9a/0xb0
[<c018a41e>] ? may_open+0x1be/0x250
[<c018c43f>] ? do_filp_open+0xff/0x7e0
[<c0101d03>] ? __switch_to+0x153/0x160
[<c019788e>] ? alloc_fd+0x6e/0x100
[<c017f789>] ? do_sys_open+0x59/0xf0
[<c017f889>] ? sys_open+0x29/0x40
[<c01032b1>] ? sysenter_do_call+0x12/0x25
=======================
Code: 90 8d b4 26 00 00 00 00 85 c0 75 83 8b 5d a4 f0 ff 4b 30 8b 45 b4 8b
75 c4 89 45 08 e9 57 fb ff ff bf fb ff ff ff e9 89 fc ff ff <0f> 0b eb fe
89 f6 8d bc 27 00 00 00 00 55 89 e5 57 56 53 83 ec
EIP: [<c0232a93>] ext4_mb_free_blocks+0x593/0x5a0 SS:ESP 0068:ed8cdbfc
Any help is welcome. ;)
Regards,
Chris
On Wed, Aug 13, 2008 at 08:28:18PM +0200, [email protected] wrote:
>
> After mounting the partitions and logging in it took half a minute to hang
> the system (or at least freeze all applications that access the fs). The
> log contains the following:
>
> kernel BUG at fs/ext4/mballoc.c:3963!
This means that we tried to truncate/delete a file while there were
still blocks on i_prealloc_list. I think I see the problem. And the
reason why we haven't noticed it is that it only shows up if you have
an indirect block-based file, and you truncate it when you have
previously been writing to it (so i_prealloc_list is not empty).
The problem is that we call ext4_discard_reservation() too late, after
we've started calling ext4_free_branches(), which calls
ext4_free_blocks(), which ultimately calls
ext4_mb_return_to_preallocation(), which is what is BUG-checking.
Can you reproduce the bug? Things are a little busy on my end, so I
don't have time to try to create a reproducer and test the patch, at
least not for a day or so. The following patch passes the "It Builds,
Ship It!" test, but not much else. :-)
If you could report (a) whether or not you can reproduce the failure,
and (b) whether this patch fixes things, I would be most grateful.
Thanks, regards,
- Ted
commit b86b40e630893e74d3259f129060cfcb115f7fb9
Author: Theodore Ts'o <[email protected]>
Date: Wed Aug 13 16:07:32 2008 -0400
ext4: Fix potential truncate BUG due to i_prealloc_list being non-empty
We need to call ext4_discard_reservation() earlier in ext4_truncate(),
to avoid a BUG() in ext4_mb_return_to_preallocation(), which is called
(ultimately) by ext4_free_blocks(). So we must ditch the blocks on
i_prealloc_list before we start freeing the data blocks.
Signed-off-by: "Theodore Ts'o" <[email protected]>
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 368ec6b..7f7b0c5 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3512,6 +3512,9 @@ void ext4_truncate(struct inode *inode)
* modify the block allocation tree.
*/
down_write(&ei->i_data_sem);
+
+ ext4_discard_reservation(inode);
+
/*
* The orphan list entry will now protect us from any crash which
* occurs before the truncate completes, so it is now safe to propagate
@@ -3581,8 +3584,6 @@ do_indirects:
;
}
- ext4_discard_reservation(inode);
On Wed, Aug 13, 2008 at 10:55:07PM +0200, Christian Hesse wrote:
> >
> > Can you reproduce the bug?
>
> I can. ;)
Great! I haven't been able to reproduce it myself, but I was using a
very quick test case on small toy filesystem. My ext4 root filesystem
is 100% extent based files --- and this bug is only happening on
old-style files with indirect-block-style inodes. And I'm trying to
get things done before leaving for a family reunion on Friday, so I
just don't have the time to set up a reproducer.
Can you verify whether or not my patch fixes things for you?
Thanks, regards,
- Ted
On Wednesday 13 August 2008, Theodore Tso wrote:
> On Wed, Aug 13, 2008 at 10:55:07PM +0200, Christian Hesse wrote:
> > > Can you reproduce the bug?
> >
> > I can. ;)
>
> Great! I haven't been able to reproduce it myself, but I was using a
> very quick test case on small toy filesystem. My ext4 root filesystem
> is 100% extent based files --- and this bug is only happening on
> old-style files with indirect-block-style inodes. And I'm trying to
> get things done before leaving for a family reunion on Friday, so I
> just don't have the time to set up a reproducer.
>
> Can you verify whether or not my patch fixes things for you?
Please look at the bottom of my last two mails... That was with your patch
applied.
--
Regards,
Chris
I'm not sure if the last mail left my system, so I send it again...
On Wednesday 13 August 2008, you wrote:
> On Wed, Aug 13, 2008 at 08:28:18PM +0200, [email protected] wrote:
> > After mounting the partitions and logging in it took half a minute to
> > hang the system (or at least freeze all applications that access the fs).
> > The log contains the following:
> >
> > kernel BUG at fs/ext4/mballoc.c:3963!
>
> This means that we tried to truncate/delete a file while there were
> still blocks on i_prealloc_list. I think I see the problem. And the
> reason why we haven't noticed it is that it only shows up if you have
> an indirect block-based file, and you truncate it when you have
> previously been writing to it (so i_prealloc_list is not empty).
>
> The problem is that we call ext4_discard_reservation() too late, after
> we've started calling ext4_free_branches(), which calls
> ext4_free_blocks(), which ultimately calls
> ext4_mb_return_to_preallocation(), which is what is BUG-checking.
>
> Can you reproduce the bug?
I can. ;)
> Things are a little busy on my end, so I
> don't have time to try to create a reproducer and test the patch, at
> least not for a day or so. The following patch passes the "It Builds,
> Ship It!" test, but not much else. :-)
>
> If you could report (a) whether or not you can reproduce the failure,
> and (b) whether this patch fixes things, I would be most grateful.
This time I got the following:
kernel BUG at fs/ext4/inode.c:1568!
invalid opcode: 0000 [#1] SMP
Modules linked in: snd_hda_intel vboxdrv iwl3945
Pid: 4049, comm: kontact Not tainted (2.6.27-rc3 #1)
EIP: 0060:[<c021aac5>] EFLAGS: 00010202 CPU: 0
EIP is at ext4_da_invalidatepage+0xa5/0x120
EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX: 000003ff
ESI: eeb900b8 EDI: eeb90138 EBP: ef165d94 ESP: ef165d70
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process kontact (pid: 4049, ti=ef164000 task=ef16c430 task.ti=ef164000)
Stack: 00000000 eeb902d8 00000000 c1d7f600 f7314000 00000000 c021aa20 00000001
c1d7f600 ef165da0 c0167799 c1d7f600 ef165dac c0167ca9 00000000 ef165e2c
c0167dd1 0000000e eeb6e2a8 00000001 00000003 f7380078 00000000 00000000
Call Trace:
[<c021aa20>] ? ext4_da_invalidatepage+0x0/0x120
[<c0167799>] ? do_invalidatepage+0x19/0x20
[<c0167ca9>] ? truncate_complete_page+0x49/0x60
[<c0167dd1>] ? truncate_inode_pages_range+0x111/0x350
[<c023d7ec>] ? jbd2_journal_stop+0x14c/0x1d0
[<c016802a>] ? truncate_inode_pages+0x1a/0x20
[<c021db6e>] ? ext4_delete_inode+0x2e/0x290
[<c021db40>] ? ext4_delete_inode+0x0/0x290
[<c01964ac>] ? generic_delete_inode+0x7c/0x120
[<c0196685>] ? generic_drop_inode+0x135/0x160
[<c0195547>] ? iput+0x47/0x50
[<c0192cd7>] ? dentry_iput+0x67/0xb0
[<c0192da5>] ? d_kill+0x35/0x60
[<c0193496>] ? dput+0x76/0x120
[<c018b9bb>] ? sys_renameat+0x1cb/0x200
[<c01768dc>] ? free_pages_and_swap_cache+0x7c/0xa0
[<c0171156>] ? remove_vma+0x46/0x60
[<c01720eb>] ? do_munmap+0x1db/0x230
[<c018ba19>] ? sys_rename+0x29/0x30
[<c01032b1>] ? sysenter_do_call+0x12/0x25
=======================
Code: 87 a0 01 00 00 89 45 e0 e8 09 33 32 00 8b 5d f0 89 f8 8b 96 10 02 00 00
29 da e8 17 ff ff ff 89 c3 8b 86 14 02 00 00 39 c3 76 2a <0f> 0b eb fe 89 9e
14 02 00 00 8b 55 e0 fe 87 a0 01 00 00 8b 55
EIP: [<c021aac5>] ext4_da_invalidatepage+0xa5/0x120 SS:ESP 0068:ef165d70
And another one:
kernel BUG at fs/ext4/inode.c:1568!
invalid opcode: 0000 [#1] SMP
Modules linked in: snd_hda_intel vboxdrv iwl3945
Pid: 4097, comm: kontact Not tainted (2.6.27-rc3 #1)
EIP: 0060:[<c021aac5>] EFLAGS: 00010202 CPU: 1
EIP is at ext4_da_invalidatepage+0xa5/0x120
EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX: 000003ff
ESI: ed9404c0 EDI: ed940540 EBP: ee7d7dfc ESP: ee7d7dd8
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process kontact (pid: 4097, ti=ee7d6000 task=ee7d51e0 task.ti=ee7d6000)
Stack: ec9ef3ec ed9406e0 00000000 c1e586c0 f5a3f800 00000000 c021aa20 00000001
c1e586c0 ee7d7e08 c0167799 c1e586c0 ee7d7e14 c0167ca9 00000000 ee7d7e94
c0167dd1 0000000e 00000296 ee6a04c0 ee6a04fc ec9ef3ec 00000000 00000000
Call Trace:
[<c021aa20>] ? ext4_da_invalidatepage+0x0/0x120
[<c0167799>] ? do_invalidatepage+0x19/0x20
[<c0167ca9>] ? truncate_complete_page+0x49/0x60
[<c0167dd1>] ? truncate_inode_pages_range+0x111/0x350
[<c0227517>] ? __ext4_journal_stop+0x27/0x60
[<c02195a5>] ? ext4_dirty_inode+0x55/0x80
[<c016802a>] ? truncate_inode_pages+0x1a/0x20
[<c021db6e>] ? ext4_delete_inode+0x2e/0x290
[<c021db40>] ? ext4_delete_inode+0x0/0x290
[<c01964ac>] ? generic_delete_inode+0x7c/0x120
[<c0196685>] ? generic_drop_inode+0x135/0x160
[<c0195547>] ? iput+0x47/0x50
[<c0192cd7>] ? dentry_iput+0x67/0xb0
[<c0192da5>] ? d_kill+0x35/0x60
[<c0193496>] ? dput+0x76/0x120
[<c0182550>] ? __fput+0x110/0x160
[<c01825bf>] ? fput+0x1f/0x30
[<c017113d>] ? remove_vma+0x2d/0x60
[<c01720eb>] ? do_munmap+0x1db/0x230
[<c0172170>] ? sys_munmap+0x30/0x50
[<c01032b1>] ? sysenter_do_call+0x12/0x25
=======================
Code: 87 a0 01 00 00 89 45 e0 e8 09 33 32 00 8b 5d f0 89 f8 8b 96 10 02 00 00
29 da e8 17 ff ff ff 89 c3 8b 86 14 02 00 00 39 c3 76 2a <0f> 0b eb fe 89 9e
14 02 00 00 8b 55 e0 fe 87 a0 01 00 00 8b 55
EIP: [<c021aac5>] ext4_da_invalidatepage+0xa5/0x120 SS:ESP 0068:ee7d7dd8
--
Regards,
Chris
On Wednesday 13 August 2008, you wrote:
> On Wed, Aug 13, 2008 at 08:28:18PM +0200, [email protected] wrote:
> > After mounting the partitions and logging in it took half a minute to
> > hang the system (or at least freeze all applications that access the fs).
> > The log contains the following:
> >
> > kernel BUG at fs/ext4/mballoc.c:3963!
>
> This means that we tried to truncate/delete a file while there were
> still blocks on i_prealloc_list. I think I see the problem. And the
> reason why we haven't noticed it is that it only shows up if you have
> an indirect block-based file, and you truncate it when you have
> previously been writing to it (so i_prealloc_list is not empty).
>
> The problem is that we call ext4_discard_reservation() too late, after
> we've started calling ext4_free_branches(), which calls
> ext4_free_blocks(), which ultimately calls
> ext4_mb_return_to_preallocation(), which is what is BUG-checking.
>
> Can you reproduce the bug?
I can. ;)
> Things are a little busy on my end, so I
> don't have time to try to create a reproducer and test the patch, at
> least not for a day or so. The following patch passes the "It Builds,
> Ship It!" test, but not much else. :-)
>
> If you could report (a) whether or not you can reproduce the failure,
> and (b) whether this patch fixes things, I would be most grateful.
This time I got the following:
kernel BUG at fs/ext4/inode.c:1568!
invalid opcode: 0000 [#1] SMP
Modules linked in: snd_hda_intel vboxdrv iwl3945
Pid: 4049, comm: kontact Not tainted (2.6.27-rc3 #1)
EIP: 0060:[<c021aac5>] EFLAGS: 00010202 CPU: 0
EIP is at ext4_da_invalidatepage+0xa5/0x120
EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX: 000003ff
ESI: eeb900b8 EDI: eeb90138 EBP: ef165d94 ESP: ef165d70
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process kontact (pid: 4049, ti=ef164000 task=ef16c430 task.ti=ef164000)
Stack: 00000000 eeb902d8 00000000 c1d7f600 f7314000 00000000 c021aa20 00000001
c1d7f600 ef165da0 c0167799 c1d7f600 ef165dac c0167ca9 00000000 ef165e2c
c0167dd1 0000000e eeb6e2a8 00000001 00000003 f7380078 00000000 00000000
Call Trace:
[<c021aa20>] ? ext4_da_invalidatepage+0x0/0x120
[<c0167799>] ? do_invalidatepage+0x19/0x20
[<c0167ca9>] ? truncate_complete_page+0x49/0x60
[<c0167dd1>] ? truncate_inode_pages_range+0x111/0x350
[<c023d7ec>] ? jbd2_journal_stop+0x14c/0x1d0
[<c016802a>] ? truncate_inode_pages+0x1a/0x20
[<c021db6e>] ? ext4_delete_inode+0x2e/0x290
[<c021db40>] ? ext4_delete_inode+0x0/0x290
[<c01964ac>] ? generic_delete_inode+0x7c/0x120
[<c0196685>] ? generic_drop_inode+0x135/0x160
[<c0195547>] ? iput+0x47/0x50
[<c0192cd7>] ? dentry_iput+0x67/0xb0
[<c0192da5>] ? d_kill+0x35/0x60
[<c0193496>] ? dput+0x76/0x120
[<c018b9bb>] ? sys_renameat+0x1cb/0x200
[<c01768dc>] ? free_pages_and_swap_cache+0x7c/0xa0
[<c0171156>] ? remove_vma+0x46/0x60
[<c01720eb>] ? do_munmap+0x1db/0x230
[<c018ba19>] ? sys_rename+0x29/0x30
[<c01032b1>] ? sysenter_do_call+0x12/0x25
=======================
Code: 87 a0 01 00 00 89 45 e0 e8 09 33 32 00 8b 5d f0 89 f8 8b 96 10 02 00 00
29 da e8 17 ff ff ff 89 c3 8b 86 14 02 00 00 39 c3 76 2a <0f> 0b eb fe 89 9e
14 02 00 00 8b 55 e0 fe 87 a0 01 00 00 8b 55
EIP: [<c021aac5>] ext4_da_invalidatepage+0xa5/0x120 SS:ESP 0068:ef165d70
--
Regards,
Chris
On Wed, Aug 13, 2008 at 11:07:06PM +0200, Christian Hesse wrote:
>
> Please look at the bottom of my last two mails... That was with your patch
> applied.
Sorry, I missed it. The new BUG seems to be a bug in the delayed
allocation code, specifically here, in fs/ext4/inode.c:ext4_da_release_space():
/* figure out how many metablocks to release */
BUG_ON(mdb > EXT4_I(inode)->i_reserved_meta_blocks);
mdb_free = EXT4_I(inode)->i_reserved_meta_blocks - mdb;
I've quickly looked at the code, and how i_reserved_meta_blocks gets
updated, and nothing *obviously* wrong is jumping out at me. Anyone
else have time to investigate this a bit more deeply?
- Ted
Theodore Tso wrote:
> On Wed, Aug 13, 2008 at 11:07:06PM +0200, Christian Hesse wrote:
>> Please look at the bottom of my last two mails... That was with your patch
>> applied.
>
> Sorry, I missed it. The new BUG seems to be a bug in the delayed
> allocation code, specifically here, in fs/ext4/inode.c:ext4_da_release_space():
>
> /* figure out how many metablocks to release */
> BUG_ON(mdb > EXT4_I(inode)->i_reserved_meta_blocks);
> mdb_free = EXT4_I(inode)->i_reserved_meta_blocks - mdb;
>
> I've quickly looked at the code, and how i_reserved_meta_blocks gets
> updated, and nothing *obviously* wrong is jumping out at me. Anyone
> else have time to investigate this a bit more deeply?
I don't :), but I tried a quick reproducer anyway and couldn't hit it ...
mkfs.ext3, mount, create non-extents file
umount, tune2fs to ext4
mount as ext4, write to file, open file O_TRUNC
... didn't oops for me :(
-Eric
On Wed, Aug 13, 2008 at 05:19:55PM -0500, Eric Sandeen wrote:
>
> I don't :), but I tried a quick reproducer anyway and couldn't hit it ...
>
> mkfs.ext3, mount, create non-extents file
> umount, tune2fs to ext4
> mount as ext4, write to file, open file O_TRUNC
>
> ... didn't oops for me :(
>
In this particular case, we need to open the file for appending, write
to it, and then unlink it in order to trigger it, I think.
- Ted
在 2008-08-13三的 18:01 -0400,Theodore Tso写道:
> On Wed, Aug 13, 2008 at 11:07:06PM +0200, Christian Hesse wrote:
> >
> > Please look at the bottom of my last two mails... That was with your patch
> > applied.
>
> Sorry, I missed it. The new BUG s without checking there is notch weems to be a bug in the delayed
> allocation code, specifically here, in fs/ext4/inode.c:ext4_da_release_space():
>
> /* figure out how many metablocks to release */
> BUG_ON(mdb > EXT4_I(inode)->i_reserved_meta_blocks);
> mdb_free = EXT4_I(inode)->i_reserved_meta_blocks - mdb;
>
> I've quickly looked at the code, and how i_reserved_meta_blocks gets
> updated, and nothing *obviously* wrong is jumping out at me. Anyone
> else have time to investigate this a bit more deeply?
>
I could reproduce it.
This patch works for me on top of Ted's change. Christian, could you
try it?
---------------------------------------------------
Ext4: Fix delalloc release block reservation for truncate
From: Mingming Cao <[email protected]>
Ext4 will release the reserved blocks for delalloc
when inode is truncated/unlinked. If there is no reserved block at all,
we shouldn't need to do so. But current code still tries to release the
reserved blocks regardless whether the counters's value is 0.
Continue doing that causes the later calculation went wrong and a kernel BUG_ON()
catched that. This doesn't happen for originally extent format file, as the calculation
for 0 reserved blocks was right for extent based file.
This patch fixed the kernel BUG() due to above reason. It adds checks for 0 to
avoid unnecessary release and fix calculation for non extent files.
Signed-off-by: Mingming Cao <[email protected]>
Index: linux-2.6.27-rc1/fs/ext4/inode.c
===================================================================
--- linux-2.6.27-rc1.orig/fs/ext4/inode.c 2008-08-13 15:29:35.000000000 -0700
+++ linux-2.6.27-rc1/fs/ext4/inode.c 2008-08-13 16:22:14.000000000 -0700
@@ -1007,6 +1007,9 @@ static int ext4_indirect_calc_metadata_a
*/
static int ext4_calc_metadata_amount(struct inode *inode, int blocks)
{
+ if (!blocks)
+ return 0;
+
if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL)
return ext4_ext_calc_metadata_amount(inode, blocks);
@@ -1553,8 +1556,27 @@ static void ext4_da_release_space(struct
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
int total, mdb, mdb_free, release;
+ if (!to_free){
+ /* Nothing to release, exit */
+ return;
+ }
+
spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
+ if (!EXT4_I(inode)->i_reserved_data_blocks){
+ /*
+ * if there is no reserved blocks, but we try to free some
+ * then the counter is messed up somewhere.
+ * but since this function is called from invalidate
+ * page, it's harmless to return without any action
+ */
+ printk(KERN_INFO "ext4 delalloc try to release %d reserved"
+ "blocks for inode %lu, but there is no reserved"
+ "data blocks\n", inode->i_ino, to_free);
+ spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
+ return;
+ }
+
/* recalculate the number of metablocks still need to be reserved */
total = EXT4_I(inode)->i_reserved_data_blocks - to_free;
mdb = ext4_calc_metadata_amount(inode, total);
@@ -3592,7 +3614,7 @@ void ext4_truncate(struct inode *inode)
* ext4 *really* writes onto the disk inode.
*/
ei->i_disksize = inode->i_size;
-
+
if (n == 1) { /* direct blocks */
ext4_free_data(handle, inode, NULL, i_data+offsets[0],
i_data + EXT4_NDIR_BLOCKS);
On Wed, Aug 13, 2008 at 05:10:53PM -0700, Mingming Cao wrote:
> I could reproduce it.
>
> This patch works for me on top of Ted's change. Christian, could you
> try it?
Thanks, mingming! Looks good and I've added it to the patch queue.
- Ted
[email protected] wrote on 08/14/2008 04:15:34 AM:
> Theodore Tso <[email protected]>
> Sent by: [email protected]
>
> 08/14/2008 04:15 AM
>
> To
>
> Eric Sandeen <[email protected]>
>
> cc
>
> Christian Hesse <[email protected]>, [email protected]
>
> Subject
>
> Re: Oops with ext4 from 2.6.27-rc3
>
> On Wed, Aug 13, 2008 at 05:19:55PM -0500, Eric Sandeen wrote:
> >
> > I don't :), but I tried a quick reproducer anyway and couldn't hit it
...
> >
> > mkfs.ext3, mount, create non-extents file
> > umount, tune2fs to ext4
> > mount as ext4, write to file, open file O_TRUNC
> >
> > ... didn't oops for me :(
> >
>
> In this particular case, we need to open the file for appending, write
> to it, and then unlink it in order to trigger it, I think.
>
> - Ted
Hi All,
Actually the same oops hit for me too with 2.6.27-rc1, i was suppose to
open a bug but found it active, but my scenario was same what Ted is
telling to do.
I have created and lvm partition with 30GB, and wrote 2.3GB file on it and
when i am trying to remove this file with "rm -rf "file_name" then i hit
the call trace, as below , but for me machine is in pingable state, rather
i can not execute any command inside the ext4 mounted directory and even i
can not umount the filesystem :(
Here is the call trace i am getting, it seems same as christian has
reported and also i have pasted the code where it is getting oops.
------------[ cut here ]------------
kernel BUG at fs/ext4/mballoc.c:3963!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc
dm_mirror dm_log dm_multipath dm_mod sbs sbshc battery ac parport_pc lp
parport sr_mod sg ide_cd_mod cdrom serio_raw button rtc_cmos rtc_core
rtc_lib tg3 libphy k8temp hwmon i2c_piix4 pcspkr i2c_core usb_storage
sata_svw libata mptspi mptscsih scsi_transport_spi mptbase sd_mod scsi_mod
ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 8852, comm: rm Not tainted (2.6.27-rc1 #2)
EIP: 0060:[<c04c98f0>] EFLAGS: 00010297 CPU: 0
EIP is at ext4_mb_free_blocks+0x3b6/0x51a
EAX: e56a5fcc EBX: 0000033f ECX: 00000000 EDX: e56a5e40
ESI: 0000000c EDI: 00000000 EBP: f5fb7de8 ESP: f5fb7d74
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process rm (pid: 8852, ti=f5fb7000 task=f43b3b80 task.ti=f5fb7000)
Stack: 0067ee0d 00000000 0067ee01 e56a5e40 df0cc490 0067ee01 00000000
cca06a48
ebfe5df0 d14d2f80 ddbcffe0 00000000 e7ad0000 00000000 0067ee0d
c966c860
f35c3c00 c966c860 f35c3800 e7d2b7b8 ebfe5df0 0000000a 0000033f
0000033f
Call Trace:
[<c04b02df>] ? ext4_free_blocks+0x9b/0x100
[<c04b4c84>] ? ext4_clear_blocks+0xcb/0xd6
[<c04b4d54>] ? ext4_free_data+0xc5/0x117
[<c04b5056>] ? ext4_truncate+0x129/0x3d5
[<c046ef0d>] ? kmem_cache_alloc+0x75/0xad
[<c04cf0ff>] ? jbd2_journal_start+0x53/0xb4
[<c04cf0ff>] ? jbd2_journal_start+0x53/0xb4
[<c04cf133>] ? jbd2_journal_start+0x87/0xb4
[<c04bbff7>] ? ext4_journal_start_sb+0x40/0x42
[<c04b7153>] ? ext4_delete_inode+0xab/0x10a
[<c04b70a8>] ? ext4_delete_inode+0x0/0x10a
[<c04828c5>] ? generic_delete_inode+0x98/0xff
[<c048293e>] ? generic_drop_inode+0x12/0x126
[<c0482221>] ? iput+0x4b/0x4e
[<c047ad43>] ? do_unlinkat+0xa9/0x112
[<c047c98c>] ? vfs_readdir+0x7e/0x8f
[<c047c780>] ? filldir64+0x0/0xcd
[<c044a123>] ? audit_syscall_entry+0x101/0x12b
[<c047adbc>] ? sys_unlink+0x10/0x12
[<c040383d>] ? sysenter_do_call+0x12/0x21
=======================
Code: 19 c0 85 c0 75 f0 8b 45 98 8d 55 c8 8b 4d f0 ff 75 08 e8 80 c2 ff ff
8b 45 98 8b 55 98 5e 05 8c 01 00 00 39 82 8c 01 00 00 74 04 <0f> 0b eb fe
8b 4d ac 8b 5d e8 8b 81 a0 01 00 00 89 da 8b 48 58
EIP: [<c04c98f0>] ext4_mb_free_blocks+0x3b6/0x51a SS:ESP 0068:f5fb7d74
---[ end trace d6eba9fe5baf0f00 ]---
================ Code of fs/ext4/mballoc.c
=================================
3952 /*
3953 * finds all preallocated spaces and return blocks being freed to them
3954 * if preallocated space becomes full (no block is used from the
space)
3955 * then the function frees space in buddy
3956 * XXX: at the moment, truncate (which is the only way to free blocks)
3957 * discards all preallocations
3958 */
3959 static void ext4_mb_return_to_preallocation(struct inode *inode,
3960 struct ext4_buddy *e4b,
3961 sector_t block, int count)
3962 {
3963 BUG_ON(!list_empty(&EXT4_I(inode)->i_prealloc_list));
3964 }
-Rishi
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Mingming & Ted, Thanks for your patch. Now it is working fine for me.
<TESTED-By: Rishikesh K Rajak [email protected] >
> Ext4: Fix delalloc release block reservation for truncate
>
> From: Mingming Cao <[email protected]>
>
> Ext4 will release the reserved blocks for delalloc
> when inode is truncated/unlinked. If there is no reserved block at all,
> we shouldn't need to do so. But current code still tries to release
the
> reserved blocks regardless whether the counters's value is 0.
> Continue doing that causes the later calculation went wrong and a
> kernel BUG_ON()
> catched that. This doesn't happen for originally extent format file,
> as the calculation
> for 0 reserved blocks was right for extent based file.
>
> This patch fixed the kernel BUG() due to above reason. It adds checks for
0 to
> avoid unnecessary release and fix calculation for non extent files.
>
> Signed-off-by: Mingming Cao <[email protected]>
<TESTED-By: Rishikesh K Rajak [email protected] >
> Index: linux-2.6.27-rc1/fs/ext4/inode.c
> ===================================================================
> --- linux-2.6.27-rc1.orig/fs/ext4/inode.c 2008-08-13 15:29:35.
> 000000000 -0700
> +++ linux-2.6.27-rc1/fs/ext4/inode.c 2008-08-13 16:22:14.000000000
-0700
> @@ -1007,6 +1007,9 @@ static int ext4_indirect_calc_metadata_a
> */
> static int ext4_calc_metadata_amount(struct inode *inode, int blocks)
> {
> + if (!blocks)
> + return 0;
> +
> if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL)
> return ext4_ext_calc_metadata_amount(inode, blocks);
>
> @@ -1553,8 +1556,27 @@ static void ext4_da_release_space(struct
> struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> int total, mdb, mdb_free, release;
>
> + if (!to_free){
> + /* Nothing to release, exit */
> + return;
> + }
> +
> spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
>
> + if (!EXT4_I(inode)->i_reserved_data_blocks){
> + /*
> + * if there is no reserved blocks, but we try to free
some
> + * then the counter is messed up somewhere.
> + * but since this function is called from invalidate
> + * page, it's harmless to return without any action
> + */
> + printk(KERN_INFO "ext4 delalloc try to release %d
reserved"
> + "blocks for inode %lu, but there is no
reserved"
> + "data blocks\n", inode->i_ino, to_free);
> + spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
> + return;
> + }
> +
> /* recalculate the number of metablocks still need to be reserved
*/
> total = EXT4_I(inode)->i_reserved_data_blocks - to_free;
> mdb = ext4_calc_metadata_amount(inode, total);
> @@ -3592,7 +3614,7 @@ void ext4_truncate(struct inode *inode)
> * ext4 *really* writes onto the disk inode.
> */
> ei->i_disksize = inode->i_size;
> -
> +
> if (n == 1) { /* direct blocks */
> ext4_free_data(handle, inode, NULL, i_data+offsets[0],
> i_data + EXT4_NDIR_BLOCKS);
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thursday 14 August 2008, Dragon kumar wrote:
> This mail is from < [email protected]>
> ===============================
>
> Hi Mingming,
>
> Still i am facing the same problem after applying this patch.
>
> Still the same error for me. And it is reproducible.
>
> Step to reproduce:
> - Create lvm partition with more than 30GB
> - Copied 6+ GB of file
> - Tried to remiveing the directory with rm -rf "/mnt/dir_name"
> - Got the call trace as below.
>
> ------------[ cut here ]------------
> kernel BUG at fs/ext4/mballoc.c:3963!
Hi Dragon,
you missed Ted's first patch I think... Watch out for the first reply to my
initial mail.
--
Regards,
Chris
> you missed Ted's first patch I think... Watch out for the first reply to
my
> initial mail.
Thanks christian, It worked fine for me after applying.
- Rishi ( dragon )
> --
> Regards,
> Chris
On Thursday 14 August 2008, Theodore Tso wrote:
> On Wed, Aug 13, 2008 at 05:10:53PM -0700, Mingming Cao wrote:
> > I could reproduce it.
> >
> > This patch works for me on top of Ted's change. Christian, could you
> > try it?
>
> Thanks, mingming! Looks good and I've added it to the patch queue.
08:58:16 eworm@io:~$ uptime
08:58:18 up 37 min, 2 users, load average: 3.57, 2.36, 1.12
This is *a lot* longer than the system survived without the patches. I hope it
is fixed now. Ted and Mingming, thanks a lot!
A short side note... I get warnings when compiling the new code:
fs/ext4/inode.c: In function 'ext4_da_release_space':
fs/ext4/inode.c:1581: warning: format '%d' expects type 'int', but argument 2
has type 'long unsigned int'
fs/ext4/inode.c:1581: warning: format '%lu' expects type 'long unsigned int',
but argument 3 has type 'int'
--
Regards,
Chris
在 2008-08-14四的 08:59 +0200,Christian Hesse写道:
> On Thursday 14 August 2008, Theodore Tso wrote:
> > On Wed, Aug 13, 2008 at 05:10:53PM -0700, Mingming Cao wrote:
> > > I could reproduce it.
> > >
> > > This patch works for me on top of Ted's change. Christian, could you
> > > try it?
> >
> > Thanks, mingming! Looks good and I've added it to the patch queue.
>
> 08:58:16 eworm@io:~$ uptime
> 08:58:18 up 37 min, 2 users, load average: 3.57, 2.36, 1.12
>
> This is *a lot* longer than the system survived without the patches. I hope it
> is fixed now. Ted and Mingming, thanks a lot!
>
> A short side note... I get warnings when compiling the new code:
>
> fs/ext4/inode.c: In function 'ext4_da_release_space':
> fs/ext4/inode.c:1581: warning: format '%d' expects type 'int', but argument 2
> has type 'long unsigned int'
> fs/ext4/inode.c:1581: warning: format '%lu' expects type 'long unsigned int',
> but argument 3 has type 'int'
Thanks to all for quick verification of the fix.
Here is the fix for the compile warning . This compile warning fix has
be fold to the parent patch in ext4 patch queue :
fix-delalloc-release-block-reservation-for-truncate
http://repo.or.cz/w/ext4-patch-queue.git
Thanks,
Ming,ming
Index: linux-2.6.27-rc3/fs/ext4/inode.c
===================================================================
--- linux-2.6.27-rc3.orig/fs/ext4/inode.c 2008-08-14 07:49:14.000000000 -0700
+++ linux-2.6.27-rc3/fs/ext4/inode.c 2008-08-14 07:49:45.000000000 -0700
@@ -1576,7 +1576,7 @@ static void ext4_da_release_space(struct
*/
printk(KERN_INFO "ext4 delalloc try to release %d reserved"
"blocks for inode %lu, but there is no reserved"
- "data blocks\n", inode->i_ino, to_free);
+ "data blocks\n", to_free, inode->i_ino);
spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
return;
}
On Thu, Aug 14, 2008 at 07:58:23AM -0700, Mingming Cao wrote:
>
>
> Index: linux-2.6.27-rc3/fs/ext4/inode.c
> ===================================================================
> --- linux-2.6.27-rc3.orig/fs/ext4/inode.c 2008-08-14 07:49:14.000000000 -0700
> +++ linux-2.6.27-rc3/fs/ext4/inode.c 2008-08-14 07:49:45.000000000 -0700
> @@ -1576,7 +1576,7 @@ static void ext4_da_release_space(struct
> */
> printk(KERN_INFO "ext4 delalloc try to release %d reserved"
> "blocks for inode %lu, but there is no reserved"
> - "data blocks\n", inode->i_ino, to_free);
> + "data blocks\n", to_free, inode->i_ino);
> spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
> return;
> }
I guess you would need a space at the end of each lines.
printk(KERN_INFO "ext4 delalloc try to release %d reserved "
"blocks for inode %lu, but there is no reserved "
"data blocks\n", to_free, inode->i_ino);
-aneesh