LinuxLists.cc - [Bug 15018] New: ext4 backtraces out of nowhere

2010-01-09 19:01:10

Subject: [Bug 15018] New: ext4 backtraces out of nowhere

http://bugzilla.kernel.org/show_bug.cgi?id=15018

Summary: ext4 backtraces out of nowhere
Product: File System
Version: 2.5
Kernel Version: 2.6.33-rc3
Platform: All
OS/Version: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: ext4
AssignedTo: [email protected]
ReportedBy: [email protected]
Regression: No

Hi,

While doing an rsync to an ext4 filesystem, suddenly the above showed up in my
dmesg:

Jan 9 15:55:20 pannekake kernel: [61568.551238] Pid: 28573, comm: rsync Not
tainted 2.6.33-rc3 #2
Jan 9 15:55:20 pannekake kernel: [61568.557085] Call Trace:
Jan 9 15:55:20 pannekake kernel: [61568.559623] [<ffffffffa0253aa1>]
ext4_write_inode+0x4d/0xc7 [ext4]
Jan 9 15:55:20 pannekake kernel: [61568.566074] [<ffffffff810dabe7>]
writeback_single_inode+0x125/0x2df
Jan 9 15:55:20 pannekake kernel: [61568.572563] [<ffffffff810dae48>]
write_inode_now+0x78/0xba
Jan 9 15:55:20 pannekake kernel: [61568.578287] [<ffffffffa0255b2d>]
ext4_da_get_block_prep+0x28b/0x36e [ext4]
Jan 9 15:55:20 pannekake kernel: [61568.585367] [<ffffffff810e1b69>]
__block_prepare_write+0x1f3/0x3c0
Jan 9 15:55:20 pannekake kernel: [61568.591792] [<ffffffffa02558a2>] ?
ext4_da_get_block_prep+0x0/0x36e [ext4]
Jan 9 15:55:20 pannekake kernel: [61568.598903] [<ffffffff810e1eb1>]
block_write_begin+0x80/0xcf
Jan 9 15:55:20 pannekake kernel: [61568.604840] [<ffffffffa02587be>]
ext4_da_write_begin+0x196/0x218 [ext4]
Jan 9 15:55:20 pannekake kernel: [61568.611683] [<ffffffffa02558a2>] ?
ext4_da_get_block_prep+0x0/0x36e [ext4]
Jan 9 15:55:20 pannekake kernel: [61568.618803] [<ffffffff810911c1>] ?
iov_iter_copy_from_user_atomic+0x84/0xf6
Jan 9 15:55:20 pannekake kernel: [61568.626052] [<ffffffff8109133f>]
generic_file_buffered_write+0x10c/0x230
Jan 9 15:55:20 pannekake kernel: [61568.632961] [<ffffffffa025375b>] ?
ext4_dirty_inode+0x40/0x45 [ext4]
Jan 9 15:55:20 pannekake kernel: [61568.639536] [<ffffffff81092b9c>]
__generic_file_aio_write+0x347/0x37b
Jan 9 15:55:20 pannekake kernel: [61568.646245] [<ffffffff81302709>] ?
__mutex_lock_slowpath+0x22d/0x23c
Jan 9 15:55:20 pannekake kernel: [61568.652819] [<ffffffff81092c2b>]
generic_file_aio_write+0x5b/0xa2
Jan 9 15:55:20 pannekake kernel: [61568.659194] [<ffffffffa024fa93>]
ext4_file_write+0x8e/0x97 [ext4]
Jan 9 15:55:20 pannekake kernel: [61568.665577] [<ffffffff810bfbc8>]
do_sync_write+0xc0/0x106
Jan 9 15:55:20 pannekake kernel: [61568.671195] [<ffffffff810cdb68>] ?
poll_select_copy_remaining+0xc5/0xe9
Jan 9 15:55:20 pannekake kernel: [61568.678015] [<ffffffff81125b50>] ?
security_file_permission+0x11/0x13
Jan 9 15:55:20 pannekake kernel: [61568.684678] [<ffffffff810c04df>]
vfs_write+0xae/0x14a
Jan 9 15:55:20 pannekake kernel: [61568.689954] [<ffffffff810c063f>]
sys_write+0x47/0x6e
Jan 9 15:55:20 pannekake kernel: [61568.695111] [<ffffffff81001feb>]
system_call_fastpath+0x16/0x1b

It showed up again ~20 minutes later, and since then I haven't seen it. I'm a
bit puzzled since I don't actually see any error messages (just the backtrace),
and I'm not able to find any ill effects except for the traceback, but I guess
it still shouldn't happen.

FWIW, the ext4 is on an LVM whose one of the component RAID-5s was resyncing at
the time. I don't know if that is relevant or not.

--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-01-10 10:46:03

by Andreas Dilger

[permalink] [raw]

Subject: Re: [Bug 15018] New: ext4 backtraces out of nowhere

On 2010-01-09, at 12:01, [email protected] wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=15018
>
> While doing an rsync to an ext4 filesystem, suddenly the above
> showed up in my dmesg:
>
> Jan 9 15:55:20 pannekake kernel: [61568.551238] Pid: 28573, comm:
> rsync Not
> tainted 2.6.33-rc3 #2
> Jan 9 15:55:20 pannekake kernel: [61568.557085] Call Trace:
> Jan 9 15:55:20 pannekake kernel: [61568.559623] [<ffffffffa0253aa1>]
> ext4_write_inode+0x4d/0xc7 [ext4]
>
> It showed up again ~20 minutes later, and since then I haven't seen
> it. I'm a bit puzzled since I don't actually see any error messages
> (just the backtrace), and I'm not able to find any ill effects
> except for the traceback, but I guess it still shouldn't happen.
>
> FWIW, the ext4 is on an LVM whose one of the component RAID-5s was
> resyncing at the time. I don't know if that is relevant or not.

In ext4_write_inode it has a dump_stack() in the case where
ext4_journal_current_handle() is non-NULL.

This is a bug, because later on it calls ext4_force_commit(), which
would end up blocking on the journal commit to finish, which is
blocked waiting on the journal handle held by the selfsame thread that
started the journal commit in the first place.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

2010-01-10 10:46:04

by bugzilla-daemon

[permalink] [raw]

Subject: [Bug 15018] ext4 backtraces out of nowhere

http://bugzilla.kernel.org/show_bug.cgi?id=15018

--- Comment #1 from Anonymous Emailer <[email protected]> 2010-01-10 10:46:03 ---
Reply-To: [email protected]

On 2010-01-09, at 12:01, [email protected] wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=15018
>
> While doing an rsync to an ext4 filesystem, suddenly the above
> showed up in my dmesg:
>
> Jan 9 15:55:20 pannekake kernel: [61568.551238] Pid: 28573, comm:
> rsync Not
> tainted 2.6.33-rc3 #2
> Jan 9 15:55:20 pannekake kernel: [61568.557085] Call Trace:
> Jan 9 15:55:20 pannekake kernel: [61568.559623] [<ffffffffa0253aa1>]
> ext4_write_inode+0x4d/0xc7 [ext4]
>
> It showed up again ~20 minutes later, and since then I haven't seen
> it. I'm a bit puzzled since I don't actually see any error messages
> (just the backtrace), and I'm not able to find any ill effects
> except for the traceback, but I guess it still shouldn't happen.
>
> FWIW, the ext4 is on an LVM whose one of the component RAID-5s was
> resyncing at the time. I don't know if that is relevant or not.

In ext4_write_inode it has a dump_stack() in the case where
ext4_journal_current_handle() is non-NULL.

This is a bug, because later on it calls ext4_force_commit(), which
would end up blocking on the journal commit to finish, which is
blocked waiting on the journal handle held by the selfsame thread that
started the journal commit in the first place.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-01-18 23:06:11

by bugzilla-daemon

[permalink] [raw]

Subject: [Bug 15018] ext4 backtraces out of nowhere

http://bugzilla.kernel.org/show_bug.cgi?id=15018

Jan Kara <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |[email protected], [email protected]

--- Comment #2 from Jan Kara <[email protected]> 2010-01-18 23:06:07 ---
Ah, inlining and such stuff made reading this stack trace quite difficult.
Anyway, the problem is that we just should not call write_inode_now() in
ext4_da_reserve_space() when we have transaction already started. And as
Andreas correctly points out, the patch even introduces a deadlock because it
can call write_inode_now with wait==1. Moreover we hold page lock when holding
write_inode_now which introduces a plenty of nasty locking issues I believe.
The bug has been introduced by commit 0637c6f4135f592f094207c7c21e7c0fc5557834.

Ted, I think that if we are failing to do the write because of ENOSPC, the only
reasonable non-deadlocky thing to do is to restart the write completely - i.e.,
bail out up to ext4_da_write_begin, stop the transaction, drop the page, *do
some magic*, and start again...

--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-01-20 17:10:31

by Aneesh Kumar K.V

[permalink] [raw]

Subject: Re: [Bug 15018] ext4 backtraces out of nowhere

On Mon, 18 Jan 2010 23:06:08 GMT, [email protected] wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=15018
>
>
> Jan Kara <[email protected]> changed:
>
> What |Removed |Added
> ----------------------------------------------------------------------------
> CC| |[email protected], [email protected]
>
>
>
>
> --- Comment #2 from Jan Kara <[email protected]> 2010-01-18 23:06:07 ---
> Ah, inlining and such stuff made reading this stack trace quite difficult.
> Anyway, the problem is that we just should not call write_inode_now() in
> ext4_da_reserve_space() when we have transaction already started. And as
> Andreas correctly points out, the patch even introduces a deadlock because it
> can call write_inode_now with wait==1. Moreover we hold page lock when holding
> write_inode_now which introduces a plenty of nasty locking issues I believe.
> The bug has been introduced by commit 0637c6f4135f592f094207c7c21e7c0fc5557834.
>
> Ted, I think that if we are failing to do the write because of ENOSPC, the only
> reasonable non-deadlocky thing to do is to restart the write completely - i.e.,
> bail out up to ext4_da_write_begin, stop the transaction, drop the page, *do
> some magic*, and start again...
>

How about the below. I am yet to test the change

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c818972..5af483a 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1835,24 +1835,12 @@ repeat:
* later. Real quota accounting is done at pages writeout
* time.
*/
- if (vfs_dq_reserve_block(inode, md_needed + 1)) {
- /*
- * We tend to badly over-estimate the amount of
- * metadata blocks which are needed, so if we have
- * reserved any metadata blocks, try to force out the
- * inode and see if we have any better luck.
- */
- if (md_reserved && retries++ <= 3)
- goto retry;
+ if (vfs_dq_reserve_block(inode, md_needed + 1))
return -EDQUOT;
- }

if (ext4_claim_free_blocks(sbi, md_needed + 1)) {
vfs_dq_release_reservation_block(inode, md_needed + 1);
if (ext4_should_retry_alloc(inode->i_sb, &retries)) {
- retry:
- if (md_reserved)
- write_inode_now(inode, (retries == 3));
yield();
goto repeat;
}
@@ -3032,7 +3020,7 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned flags,
struct page **pagep, void **fsdata)
{
- int ret, retries = 0;
+ int ret, retries = 0, quota_retries = 0;
struct page *page;
pgoff_t index;
unsigned from, to;
@@ -3091,6 +3079,21 @@ retry:

if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
goto retry;
+
+ if ((ret == -EDQUOT) && (quota_retries <= 3)) {
+ /*
+ * Since we do aggressive quota reservation may get no quota error
+ * even though we may be having quota. Forcing the inode write will
+ * free up the extra quota reserved. We do it only if we have
+ * meta data blocks reserved because we do aggressive reservation only
+ * for meta data blocks.
+ */
+ if (EXT4_I(inode)->i_reserved_meta_blocks) {
+ write_inode_now(inode, (quota_retries == 3));
+ quota_retries++;
+ goto retry;
+ }
+ }
out:
return ret;
}

2010-01-20 17:10:33

by bugzilla-daemon

[permalink] [raw]

Subject: [Bug 15018] ext4 backtraces out of nowhere

http://bugzilla.kernel.org/show_bug.cgi?id=15018

--- Comment #3 from Aneesh Kumar K.V <[email protected]> 2010-01-20 17:10:31 ---
On Mon, 18 Jan 2010 23:06:08 GMT, [email protected] wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=15018
>
>
> Jan Kara <[email protected]> changed:
>
> What |Removed |Added
> ----------------------------------------------------------------------------
> CC| |[email protected], [email protected]
>
>
>
>
> --- Comment #2 from Jan Kara <[email protected]> 2010-01-18 23:06:07 ---
> Ah, inlining and such stuff made reading this stack trace quite difficult.
> Anyway, the problem is that we just should not call write_inode_now() in
> ext4_da_reserve_space() when we have transaction already started. And as
> Andreas correctly points out, the patch even introduces a deadlock because it
> can call write_inode_now with wait==1. Moreover we hold page lock when holding
> write_inode_now which introduces a plenty of nasty locking issues I believe.
> The bug has been introduced by commit 0637c6f4135f592f094207c7c21e7c0fc5557834.
>
> Ted, I think that if we are failing to do the write because of ENOSPC, the only
> reasonable non-deadlocky thing to do is to restart the write completely - i.e.,
> bail out up to ext4_da_write_begin, stop the transaction, drop the page, *do
> some magic*, and start again...
>

How about the below. I am yet to test the change

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c818972..5af483a 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1835,24 +1835,12 @@ repeat:
* later. Real quota accounting is done at pages writeout
* time.
*/
- if (vfs_dq_reserve_block(inode, md_needed + 1)) {
- /*
- * We tend to badly over-estimate the amount of
- * metadata blocks which are needed, so if we have
- * reserved any metadata blocks, try to force out the
- * inode and see if we have any better luck.
- */
- if (md_reserved && retries++ <= 3)
- goto retry;
+ if (vfs_dq_reserve_block(inode, md_needed + 1))
return -EDQUOT;
- }

if (ext4_claim_free_blocks(sbi, md_needed + 1)) {
vfs_dq_release_reservation_block(inode, md_needed + 1);
if (ext4_should_retry_alloc(inode->i_sb, &retries)) {
- retry:
- if (md_reserved)
- write_inode_now(inode, (retries == 3));
yield();
goto repeat;
}
@@ -3032,7 +3020,7 @@ static int ext4_da_write_begin(struct file *file, struct
address_space *mapping,
loff_t pos, unsigned len, unsigned flags,
struct page **pagep, void **fsdata)
{
- int ret, retries = 0;
+ int ret, retries = 0, quota_retries = 0;
struct page *page;
pgoff_t index;
unsigned from, to;
@@ -3091,6 +3079,21 @@ retry:

if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
goto retry;
+
+ if ((ret == -EDQUOT) && (quota_retries <= 3)) {
+ /*
+ * Since we do aggressive quota reservation may get no quota error
+ * even though we may be having quota. Forcing the inode write will
+ * free up the extra quota reserved. We do it only if we have
+ * meta data blocks reserved because we do aggressive reservation only
+ * for meta data blocks.
+ */
+ if (EXT4_I(inode)->i_reserved_meta_blocks) {
+ write_inode_now(inode, (quota_retries == 3));
+ quota_retries++;
+ goto retry;
+ }
+ }
out:
return ret;
}

--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-01-26 23:19:24

by bugzilla-daemon

[permalink] [raw]

Subject: [Bug 15018] ext4 backtraces out of nowhere

http://bugzilla.kernel.org/show_bug.cgi?id=15018

--- Comment #4 from Jan Kara <[email protected]> 2010-01-26 23:19:18 ---
Hmm, I don't like the code in ext4_da_reserve_space even after your patch - the
retries are duplicated with the ones from ext4_da_write_begin and calling
write_inode_now from ext4_da_write_begin doesn't make too much sense either -
why not write all inodes with delayed allocation if we want to write something?

Actually, instead of retrying delayed allocation, I think it makes more sense
to just fall back to normal allocation. Attached patch does that and it also
allows me to completely fill user's quota.

--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-01-26 23:20:21

by bugzilla-daemon

[permalink] [raw]

Subject: [Bug 15018] ext4 backtraces out of nowhere

http://bugzilla.kernel.org/show_bug.cgi?id=15018

--- Comment #5 from Jan Kara <[email protected]> 2010-01-26 23:20:19 ---
Created an attachment (id=24739)
--> (http://bugzilla.kernel.org/attachment.cgi?id=24739)
Patch fixing deadlock in ext4_da_reserve_space

--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-01-27 09:46:37

by Aneesh Kumar K.V

[permalink] [raw]

Subject: Re: [Bug 15018] ext4 backtraces out of nowhere

On Tue, 26 Jan 2010 23:19:23 GMT, [email protected] wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=15018
>
>
>
>
>
> --- Comment #4 from Jan Kara <[email protected]> 2010-01-26 23:19:18 ---
> Hmm, I don't like the code in ext4_da_reserve_space even after your patch - the
> retries are duplicated with the ones from ext4_da_write_begin and calling
> write_inode_now from ext4_da_write_begin doesn't make too much sense either -
> why not write all inodes with delayed allocation if we want to write something?
>
> Actually, instead of retrying delayed allocation, I think it makes more sense
> to just fall back to normal allocation. Attached patch does that and it also
> allows me to completely fill user's quota.
>

Switching to non delayed allocation may not guarantee that we succeed in block
allocation. For quota we really need to flush this inode. Even if we
flush other inodes of other user we may still get a quota error

-aneesh

2010-01-27 09:46:44

by bugzilla-daemon

[permalink] [raw]

Subject: [Bug 15018] ext4 backtraces out of nowhere

http://bugzilla.kernel.org/show_bug.cgi?id=15018

--- Comment #6 from Aneesh Kumar K.V <[email protected]> 2010-01-27 09:46:39 ---
On Tue, 26 Jan 2010 23:19:23 GMT, [email protected] wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=15018
>
>
>
>
>
> --- Comment #4 from Jan Kara <[email protected]> 2010-01-26 23:19:18 ---
> Hmm, I don't like the code in ext4_da_reserve_space even after your patch - the
> retries are duplicated with the ones from ext4_da_write_begin and calling
> write_inode_now from ext4_da_write_begin doesn't make too much sense either -
> why not write all inodes with delayed allocation if we want to write something?
>
> Actually, instead of retrying delayed allocation, I think it makes more sense
> to just fall back to normal allocation. Attached patch does that and it also
> allows me to completely fill user's quota.
>

Switching to non delayed allocation may not guarantee that we succeed in block
allocation. For quota we really need to flush this inode. Even if we
flush other inodes of other user we may still get a quota error

-aneesh

--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-01-27 11:51:49

by bugzilla-daemon

[permalink] [raw]

Subject: [Bug 15018] ext4 backtraces out of nowhere

http://bugzilla.kernel.org/show_bug.cgi?id=15018

--- Comment #7 from Jan Kara <[email protected]> 2010-01-27 11:51:47 ---
Aneesh, I agree that switching to non-delayed allocation does not guarantee a
success in block allocation. But flushing inode does not guarantee that either
(and IMHO does not bring us that much). The user might have other inodes which
have delayed-allocated writes still pending and overestimation of reserved
space on these inodes causes EDQUOT. So the ultimately correct way would be to
flush all user's inodes but I'm not sure we want to do that from
ext4_da_write_begin and it wouldn't be an easy thing to do. A middle ground
might be to call writeback_inodes_sb_if_idle (similarly as we do that from
ext4_nonda_switch) when we hit EDQUOT but I'm not too satisfied with that
either...

--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-03-18 11:11:54

by bugzilla-daemon

[permalink] [raw]