2009-03-05 07:34:13

by Theodore Ts'o

[permalink] [raw]
Subject: [STABLE, 2.6.27.y] ext4: Add fallback for find_group_flex

This is a workaround for find_group_flex() which badly needs to be
replaced. One of its problems (besides ignoring the Orlov algorithm)
is that it is a bit hyperactive about returning failure under
suspicious circumstances. This can lead to spurious ENOSPC failures
even when there are inodes still available.

Work around this for now by retrying the search using
find_group_other() if find_group_flex() returns -1. If
find_group_other() succeeds when find_group_flex() has failed, log a
warning message.

A better block/inode allocator that will fix this problem for real has
been queued up for the next merge window.

Signed-off-by: "Theodore Ts'o" <[email protected]>
(cherry picked from commit 05bf9e839d9de4e8a094274a0a2fd07beb47eaf1)
---
fs/ext4/ialloc.c | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index b994854..cce841f 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -702,6 +702,13 @@ struct inode *ext4_new_inode(handle_t *handle, struct inode * dir, int mode)

if (sbi->s_log_groups_per_flex) {
ret2 = find_group_flex(sb, dir, &group);
+ if (ret2 == -1) {
+ ret2 = find_group_other(sb, dir, &group);
+ if (ret2 == 0 && printk_ratelimit())
+ printk(KERN_NOTICE "ext4: find_group_flex "
+ "failed, fallback succeeded dir %lu\n",
+ dir->i_ino);
+ }
goto got_group;
}

--
1.5.6.3



2009-03-05 07:34:13

by Theodore Ts'o

[permalink] [raw]
Subject: [STABLE, 2.6.27.y] ext4: Fix deadlock in ext4_write_begin() and ext4_da_write_begin()

From: Jan Kara <[email protected]>

Functions ext4_write_begin() and ext4_da_write_begin() call
grab_cache_page_write_begin() without AOP_FLAG_NOFS. Thus it
can happen that page reclaim is triggered in that function
and it recurses back into the filesystem (or some other filesystem).
But this can lead to various problems as a transaction is already
started at that point. Add the necessary flag.

http://bugzilla.kernel.org/show_bug.cgi?id=11688

Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
(cherry picked from commit ebd3610b110bbb18ea6f9f2aeed1e1068c537227)
---
fs/ext4/inode.c | 9 ++++++++-
1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 7b063d4..b233ade 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1372,6 +1372,10 @@ retry:
goto out;
}

+ /* We cannot recurse into the filesystem as the transaction is already
+ * started */
+ flags |= AOP_FLAG_NOFS;
+
page = grab_cache_page_write_begin(mapping, index, flags);
if (!page) {
ext4_journal_stop(handle);
@@ -1381,7 +1385,7 @@ retry:
*pagep = page;

ret = block_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
- ext4_get_block);
+ ext4_get_block);

if (!ret && ext4_should_journal_data(inode)) {
ret = walk_page_buffers(handle, page_buffers(page),
@@ -2465,6 +2469,9 @@ retry:
ret = PTR_ERR(handle);
goto out;
}
+ /* We cannot recurse into the filesystem as the transaction is already
+ * started */
+ flags |= AOP_FLAG_NOFS;

page = grab_cache_page_write_begin(mapping, index, flags);
if (!page) {
--
1.5.6.3


2009-03-13 06:37:53

by Greg KH

[permalink] [raw]
Subject: patch ext4-add-fallback-for-find_group_flex.patch added to 2.6.27-stable tree


This is a note to let you know that we have just queued up the patch titled

Subject: ext4: Add fallback for find_group_flex

to the 2.6.27-stable tree. Its filename is

ext4-add-fallback-for-find_group_flex.patch

A git repo of this tree can be found at
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary


>From [email protected] Thu Mar 12 23:32:27 2009
From: "Theodore Ts'o" <[email protected]>
Date: Thu, 5 Mar 2009 02:34:06 -0500
Subject: ext4: Add fallback for find_group_flex
To: [email protected]
Cc: Ext4 Developers List <[email protected]>, "Theodore Ts'o" <[email protected]>
Message-ID: <[email protected]>

From: "Theodore Ts'o" <[email protected]>

(cherry picked from commit 05bf9e839d9de4e8a094274a0a2fd07beb47eaf1)

This is a workaround for find_group_flex() which badly needs to be
replaced. One of its problems (besides ignoring the Orlov algorithm)
is that it is a bit hyperactive about returning failure under
suspicious circumstances. This can lead to spurious ENOSPC failures
even when there are inodes still available.

Work around this for now by retrying the search using
find_group_other() if find_group_flex() returns -1. If
find_group_other() succeeds when find_group_flex() has failed, log a
warning message.

A better block/inode allocator that will fix this problem for real has
been queued up for the next merge window.

Signed-off-by: "Theodore Ts'o" <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/ext4/ialloc.c | 7 +++++++
1 file changed, 7 insertions(+)

--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -702,6 +702,13 @@ struct inode *ext4_new_inode(handle_t *h

if (sbi->s_log_groups_per_flex) {
ret2 = find_group_flex(sb, dir, &group);
+ if (ret2 == -1) {
+ ret2 = find_group_other(sb, dir, &group);
+ if (ret2 == 0 && printk_ratelimit())
+ printk(KERN_NOTICE "ext4: find_group_flex "
+ "failed, fallback succeeded dir %lu\n",
+ dir->i_ino);
+ }
goto got_group;
}



Patches currently in stable-queue which might be from [email protected] are

queue-2.6.27/jbd2-fix-return-value-of-jbd2_journal_start_commit.patch
queue-2.6.27/revert-ext4-wait-on-all-pending-commits-in-ext4_sync_fs.patch
queue-2.6.27/jbd2-avoid-possible-null-dereference-in-jbd2_journal_begin_ordered_truncate.patch
queue-2.6.27/ext4-fix-to-read-empty-directory-blocks-correctly-in-64k.patch
queue-2.6.27/ext4-fix-lockdep-warning.patch
queue-2.6.27/ext4-initialize-preallocation-list_head-s-properly.patch
queue-2.6.27/ext4-fix-null-dereference-in-ext4_ext_migrate-s-error-handling.patch
queue-2.6.27/ext4-add-fallback-for-find_group_flex.patch
queue-2.6.27/ext4-fix-deadlock-in-ext4_write_begin-and-ext4_da_write_begin.patch

2009-03-13 06:37:55

by Greg KH

[permalink] [raw]
Subject: patch ext4-fix-deadlock-in-ext4_write_begin-and-ext4_da_write_begin.patch added to 2.6.27-stable tree


This is a note to let you know that we have just queued up the patch titled

Subject: ext4: Fix deadlock in ext4_write_begin() and ext4_da_write_begin()

to the 2.6.27-stable tree. Its filename is

ext4-fix-deadlock-in-ext4_write_begin-and-ext4_da_write_begin.patch

A git repo of this tree can be found at
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary


>From [email protected] Thu Mar 12 23:32:47 2009
From: Jan Kara <[email protected]>
Date: Thu, 5 Mar 2009 02:34:07 -0500
Subject: ext4: Fix deadlock in ext4_write_begin() and ext4_da_write_begin()
To: [email protected]
Cc: "Theodore Ts'o" <[email protected]>, Ext4 Developers List <[email protected]>, Jan Kara <[email protected]>
Message-ID: <[email protected]>

From: Jan Kara <[email protected]>

(cherry picked from commit ebd3610b110bbb18ea6f9f2aeed1e1068c537227)

Functions ext4_write_begin() and ext4_da_write_begin() call
grab_cache_page_write_begin() without AOP_FLAG_NOFS. Thus it
can happen that page reclaim is triggered in that function
and it recurses back into the filesystem (or some other filesystem).
But this can lead to various problems as a transaction is already
started at that point. Add the necessary flag.

http://bugzilla.kernel.org/show_bug.cgi?id=11688

Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/ext4/inode.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1372,6 +1372,10 @@ retry:
goto out;
}

+ /* We cannot recurse into the filesystem as the transaction is already
+ * started */
+ flags |= AOP_FLAG_NOFS;
+
page = grab_cache_page_write_begin(mapping, index, flags);
if (!page) {
ext4_journal_stop(handle);
@@ -1381,7 +1385,7 @@ retry:
*pagep = page;

ret = block_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
- ext4_get_block);
+ ext4_get_block);

if (!ret && ext4_should_journal_data(inode)) {
ret = walk_page_buffers(handle, page_buffers(page),
@@ -2465,6 +2469,9 @@ retry:
ret = PTR_ERR(handle);
goto out;
}
+ /* We cannot recurse into the filesystem as the transaction is already
+ * started */
+ flags |= AOP_FLAG_NOFS;

page = grab_cache_page_write_begin(mapping, index, flags);
if (!page) {


Patches currently in stable-queue which might be from [email protected] are

queue-2.6.27/fs-new-inode-i_state-corruption-fix.patch
queue-2.6.27/jbd2-fix-return-value-of-jbd2_journal_start_commit.patch
queue-2.6.27/revert-ext4-wait-on-all-pending-commits-in-ext4_sync_fs.patch
queue-2.6.27/jbd2-avoid-possible-null-dereference-in-jbd2_journal_begin_ordered_truncate.patch
queue-2.6.27/ext4-fix-deadlock-in-ext4_write_begin-and-ext4_da_write_begin.patch