2009-09-30 05:25:31

by Theodore Ts'o

[permalink] [raw]
Subject: [GIT PULL] ext4 for v2.6.32 round II

Hi Linus,

Please pull from:

git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git for_linus

to grab the following changes for v2.6.32.

Thanks!!

- Ted

Curt Wohlgemuth (2):
ext4: Make sure ext4_dirty_inode() updates the inode in no journal mode
ext4: Handle nested ext4_journal_start/stop calls without a journal

Frank Mayhar (1):
ext4: Avoid updating the inode table bh twice in no journal mode

Jan Kara (1):
ext4: Update documentation about quota mount options

Josh Stone (1):
ext4: Add a stub for mpage_da_data in the trace header

Mingming Cao (4):
ext4: release reserved quota when block reservation for delalloc retry
ext4: Split uninitialized extents for direct I/O
ext4: Use end_io callback to avoid direct I/O fallback to buffered I/O
ext4: async direct IO for holes and fallocate support

Theodore Ts'o (8):
ext4: Use ext4_msg() for ext4_da_writepage() errors
ext4: Fix hueristic which avoids group preallocation for closed files
ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks
ext4: EXT4_IOC_MOVE_EXT: Check for different original and donor inodes first
ext4, jbd2: Drop unneeded printks at mount and unmount time
ext4: Use tracepoints for mb_history trace file
jbd2: Use tracepoints for history file
ext4: Fix time encoding with extra epoch bits

Documentation/filesystems/ext4.txt | 13 +-
Documentation/filesystems/proc.txt | 1 -
fs/ext4/ext4.h | 54 +++-
fs/ext4/ext4_extents.h | 7 +-
fs/ext4/ext4_jbd2.h | 6 +-
fs/ext4/extents.c | 444 +++++++++++++++++++++++++---
fs/ext4/fsync.c | 5 +
fs/ext4/inode.c | 574 +++++++++++++++++++++++++++++++-----
fs/ext4/mballoc.c | 305 +------------------
fs/ext4/mballoc.h | 35 +---
fs/ext4/migrate.c | 2 +-
fs/ext4/move_extent.c | 20 +-
fs/ext4/namei.c | 3 +-
fs/ext4/super.c | 99 ++++---
fs/jbd2/checkpoint.c | 7 +
fs/jbd2/commit.c | 59 ++--
fs/jbd2/journal.c | 196 +------------
include/linux/jbd2.h | 27 +--
include/trace/events/ext4.h | 178 +++++++++++-
include/trace/events/jbd2.h | 78 +++++
20 files changed, 1362 insertions(+), 751 deletions(-)


2009-10-01 01:41:31

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: [Kernel BUG] ext4 for v2.6.32 round II

On Wed, Sep 30, 2009 at 01:25:31AM -0400, Theodore Ts'o wrote:
>
> Please pull from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git for_linus
>
> to grab the following changes for v2.6.32.
>
>
> Curt Wohlgemuth (2):
> ext4: Make sure ext4_dirty_inode() updates the inode in no journal mode
> ext4: Handle nested ext4_journal_start/stop calls without a journal
>
> Frank Mayhar (1):
> ext4: Avoid updating the inode table bh twice in no journal mode
>
> Jan Kara (1):
> ext4: Update documentation about quota mount options
>
> Josh Stone (1):
> ext4: Add a stub for mpage_da_data in the trace header
>
> Mingming Cao (4):
> ext4: release reserved quota when block reservation for delalloc retry
> ext4: Split uninitialized extents for direct I/O
> ext4: Use end_io callback to avoid direct I/O fallback to buffered I/O
> ext4: async direct IO for holes and fallocate support
>
> Theodore Ts'o (8):
> ext4: Use ext4_msg() for ext4_da_writepage() errors
> ext4: Fix hueristic which avoids group preallocation for closed files
> ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks
> ext4: EXT4_IOC_MOVE_EXT: Check for different original and donor inodes first
> ext4, jbd2: Drop unneeded printks at mount and unmount time
> ext4: Use tracepoints for mb_history trace file
> jbd2: Use tracepoints for history file
> ext4: Fix time encoding with extra epoch bits

Running latest git I get the following kernel BUG message:

------------[ cut here ]------------
Kernel BUG at ffffffff810efa89 [verbose debug info unavailable]
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq
CPU 3
Pid: 1930, comm: flush-8:16 Not tainted 2.6.32-rc2-00134-g84d88d5 #4 System Product Name
RIP: 0010:[<ffffffff810efa89>] [<ffffffff810efa89>] ext4_num_dirty_pages+0x113/0x213
RSP: 0018:ffff88011e7d7a80 EFLAGS: 00010246
RAX: 4000000000020039 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffffea00030497e8 RSI: 0000000000000002 RDI: ffffea000300ad60
RBP: ffff88011e7d7b80 R08: ffffea000300ad68 R09: 0000000000000003
R10: 000000000000000e R11: 0000000000000000 R12: 0000000000000000
R13: ffff88011e7d7ac0 R14: ffff8800df4acee0 R15: ffff88011e7d7b48
FS: 00007eff036f26f0(0000) GS:ffff880028380000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f24c4fa0000 CR3: 000000011e68d000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process flush-8:16 (pid: 1930, threadinfo ffff88011e7d6000, task ffff88011ca2abe0)
Stack:
ffffffff810f174e ffff88011e5808e0 ffffea00030497e8 0000000000008000
<0> ffff88011e7d7ad0 0000000e00000286 ffff88011e7d7ad0 0000000000000000
<0> 000000000000000e 0000000000000000 ffffea00030497e8 ffffea00030497b0
Call Trace:
[<ffffffff810f174e>] ? __mpage_da_writepage+0x0/0x14e
[<ffffffff810f134f>] ext4_da_writepages+0x135/0x48f
[<ffffffff8104c642>] ? find_busiest_group+0x3ea/0x950
[<ffffffff81084178>] do_writepages+0x1c/0x25
[<ffffffff810c07fe>] writeback_single_inode+0xea/0x2e3
[<ffffffff810c14d8>] writeback_inodes_wb+0x443/0x520
[<ffffffff810c16d8>] wb_writeback+0x123/0x1a1
[<ffffffff810c1933>] wb_do_writeback+0x137/0x14d
[<ffffffff810c1974>] bdi_writeback_task+0x2b/0x84
[<ffffffff8108e0f3>] ? bdi_start_fn+0x0/0xcd
[<ffffffff8108e15f>] bdi_start_fn+0x6c/0xcd
[<ffffffff8108e0f3>] ? bdi_start_fn+0x0/0xcd
[<ffffffff810657d7>] kthread+0x7a/0x82
[<ffffffff810292da>] child_rip+0xa/0x20
[<ffffffff8106575d>] ? kthread+0x0/0x82
[<ffffffff810292d0>] ? child_rip+0x0/0x20
Code: 72 18 75 15 48 8b 02 a8 10 74 0e f6 c4 20 75 09 48 8b 4a 20 48 39 d9 74 0d 48 89 d7 e8 3b ea f8 ff 48 89 d9 eb 5b f6 c4 08 75 04 <0f> 0b eb fe 48 8b 72 10 48 89 f0 48 8b 38 f7 c7 00 02 00 00 75
RIP [<ffffffff810efa89>] ext4_num_dirty_pages+0x113/0x213
RSP <ffff88011e7d7a80>
---[ end trace 8a998a2f9968ef87 ]---

or

------------[ cut here ]------------
Kernel BUG at ffffffff810efa89 [verbose debug info unavailable]
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:16/ATK0110:00/hwmon/hwmon0/temp2_input
CPU 1
Pid: 1916, comm: flush-8:16 Not tainted 2.6.32-rc2-00134-g84d88d5 #4 System Product Name
RIP: 0010:[<ffffffff810efa89>] [<ffffffff810efa89>] ext4_num_dirty_pages+0x113/0x213
RSP: 0018:ffff88011cf61a80 EFLAGS: 00010246
RAX: 4000000000020039 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffffea000301bc30 RSI: 0000000000000002 RDI: ffffea000301c058
RBP: ffff88011cf61b80 R08: ffffea000301c060 R09: 0000000000000003
R10: 000000000000000e R11: 0000000000000000 R12: 0000000000000000
R13: ffff88011cf61ac0 R14: ffff8800df4339f0 R15: ffff88011cf61b48
FS: 00007f04ceb40910(0000) GS:ffff880028280000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f04ce33f000 CR3: 000000011bb91000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process flush-8:16 (pid: 1916, threadinfo ffff88011cf60000, task ffff88011e6f2be0)
Stack:
0000000100000001 00000000007745d5 ffffea000301bc30 0000000000008000
<0> ffff88011cf61ad0 0000000e03efe400 ffff88011cf61ad0 0000000000000000
<0> 000000000000000e 0000000000000000 ffffea000301bc30 ffffea000301bca0
Call Trace:
[<ffffffff810f134f>] ext4_da_writepages+0x135/0x48f
[<ffffffff81084178>] do_writepages+0x1c/0x25
[<ffffffff810c07fe>] writeback_single_inode+0xea/0x2e3
[<ffffffff810c14d8>] writeback_inodes_wb+0x443/0x520
[<ffffffff810c16d8>] wb_writeback+0x123/0x1a1
[<ffffffff810c1933>] wb_do_writeback+0x137/0x14d
[<ffffffff810c1974>] bdi_writeback_task+0x2b/0x84
[<ffffffff8108e0f3>] ? bdi_start_fn+0x0/0xcd
[<ffffffff8108e15f>] bdi_start_fn+0x6c/0xcd
[<ffffffff8108e0f3>] ? bdi_start_fn+0x0/0xcd
[<ffffffff810657d7>] kthread+0x7a/0x82
[<ffffffff810292da>] child_rip+0xa/0x20
[<ffffffff8106575d>] ? kthread+0x0/0x82
[<ffffffff810292d0>] ? child_rip+0x0/0x20
Code: 72 18 75 15 48 8b 02 a8 10 74 0e f6 c4 20 75 09 48 8b 4a 20 48 39 d9 74 0d 48 89 d7 e8 3b ea f8 ff 48 89 d9 eb 5b f6 c4 08 75 04 <0f> 0b eb fe 48 8b 72 10 48 89 f0 48 8b 38 f7 c7 00 02 00 00 75
RIP [<ffffffff810efa89>] ext4_num_dirty_pages+0x113/0x213
RSP <ffff88011cf61a80>
---[ end trace 534f3a080f2b5f80 ]---

I haven't bisected yet, but maybe it is obvious to someone what went wrong.
--
Markus

2009-10-01 01:57:42

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Kernel BUG] ext4 for v2.6.32 round II



On Thu, 1 Oct 2009, Markus Trippelsdorf wrote:
>
> Running latest git I get the following kernel BUG message:
>
> ------------[ cut here ]------------
> Kernel BUG at ffffffff810efa89 [verbose debug info unavailable]

You really shouldn't turn off verbose debug info unless you are _really_
tight on memory in some embedded device. It's not that complex, nor does
it use a lot of memory, and it makes the error report much harder to
figure out when the verbose debug output information isn't available.

But in this case I can do it by just looking at the function/offset:

> RIP: 0010:[<ffffffff810efa89>] [<ffffffff810efa89>] ext4_num_dirty_pages+0x113/0x213

The only BUG_ON() that seems relevant in that function (at least with the
config I tried with) ends up being

fs/ext4/inode.c:1184

in my sources, which is from "page_buffers(page)", which has a

BUG_ON(!PagePrivate(page));

in it. And that would have been much easier to figure out if you had had
CONFIG_DEBUG_BUGVERBOSE enabled..

Now over to the ext4 people to actually hopefully _solve_ the bug. Maybe
bisection would help.

Linus

2009-10-01 02:15:29

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: [Kernel BUG] ext4 for v2.6.32 round II

On Wed, Sep 30, 2009 at 06:57:00PM -0700, Linus Torvalds wrote:
>
>
> On Thu, 1 Oct 2009, Markus Trippelsdorf wrote:
> >
> > Running latest git I get the following kernel BUG message:
> >
> > ------------[ cut here ]------------
> > Kernel BUG at ffffffff810efa89 [verbose debug info unavailable]
>
> You really shouldn't turn off verbose debug info unless you are _really_
> tight on memory in some embedded device. It's not that complex, nor does
> it use a lot of memory, and it makes the error report much harder to
> figure out when the verbose debug output information isn't available.
>
> But in this case I can do it by just looking at the function/offset:
>
> > RIP: 0010:[<ffffffff810efa89>] [<ffffffff810efa89>] ext4_num_dirty_pages+0x113/0x213
>
> The only BUG_ON() that seems relevant in that function (at least with the
> config I tried with) ends up being
>
> fs/ext4/inode.c:1184
>
> in my sources, which is from "page_buffers(page)", which has a
>
> BUG_ON(!PagePrivate(page));
>
> in it. And that would have been much easier to figure out if you had had
> CONFIG_DEBUG_BUGVERBOSE enabled..
>

You're right (with CONFIG_DEBUG_BUGVERBOSE=y):

------------[ cut here ]------------
kernel BUG at fs/ext4/inode.c:1184!
...

> Now over to the ext4 people to actually hopefully _solve_ the bug. Maybe
> bisection would help.

--
Markus

2009-10-01 03:02:02

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [Kernel BUG] ext4 for v2.6.32 round II

Hi Markus,

I'm pretty sure the following should fix your problems; can you
confirm?

- Ted

commit 1f94533d9cd75f6d2826018d54a971b9cc085992
Author: Theodore Ts'o <[email protected]>
Date: Wed Sep 30 22:57:41 2009 -0400

ext4: fix a BUG_ON crash by checking that page has buffers attached to it

In ext4_num_dirty_pages() we were calling page_buffers() before
checking to see if the page actually had pages attached to it; this
would cause a BUG check crash in the inline function page_buffers().

Thanks to Markus Trippelsdorf for reporting this bug.

Signed-off-by: "Theodore Ts'o" <[email protected]>

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index ec367bc..6e65d0e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1146,8 +1146,8 @@ static int check_block_validity(struct inode *inode, const char *msg,
}

/*
- * Return the number of dirty pages in the given inode starting at
- * page frame idx.
+ * Return the number of contiguous dirty pages in a given inode
+ * starting at page frame idx.
*/
static pgoff_t ext4_num_dirty_pages(struct inode *inode, pgoff_t idx,
unsigned int max_pages)
@@ -1181,15 +1181,15 @@ static pgoff_t ext4_num_dirty_pages(struct inode *inode, pgoff_t idx,
unlock_page(page);
break;
}
- head = page_buffers(page);
- bh = head;
- do {
- if (!buffer_delay(bh) &&
- !buffer_unwritten(bh)) {
- done = 1;
- break;
- }
- } while ((bh = bh->b_this_page) != head);
+ if (page_has_buffers(page)) {
+ bh = head = page_buffers(page);
+ do {
+ if (!buffer_delay(bh) &&
+ !buffer_unwritten(bh))
+ done = 1;
+ bh = bh->b_this_page;
+ } while (!done && (bh != head));
+ }
unlock_page(page);
if (done)
break;

2009-10-01 03:47:15

by Markus Trippelsdorf

[permalink] [raw]
Subject: Re: [Kernel BUG] ext4 for v2.6.32 round II

On Wed, Sep 30, 2009 at 11:01:51PM -0400, Theodore Tso wrote:
> Hi Markus,
>
> I'm pretty sure the following should fix your problems; can you
> confirm?

Yes, everything is fine again.
Thank you.

--
Markus

2009-10-01 06:40:54

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [Kernel BUG] ext4 for v2.6.32 round II

On Thu, Oct 01, 2009 at 05:47:16AM +0200, Markus Trippelsdorf wrote:
> On Wed, Sep 30, 2009 at 11:01:51PM -0400, Theodore Tso wrote:
> > Hi Markus,
> >
> > I'm pretty sure the following should fix your problems; can you
> > confirm?
>
> Yes, everything is fine again.

Linus, can you pull from:

git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git for_linus

This will pick up the fix to the problem Markus reported, plus some
old code that we had promised would be removed for 2.6.31.

- Ted

Eric Sandeen (1):
ext4: drop ext4dev compat

Theodore Ts'o (1):
ext4: fix a BUG_ON crash by checking that page has buffers attached to it

fs/ext4/Kconfig | 14 --------------
fs/ext4/inode.c | 22 +++++++++++-----------
fs/ext4/super.c | 31 -------------------------------
3 files changed, 11 insertions(+), 56 deletions(-)