LinuxLists.cc - ext3: kernel BUG at fs/jbd/journal.c:412!

2008-11-06 15:58:48

Subject: ext3: kernel BUG at fs/jbd/journal.c:412!

I get this with ext4-patchqueue. I guess we have some ext3 patches queued there.

------------[ cut here ]------------
kernel BUG at fs/jbd/journal.c:412!
invalid opcode: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.2/0000:05:01.1/irq
Modules linked in: autofs4 hidp rfkill input_polldev sbs sbshc battery ac parport_pc lp parport i6300esb i2c_i801 i2c_core e752x_edac edac_core tg3 libphy qla2xxx scsi_transport_fc dm_multipath dm_mirror dm_region_hash dm_log dm_mod loop xt_tcpudp ip6t_REJECT ipv6 ipt_REJECT x_tables sunrpc rfcomm bnep l2cap bluetooth bridge stp sg rtc_cmos rtc_core rtc_lib pcspkr button ata_piix libata mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: ip_tables]

Pid: 29321, comm: fsstress Not tainted (2.6.28-rc3-autokern1 #2) IBM BladeCenter HS20 -[88432RG]-
EIP: 0060:[<c88808a0>] EFLAGS: 00210246 CPU: 0
EIP is at __log_space_left+0x15/0x2d [jbd]
EAX: 000000dd EBX: 00002000 ECX: 000024b1 EDX: 0000dddd
ESI: c68fc800 EDI: 00000000 EBP: c1a8cef8 ESP: c1a8cef8
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process fsstress (pid: 29321, ti=c1a8c000 task=c375b160 task.ti=c1a8c000)
Stack:
c1a8cf10 c88800bd 00000000 c6df1b3c c6df1b00 c68fc800 c1a8cf4c c887ce08
c4e38288 c68fc800 c68fc814 00000001 00000000 c1a8cf4c c044333c c4e3829c
c1a8cf50 00200282 c4e38288 c68fc800 c4e3829c c1a8cf70 c887ceee 00000000
Call Trace:
[<c88800bd>] ? __log_wait_for_space+0xdd/0x137 [jbd]
[<c887ce08>] ? start_this_handle+0x2de/0x31d [jbd]
[<c044333c>] ? lockdep_init_map+0x74/0x317
[<c887ceee>] ? journal_start+0xa7/0xf2 [jbd]
[<c887cf46>] ? journal_force_commit+0xd/0x1f [jbd]
[<c88c069c>] ? ext3_force_commit+0x22/0x24 [ext3]
[<c88c0f82>] ? ext3_sync_fs+0x10/0x29 [ext3]
[<c0482e6c>] ? sync_filesystems+0xa4/0xe9
[<c0499c4f>] ? do_sync+0x31/0x5a
[<c0499c85>] ? sys_sync+0xd/0x14
[<c0403a65>] ? sysenter_do_call+0x12/0x31
Code: 88 c8 00 00 00 00 a1 a0 82 88 c8 85 c0 74 05 e8 46 d0 bf f7 5d c3 55 8b 50 14 8b 88 cc 01 00 00 89 e5 89 d0 c1 f8 08 38 d0 75 04 <0f> 0b eb fe 8d 51 e0 31 c0 85 d2 7e 09 89 d0 c1 f8 03 29 c2 89
EIP: [<c88808a0>] __log_space_left+0x15/0x2d [jbd] SS:ESP 0068:c1a8cef8
---[ end trace e677f87f47dd07ca ]---

2008-11-06 16:12:28

by Arthur Jones

[permalink] [raw]

Subject: Re: ext3: kernel BUG at fs/jbd/journal.c:412!

Hi Aneesh, ...

On Thu, Nov 06, 2008 at 07:50:25AM -0800, Aneesh Kumar K.V wrote:
> I get this with ext4-patchqueue. I guess we have some ext3 patches queued there.

This could be related to a patch I just posted.

I'd like to try to reproduce this. What is the
ext4-patchqueue? What were you running that caused
this to pop up?

See the thread on linux-ext4 called "ext3: slow symlink corruption
on umount" for details on how this patch came about...

Thanks...

Arthur

2008-11-06 16:51:16

by Aneesh Kumar K.V

[permalink] [raw]

Subject: Re: ext3: kernel BUG at fs/jbd/journal.c:412!

2008-11-06 17:13:27

by Theodore Ts'o

[permalink] [raw]

Subject: Re: ext3: kernel BUG at fs/jbd/journal.c:412!

On Thu, Nov 06, 2008 at 08:12:27AM -0800, Arthur Jones wrote:
> Hi Aneesh, ...
>
> On Thu, Nov 06, 2008 at 07:50:25AM -0800, Aneesh Kumar K.V wrote:
> > I get this with ext4-patchqueue. I guess we have some ext3 patches queued there.
>
> This could be related to a patch I just posted.
>
> See the thread on linux-ext4 called "ext3: slow symlink corruption
> on umount" for details on how this patch came about...

No this is the other ext3 patch we have in the patch tree. I see the
problem, we're calling __log_space_left in a diagnostic printk after
we've released the j_state_lock. Here's the incremental fix:

diff --git a/fs/jbd/checkpoint.c b/fs/jbd/checkpoint.c
index 5e856de..18e5137 100644
--- a/fs/jbd/checkpoint.c
+++ b/fs/jbd/checkpoint.c
@@ -115,7 +115,7 @@ static int __try_to_free_cp_buf(struct journal_head *jh)
*/
void __log_wait_for_space(journal_t *journal)
{
- int nblocks;
+ int nblocks, space_left;
assert_spin_locked(&journal->j_state_lock);

nblocks = jbd_space_needed(journal);
@@ -139,7 +139,8 @@ void __log_wait_for_space(journal_t *journal)
spin_lock(&journal->j_state_lock);
spin_lock(&journal->j_list_lock);
nblocks = jbd_space_needed(journal);
- if (__log_space_left(journal) < nblocks) {
+ space_left = __log_space_left(journal);
+ if (space_left < nblocks) {
int chkpt = journal->j_checkpoint_transactions != NULL;
int tid = 0;

@@ -157,8 +158,7 @@ void __log_wait_for_space(journal_t *journal)
} else {
printk(KERN_ERR "%s: needed %d blocks and "
"only had %d space available\n",
- __func__, nblocks,
- __log_space_left(journal));
+ __func__, nblocks, space_left);
printk(KERN_ERR "%s: no way to get more "
"journal space\n", __func__);
WARN_ON(1);

This is in a "should never happen path", though. What were you doing
to trigger it?

- Ted

2008-11-06 17:23:38

by Arthur Jones

[permalink] [raw]

Subject: Re: ext3: kernel BUG at fs/jbd/journal.c:412!

On Thu, Nov 06, 2008 at 09:13:22AM -0800, Theodore Tso wrote:
> [...]
> No this is the other ext3 patch we have in the patch tree.

Pheew... :-)

Arthur

2008-11-06 17:24:29

by Aneesh Kumar K.V

[permalink] [raw]

Subject: Re: ext3: kernel BUG at fs/jbd/journal.c:412!

On Thu, Nov 06, 2008 at 12:13:22PM -0500, Theodore Tso wrote:
> On Thu, Nov 06, 2008 at 08:12:27AM -0800, Arthur Jones wrote:
> > Hi Aneesh, ...
> >
> > On Thu, Nov 06, 2008 at 07:50:25AM -0800, Aneesh Kumar K.V wrote:
> > > I get this with ext4-patchqueue. I guess we have some ext3 patches queued there.
> >
> > This could be related to a patch I just posted.
> >
> > See the thread on linux-ext4 called "ext3: slow symlink corruption
> > on umount" for details on how this patch came about...
>
> No this is the other ext3 patch we have in the patch tree. I see the
> problem, we're calling __log_space_left in a diagnostic printk after
> we've released the j_state_lock. Here's the incremental fix:
>
> diff --git a/fs/jbd/checkpoint.c b/fs/jbd/checkpoint.c
> index 5e856de..18e5137 100644
> --- a/fs/jbd/checkpoint.c
> +++ b/fs/jbd/checkpoint.c
> @@ -115,7 +115,7 @@ static int __try_to_free_cp_buf(struct journal_head *jh)
> */
> void __log_wait_for_space(journal_t *journal)
> {
> - int nblocks;
> + int nblocks, space_left;
> assert_spin_locked(&journal->j_state_lock);
>
> nblocks = jbd_space_needed(journal);
> @@ -139,7 +139,8 @@ void __log_wait_for_space(journal_t *journal)
> spin_lock(&journal->j_state_lock);
> spin_lock(&journal->j_list_lock);
> nblocks = jbd_space_needed(journal);
> - if (__log_space_left(journal) < nblocks) {
> + space_left = __log_space_left(journal);
> + if (space_left < nblocks) {
> int chkpt = journal->j_checkpoint_transactions != NULL;
> int tid = 0;
>
> @@ -157,8 +158,7 @@ void __log_wait_for_space(journal_t *journal)
> } else {
> printk(KERN_ERR "%s: needed %d blocks and "
> "only had %d space available\n",
> - __func__, nblocks,
> - __log_space_left(journal));
> + __func__, nblocks, space_left);
> printk(KERN_ERR "%s: no way to get more "
> "journal space\n", __func__);
> WARN_ON(1);
>
>

you would need the same patch for jbd2

-aneesh

2008-11-06 17:32:40

by Theodore Ts'o

[permalink] [raw]

Subject: Re: ext3: kernel BUG at fs/jbd/journal.c:412!

On Thu, Nov 06, 2008 at 10:46:39PM +0530, Aneesh Kumar K.V wrote:
>
> you would need the same patch for jbd2

I know, already done, and checked into the ext4 patch queue.

- Ted

2008-11-06 17:46:52

by Aneesh Kumar K.V

[permalink] [raw]

Subject: Re: ext3: kernel BUG at fs/jbd/journal.c:412!

On Thu, Nov 06, 2008 at 12:32:36PM -0500, Theodore Tso wrote:
> On Thu, Nov 06, 2008 at 10:46:39PM +0530, Aneesh Kumar K.V wrote:
> >
> > you would need the same patch for jbd2
>
> I know, already done, and checked into the ext4 patch queue.

Now i am hitting this on ext4

__jbd2_log_wait_for_space: needed 1024 blocks and only had 1023 space
available
__jbd2_log_wait_for_space: no way to get more journal space in hdc:8
------------[ cut here ]------------
kernel BUG at fs/jbd2/checkpoint.c:164!
invalid opcode: 0000 [#1] SMP

I also have the change to make sure we use the right chunk value in
ext4_index_trans_blocks

-aneesh