2005-03-03 13:49:25

by Brice Figureau

[permalink] [raw]
Subject: 2.6.10-ac10 oops in journal_commit_transaction

Hi,


I'm reporting an oops on a bi-Xeon database server under 2.6.10-ac10
quite similar to:
http://marc.theaimsgroup.com/?l=ext3-users&m=110848085314238&w=2

I also got another server crashing (a mail server this time), but I
couldn't get the oops/panic.

This was after more than two weeks of uptime, I was running 2.6.10-ac1
before and never got this problem.

Here are the oops information:

Unable to handle kernel NULL pointer dereference at virtual address 0000000c
printing eip:
c01a858d
*pde = 00000000
Oops: 0002 [#1]
PREEMPT SMP
Modules linked in: i2c_i801 i2c_core ip_conntrack_ftp ipt_LOG ipt_limit ipt_REJECT ipt_state iptable_filter ip_conntrack ip_tables
CPU: 2
EIP: 0060:[journal_commit_transaction+877/5264] Not tainted VLI
EFLAGS: 00010286 (2.6.10-ac10)
EIP is at journal_commit_transaction+0x36d/0x1490
eax: db38a56c ebx: 00000000 ecx: 00000000 edx: f7779480
esi: f76fa000 edi: db38a56c ebp: f76fbf60 esp: f76fbdc8
ds: 007b es: 007b ss: 0068
Process kjournald (pid: 1206, threadinfo=f76fa000 task=f7454020)
Stack: f191fadc f191fadc 00000008 00000aa2 f76fbe04 f7fea4c0 f7c305b0 00000000
f77794b8 f7fea414 00000000 00000000 00000000 00000000 00000000 db313efc
f7779480 e4079c2c 00000aa2 00000001 f76fbe28 c01239b0 00000001 f76fbea8
Call Trace:
[show_stack+127/160] show_stack+0x7f/0xa0
[show_registers+351/464] show_registers+0x15f/0x1d0
[die+256/400] die+0x100/0x190
[do_page_fault+672/1712] do_page_fault+0x2a0/0x6b0
[error_code+43/48] error_code+0x2b/0x30
[kjournald+212/576] kjournald+0xd4/0x240
[kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
Code: 8b 85 a0 fe ff ff 85 c0 0f 85 4f 0e 00 00 8b 95 a8 fe ff ff 8b 42 18 85 c0 0f 84 85 00 00 00 be 00 e0 ff ff 21 e6 8b 78 20 8b 1f <f0> ff 43 0c 8b 03 a8 04 0f 85 de 0d 00 00 89 5c 24 04 8b 4d 08
<6>note: kjournald[1206] exited with preempt_count 1

The code crashes in fs/jbd/commit.c journal_commit_transaction in this
particular area at line 314:

...
/*
* Wait for all previously submitted IO to complete.
*/
while (commit_transaction->t_locked_list) {
struct buffer_head *bh;

jh = commit_transaction->t_locked_list->b_tprev;
bh = jh2bh(jh);
get_bh(bh); <--- crash here because bh is NULL
if (buffer_locked(bh)) {
spin_unlock(&journal->j_list_lock);
wait_on_buffer(bh);
if (unlikely(!buffer_uptodate(bh)))
err = -EIO;
spin_lock(&journal->j_list_lock);
}
if (!inverted_lock(journal, bh)) {
put_bh(bh);
spin_lock(&journal->j_list_lock);
continue;
}
if (buffer_jbd(bh) && jh->b_jlist == BJ_Locked) {
__journal_unfile_buffer(jh);
jbd_unlock_bh_state(bh);
journal_remove_journal_head(bh);
put_bh(bh);
} else {
jbd_unlock_bh_state(bh);
}
put_bh(bh);
if (need_resched()) {
spin_unlock(&journal->j_list_lock);
cond_resched();
spin_lock(&journal->j_list_lock);
}
}
...

And more precisely at this stage of the code:

jh = commit_transaction->t_locked_list->b_tprev;
8b 78 20 mov 0x20(%eax),%edi

bh = jh2bh(jh);
8b 1f mov (%edi),%ebx

get_bh(bh);
f0 ff 43 0c lock incl 0xc(%ebx) <-- crash because ebx is null
8b 03 mov (%ebx),%eax

Unfortunately I don't have the knowledge (and time to aquire it) that
will help me chase down this bug/problem.

If you need more information (including .config and other) I'll be happy
to provide it.

Can you CC: me as I'm not subscribed to the list.

Regards,
--
Brice Figureau <[email protected]>


2005-03-04 00:10:21

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.10-ac10 oops in journal_commit_transaction

Brice Figureau <[email protected]> wrote:
>
> I'm reporting an oops on a bi-Xeon database server under 2.6.10-ac10
> quite similar to:
> http://marc.theaimsgroup.com/?l=ext3-users&m=110848085314238&w=2
>
> I also got another server crashing (a mail server this time), but I
> couldn't get the oops/panic.
>
> This was after more than two weeks of uptime, I was running 2.6.10-ac1
> before and never got this problem.
>
> Here are the oops information:
>
> Unable to handle kernel NULL pointer dereference at virtual address 0000000c
> printing eip:
> c01a858d
> *pde = 00000000
> Oops: 0002 [#1]
> PREEMPT SMP
> Modules linked in: i2c_i801 i2c_core ip_conntrack_ftp ipt_LOG ipt_limit ipt_REJECT ipt_state iptable_filter ip_conntrack ip_tables
> CPU: 2
> EIP: 0060:[journal_commit_transaction+877/5264] Not tainted VLI
> EFLAGS: 00010286 (2.6.10-ac10)
> EIP is at journal_commit_transaction+0x36d/0x1490

Please do:

gdb vmlinux
(gdb) l *0xc01a858d

2005-03-04 10:52:27

by Brice Figureau

[permalink] [raw]
Subject: Re: 2.6.10-ac10 oops in journal_commit_transaction

Hi Andrew,

On Thu, 2005-03-03 at 15:37 -0800, Andrew Morton wrote:
> Brice Figureau <[email protected]> wrote:
[snip]
> > Unable to handle kernel NULL pointer dereference at virtual address 0000000c
> > printing eip:
> > c01a858d
> > *pde = 00000000
> > Oops: 0002 [#1]
> > PREEMPT SMP
> > Modules linked in: i2c_i801 i2c_core ip_conntrack_ftp ipt_LOG ipt_limit ipt_REJECT ipt_state iptable_filter ip_conntrack ip_tables
> > CPU: 2
> > EIP: 0060:[journal_commit_transaction+877/5264] Not tainted VLI
> > EFLAGS: 00010286 (2.6.10-ac10)
> > EIP is at journal_commit_transaction+0x36d/0x1490
>
> Please do:
>
> gdb vmlinux
> (gdb) l *0xc01a858d

Unfortunately this kernel is not compiled with CONFIG_DEBUG_INFO=y, so
the above command does not work.

But:
(gdb) disassemble 0xc01a858d
Dump of assembler code for function journal_commit_transaction:
[snipped]
...
0xc01a8568 <journal_commit_transaction+840>: test %eax,%eax
0xc01a856a <journal_commit_transaction+842>: jne 0xc01a93bf <journal_commit_transaction+4511>
0xc01a8570 <journal_commit_transaction+848>: mov 0xfffffea8(%ebp),%edx
0xc01a8576 <journal_commit_transaction+854>: mov 0x18(%edx),%eax
0xc01a8579 <journal_commit_transaction+857>: test %eax,%eax
0xc01a857b <journal_commit_transaction+859>: je 0xc01a8606 <journal_commit_transaction+998>
0xc01a8581 <journal_commit_transaction+865>: mov $0xffffe000,%esi
0xc01a8586 <journal_commit_transaction+870>: and %esp,%esi
0xc01a8588 <journal_commit_transaction+872>: mov 0x20(%eax),%edi
0xc01a858b <journal_commit_transaction+875>: mov (%edi),%ebx
0xc01a858d <journal_commit_transaction+877>: lock incl 0xc(%ebx)
0xc01a8591 <journal_commit_transaction+881>: mov (%ebx),%eax
0xc01a8593 <journal_commit_transaction+883>: test $0x4,%al
0xc01a8595 <journal_commit_transaction+885>: jne 0xc01a9379 <journal_commit_transaction+4441>
0xc01a859b <journal_commit_transaction+891>: mov %ebx,0x4(%esp)
0xc01a859f <journal_commit_transaction+895>: mov 0x8(%ebp),%ecx
0xc01a85a2 <journal_commit_transaction+898>: mov %ecx,(%esp)
0xc01a85a5 <journal_commit_transaction+901>: call 0xc01a81d0 <inverted_lock>
0xc01a85aa <journal_commit_transaction+906>: test %eax,%eax
0xc01a85ac <journal_commit_transaction+908>: je 0xc01a9373 <journal_commit_transaction+4435>
0xc01a85b2 <journal_commit_transaction+914>: mov (%ebx),%eax
0xc01a85b4 <journal_commit_transaction+916>: test $0x20,%ah

So I recompiled my kernel with DEBUG_CONFIG_INFO with the hope that the
code won't move too far and I could find the code:

On the kernel with *debug* enabled:
(gdb) l *0xc01a858d
0xc01a858d is in journal_commit_transaction (buffer_head.h:104).
99 * Emit the buffer bitops functions. Note that there are also functions
100 * of the form "mark_buffer_foo()". These are higher-level functions which
101 * do something in addition to setting a b_state bit.
102 */
103 BUFFER_FNS(Uptodate, uptodate)
104 BUFFER_FNS(Dirty, dirty)
105 TAS_BUFFER_FNS(Dirty, dirty)
106 BUFFER_FNS(Lock, locked)
107 TAS_BUFFER_FNS(Lock, locked)
108 BUFFER_FNS(Req, req)

Which does not seem to match the code included in the oops.

(gdb) disassemble 0xc01a858d
[snip]
0xc01a85c8 <journal_commit_transaction+840>: test %eax,%eax
0xc01a85ca <journal_commit_transaction+842>: jne 0xc01a941f <journal_commit_transaction+4511>
0xc01a85d0 <journal_commit_transaction+848>: mov 0xfffffea8(%ebp),%edx
0xc01a85d6 <journal_commit_transaction+854>: mov 0x18(%edx),%eax
0xc01a85d9 <journal_commit_transaction+857>: test %eax,%eax
0xc01a85db <journal_commit_transaction+859>: je 0xc01a8666 <journal_commit_transaction+998>
0xc01a85e1 <journal_commit_transaction+865>: mov $0xffffe000,%esi
0xc01a85e6 <journal_commit_transaction+870>: and %esp,%esi
0xc01a85e8 <journal_commit_transaction+872>: mov 0x20(%eax),%edi
0xc01a85eb <journal_commit_transaction+875>: mov (%edi),%ebx
0xc01a85ed <journal_commit_transaction+877>: lock incl 0xc(%ebx)
0xc01a85f1 <journal_commit_transaction+881>: mov (%ebx),%eax
0xc01a85f3 <journal_commit_transaction+883>: test $0x4,%al
0xc01a85f5 <journal_commit_transaction+885>: jne 0xc01a93d9 <journal_commit_transaction+4441>
0xc01a85fb <journal_commit_transaction+891>: mov %ebx,0x4(%esp)
0xc01a85ff <journal_commit_transaction+895>: mov 0x8(%ebp),%ecx
0xc01a8602 <journal_commit_transaction+898>: mov %ecx,(%esp)
0xc01a8605 <journal_commit_transaction+901>: call 0xc01a8230 <inverted_lock>
0xc01a860a <journal_commit_transaction+906>: test %eax,%eax
0xc01a860c <journal_commit_transaction+908>: je 0xc01a93d3 <journal_commit_transaction+4435>
0xc01a8612 <journal_commit_transaction+914>: mov (%ebx),%eax
0xc01a8614 <journal_commit_transaction+916>: test $0x20,%ah

So the same code is now at 0xc01a85ed:
(gdb) l *0xc01a85ed
0xc01a85ed is in journal_commit_transaction (atomic.h:103).
98 *
99 * Atomically increments @v by 1.
100 */
101 static __inline__ void atomic_inc(atomic_t *v)
102 {
103 __asm__ __volatile__(
104 LOCK "incl %0"
105 :"=m" (v->counter)
106 :"m" (v->counter));
107 }

It seems to me that get_bh is the culprit because of the following
definition from include/linux/buffer_head.h:
static inline void get_bh(struct buffer_head *bh)
{
atomic_inc(&bh->b_count);
}

I hope this will help you. Let me know if you need more information.
Thanks for taking care of that problem,
Regards,
--
Brice Figureau <[email protected]>

2005-03-06 23:09:40

by Alan

[permalink] [raw]
Subject: Re: 2.6.10-ac10 oops in journal_commit_transaction

FYI Stephen Tweedie has now posted a patch for 2.6.x that ought to fix
this one.

Alan

2005-04-21 22:10:53

by Zou, Nanhai

[permalink] [raw]
Subject: RE: 2.6.10-ac10 oops in journal_commit_transaction

Hi Alan,
We have seen the same oops on the same point.
Can you point to me the URL where the patch is?
I am not sure which patch should I get.

Thanks
Zou Nan hai

> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Alan Cox
> Sent: Monday, March 07, 2005 6:59 AM
> To: Brice Figureau
> Cc: Andrew Morton; Linux Kernel Mailing List
> Subject: Re: 2.6.10-ac10 oops in journal_commit_transaction
>
> FYI Stephen Tweedie has now posted a patch for 2.6.x that ought to fix
> this one.
>
> Alan
>
> -
> To unsubscribe from this list: send the line "unsubscribe
linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2005-04-21 22:33:25

by Chris Wright

[permalink] [raw]
Subject: Re: 2.6.10-ac10 oops in journal_commit_transaction

* Zou, Nanhai ([email protected]) wrote:
> We have seen the same oops on the same point.
> Can you point to me the URL where the patch is?
> I am not sure which patch should I get.

I believe it's fixed in 2.6.11-ac, and we fixed it in the current stable
2.6.11.7 tree. The following patch is what went into 2.6.11.7:
---

From: Stephen Tweedie
Subject: Prevent race condition in jbd

This patch from Stephen Tweedie which fixes a race in jbd code (it
demonstrated itself as more or less random NULL dereferences in the
journal code).

Acked-by: Jan Kara <[email protected]>
Acked-by: Chris Mason <[email protected]>
Signed-off-by: Chris Wright <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

--- linux-2.6-ext3/fs/jbd/transaction.c.=K0000=.orig
+++ linux-2.6-ext3/fs/jbd/transaction.c
@@ -1775,10 +1775,10 @@ static int journal_unmap_buffer(journal_
JBUFFER_TRACE(jh, "checkpointed: add to BJ_Forget");
ret = __dispose_buffer(jh,
journal->j_running_transaction);
+ journal_put_journal_head(jh);
spin_unlock(&journal->j_list_lock);
jbd_unlock_bh_state(bh);
spin_unlock(&journal->j_state_lock);
- journal_put_journal_head(jh);
return ret;
} else {
/* There is no currently-running transaction. So the
@@ -1789,10 +1789,10 @@ static int journal_unmap_buffer(journal_
JBUFFER_TRACE(jh, "give to committing trans");
ret = __dispose_buffer(jh,
journal->j_committing_transaction);
+ journal_put_journal_head(jh);
spin_unlock(&journal->j_list_lock);
jbd_unlock_bh_state(bh);
spin_unlock(&journal->j_state_lock);
- journal_put_journal_head(jh);
return ret;
} else {
/* The orphan record's transaction has
@@ -1813,10 +1813,10 @@ static int journal_unmap_buffer(journal_
journal->j_running_transaction);
jh->b_next_transaction = NULL;
}
+ journal_put_journal_head(jh);
spin_unlock(&journal->j_list_lock);
jbd_unlock_bh_state(bh);
spin_unlock(&journal->j_state_lock);
- journal_put_journal_head(jh);
return 0;
} else {
/* Good, the buffer belongs to the running transaction.


2005-04-22 10:24:53

by Alan

[permalink] [raw]
Subject: Re: 2.6.10-ac10 oops in journal_commit_transaction

On Iau, 2005-04-21 at 23:29, Chris Wright wrote:
> I believe it's fixed in 2.6.11-ac, and we fixed it in the current stable
> 2.6.11.7 tree. The following patch is what went into 2.6.11.7:

2.6.11.7 or 2.6.11ac7 (ie 2.6.11.7-ac 8)) both have this fixed.