2002-01-24 15:54:52

by [email protected]

[permalink] [raw]
Subject: OOPS: kernel BUG at transaction.c:1857 on 2.4.17 while rm'ing 700mb file on ext3 partition.

total used free shared buffers cached
Mem: 383828 249824 134004 0 6072 161672
-/+ buffers/cache: 82080 301748
Swap: 160608 23072 137536


Attachments:
dmesg.txt (8.23 kB)
ksymoops.txt (3.01 kB)
lspci.txt (0.98 kB)
lsmod.txt (348.00 B)
free.txt (230.00 B)
Download all attachments

2002-01-24 19:19:57

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: OOPS: kernel BUG at transaction.c:1857 on 2.4.17 while rm'ing 700mb file on ext3 partition.

Hi,

On Thu, Jan 24, 2002 at 04:54:34PM +0100, frode wrote:
>
> I got the following error while rm'ing a 700mb file from an ext3 partition:
>
> Assertion failure in journal_unmap_buffer() at transaction.c:1857:
> "transaction == journal->j_running_transaction"

Hmm --- this is not one I think I've ever seen before.

> >>EIP; c015ea1a <journal_unmap_buffer+fa/1b0> <=====
> Trace; c015eb6e <journal_flushpage+9e/140>
> Trace; c0156ae2 <ext3_flushpage+22/30>
> Trace; c0125738 <do_flushpage+18/30>
> Trace; c0125762 <truncate_complete_page+12/50>
> Trace; c01258c6 <truncate_list_pages+126/190>
> Trace; c0125970 <truncate_inode_pages+40/70>
> Trace; c014485e <iput+ae/200>
> Trace; c0142e4c <d_delete+4c/70>
> Trace; c013c69c <vfs_unlink+13c/170>
> Trace; c013c778 <sys_unlink+a8/120>
> Trace; c0106e8a <system_call+32/38>

Well, that's a straight forward trace, and looks perfectly normal for
a delete operation. The buffer_head is locked at this point, and the
transaction itself is pinned, so I can't see any way to have an
unrecognised transaction here.

> I use the 'mem=nopentium' option on the lilo prompt while booting, hoping to
> reduce the rather large amount of oopses I have had recently, as I read
> something about AMD Athlons and AGP causing troubles.

Those problems included AGP cache coherency problems, but I didn't see
any mention of other instabilities as a result. Also,

> NVRM: loading NVIDIA NVdriver Kernel Module 1.0.2313 Tue Nov 27 12:01:24 PST 2001

with this driver loaded we really can't make any guarantees about your
system stability at all. If you manage to eliminate other oopses and
still get the ext3 one, even without the NVidia driver loaded, then
there would be a much better change of debugging things, but right now
it sounds like a hardware problem.

Cheers,
Stephen

2002-01-24 22:53:55

by [email protected]

[permalink] [raw]
Subject: Re: OOPS: kernel BUG at transaction.c:1857 on 2.4.17 while rm'ing 700mb file on ext3 partition.

Stephen C. Tweedie wrote:
> On Thu, Jan 24, 2002 at 04:54:34PM +0100, frode wrote:
>>I got the following error while rm'ing a 700mb file from an ext3 partition:
>>Assertion failure in journal_unmap_buffer() at transaction.c:1857:
>>"transaction == journal->j_running_transaction"
> Hmm --- this is not one I think I've ever seen before.

[oops trace snipped]

>>NVRM: loading NVIDIA NVdriver Kernel Module 1.0.2313 Tue Nov 27 12:01:24 PST 2001
> with this driver loaded we really can't make any guarantees about your
> system stability at all. If you manage to eliminate other oopses and
> still get the ext3 one, even without the NVidia driver loaded, then
> there would be a much better change of debugging things, but right now
> it sounds like a hardware problem.

OK, I rebooted and gzip'ed the NVdriver in /lib/modules... to make sure the
module doesn't load (lsmod now says my kernel isn't tainted). I'll try using the
plain 'nv' driver shipped with XFree instead for a while. I tried making another
700mb iso image and fool around with it (loopback mount it, umount it, then rm
it) but couldn't trigger anything - but I just spent five minutes trying.

As I mentioned I have had quite a few oopses lately, most of them regarding
paging etc. (but I'm no kernel expert). See for example
http://marc.theaimsgroup.com/?l=linux-kernel&m=101096234600708&w=2
and
http://marc.theaimsgroup.com/?l=linux-kernel&m=101128528029736&w=2

I'm running linux on an old p100 as well but don't see any problems, so as you
say I suspected a hardware problem. I ran MemTest86 for about half an hour
without any errors (but of course there's plenty of other things that may be wrong).

Do you have any suggestions on other ways I could try to put my hardware
stability on trial, or try to reproduce the bug (to see if it occurs on a
non-tainted kernel)?

- Frode

2002-01-29 14:12:12

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: OOPS: kernel BUG at transaction.c:1857 on 2.4.17 while rm'ing 700mb file on ext3 partition.

Hi,

On Thu, Jan 24, 2002 at 11:53:27PM +0100, frode wrote:

> >>I got the following error while rm'ing a 700mb file from an ext3 partition:
> >>Assertion failure in journal_unmap_buffer() at transaction.c:1857:
> >>"transaction == journal->j_running_transaction"
> > Hmm --- this is not one I think I've ever seen before.

> OK, I rebooted and gzip'ed the NVdriver in /lib/modules... to make sure the
> module doesn't load (lsmod now says my kernel isn't tainted). I'll try using the
> plain 'nv' driver shipped with XFree instead for a while. I tried making another
> 700mb iso image and fool around with it (loopback mount it, umount it, then rm
> it) but couldn't trigger anything - but I just spent five minutes trying.

Have you been able to reproduce any problems yet?

> As I mentioned I have had quite a few oopses lately, most of them regarding
> paging etc. (but I'm no kernel expert). See for example
> http://marc.theaimsgroup.com/?l=linux-kernel&m=101096234600708&w=2
> and
> http://marc.theaimsgroup.com/?l=linux-kernel&m=101128528029736&w=2

iput() crash; page list crash; jbd transaction crash. These look
perfectly consistent with random memory corruption.
>
> I'm running linux on an old p100 as well but don't see any problems, so as you
> say I suspected a hardware problem. I ran MemTest86 for about half an hour

Try leaving it running overnight --- half an hour is very little time
for a proper memory test.

Cheers,
Stephen

2002-01-29 14:26:02

by [email protected]

[permalink] [raw]
Subject: Re: OOPS: kernel BUG at transaction.c:1857 on 2.4.17 while rm'ing 700mb file on ext3 partition.

Stephen C. Tweedie wrote:
> On Thu, Jan 24, 2002 at 11:53:27PM +0100, frode wrote:
>>>>I got the following error while rm'ing a 700mb file from an ext3 partition:
>>>>Assertion failure in journal_unmap_buffer() at transaction.c:1857:
>>>>"transaction == journal->j_running_transaction"
>>>Hmm --- this is not one I think I've ever seen before.
> Have you been able to reproduce any problems yet?

No reproducible problems, just more random oopses (like in
http://marc.theaimsgroup.com/?l=linux-kernel&m=101205570604468&w=2)

> iput() crash; page list crash; jbd transaction crash. These look
> perfectly consistent with random memory corruption.
[memtest86]
> Try leaving it running overnight --- half an hour is very little time
> for a proper memory test.


Others have suggested this by mail also, and after running memtest for 4 hours,
what do you know, a bit error occured.

Test#: 4
Pass#: 6
Failing address: 0aed1f64 - 174.8mb
Good pattern: 00080000
Bad pattern: 000a0000
Error bits: 00020000
Count: 1

I'm pretty sure that 256mb no-brand memory chip I added one year ago is to blame.
I'll try running memtest86 for 8+ hours as soon as feasible, but I guess I
should just throw out the old RAM and put in some new.

I guess all I have to say is, sorry for wasting your time! :(

Anyway, thanks for your interest - at least I'm close to a solution now! :)

- Frode