2004-09-06 20:42:42

by Florian Lohoff

[permalink] [raw]
Subject: [2.6.7] kernel BUG at fs/jbd/transaction.c:1227!


Hi,
once again i have had an oops on a web/ftp/rsync server:

2xP4 Xeon 2.4Ghz
2GB Ram
ext3
Kernel 2.6.7

Before the machine went into production it ran an memtest86 for ~48
hours and showed no problems.

3ware CLI> info c0
Controller: c0
-------------
Driver: 1.26.00.039
Model: 8506-8
FW: FE7S 1.05.00.065
BIOS: BE7X 1.08.00.048
Monitor: ME7X 1.01.00.038
Serial #: F14505A4080259
PCB: Rev5
PCHIP: 1.30-66
ACHIP: 3.20


# of units: 2
Unit 0: RAID 1 74.53 GB ( 156299440 blocks): OK
Unit 2: RAID 0 465.77 GB ( 976789504 blocks): OK

# of ports: 8
Port 0: ST380013AS 5JV8P5W9 74.53 GB (156301488 blocks): OK(unit 0)
Port 1: ST380013AS 5JV83ST6 74.53 GB (156301488 blocks): OK(unit 0)
Port 2: WDC WD2500JD-00GBB0 WD-WMAEP2753130 232.88 GB (488397168 blocks): OK(unit 2)
Port 3: WDC WD2500JD-00GBB0 WD-WMAEP2753448 232.88 GB (488397168 blocks): OK(unit 2)


After some sudden filesystem problems:
Sep 6 20:22:03 source kernel: attempt to access beyond end of device
Sep 6 20:22:03 source kernel: sdb1: rw=0, want=1803231528, limit=976784067
Sep 6 20:22:03 source kernel: attempt to access beyond end of device
Sep 6 20:22:03 source kernel: sdb1: rw=0, want=1080256520, limit=976784067
Sep 6 20:22:03 source kernel: attempt to access beyond end of device
Sep 6 20:22:03 source kernel: sdb1: rw=0, want=1121190792, limit=976784067
Sep 6 20:22:03 source kernel: attempt to access beyond end of device
[...]
Sep 6 20:22:04 source kernel: EXT3-fs error (device sdb1): ext3_free_blocks: Freeing blocks not in datazone - block = 1836016676, count = 1Sep 6 20:22:04 source kernel: Remounting filesystem read-only
Sep 6 20:22:04 source kernel: EXT3-fs error (device sdb1): ext3_free_blocks: bit already cleared for block 7235937
Sep 6 20:22:05 source kernel: EXT3-fs error (device sdb1): ext3_free_blocks: Freeing blocks in system zones - Block = 33, count = 1
Sep 6 20:22:05 source kernel: EXT3-fs error (device sdb1): ext3_free_blocks: Freeing blocks not in datazone - block = 140148848, count = 1
Sep 6 20:22:05 source kernel: attempt to access beyond end of device
Sep 6 20:22:05 source kernel: sdb1: rw=0, want=1080256520, limit=976784067
Sep 6 20:22:05 source kernel: EXT3-fs error (device sdb1): ext3_free_branches: Read failure, inode=13459886, block=135032064
Sep 6 20:22:05 source kernel: attempt to access beyond end of device
Sep 6 20:22:05 source kernel: sdb1: rw=0, want=3901751304, limit=976784067
Sep 6 20:22:05 source kernel: EXT3-fs error (device sdb1): ext3_free_branches: Read failure, inode=13459886, block=1024589824
Sep 6 20:22:05 source kernel: EXT3-fs error (device sdb1): ext3_free_blocks: Freeing blocks in system zones - Block = 65569188, count = 1
Sep 6 20:22:05 source kernel: EXT3-fs error (device sdb1): ext3_free_blocks: bit already cleared for block 63230
[...]
Sep 6 20:28:50 source kernel: EXT3-fs error (device sdb1): ext3_free_blocks: Freeing blocks not in datazone - block = 805335040, count = 1
Sep 6 20:28:50 source kernel: EXT3-fs error (device sdb1): ext3_free_blocks: Freeing blocks not in datazone - block = 4294959104, count = 1Sep 6 20:28:51 source kernel: EXT3-fs error (device sdb1): ext3_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1Sep 6 20:29:22 source last message repeated 244 times
Sep 6 20:30:02 source last message repeated 318 times
Sep 6 20:30:02 source kernel: Assertion failure in journal_forget() at fs/jbd/transaction.c:1227: "!jh->b_committed_data"
Sep 6 20:30:02 source kernel: ------------[ cut here ]------------
Sep 6 20:30:04 source kernel: kernel BUG at fs/jbd/transaction.c:1227!
Sep 6 20:30:04 source kernel: invalid operand: 0000 [#1]
Sep 6 20:30:04 source kernel: SMP
Sep 6 20:30:04 source kernel: Modules linked in: af_packet sch_red i8xx_tco dm_mod w83627hf lm85 i2c_sensor i2c_isa i2c_core unix
Sep 6 20:30:04 source kernel: CPU: 3
Sep 6 20:30:04 source kernel: EIP: 0060:[journal_forget+455/532] Not tainted
Sep 6 20:30:04 source kernel: EFLAGS: 00010282 (2.6.7-source)
Sep 6 20:30:04 source kernel: EIP is at journal_forget+0x1c7/0x214
Sep 6 20:30:04 source kernel: eax: 0000005f ebx: c3e7814c ecx: 00000002 edx: c039bddc
Sep 6 20:30:04 source kernel: esi: dea50e9c edi: c2380800 ebp: cf5d33d8 esp: d01f9d80
Sep 6 20:30:04 source kernel: ds: 007b es: 007b ss: 0068
Sep 6 20:30:04 source kernel: Process rsync (pid: 7205, threadinfo=d01f8000 task=f48896f0)
Sep 6 20:30:04 source kernel: Stack: c0355e20 c0337be7 c0354030 000004cb c0354146 c3e7814c df261404 c3e7814c
Sep 6 20:30:04 source kernel: cb2841d0 c0187554 cb2841d0 c3e7814c c7582000 c7581ffc ffffffff c3e7814c
Sep 6 20:30:04 source kernel: cb2841d0 02ff0000 ead1bfc0 c0189bef cb2841d0 00000001 df261404 c3e7814c
Sep 6 20:30:04 source kernel: Call Trace:
Sep 6 20:30:04 source kernel: [ext3_forget+224/228] ext3_forget+0xe0/0xe4
Sep 6 20:30:04 source kernel: [ext3_free_branches+260/607] ext3_free_branches+0x104/0x25f
Sep 6 20:30:04 source kernel: [ext3_free_branches+225/607] ext3_free_branches+0xe1/0x25f
Sep 6 20:30:04 source kernel: [ext3_truncate+1213/1534] ext3_truncate+0x4bd/0x5fe
Sep 6 20:30:04 source kernel: [journal_start+171/210] journal_start+0xab/0xd2
Sep 6 20:30:04 source kernel: [start_transaction+35/88] start_transaction+0x23/0x58
Sep 6 20:30:04 source kernel: [ext3_delete_inode+192/230] ext3_delete_inode+0xc0/0xe6
Sep 6 20:30:04 source kernel: [ext3_delete_inode+0/230] ext3_delete_inode+0x0/0xe6
Sep 6 20:30:04 source kernel: [generic_delete_inode+89/224] generic_delete_inode+0x59/0xe0
Sep 6 20:30:04 source kernel: [iput+98/124] iput+0x62/0x7c
Sep 6 20:30:04 source kernel: [dput+231/403] dput+0xe7/0x193
Sep 6 20:30:04 source kernel: [sys_rename+511/544] sys_rename+0x1ff/0x220
Sep 6 20:30:04 source kernel: [__fput+176/263] __fput+0xb0/0x107
Sep 6 20:30:04 source kernel: [syscall_call+7/11] syscall_call+0x7/0xb
Sep 6 20:30:04 source kernel:
Sep 6 20:30:04 source kernel: Code: 0f 0b cb 04 30 40 35 c0 e9 58 ff ff ff c7 44 24 10 5c 41 35
Sep 6 20:30:04 source kernel: <0>Fatal exception: panic in 5 seconds

Mount options for /dev/sdb1:
/dev/sdb1 on /data type ext3 (rw,noatime,data=journal,errors=remount-ro)

tune2fs 1.35 (28-Feb-2004)
Filesystem volume name: <none>
Last mounted on: <not available>
Filesystem UUID: 6e3266b5-b248-412d-a287-09159fcc487b
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal filetype needs_recovery sparse_super large_file
Default mount options: (none)
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 61063168
Block count: 122098008
Reserved block count: 6104900
Free blocks: 49951840
Free inodes: 60157918
First block: 0
Block size: 4096
Fragment size: 4096
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 16384
Inode blocks per group: 512
Filesystem created: Fri Jun 18 23:45:42 2004
Last mount time: Mon Sep 6 21:25:35 2004
Last write time: Mon Sep 6 21:25:35 2004
Mount count: 1
Maximum mount count: -1
Last checked: Mon Sep 6 21:25:16 2004
Check interval: 0 (<none>)
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128
Journal inode: 8
Default directory hash: tea
Directory Hash Seed: 19a42cdd-6f23-4e26-a863-df8ae830741c
Journal backup: inode blocks

config attached

Flo
--
Florian Lohoff [email protected] +49-171-2280134
Heisenberg may have been here.


Attachments:
(No filename) (0.00 B)
(No filename) (189.00 B)
Download all attachments

2004-09-06 21:28:30

by Alan

[permalink] [raw]
Subject: Re: [2.6.7] kernel BUG at fs/jbd/transaction.c:1227!

On Llu, 2004-09-06 at 21:41, Florian Lohoff wrote:
> Sep 6 20:22:03 source kernel: attempt to access beyond end of device
> Sep 6 20:22:03 source kernel: sdb1: rw=0, want=1803231528, limit=976784067
> Sep 6 20:22:03 source kernel: attempt to access beyond end of device
> Sep 6 20:22:03 source kernel: sdb1: rw=0, want=1080256520, limit=976784067
> Sep 6 20:22:03 source kernel: attempt to access beyond end of device
> Sep 6 20:22:03 source kernel: sdb1: rw=0, want=1121190792, limit=976784067
> Sep 6 20:22:03 source kernel: attempt to access beyond end of devic

All of these seem to be sensible block numbers if the top bit is flipped
back. How much do you trust the hardware ?

2004-10-14 16:00:17

by Florian Lohoff

[permalink] [raw]
Subject: Re: [2.6.7] kernel BUG at fs/jbd/transaction.c:1227!

On Mon, Sep 06, 2004 at 09:25:26PM +0100, Alan Cox wrote:
> On Llu, 2004-09-06 at 21:41, Florian Lohoff wrote:
> > Sep 6 20:22:03 source kernel: attempt to access beyond end of device
> > Sep 6 20:22:03 source kernel: sdb1: rw=0, want=1803231528, limit=976784067
> > Sep 6 20:22:03 source kernel: attempt to access beyond end of device
> > Sep 6 20:22:03 source kernel: sdb1: rw=0, want=1080256520, limit=976784067
> > Sep 6 20:22:03 source kernel: attempt to access beyond end of device
> > Sep 6 20:22:03 source kernel: sdb1: rw=0, want=1121190792, limit=976784067
> > Sep 6 20:22:03 source kernel: attempt to access beyond end of devic
>
> All of these seem to be sensible block numbers if the top bit is flipped
> back. How much do you trust the hardware ?

Too much - It seems there is a 3Ware HW bug notice saying that for
64Bit 3Ware Controllers on specific Board and or Chipsets (My case
I7501) there are some unwanted noises on the PCI lines which is fixed in
newer Board revisions of the Controllers.

After replacing the 8506 with a newer revision the bug is gone.

Flo
--
Florian Lohoff [email protected] +49-171-2280134
Heisenberg may have been here.


Attachments:
(No filename) (1.20 kB)
(No filename) (189.00 B)
Download all attachments