2003-01-14 13:06:38

by Lukasz Trabinski

[permalink] [raw]
Subject: 2.4.21-pre3 - problems with ext3

Hello

Since 2.4.20, we have problems with ext3. Machine is 2xPentium III (1GHz),
2GB RAM, 1GB swap. RH 8.0 (glibc-2.3.1-21), gcc (GCC) 3.2 20020903

We have a lot of users:

oceanic:~# wc -l /etc/passwd
6694 /etc/passwd

connected via SAMBA (2.2.7) from 200-300 Windows-XX workstations

Partition with ext3 looks like this:

oceanic:~# mount |grep ext3
/dev/sdb5 on /home1 type ext3 (rw,nosuid,nodev,usrquota)
/dev/sdb6 on /home2 type ext3 (rw,nosuid,nodev,usrquota)
/dev/sdc5 on /home3 type ext3 (rw,nosuid,nodev,usrquota)
/dev/sdc6 on /home4 type ext3 (rw,nosuid,nodev,usrquota)
/dev/sdc7 on /home5 type ext3 (rw,nosuid,nodev,usrquota)
/dev/sdc8 on /home6 type ext3 (rw,nosuid,nodev,usrquota)
/dev/sdb7 on /home7 type ext3 (rw,nosuid,nodev,usrquota)
/dev/sdb8 on /home8 type ext3 (rw,nosuid,nodev,usrquota)
/dev/sdd5 on /home9 type ext3 (rw,nosuid,nodev,usrquota)
/dev/sdd6 on /homea type ext3 (rw,nosuid,nodev,usrquota)
/dev/sdd7 on /homeb type ext3 (rw,nosuid,nodev,usrquota)
/dev/sdd8 on /homec type ext3 (rw,nosuid,nodev,usrquota)
/dev/sda7 on /opt/windows type ext3 (rw,nosuid,nodev)



My question is:
Is it a problem with ext3 or problems with disk (hardware problem) or
something else?

Oops looks like this:


ksymoops 2.4.5 on i686 2.4.21-pre3. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.21-pre3/ (default)
-m /boot/System.map-2.4.21-pre3 (default)

Jan 14 12:53:52 oceanic kernel: EIP: 0010:[<f88ab5df>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Jan 14 12:53:52 oceanic kernel: EFLAGS: 00010282
Jan 14 12:53:52 oceanic kernel: eax: 0000007a ebx: efeed7a0 ecx: 00000092 edx: e9925f7c
Jan 14 12:53:52 oceanic kernel: esi: ef818000 edi: f7719000 ebp: 00000020 esp: ef819b74
Jan 14 12:53:52 oceanic kernel: ds: 0018 es: 0018 ss: 0018
Jan 14 12:53:52 oceanic kernel: Process smbd (pid: 7566, stackpage=ef819000)
Jan 14 12:53:52 oceanic kernel: Stack: f88b5be0 f88b5063 f88b4ff4 000000f9 f88b5da0 efeed7a0 f758e2c0 f758e2c0
Jan 14 12:53:52 oceanic kernel: f88c4bc0 f7719000 00000001 f7b73218 00000002 00000001 f758e2c0 f7638c00
Jan 14 12:53:52 oceanic kernel: c0157ff5 f758e2c0 f758e2c0 f7638c00 ffffffff c0131f7a f758e2c0 00000001
Jan 14 12:53:52 oceanic kernel: Call Trace: [<f88b5be0>] [<f88b5063>] [<f88b4ff4>] [<f88b5da0>] [<f88c4bc0>]
Jan 14 12:53:52 oceanic kernel: [<c0157ff5>] [<c0131f7a>] [<c01de57f>] [<c0138bb5>] [<f88bf0c9>] [<c015e991>]
Jan 14 12:53:52 oceanic kernel: [<c015ee7c>] [<c015ff11>] [<c01586ce>] [<c01587ac>] [<c0156590>] [<c0158a42>]
Jan 14 12:53:52 oceanic kernel: [<c0158af4>] [<c0137323>] [<c0137386>] [<c01382d2>] [<c0138564>] [<c012f9b6>]
Jan 14 12:53:52 oceanic kernel: [<f88b44dc>] [<f88c2c75>] [<f88ab55b>] [<f88ab625>] [<f88c0b44>] [<f88c3858>]
Jan 14 12:53:52 oceanic kernel: [<c020c28c>] [<c0209fe1>] [<c012c063>] [<c0159c76>] [<f88c473c>] [<c024601e>]
Jan 14 12:53:52 oceanic kernel: [<c0159fce>] [<c013e2e6>] [<c0148bb3>] [<c013f98d>] [<c013e397>] [<c010770f>]
Jan 14 12:53:52 oceanic kernel: Code: 0f 0b f9 00 f4 4f 8b f8 ff 43 08 89 d8 8b 5c 24 14 8b 74 24


>>EIP; f88ab5df <[jbd]journal_start+5f/c0> <=====

>>ebx; efeed7a0 <___strtok+2fb896e0/38546fa0>
>>edx; e9925f7c <___strtok+295c1ebc/38546fa0>
>>esi; ef818000 <___strtok+2f4b3f40/38546fa0>
>>edi; f7719000 <___strtok+373b4f40/38546fa0>
>>esp; ef819b74 <___strtok+2f4b5ab4/38546fa0>

Trace; f88b5be0 <[jbd]__kstrtab_journal_enable_debug+651/5be5>
Trace; f88b5063 <[jbd]__ksymtab_journal_enable_debug+1f/28>
Trace; f88b4ff4 <[jbd]__ksymtab_journal_ack_err+0/8>
Trace; f88b5da0 <[jbd]__kstrtab_journal_enable_debug+811/5be5>
Trace; f88c4bc0 <[ext3]ext3_dirty_inode+160/180>
Trace; c0157ff5 <__mark_inode_dirty+b5/4e0>
Trace; c0131f7a <generic_file_write+2ba/2bb0>
Trace; c01de57f <scsi_io_completion+16f/850>
Trace; c0138bb5 <free_pages+535/27c0>
Trace; f88bf0c9 <[ext3]ext3_file_write+39/d0>
Trace; c015e991 <seq_printf+f01/a9c0>
Trace; c015ee7c <seq_printf+13ec/a9c0>
Trace; c015ff11 <seq_printf+2481/a9c0>
Trace; c01586ce <clear_inode+8e/260>
Trace; c01587ac <clear_inode+16c/260>
Trace; c0156590 <dput+30/190>
Trace; c0158a42 <invalidate_device+102/240>
Trace; c0158af4 <invalidate_device+1b4/240>
Trace; c0137323 <kmem_find_general_cachep+1583/24b0>
Trace; c0137386 <kmem_find_general_cachep+15e6/24b0>
Trace; c01382d2 <_alloc_pages+82/210>
Trace; c0138564 <__alloc_pages+104/1a0>
Trace; c012f9b6 <find_or_create_page+86/110>
Trace; f88b44dc <[jbd]__jbd_kmalloc+2c/c0>
Trace; f88c2c75 <[ext3]ext3_block_truncate_page+85/490>
Trace; f88ab55b <[jbd]new_handle+4b/70>
Trace; f88ab625 <[jbd]journal_start+a5/c0>
Trace; f88c0b44 <[ext3]start_transaction+94/a0>
Trace; f88c3858 <[ext3]ext3_truncate+d8/480>
Trace; c020c28c <skb_copy_datagram_iovec+4c/690>
Trace; c0209fe1 <alloc_skb+361/370>
Trace; c012c063 <vmtruncate+d3/9e0>
Trace; c0159c76 <inode_setattr+106/190>
Trace; f88c473c <[ext3]ext3_setattr+25c/320>
Trace; c024601e <inet_recvmsg_R__ver_inet_recvmsg+4e/70>
Trace; c0159fce <notify_change+2ce/350>
Trace; c013e2e6 <fd_install+b6/b70>
Trace; c0148bb3 <cdput+8c3/bb0>
Trace; c013f98d <generic_file_open+55d/650>
Trace; c013e397 <fd_install+167/b70>
Trace; c010770f <__read_lock_failed+144b/183c>

Code; f88ab5df <[jbd]journal_start+5f/c0>
00000000 <_EIP>:
Code; f88ab5df <[jbd]journal_start+5f/c0> <=====
0: 0f 0b ud2a <=====
Code; f88ab5e1 <[jbd]journal_start+61/c0>
2: f9 stc
Code; f88ab5e2 <[jbd]journal_start+62/c0>
3: 00 f4 add %dh,%ah
Code; f88ab5e4 <[jbd]journal_start+64/c0>
5: 4f dec %edi
Code; f88ab5e5 <[jbd]journal_start+65/c0>
6: 8b f8 mov %eax,%edi
Code; f88ab5e7 <[jbd]journal_start+67/c0>
8: ff 43 08 incl 0x8(%ebx)
Code; f88ab5ea <[jbd]journal_start+6a/c0>
b: 89 d8 mov %ebx,%eax
Code; f88ab5ec <[jbd]journal_start+6c/c0>
d: 8b 5c 24 14 mov 0x14(%esp,1),%ebx
Code; f88ab5f0 <[jbd]journal_start+70/c0>
11: 8b 74 24 00 mov 0x0(%esp,1),%esi


--
*[ ?ukasz Tr?bi?ski ]*
SysAdmin @wsisiz.edu.pl


2003-01-14 15:07:20

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: 2.4.21-pre3 - problems with ext3

Hi,

On Tue, 2003-01-14 at 13:15, Lukasz Trabinski wrote:

> Jan 14 12:53:52 oceanic kernel: Code: 0f 0b f9 00 f4 4f 8b f8 ff 43 08 89 d8 8b 5c 24 14 8b 74 24

That's a BUG(), and you should have had some form of ext3 or jbd assert
failure in the logs just before this oops --- could you supply that,
please?

Thanks,
Stephen

2003-01-14 15:40:40

by Lukasz Trabinski

[permalink] [raw]
Subject: Re: 2.4.21-pre3 - problems with ext3

On Tue, 14 Jan 2003, Stephen C. Tweedie wrote:

> Hi,
>
> On Tue, 2003-01-14 at 13:15, Lukasz Trabinski wrote:
>
> > Jan 14 12:53:52 oceanic kernel: Code: 0f 0b f9 00 f4 4f 8b f8 ff 43 08 89 d8 8b 5c 24 14 8b 74 24
>
> That's a BUG(), and you should have had some form of ext3 or jbd assert
> failure in the logs just before this oops --- could you supply that,
> please?

Here is:

Jan 14 12:53:52 oceanic kernel: Assertion failure in
journal_start_Rsmp_909c88ec() at transaction.c:249:
"handle->h_transaction->t_journal == journal"
Jan 14 12:53:52 oceanic kernel: kernel BUG at transaction.c:249!
Jan 14 12:53:52 oceanic kernel: invalid operand: 0000
Jan 14 12:53:52 oceanic kernel: CPU: 1

Thank You for your answer.

--
*[ ?ukasz Tr?bi?ski ]*
SysAdmin @wsisiz.edu.pl

2003-01-20 22:30:05

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: 2.4.21-pre3 - problems with ext3

Hi,

On Tue, 2003-01-14 at 13:15, Lukasz Trabinski wrote:

> Since 2.4.20, we have problems with ext3. Machine is 2xPentium III (1GHz),
> 2GB RAM, 1GB swap. RH 8.0 (glibc-2.3.1-21), gcc (GCC) 3.2 20020903

So it was stable under earlier kernels?

> Is it a problem with ext3 or problems with disk (hardware problem) or
> something else?

There's nothing in the trace to indicate a hardware fault. Is there
anything else showing up in the logs which might indicate the kernel
getting into a tangle?

>From the assert failure:

> Jan 14 12:53:52 oceanic kernel: Assertion failure in
> journal_start_Rsmp_909c88ec() at transaction.c:249:
> "handle->h_transaction->t_journal == journal"

it looks as if either the VM has been recursing into the filesystem
(filesystem problem or a missing GFP_NOFS somewhere), or there has been
a stack overflow (the data structure that the mismatch is on is just
about the very last thing on the task structure that the stack grows
towards.)

But as for the decoded OOPS, I can't immediately trace through it
successfully. There's no syscall entry point at the top of the stack,
and there appear to be two separate possible interpretations of the call
trace. Do you have any other captured oopses that I might be able to
find some common threads in?

Cheers,
Stephen

2003-01-21 00:16:52

by Lukasz Trabinski

[permalink] [raw]
Subject: Re: 2.4.21-pre3 - problems with ext3 (long)

On Mon, 20 Jan 2003, Stephen C. Tweedie wrote:

> > Since 2.4.20, we have problems with ext3. Machine is 2xPentium III (1GHz),
> > 2GB RAM, 1GB swap. RH 8.0 (glibc-2.3.1-21), gcc (GCC) 3.2 20020903
>
> So it was stable under earlier kernels?

Yes, it was stable.

Some plays with zcat messages.*.gz |grep "Assertion failure":

system boot 2.4.21-pre3
Jan 14 12:53:52 oceanic kernel: Assertion failure in journal_start_Rsmp_909c88ec
() at transaction.c:249: "handle->h_transaction->t_journal == journal"

system boot 2.4.21-pre2
Jan 10 13:00:11 oceanic kernel: Assertion failure in journal_start_Rsmp_c2be780a
() at transaction.c:248: "handle->h_transaction->t_journal == journal"

system boot 2.4.20:
Dec 15 15:27:01 oceanic kernel: Assertion failure in journal_start_Rsmp_c2be780a
() at transaction.c:248: "handle->h_transaction->t_journal == journal


With earlier kernels 2.4.X (for example 2.4.20-rc2) this machine has much
longer uptime.

[...]
> But as for the decoded OOPS, I can't immediately trace through it
> successfully. There's no syscall entry point at the top of the stack,
> and there appear to be two separate possible interpretations of the call
> trace. Do you have any other captured oopses that I might be able to
> find some common threads in?


By the way, last crash was with messages:

Jan 19 11:50:20 oceanic kernel: kernel BUG at highmem.c:159!
Jan 19 11:50:20 oceanic kernel: invalid operand: 0000
Jan 19 11:50:20 oceanic kernel: CPU: 1

But, that's all what was in log files.


There is two earlier oopses, if you want, can send to you fragments
directly from log files - not output from ksymoops. All oopses came from
smbd process. ( samba-2.2.7-1.7.3 from RH updates).


oopses 1:

ksymoops 2.4.5 on i686 2.4.21-pre3. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.21-pre2/System.map (specified)
-m /boot/System.map-2.4.21-pre3 (default)

Jan 10 13:00:11 oceanic kernel: kernel BUG at transaction.c:248!
Jan 10 13:00:11 oceanic kernel: invalid operand: 0000
Jan 10 13:00:11 oceanic kernel: CPU: 1
Jan 10 13:00:11 oceanic kernel: EIP: 0010:[<f88ab368>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Jan 10 13:00:11 oceanic kernel: EFLAGS: 00010296
Jan 10 13:00:11 oceanic kernel: eax: 0000007a ebx: e03e2420 ecx: ea894000 edx: f744df44
Jan 10 13:00:11 oceanic kernel: esi: f7b59400 edi: ea894000 ebp: d73bc960 esp: ea8959e4
Jan 10 13:00:11 oceanic kernel: ds: 0018 es: 0018 ss: 0018
Jan 10 13:00:11 oceanic kernel: Process smbd (pid: 13129, stackpage=ea895000)
Jan 10 13:00:11 oceanic kernel: Stack: f88b2300 f88b3ffb f88b3fd0 000000f8 f88b23a0 e03e2420 ffffffe2 f7563d60
Jan 10 13:00:11 oceanic kernel: f88bf7ae f7b59400 00000001 0076c129 00000400 ec001e60 f76a51a0 ea895a48
Jan 10 13:00:11 oceanic kernel: 00000000 00000000 f76ed400 f7563d60 f7b58c00 00000001 c014dc02 f7563d60
Jan 10 13:00:11 oceanic kernel: Call Trace: [<f88b2300>] [<f88b3ffb>] [<f88b3fd0>] [<f88b23a0>] [<f88bf7ae>]
Jan 10 13:00:11 oceanic kernel: [<c014dc02>] [<c012c337>] [<f88abd0e>] [<c01a385a>] [<c0129882>] [<c0132555>]
Jan 10 13:00:11 oceanic kernel: [<f88babc2>] [<c0153abb>] [<c0153f50>] [<c0154e9b>] [<c014e797>] [<c014da7d>]
Jan 10 13:00:11 oceanic kernel: [<c014e86d>] [<c014eaed>] [<c014eb30>] [<c01312fe>] [<c013135c>] [<c0131e52>]
Jan 10 13:00:11 oceanic kernel: [<c013210b>] [<c0129db8>] [<f88bdd75>] [<f88bcc6e>] [<c021445f>] [<c0209b59>]
Jan 10 13:00:11 oceanic kernel: [<c013aeb8>] [<f88b3ff0>] [<f88b1c17>] [<f88ab285>] [<f88ab39d>] [<f88be710>]
Jan 10 13:00:11 oceanic kernel: [<f88bc198>] [<f88be710>] [<f88be7c9>] [<c01eea7d>] [<c020abf5>] [<f88be710>]
Jan 10 13:00:11 oceanic kernel: [<c0126d99>] [<c014f836>] [<f88bf525>] [<c01eb561>] [<c0132170>] [<c014f9d9>]
Jan 10 13:00:11 oceanic kernel: [<c01eb668>] [<c01374a7>] [<c0137b22>] [<c01070c3>]
Jan 10 13:00:11 oceanic kernel: Code: 0f 0b f8 00 d0 3f 8b f8 83 c4 14 ff 43 08 eb 40 8b 4c 24 14


>>EIP; f88ab368 <[jbd].text.start+308/9ea9> <=====

>>ebx; e03e2420 <___strtok+2007e360/38546fa0>
>>ecx; ea894000 <___strtok+2a52ff40/38546fa0>
>>edx; f744df44 <___strtok+370e9e84/38546fa0>
>>esi; f7b59400 <___strtok+377f5340/38546fa0>
>>edi; ea894000 <___strtok+2a52ff40/38546fa0>
>>ebp; d73bc960 <___strtok+170588a0/38546fa0>
>>esp; ea8959e4 <___strtok+2a531924/38546fa0>

Trace; f88b2300 <[jbd].text.start+72a0/9ea9>
Trace; f88b3ffb <[jbd].text.start+8f9b/9ea9>
Trace; f88b3fd0 <[jbd].text.start+8f70/9ea9>
Trace; f88b23a0 <[jbd].text.start+7340/9ea9>
Trace; f88bf7ae <[ext3].text.start+274e/dc9d>
Trace; c014dc02 <vfs_mkdir+232/350>
Trace; c012c337 <vmtruncate+3a7/9e0>
Trace; f88abd0e <[jbd].text.start+cae/9ea9>
Trace; c01a385a <vc_resize+170a/3a40>
Trace; c0129882 <in_egroup_p+922/c70>
Trace; c0132555 <generic_file_write+895/2bb0>
Trace; f88babc2 <[jbd].rodata.end+5bdf/6191>
Trace; c0153abb <locks_copy_lock+72b/a30>
Trace; c0153f50 <posix_locks_deadlock+f0/110>
Trace; c0154e9b <lease_get_mtime+fb/e60>
Trace; c014e797 <vfs_symlink+257/2c0>
Trace; c014da7d <vfs_mkdir+ad/350>
Trace; c014e86d <vfs_link+6d/da0>
Trace; c014eaed <vfs_link+2ed/da0>
Trace; c014eb30 <vfs_link+330/da0>
Trace; c01312fe <generic_file_mmap+40e/d20>
Trace; c013135c <generic_file_mmap+46c/d20>
Trace; c0131e52 <generic_file_write+192/2bb0>
Trace; c013210b <generic_file_write+44b/2bb0>
Trace; c0129db8 <exec_usermodehelper+1e8/210>
Trace; f88bdd75 <[ext3].text.start+d15/dc9d>
Trace; f88bcc6e <[jbd].bss.end+1ac7/1eb9>
Trace; c021445f <__rta_fill+9f/150>
Trace; c0209b59 <sock_init_data+2e9/370>
Trace; c013aeb8 <alloc_pages_node+78/2480>
Trace; f88b3ff0 <[jbd].text.start+8f90/9ea9>
Trace; f88b1c17 <[jbd].text.start+6bb7/9ea9>
Trace; f88ab285 <[jbd].text.start+225/9ea9>
Trace; f88ab39d <[jbd].text.start+33d/9ea9>
Trace; f88be710 <[ext3].text.start+16b0/dc9d>
Trace; f88bc198 <[jbd].bss.end+ff1/1eb9>
Trace; f88be710 <[ext3].text.start+16b0/dc9d>
Trace; f88be7c9 <[ext3].text.start+1769/dc9d>
Trace; c01eea7d <scsi_free+e33d/1c220>
Trace; c020abf5 <__pskb_pull_tail+115/300>
Trace; f88be710 <[ext3].text.start+16b0/dc9d>
Trace; c0126d99 <notify_parent+599/d50>
Trace; c014f836 <vfs_follow_link+b6/d0>
Trace; f88bf525 <[ext3].text.start+24c5/dc9d>
Trace; c01eb561 <scsi_free+ae21/1c220>
Trace; c0132170 <generic_file_write+4b0/2bb0>
Trace; c014f9d9 <page_follow_link+e9/15e0>
Trace; c01eb668 <scsi_free+af28/1c220>
Trace; c01374a7 <kmem_find_general_cachep+1707/24b0>
Trace; c0137b22 <kmem_find_general_cachep+1d82/24b0>
Trace; c01070c3 <__read_lock_failed+dff/183c>

Code; f88ab368 <[jbd].text.start+308/9ea9>
00000000 <_EIP>:
Code; f88ab368 <[jbd].text.start+308/9ea9> <=====
0: 0f 0b ud2a <=====
Code; f88ab36a <[jbd].text.start+30a/9ea9>
2: f8 clc
Code; f88ab36b <[jbd].text.start+30b/9ea9>
3: 00 d0 add %dl,%al
Code; f88ab36d <[jbd].text.start+30d/9ea9>
5: 3f aas
Code; f88ab36e <[jbd].text.start+30e/9ea9>
6: 8b f8 mov %eax,%edi
Code; f88ab370 <[jbd].text.start+310/9ea9>
8: 83 c4 14 add $0x14,%esp
Code; f88ab373 <[jbd].text.start+313/9ea9>
b: ff 43 08 incl 0x8(%ebx)
Code; f88ab376 <[jbd].text.start+316/9ea9>
e: eb 40 jmp 50 <_EIP+0x50>
Code; f88ab378 <[jbd].text.start+318/9ea9>
10: 8b 4c 24 14 mov 0x14(%esp,1),%ecx

oopses 2:

ksymoops 2.4.5 on i686 2.4.21-pre3. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.20/System.map (specified)
-m /boot/System.map-2.4.21-pre3 (default)
Dec 15 15:27:01 oceanic kernel: Kernel panic: EXT3-fs panic (device sd(8,23)): load_block_bitmap: block_group >= groups_count - block_group = 524287, groups_count = 2126
Dec 15 15:27:01 oceanic kernel: kernel BUG at transaction.c:248!
Dec 15 15:27:01 oceanic kernel: invalid operand: 0000
Dec 15 15:27:01 oceanic kernel: CPU: 0
Dec 15 15:27:01 oceanic kernel: EIP: 0010:[<f88ab368>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Dec 15 15:27:01 oceanic kernel: EFLAGS: 00010202
Dec 15 15:27:01 oceanic kernel: eax: 0000007a ebx: f5c4fd40 ecx: c029e0c0 edx: 00018765
Dec 15 15:27:01 oceanic kernel: esi: f7b57000 edi: f7530000 ebp: f6295920 esp: f753178c
Dec 15 15:27:01 oceanic kernel: ds: 0018 es: 0018 ss: 0018
Dec 15 15:27:01 oceanic kernel: Process smbd (pid: 21636, stackpage=f7531000)
Dec 15 15:27:01 oceanic kernel: Stack: f88b2300 f88b3ffb f88b3fd0 000000f8 f88b23a0 f5c4fd40 ffffffe2 f77020c0
Dec 15 15:27:01 oceanic kernel: f88bf7de f7b57000 00000001 c013afe6 d585ae60 c013bc31 00000400 00000000
Dec 15 15:27:01 oceanic kernel: 00014140 00000000 c2382900 f77020c0 f7b56800 00000001 c014db12 f77020c0
Dec 15 15:27:01 oceanic kernel: Call Trace: [<f88b2300>] [<f88b3ffb>] [<f88b3fd0>] [<f88b23a0>] [<f88bf7de>]
Dec 15 15:27:01 oceanic kernel: [<c013afe6>] [<c013bc31>] [<c014db12>] [<c012c2f7>] [<c012c4e0>] [<c013b188>]
Dec 15 15:27:01 oceanic kernel: [<c0199cfb>] [<f88babc2>] [<c0153aeb>] [<c0153dac>] [<c013a1d3>] [<c013a40e>]
Dec 15 15:27:01 oceanic kernel: [<f88c7320>] [<c013a467>] [<c0117599>] [<f88c247f>] [<f88c5e40>] [<f88c7320>]
Dec 15 15:27:01 oceanic kernel: [<f88c9540>] [<f88c9540>] [<f88c4960>] [<f88b919d>] [<f88c7320>] [<f88c4960>]
Dec 15 15:27:01 oceanic kernel: [<c014db12>] [<f88b9452>] [<f88bf229>] [<f88bf287>] [<f88abd6f>] [<f88bf6b1>]
Dec 15 15:27:01 oceanic kernel: [<f88bf645>] [<f88bf656>] [<f88bf757>] [<f88bf819>] [<f88ac466>] [<f88bc0f3>]
Dec 15 15:27:01 oceanic kernel: [<c01fefb2>] [<f88be345>] [<f88aff50>] [<f88abd0e>] [<f88b1d40>] [<f88be42c>]
Dec 15 15:27:01 oceanic kernel: [<c01c35d2>] [<f88abd0e>] [<c01b72bc>] [<f88be72f>] [<c011c6ed>] [<c013a04e>]
Dec 15 15:27:01 oceanic kernel: [<c013af18>] [<c013b1b6>] [<f88be5d6>] [<c013af18>] [<f88abd0e>] [<f88bf645>]
Dec 15 15:27:01 oceanic kernel: [<f88bf656>] [<f88be9c5>] [<f88b3ff0>] [<f88b1c27>] [<f88ab285>] [<f88ab39d>]
Dec 15 15:27:01 oceanic kernel: [<f88bc1c8>] [<f88ac705>] [<f88bc35c>] [<f88bc280>] [<f88bc280>] [<c014f178>]
Dec 15 15:27:01 oceanic kernel: [<c014d216>] [<c014551e>] [<c0143090>] [<c01440ba>] [<c01455d9>] [<c01070c3>]
Dec 15 15:27:01 oceanic kernel: Code: 0f 0b f8 00 d0 3f 8b f8 83 c4 14 ff 43 08 eb 40 8b 4c 24 14


>>EIP; f88ab368 <[jbd].text.start+308/9ea9> <=====

>>ebx; f5c4fd40 <___strtok+358ebc80/38546fa0>
>>ecx; c029e0c0 <scsi_device_types+374a0/43060>
>>edx; 00018765 Before first symbol
>>esi; f7b57000 <___strtok+377f2f40/38546fa0>
>>edi; f7530000 <___strtok+371cbf40/38546fa0>
>>ebp; f6295920 <___strtok+35f31860/38546fa0>
>>esp; f753178c <___strtok+371cd6cc/38546fa0>

Trace; f88b2300 <[jbd].text.start+72a0/9ea9>
Trace; f88b3ffb <[jbd].text.start+8f9b/9ea9>
Trace; f88b3fd0 <[jbd].text.start+8f70/9ea9>
Trace; f88b23a0 <[jbd].text.start+7340/9ea9>
Trace; f88bf7de <[ext3].text.start+277e/dc9d>
Trace; c013afe6 <alloc_pages_node+1a6/2480>
Trace; c013bc31 <alloc_pages_node+df1/2480>
Trace; c014db12 <vfs_mkdir+142/350>
Trace; c012c2f7 <vmtruncate+367/9e0>
Trace; c012c4e0 <vmtruncate+550/9e0>
Trace; c013b188 <alloc_pages_node+348/2480>
Trace; c0199cfb <n_tty_ioctl+76b/19e0>
Trace; f88babc2 <[jbd].rodata.end+5bdf/6191>
Trace; c0153aeb <locks_copy_lock+75b/a30>
Trace; c0153dac <locks_copy_lock+a1c/a30>
Trace; c013a1d3 <free_pages+1b53/27c0>
Trace; c013a40e <free_pages+1d8e/27c0>
Trace; f88c7320 <[ext3].text.start+a2c0/dc9d>
Trace; c013a467 <free_pages+1de7/27c0>
Trace; c0117599 <change_page_attr+e9/3c0>
Trace; f88c247f <[ext3].text.start+541f/dc9d>
Trace; f88c5e40 <[ext3].text.start+8de0/dc9d>
Trace; f88c7320 <[ext3].text.start+a2c0/dc9d>
Trace; f88c9540 <[ext3].text.start+c4e0/dc9d>
Trace; f88c9540 <[ext3].text.start+c4e0/dc9d>
Trace; f88c4960 <[ext3].text.start+7900/dc9d>
Trace; f88b919d <[jbd].rodata.end+41ba/6191>
Trace; f88c7320 <[ext3].text.start+a2c0/dc9d>
Trace; f88c4960 <[ext3].text.start+7900/dc9d>
Trace; c014db12 <vfs_mkdir+142/350>
Trace; f88b9452 <[jbd].rodata.end+446f/6191>
Trace; f88bf229 <[ext3].text.start+21c9/dc9d>
Trace; f88bf287 <[ext3].text.start+2227/dc9d>
Trace; f88abd6f <[jbd].text.start+d0f/9ea9>
Trace; f88bf6b1 <[ext3].text.start+2651/dc9d>
Trace; f88bf645 <[ext3].text.start+25e5/dc9d>
Trace; f88bf656 <[ext3].text.start+25f6/dc9d>
Trace; f88bf757 <[ext3].text.start+26f7/dc9d>
Trace; f88bf819 <[ext3].text.start+27b9/dc9d>
Trace; f88ac466 <[jbd].text.start+1406/9ea9>
Trace; f88bc0f3 <[jbd].bss.end+f4c/1eb9>
Trace; c01fefb2 <cdrom_ioctl+652/1b60>
Trace; f88be345 <[ext3].text.start+12e5/dc9d>
Trace; f88aff50 <[jbd].text.start+4ef0/9ea9>
Trace; f88abd0e <[jbd].text.start+cae/9ea9>
Trace; f88b1d40 <[jbd].text.start+6ce0/9ea9>
Trace; f88be42c <[ext3].text.start+13cc/dc9d>
Trace; c01c35d2 <ide_taskfile_ioctl+302/550>
Trace; f88abd0e <[jbd].text.start+cae/9ea9>
Trace; c01b72bc <get_gendisk+62ec/ad20>
Trace; f88be72f <[ext3].text.start+16cf/dc9d>
Trace; c011c6ed <inter_module_put+5fd/900>
Trace; c013a04e <free_pages+19ce/27c0>
Trace; c013af18 <alloc_pages_node+d8/2480>
Trace; c013b1b6 <alloc_pages_node+376/2480>
Trace; f88be5d6 <[ext3].text.start+1576/dc9d>
Trace; c013af18 <alloc_pages_node+d8/2480>
Trace; f88abd0e <[jbd].text.start+cae/9ea9>
Trace; f88bf645 <[ext3].text.start+25e5/dc9d>
Trace; f88bf656 <[ext3].text.start+25f6/dc9d>
Trace; f88be9c5 <[ext3].text.start+1965/dc9d>
Trace; f88b3ff0 <[jbd].text.start+8f90/9ea9>
Trace; f88b1c27 <[jbd].text.start+6bc7/9ea9>
Trace; f88ab285 <[jbd].text.start+225/9ea9>
Trace; f88ab39d <[jbd].text.start+33d/9ea9>
Trace; f88bc1c8 <[jbd].bss.end+1021/1eb9>
Trace; f88ac705 <[jbd].text.start+16a5/9ea9>
Trace; f88bc35c <[jbd].bss.end+11b5/1eb9>
Trace; f88bc280 <[jbd].bss.end+10d9/1eb9>
Trace; f88bc280 <[jbd].bss.end+10d9/1eb9>
Trace; c014f178 <vfs_link+978/da0>
Trace; c014d216 <vfs_create+4e6/8f0>
Trace; c014551e <set_buffer_async_io+5e/330>
Trace; c0143090 <create_empty_buffers+3a0/720>
Trace; c01440ba <block_write_full_page+12a/140>
Trace; c01455d9 <set_buffer_async_io+119/330>
Trace; c01070c3 <__read_lock_failed+dff/183c>

Code; f88ab368 <[jbd].text.start+308/9ea9>
00000000 <_EIP>:
Code; f88ab368 <[jbd].text.start+308/9ea9> <=====
0: 0f 0b ud2a <=====
Code; f88ab36a <[jbd].text.start+30a/9ea9>
2: f8 clc
Code; f88ab36b <[jbd].text.start+30b/9ea9>
3: 00 d0 add %dl,%al
Code; f88ab36d <[jbd].text.start+30d/9ea9>
5: 3f aas
Code; f88ab36e <[jbd].text.start+30e/9ea9>
6: 8b f8 mov %eax,%edi
Code; f88ab370 <[jbd].text.start+310/9ea9>
8: 83 c4 14 add $0x14,%esp
Code; f88ab373 <[jbd].text.start+313/9ea9>
b: ff 43 08 incl 0x8(%ebx)
Code; f88ab376 <[jbd].text.start+316/9ea9>
e: eb 40 jmp 50 <_EIP+0x50>
Code; f88ab378 <[jbd].text.start+318/9ea9>
10: 8b 4c 24 14 mov 0x14(%esp,1),%ecx


Greetings

--
*[ ?ukasz Tr?bi?ski ]*
SysAdmin @wsisiz.edu.pl

2003-01-21 13:48:07

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: 2.4.21-pre3 - problems with ext3 (long)

Hi,

On Tue, 2003-01-21 at 00:25, Lukasz Trabinski wrote:

> system boot 2.4.20:
> Dec 15 15:27:01 oceanic kernel: Assertion failure in journal_start_Rsmp_c2be780a
> () at transaction.c:248: "handle->h_transaction->t_journal == journal

> With earlier kernels 2.4.X (for example 2.4.20-rc2) this machine has much
> longer uptime.

OK, which was the last one which ran stable for you? I note that you've
got a failure marked "2.4.20" in the log.

> Dec 15 15:27:01 oceanic kernel: Kernel panic: EXT3-fs panic (device sd(8,23)): load_block_bitmap: block_group >= groups_count - block_group = 524287, groups_count = 2126

Do you have the backtrace for that? I can't see any way that particular
error can happen unless the kernel's memory is getting corrupt, or
there's a corrupt superblock coming in from the disk.

Also, are you sure you've been ksymoops'ing these from the right
System.map files? The traces really don't make a lot of sense.

Finally,

> By the way, last crash was with messages:
> Jan 19 11:50:20 oceanic kernel: kernel BUG at highmem.c:159!
> Jan 19 11:50:20 oceanic kernel: invalid operand: 0000
> Jan 19 11:50:20 oceanic kernel: CPU: 1

If that happens again, serial console is the best way of getting the
full oops. How much memory does your system have? Have you ever seen
this error before?

Cheers,
Stephen

Subject: Re: 2.4.21-pre3 - problems with ext3 (long)

At 13:56 2003-01-21 +0000, Stephen C. Tweedie wrote:

>If that happens again, serial console is the best way of getting the
>full oops. How much memory does your system have? Have you ever seen
>this error before?

Yes - we have seen this error before.....

System has 2GB RAM.....



>Cheers,
> Stephen

--
Bartlomiej Solarz-Niesluchowski, Administrator WSISiZ
e-mail: [email protected]
01-447 Warszawa, ul. Newelska 6, pokoj 404, pon.-pt. 8-16, tel. 836-92-53
Motto - nie psuj Win'9x one i bez tego sie psuja....
Jak sobie poscielisz tak sie wyspisz

2003-01-21 14:29:11

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: 2.4.21-pre3 - problems with ext3 (long)

Hi,

On Tue, 2003-01-21 at 14:22, Bartlomiej Solarz-Niesluchowski wrote:
> At 13:56 2003-01-21 +0000, Stephen C. Tweedie wrote:
>
> >If that happens again, serial console is the best way of getting the
> >full oops. How much memory does your system have? Have you ever seen
> >this error before?
>
> Yes - we have seen this error before.....

Well, the kmap() bug looks like kunmap() being done twice on a page. If
that's happening, we really do need to find out where, so capturing that
trace via serial console would be a _big_ help, thanks.

Cheers,
Stephen

Subject: Re: 2.4.21-pre3 - problems with ext3 (long)

At 14:38 2003-01-21 +0000, Stephen C. Tweedie wrote:
>Hi,
>
>On Tue, 2003-01-21 at 14:22, Bartlomiej Solarz-Niesluchowski wrote:
> > At 13:56 2003-01-21 +0000, Stephen C. Tweedie wrote:
> >
> > >If that happens again, serial console is the best way of getting the
> > >full oops. How much memory does your system have? Have you ever seen
> > >this error before?
> >
> > Yes - we have seen this error before.....
>
>Well, the kmap() bug looks like kunmap() being done twice on a page. If
>that's happening, we really do need to find out where, so capturing that
>trace via serial console would be a _big_ help, thanks.

OK I make serial console especially for this.... - see you soon (I think
that my system will have crash about tomorrow (est. uptime now is about 3-4
days)).....


--
Bartlomiej Solarz-Niesluchowski, Administrator WSISiZ
e-mail: [email protected]

Subject: Re: 2.4.21-pre3 - problems with ext3 (long)

At 14:38 2003-01-21 +0000, Stephen C. Tweedie wrote:
>Hi,
>
>On Tue, 2003-01-21 at 14:22, Bartlomiej Solarz-Niesluchowski wrote:
> > At 13:56 2003-01-21 +0000, Stephen C. Tweedie wrote:
> >
> > >If that happens again, serial console is the best way of getting the
> > >full oops. How much memory does your system have? Have you ever seen
> > >this error before?
> >
> > Yes - we have seen this error before.....
>
>Well, the kmap() bug looks like kunmap() being done twice on a page. If
>that's happening, we really do need to find out where, so capturing that
>trace via serial console would be a _big_ help, thanks.

OK next OOPS in ext3 (now we have it at uptime 8.5 days and kernel 2.4.21-pre4)

Feb 6 15:10:52 oceanic kernel: Assertion failure in
journal_start_Rsmp_909c88ec
() at transaction.c:249: "handle->h_transaction->t_journal == journal"

ksymoops 2.4.5 on i686 2.4.21-pre4. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.21-pre4/ (default)
-m /lib/modules/2.4.21-pre4/System.map (specified)

Feb 6 15:10:52 oceanic kernel: kernel BUG at transaction.c:249!
Feb 6 15:10:52 oceanic kernel: invalid operand: 0000
Feb 6 15:10:52 oceanic kernel: CPU: 0
Feb 6 15:10:52 oceanic kernel: EIP: 0010:[<f88ab5df>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Feb 6 15:10:52 oceanic kernel: EFLAGS: 00010282
Feb 6 15:10:52 oceanic kernel: eax: 0000007a ebx: ecf8b760 ecx:
00000012 edx: f613ff7c
Feb 6 15:10:52 oceanic kernel: esi: cd456000 edi: f76e0c00 ebp:
00000020 esp: cd457b74
Feb 6 15:10:52 oceanic kernel: ds: 0018 es: 0018 ss: 0018
Feb 6 15:10:52 oceanic kernel: Process smbd (pid: 22109, stackpage=cd457000)
Feb 6 15:10:52 oceanic kernel: Stack: f88b5c00 f88b5083 f88b5014 000000f9
f88b5dc0 ecf8b760 f7725620 f7725620
Feb 6 15:10:52 oceanic kernel: f88c4bc0 f76e0c00 00000002 00000000
f7baee18 00000001 f7725620 f76a4c00
Feb 6 15:10:52 oceanic kernel: c0157f05 f7725620 f7725620 f76a4c00
ffffffff c0131e6a f7725620 00000001
Feb 6 15:10:52 oceanic kernel: Call Trace: [<f88b5c00>] [<f88b5083>]
[<f88b5014>] [<f88b5dc0>] [<f88c4bc0>]
Feb 6 15:10:52 oceanic kernel: [<c0157f05>] [<c0131e6a>] [<c01e3dec>]
[<f88bf0c9>] [<c015e8a1>] [<c015ed8c>]
Feb 6 15:10:52 oceanic kernel: [<c015fe21>] [<c01585de>] [<c01586bc>]
[<c0158952>] [<c0158a04>] [<c0137223>]
Feb 6 15:10:52 oceanic kernel: [<c0137286>] [<c01381d2>] [<c0138464>]
[<c012f896>] [<f88b0018>] [<f88c2c75>]
Feb 6 15:10:52 oceanic kernel: [<f88ab55b>] [<f88ab625>] [<f88c0b44>]
[<f88c3858>] [<c020c57c>] [<c012bf43>]
Feb 6 15:10:52 oceanic kernel: [<c0159b86>] [<f88c473c>] [<c024631e>]
[<c0159ede>] [<c013e1e6>] [<c0148ab3>]
Feb 6 15:10:52 oceanic kernel: [<c013f88d>] [<c013e297>] [<c010770f>]
Feb 6 15:10:52 oceanic kernel: Code: 0f 0b f9 00 14 50 8b f8 ff 43 08 89
d8 8b 5c 24 14 8b 74 24


>>EIP; f88ab5df <[jbd]journal_start+5f/c0> <=====

>>ebx; ecf8b760 <_end+2cc275bc/38546ebc>
>>edx; f613ff7c <_end+35ddbdd8/38546ebc>
>>esi; cd456000 <_end+d0f1e5c/38546ebc>
>>edi; f76e0c00 <_end+3737ca5c/38546ebc>
>>esp; cd457b74 <_end+d0f39d0/38546ebc>

Trace; f88b5c00 <[jbd]__kstrtab_journal_enable_debug+651/5be5>
Trace; f88b5083 <[jbd]__ksymtab_journal_enable_debug+1f/28>
Trace; f88b5014 <[jbd]__ksymtab_journal_ack_err+0/8>
Trace; f88b5dc0 <[jbd]__kstrtab_journal_enable_debug+811/5be5>
Trace; f88c4bc0 <[ext3]ext3_dirty_inode+160/180>
Trace; c0157f05 <__mark_inode_dirty+b5/c0>
Trace; c0131e6a <generic_file_write+2ba/800>
Trace; c01e3dec <ahc_linux_run_device_queue+3fc/8c0>
Trace; f88bf0c9 <[ext3]ext3_file_write+39/d0>
Trace; c015e8a1 <write_dquot+b1/100>
Trace; c015ed8c <dqput+4c/f0>
Trace; c015fe21 <dquot_drop+51/60>
Trace; c01585de <clear_inode+8e/130>
Trace; c01586bc <dispose_list+3c/80>
Trace; c0158952 <prune_icache+82/110>
Trace; c0158a04 <shrink_icache_memory+24/40>
Trace; c0137223 <shrink_caches+83/b0>
Trace; c0137286 <try_to_free_pages_zone+36/50>
Trace; c01381d2 <balance_classzone+62/1f0>
Trace; c0138464 <__alloc_pages+104/1a0>
Trace; c012f896 <find_or_create_page+86/110>
Trace; f88b0018 <[jbd]journal_recover+168/1d0>
Trace; f88c2c75 <[ext3]ext3_block_truncate_page+85/490>
Trace; f88ab55b <[jbd]new_handle+4b/70>
Trace; f88ab625 <[jbd]journal_start+a5/c0>
Trace; f88c0b44 <[ext3]start_transaction+94/a0>
Trace; f88c3858 <[ext3]ext3_truncate+d8/480>
Trace; c020c57c <skb_copy_datagram_iovec+4c/280>
Trace; c012bf43 <vmtruncate+d3/1b0>
Trace; c0159b86 <inode_setattr+106/120>
Trace; f88c473c <[ext3]ext3_setattr+25c/320>
Trace; c024631e <inet_recvmsg+4e/70>
Trace; c0159ede <notify_change+2ce/2e0>
Trace; c013e1e6 <do_truncate+66/90>
Trace; c0148ab3 <cp_new_stat64+f3/120>
Trace; c013f88d <do_sys_ftruncate+10d/16c>
Trace; c013e297 <sys_ftruncate64+27/30>
Trace; c010770f <system_call+33/38>

Code; f88ab5df <[jbd]journal_start+5f/c0>
00000000 <_EIP>:
Code; f88ab5df <[jbd]journal_start+5f/c0> <=====
0: 0f 0b ud2a <=====
Code; f88ab5e1 <[jbd]journal_start+61/c0>
2: f9 stc
Code; f88ab5e2 <[jbd]journal_start+62/c0>
3: 00 14 50 add %dl,(%eax,%edx,2)
Code; f88ab5e5 <[jbd]journal_start+65/c0>
6: 8b f8 mov %eax,%edi
Code; f88ab5e7 <[jbd]journal_start+67/c0>
8: ff 43 08 incl 0x8(%ebx)
Code; f88ab5ea <[jbd]journal_start+6a/c0>
b: 89 d8 mov %ebx,%eax
Code; f88ab5ec <[jbd]journal_start+6c/c0>
d: 8b 5c 24 14 mov 0x14(%esp,1),%ebx
Code; f88ab5f0 <[jbd]journal_start+70/c0>
11: 8b 74 24 00 mov 0x0(%esp,1),%esi


Best Regards



--
Bartlomiej Solarz-Niesluchowski, Administrator WSISiZ
e-mail: [email protected]