I can reliably crash 2.5.X on one of our newsservers (dual PIII/450, GigE,
lots of disk- and network I/O).
The last crash I posted here was a bit garbled. So I tried again,
this one is clean. Crash appears to be timer related ?
kernel BUG at kernel/timer.c:333!
invalid operand: 0000
CPU: 1
EIP: 0060:[<c01227a9>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010083
eax: c1a16140 ebx: c1a1638c ecx: f70b65e4 edx: c1a16140
esi: c1a15aa0 edi: f70b65e4 ebp: c1a162b0 esp: c1b25db4
ds: 0068 es: 0068 ss: 0068
Stack: c1a16ac0 c1a15aa0 c1b24000 00000001 c0122ca9 c1a15aa0 c1a162b0 c1a16ac0
00000000 c1b24000 00000001 00000001 c011f7d5 00000000 00000001 c03d9660
fffffffe 00000020 c042f424 c042f424 c011f4da c03d9660 00000004 00000001
[<c0122ca9>] run_timer_tasklet+0x5d/0x15c
[<c011f7d5>] tasklet_hi_action+0x85/0xe0
[<c011f4da>] do_softirq+0x5a/0xac
[<c0112adf>] smp_apic_timer_interrupt+0x113/0x124
[<c0109476>] apic_timer_interrupt+0x1a/0x20
[<c0135fd5>] shrink_cache+0x185/0x350
[<c0136850>] shrink_zone+0x80/0x88
[<c0136a5e>] balance_pgdat+0x9e/0xfc
[<c0136bc4>] kswapd+0x108/0x112
[<c0136abc>] kswapd+0x0/0x112
[<c0118058>] autoremove_wake_function+0x0/0x40
[<c0106e8d>] kernel_thread_helper+0x5/0xc
Code: 0f 0b 4d 01 ed d2 29 c0 8b 39 8b 41 08 89 c2 2b 56 04 81 fa
>>EIP; c01227a9 <cascade+25/f8> <=====
>>eax; c1a16140 <END_OF_CODE+158eb98/????>
>>ebx; c1a1638c <END_OF_CODE+158ede4/????>
>>ecx; f70b65e4 <END_OF_CODE+36c2f03c/????>
>>edx; c1a16140 <END_OF_CODE+158eb98/????>
>>esi; c1a15aa0 <END_OF_CODE+158e4f8/????>
>>edi; f70b65e4 <END_OF_CODE+36c2f03c/????>
>>ebp; c1a162b0 <END_OF_CODE+158ed08/????>
>>esp; c1b25db4 <END_OF_CODE+169e80c/????>
Code; c01227a9 <cascade+25/f8>
00000000 <_EIP>:
Code; c01227a9 <cascade+25/f8> <=====
0: 0f 0b ud2a <=====
Code; c01227ab <cascade+27/f8>
2: 4d dec %ebp
Code; c01227ac <cascade+28/f8>
3: 01 ed add %ebp,%ebp
Code; c01227ae <cascade+2a/f8>
5: d2 29 shrb %cl,(%ecx)
Code; c01227b0 <cascade+2c/f8>
7: c0 8b 39 8b 41 08 89 rorb $0x89,0x8418b39(%ebx)
Code; c01227b7 <cascade+33/f8>
e: c2 2b 56 ret $0x562b
Code; c01227ba <cascade+36/f8>
11: 04 81 add $0x81,%al
Code; c01227bc <cascade+38/f8>
13: fa cli
Perhaps it has something to do with the following debug message
I see during boot:
... bootmessages ...
slab: reap timer started for cpu 0
slab: reap timer started for cpu 1
Starting kswapd
highmem bounce pool size: 64 pages
aio_setup: sizeof(struct page) = 40
[f7f8e040] eventpoll: driver installed.
Debug: sleeping function called from illegal context at include/asm/semaphore.h:119
Call Trace:
[<c0117be8>] __might_sleep+0x54/0x58
[<c013578c>] set_shrinker+0x3c/0x7c
[<c01686b4>] mb_cache_create+0x1c4/0x244
[<c0168380>] mb_cache_shrink_fn+0x0/0x170
[<c01050ab>] init+0x47/0x1ac
[<c0105064>] init+0x0/0x1ac
[<c0106e8d>] kernel_thread_helper+0x5/0xc
Journalled Block Device driver loaded
Capability LSM initialized
Serial: 8250/16550 driver $Revision: 1.90 $ IRQ sharing disabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
... continuess booting ...
Mike.
Miquel van Smoorenburg wrote:
>
> I can reliably crash 2.5.X on one of our newsservers (dual PIII/450, GigE,
> lots of disk- and network I/O).
>
> The last crash I posted here was a bit garbled. So I tried again,
> this one is clean. Crash appears to be timer related ?
>
> kernel BUG at kernel/timer.c:333!
There are timer fixes in Linus's current tree. The problem which
they address could cause this BUG.
>..
> Debug: sleeping function called from illegal context at include/asm/semaphore.h:119
> Call Trace:
> [<c0117be8>] __might_sleep+0x54/0x58
> [<c013578c>] set_shrinker+0x3c/0x7c
> [<c01686b4>] mb_cache_create+0x1c4/0x244
> [<c0168380>] mb_cache_shrink_fn+0x0/0x170
> [<c01050ab>] init+0x47/0x1ac
> [<c0105064>] init+0x0/0x1ac
> [<c0106e8d>] kernel_thread_helper+0x5/0xc
That's different. A fix for this is in Linus's tree.
So.. Please grab an update from
ftp://ftp.kernel.org/pub/linux/kernel/v2.5/snapshots/
or retest 2.5.47.
According to Andrew Morton:
> Miquel van Smoorenburg wrote:
> > I can reliably crash 2.5.X on one of our newsservers (dual PIII/450, GigE,
> > lots of disk- and network I/O).
> >
> > kernel BUG at kernel/timer.c:333!
>
> There are timer fixes in Linus's current tree. The problem which
> they address could cause this BUG.
I've booted 2.5.46bk5 on the machine, and it has been running for over
2 hours with extra heavy diskio. That reliably crashed the machine
in about 45 minutes with 2.4.45 and 2.5.46, machine is still up now.
> > Debug: sleeping function called from illegal context at include/asm/semaphore.h:119
> > Call Trace:
> > [<c0117be8>] __might_sleep+0x54/0x58
> > [<c013578c>] set_shrinker+0x3c/0x7c
> > [<c01686b4>] mb_cache_create+0x1c4/0x244
> > [<c0168380>] mb_cache_shrink_fn+0x0/0x170
> > [<c01050ab>] init+0x47/0x1ac
> > [<c0105064>] init+0x0/0x1ac
> > [<c0106e8d>] kernel_thread_helper+0x5/0xc
>
> That's different. A fix for this is in Linus's tree.
Indeed, I don't see that one anymore in -bk5
I'm still seeing the buffer layer error at fs/buffer.c:1623,
though. Happens when a blockdev is close()d. Is a fix for
this in -mm2? Does -mm2 include -bk5 ? If so I'll put that
on it and keep an eye on it tomorrow, see what happens.
Debug messages I'm still seeing:
(note that I compiled IPv6 into the kernel since we're slowly moving
our network to IPv6 but that it is otherwise unused right now, and
that the previous kernels that crashed on me didn't have IPv6 in it)
Uninitialised timer!
This is just a warning. Your computer is OK
function=0xc0285748, data=0xf78a6680
Call Trace:
[<c0122610>] check_timer_failed+0x40/0x54
[<c0285748>] igmp6_timer_handler+0x0/0x58
[<c0122b12>] del_timer+0x16/0x84
[<c02855d8>] igmp6_join_group+0x94/0x124
[<c028464c>] igmp6_group_added+0xcc/0xd8
[<c02885ab>] tcp_v6_err+0x3a7/0x63c
[<c0284a46>] ipv6_dev_mc_inc+0x31e/0x330
[<c027443e>] addrconf_join_solict+0x3a/0x44
[<c0275a3f>] addrconf_dad_start+0x13/0x15c
[<c0275354>] addrconf_add_linklocal+0x28/0x44
[<c0275407>] addrconf_dev_config+0x97/0xa4
[<c02754ee>] addrconf_notify+0x52/0xc0
[<c0126037>] notifier_call_chain+0x1f/0x38
[<c02259f6>] dev_open+0xa6/0xb0
[<c0226ca1>] dev_change_flags+0x51/0x104
[<c025bad0>] devinet_ioctl+0x2bc/0x598
[<c025e2e7>] inet_ioctl+0x77/0xb4
[<c021f41b>] sock_ioctl+0x267/0x298
[<c0153c09>] sys_ioctl+0x22d/0x27c
[<c01095b1>] error_code+0x2d/0x38
[<c0108b6f>] syscall_call+0x7/0xb
buffer layer error at fs/buffer.c:1623
Pass this trace through ksymoops for reporting
Call Trace:
[<c0145753>] __buffer_error+0x33/0x38
[<c0146e6f>] __block_write_full_page+0x7f/0x3bc
[<c0147ef9>] block_write_full_page+0x2d/0x9c
[<c014a414>] blkdev_get_block+0x0/0x48
[<c014a513>] blkdev_writepage+0xf/0x14
[<c014a414>] blkdev_get_block+0x0/0x48
[<c0161857>] mpage_writepages+0x21b/0x398
[<c014a504>] blkdev_writepage+0x0/0x14
[<c0137cfb>] __pagevec_free+0x1f/0x28
[<c0135895>] release_pages+0x171/0x17c
[<c014b5ad>] generic_writepages+0x11/0x15
[<c013e068>] do_writepages+0x18/0x30
[<c012f035>] filemap_fdatawrite+0x51/0x68
[<c014596f>] sync_blockdev+0x1b/0x3c
[<c014b3bf>] blkdev_put+0x7b/0x19c
[<c014b4f2>] blkdev_close+0x12/0x18
[<c0145410>] __fput+0x30/0xf8
[<c01453dc>] fput+0x14/0x18
[<c01437cc>] filp_close+0xb0/0xbc
[<c014384e>] sys_close+0x76/0xa4
[<c0108b6f>] syscall_call+0x7/0xb
Thanks,
Mike.
Miquel van Smoorenburg wrote:
>
> According to Andrew Morton:
> > Miquel van Smoorenburg wrote:
> > > I can reliably crash 2.5.X on one of our newsservers (dual PIII/450, GigE,
> > > lots of disk- and network I/O).
> > >
> > > kernel BUG at kernel/timer.c:333!
> >
> > There are timer fixes in Linus's current tree. The problem which
> > they address could cause this BUG.
>
> I've booted 2.5.46bk5 on the machine, and it has been running for over
> 2 hours with extra heavy diskio. That reliably crashed the machine
> in about 45 minutes with 2.4.45 and 2.5.46, machine is still up now.
OK, thanks.
>
> I'm still seeing the buffer layer error at fs/buffer.c:1623,
> though. Happens when a blockdev is close()d. Is a fix for
> this in -mm2? Does -mm2 include -bk5 ? If so I'll put that
> on it and keep an eye on it tomorrow, see what happens.
This is a blockdev which was under mmap(), yes? No, I haven't looked at
that yet. It'll be a matter of just killing the warning.
mmapping a blockdev is a pretty dopey thing to do, btw. It doesn't
allow the use of highmem, the IO uses tiny BIOs (in fact I think
it uses 512-byte or 1k blocksize too) and there are buffer_heads
all over the place. You'll get better results from mmapping a
regular file.
> Debug messages I'm still seeing:
> (note that I compiled IPv6 into the kernel since we're slowly moving
> our network to IPv6 but that it is otherwise unused right now, and
> that the previous kernels that crashed on me didn't have IPv6 in it)
>
> Uninitialised timer!
> This is just a warning. Your computer is OK
> function=0xc0285748, data=0xf78a6680
> Call Trace:
> [<c0122610>] check_timer_failed+0x40/0x54
> [<c0285748>] igmp6_timer_handler+0x0/0x58
> [<c0122b12>] del_timer+0x16/0x84
> [<c02855d8>] igmp6_join_group+0x94/0x124
I missed one there.
--- 25/net/ipv6/mcast.c~ip6-mcast-timer Sun Nov 10 09:12:28 2002
+++ 25-akpm/net/ipv6/mcast.c Sun Nov 10 09:12:44 2002
@@ -296,6 +296,7 @@ int ipv6_dev_mc_inc(struct net_device *d
}
memset(mc, 0, sizeof(struct ifmcaddr6));
+ init_timer(&mc->mca_timer);
mc->mca_timer.function = igmp6_timer_handler;
mc->mca_timer.data = (unsigned long) mc;
_
According to Andrew Morton:
> Miquel van Smoorenburg wrote:
> >
> > I've booted 2.5.46bk5 on the machine, and it has been running for over
> > 2 hours with extra heavy diskio. That reliably crashed the machine
> > in about 45 minutes with 2.4.45 and 2.5.46, machine is still up now.
>
> OK, thanks.
It survived the night and is still up. Looks like it runs slightly
faster than 2.4.20-X.
> This is a blockdev which was under mmap(), yes? No, I haven't looked at
> that yet. It'll be a matter of just killing the warning.
OK.
> mmapping a blockdev is a pretty dopey thing to do, btw. It doesn't
> allow the use of highmem, the IO uses tiny BIOs (in fact I think
> it uses 512-byte or 1k blocksize too) and there are buffer_heads
> all over the place. You'll get better results from mmapping a
> regular file.
It's just that the news server uses its own 'filesystem'. It does
normal read/write i/o on it, but the allocation bitmap at the
beginning of the 'file' is mmap()ed. Using a regular file means
creating a 160 GB file, the triple indirect blocks will probably
kill performance.
I guess that means I have to resurrect rawfs, then (a filesystem
I wrote for 2.2 that shows partitions as fixed-size files). But
that seems so .. unnecessary.
Mike.
Miquel van Smoorenburg wrote:
>
> According to Andrew Morton:
> > Miquel van Smoorenburg wrote:
> > >
> > > I've booted 2.5.46bk5 on the machine, and it has been running for over
> > > 2 hours with extra heavy diskio. That reliably crashed the machine
> > > in about 45 minutes with 2.4.45 and 2.5.46, machine is still up now.
> >
> > OK, thanks.
>
> It survived the night and is still up. Looks like it runs slightly
> faster than 2.4.20-X.
Good, thanks.
> ..
> > mmapping a blockdev is a pretty dopey thing to do, btw. It doesn't
> > allow the use of highmem, the IO uses tiny BIOs (in fact I think
> > it uses 512-byte or 1k blocksize too) and there are buffer_heads
> > all over the place. You'll get better results from mmapping a
> > regular file.
>
> It's just that the news server uses its own 'filesystem'. It does
> normal read/write i/o on it, but the allocation bitmap at the
> beginning of the 'file' is mmap()ed. Using a regular file means
> creating a 160 GB file, the triple indirect blocks will probably
> kill performance.
hm, OK. mmapping just a little bit in that manner isn't so bad I
guess. But the general read() and write() implementation for blockdevs
is not as efficient as that for, say, ext2 files.
Probably I could turn on the direct-to-bio stuff for blockdevs if there is
no filesystem mounted. But with the regular open of /dev/hda1 there's no
way to prevent that.
The really scary problem is if some smarty tries to modify a blockdev
with MAP_SHARED while there's a live filesystem mounted. Even if they
modify "safe" parts of the filesystem, this will wreck the fs if it has
blocksize < pagesize. I'm not aware of a robust way of preventing this.
> I guess that means I have to resurrect rawfs, then (a filesystem
> I wrote for 2.2 that shows partitions as fixed-size files). But
> that seems so .. unnecessary.
>
What you're doing now should be OK.