While running "dbench 32" on 2.5.2-pre1:
I noticed the test was taking much longer than usual,
and I could not do a new "login".
vmstat 8 looked like this:
r b w swpd free buff cache si so bi bo in cs us sy id
0 34 1 0 222504 12248 736088 0 0 0 0 103 59 0 0 100
1 34 1 0 222504 12248 736088 0 0 0 0 100 56 0 0 100
0 34 1 0 222504 12248 736088 0 0 0 0 103 59 0 0 100
<sysrq Sync Umount> did not print their "done" messages.
The "b" and "w" columns when up though:
r b w swpd free buff cache si so bi bo in cs us sy id
0 37 3 0 222456 12280 736092 0 0 0 0 222 269 0 0 100
There was no Oops.
2.5.1-dj3 completed dbench normally.
Configs between the 2 kernels:
diff 2.5.2-pre1 2.5.1-dj3
> CONFIG_IP_NF_QUEUE=m
2.5.1-pre1[01] and 2.5.1-final did not exhibit this behavior.
Hardware:
1333 Athlon
1GB RAM
CONFIG_HIGHMEM4G=y
CONFIG_HIGHMEM=y
--
Randy Hron
On Fri, Dec 21 2001, [email protected] wrote:
> While running "dbench 32" on 2.5.2-pre1:
>
> I noticed the test was taking much longer than usual,
> and I could not do a new "login".
>
> vmstat 8 looked like this:
You neglected to mention what disk I/O system you are using? IDE or
SCSI, and if the latter what host adapter?
--
Jens Axboe
On Fri, Dec 21, 2001 at 03:46:54PM +0100, Jens Axboe wrote:
> You neglected to mention what disk I/O system you are using? IDE or
> SCSI, and if the latter what host adapter?
>
> --
> Jens Axboe
Sorry about that. It's an IDE drive.
00:00.0 Host bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133] (rev 03)
00:01.0 PCI bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133 AGP]
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40)
00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40)
00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139 (rev 10)
00:0f.0 Multimedia audio controller: C-Media Electronics Inc CM8738 (rev 10)
01:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G400 AGP (rev 04)
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_IDEDISK_MULTI_MODE=y
CONFIG_BLK_DEV_IDECD=m
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_IDEDMA_AUTO=y
CONFIG_BLK_DEV_IDE_MODES=y
--
Randy Hron
On Fri, Dec 21 2001, [email protected] wrote:
> On Fri, Dec 21, 2001 at 03:46:54PM +0100, Jens Axboe wrote:
> > You neglected to mention what disk I/O system you are using? IDE or
> > SCSI, and if the latter what host adapter?
> >
> > --
> > Jens Axboe
>
> Sorry about that. It's an IDE drive.
>
> 00:00.0 Host bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133] (rev 03)
> 00:01.0 PCI bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133 AGP]
> 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40)
> 00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
> 00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40)
> 00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139 (rev 10)
> 00:0f.0 Multimedia audio controller: C-Media Electronics Inc CM8738 (rev 10)
> 01:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G400 AGP (rev 04)
>
> CONFIG_IDE=y
> CONFIG_BLK_DEV_IDE=y
> CONFIG_BLK_DEV_IDEDISK=y
> CONFIG_IDEDISK_MULTI_MODE=y
> CONFIG_BLK_DEV_IDECD=m
> CONFIG_BLK_DEV_IDEPCI=y
> CONFIG_BLK_DEV_IDEDMA_PCI=y
> CONFIG_IDEDMA_PCI_AUTO=y
> CONFIG_BLK_DEV_IDEDMA=y
> CONFIG_IDEDMA_AUTO=y
> CONFIG_BLK_DEV_IDE_MODES=y
Thanks -- could you also try and do sysrq-t back traces when it seems
stuck?
Does a non-highmem kernel run ok?
--
Jens Axboe
On Fri, Dec 21, 2001 at 06:01:56PM +0100, Jens Axboe wrote:
> Thanks -- could you also try and do sysrq-t back traces when it seems
> stuck?
>
> Does a non-highmem kernel run ok?
>
> --
> Jens Axboe
I recompiled with highmem turned off.
# CONFIG_HIGHMEM4G is not set
# CONFIG_HIGHMEM64G is not set
I run a scripty that executes dbench 32, then dbench 128.
dbench 32 completed this time.
dbench 128 hung similar to dbench 32 in the previous message.
I don't have the vmstat output captured, but "b" was 128,
bi and bo were 0, and idle was 100.
I couldn't save a stack trace because /bin/ed would not open a file.
I.E: ed output - no prompt about file does not exist. "w" would
not save, etc. The vmstat "b" column went up by 2 after I started
ed and tried another console login.
--
Before running dbench, I normally create a small loopback reiserfs
filesystem. This worked okay the first time I did it (with highmem).
After recompiling without highmem, I ran my "build_rootfs" script
to create a small uml root fs, and got an Oops. The same script
was fine on 2.5.1-pre[5-9] and 2.5.1-pre1[01]. (you fixed
something like this in the patches between 2.5.1-pre3 and pre4.)
I rebooted after each Oops, so the dbench's above were run
after a fresh boot.
invalid operand: 0000
CPU: 0
EIP: 0010:[<c012fbf0>] Not tainted
EFLAGS: 00010287
eax: 00000070 ebx: 00000700 ecx: c02a45dc edx: 00038001
esi: 00000000 edi: 00000000 ebp: f4a5a000 esp: f4a8fe38
ds: 0018 es: 0018 ss: 0018
Process mkreiserfs (pid: 135, stackpage=f4a8f000)
Stack: 00000700 00000000 00000000 f4a5a000 c023896c 00000246 f7ef1740 00000000
00000000 fac4a887 00038001 00000070 f4a8fe98 00000700 00000000 c02a45dc
f7ef1740 00000000 00000001 00000030 00000000 00000000 c018a4a0 c02a45dc
Call Trace: [<fac4a887>] [<c018a4a0>] [<c018a54c>] [<c018a5f6>] [<c01340f0>]
[<c012c923>] [<c0136aff>] [<c0136a60>] [<c0126ab5>] [<c0126ee5>] [<c0126e00>]
[<c0131ae6>] [<c01086eb>]
Code: 0f 0b 8b 35 04 59 29 c0 c7 44 24 18 70 00 00 00 89 74 24 14
>>EIP; c012fbf0 <create_bounce+40/250> <=====
Trace; fac4a886 <END_OF_CODE+207b8/????>
Trace; c018a4a0 <generic_make_request+170/190>
Trace; c018a54c <submit_bio+4c/60>
Trace; c018a5f6 <submit_bh+96/a0>
Trace; c01340f0 <block_read_full_page+1a0/1c0>
Trace; c012c922 <__alloc_pages+32/170>
Trace; c0136afe <blkdev_readpage+e/20>
Trace; c0136a60 <blkdev_get_block+0/40>
Trace; c0126ab4 <do_generic_file_read+274/3f0>
Trace; c0126ee4 <generic_file_read+84/140>
Trace; c0126e00 <file_read_actor+0/60>
Trace; c0131ae6 <sys_read+96/d0>
Trace; c01086ea <system_call+32/38>
Code; c012fbf0 <create_bounce+40/250>
00000000 <_EIP>:
Code; c012fbf0 <create_bounce+40/250> <=====
0: 0f 0b ud2a <=====
Code; c012fbf2 <create_bounce+42/250>
2: 8b 35 04 59 29 c0 mov 0xc0295904,%esi
Code; c012fbf8 <create_bounce+48/250>
8: c7 44 24 18 70 00 00 movl $0x70,0x18(%esp,1)
Code; c012fbfe <create_bounce+4e/250>
f: 00
Code; c012fc00 <create_bounce+50/250>
10: 89 74 24 14 mov %esi,0x14(%esp,1)
I rebooted, and tried to create the loopback reiserfs again and
got:
invalid operand: 0000
CPU: 0
EIP: 0010:[<c012fbf0>] Not tainted
EFLAGS: 00010287
eax: 00000070 ebx: 00000700 ecx: c02a45dc edx: 00038001
esi: 00000000 edi: 00000000 ebp: f4d0e000 esp: f4c31e38
ds: 0018 es: 0018 ss: 0018
Process mkreiserfs (pid: 118, stackpage=f4c31000)
Stack: 00000700 00000000 00000000 f4d0e000 f4c4c2c0 00000246 f7ef1900 00000000
00000000 fac28887 00038001 00000070 f4c31e98 00000700 00000000 c02a45dc
f7ef1900 00000000 00000001 00000030 00000000 00000000 c018a4a0 c02a45dc
Call Trace: [<fac28887>] [<c018a4a0>] [<c018a54c>] [<c018a5f6>] [<c01340f0>]
[<c012c923>] [<c0136aff>] [<c0136a60>] [<c0126ab5>] [<c0126ee5>] [<c0126e00>]
[<c0131ae6>] [<c01086eb>]
Code: 0f 0b 8b 35 04 59 29 c0 c7 44 24 18 70 00 00 00 89 74 24 14
>>EIP; c012fbf0 <create_bounce+40/250> <=====
Trace; fac28886 <[loop]loop_make_request+96/200>
Trace; c018a4a0 <generic_make_request+170/190>
Trace; c018a54c <submit_bio+4c/60>
Trace; c018a5f6 <submit_bh+96/a0>
Trace; c01340f0 <block_read_full_page+1a0/1c0>
Trace; c012c922 <__alloc_pages+32/170>
Trace; c0136afe <blkdev_readpage+e/20>
Trace; c0136a60 <blkdev_get_block+0/40>
Trace; c0126ab4 <do_generic_file_read+274/3f0>
Trace; c0126ee4 <generic_file_read+84/140>
Trace; c0126e00 <file_read_actor+0/60>
Trace; c0131ae6 <sys_read+96/d0>
Trace; c01086ea <system_call+32/38>
Code; c012fbf0 <create_bounce+40/250>
00000000 <_EIP>:
Code; c012fbf0 <create_bounce+40/250> <=====
0: 0f 0b ud2a <=====
Code; c012fbf2 <create_bounce+42/250>
2: 8b 35 04 59 29 c0 mov 0xc0295904,%esi
Code; c012fbf8 <create_bounce+48/250>
8: c7 44 24 18 70 00 00 movl $0x70,0x18(%esp,1)
Code; c012fbfe <create_bounce+4e/250>
f: 00
Code; c012fc00 <create_bounce+50/250>
10: 89 74 24 14 mov %esi,0x14(%esp,1)
--
Randy Hron
On Fri, Dec 21 2001, [email protected] wrote:
> On Fri, Dec 21, 2001 at 06:01:56PM +0100, Jens Axboe wrote:
> > Thanks -- could you also try and do sysrq-t back traces when it seems
> > stuck?
> >
> > Does a non-highmem kernel run ok?
> >
> > --
> > Jens Axboe
>
> I recompiled with highmem turned off.
> # CONFIG_HIGHMEM4G is not set
> # CONFIG_HIGHMEM64G is not set
>
> I run a scripty that executes dbench 32, then dbench 128.
Ok, please try something for me. In drivers/block/elevator.c, comment
out this block:
if (q->last_merge) {
__rq = list_entry_rq(q->last_merge);
BUG_ON(__rq->flags & REQ_STARTED);
if ((ret = elv_try_merge(__rq, bio))) {
*req = __rq;
return ret;
}
}
(just #if 0 the entire thing) -- the one inside elevator_linus_merge()
Loop back highmem issue is different, I'll take a look at that later.
I'll be pretty unresponsive over christmas, though.
--
Jens Axboe
> Ok, please try something for me. In drivers/block/elevator.c, comment
> out this block:
After commenting the block of code, make clean, etc, I rebooted and ran
the dbench 32, 128 scripty. It completed dbench 32 again, but dbench
128 hung again. I could quit some tools. df, ps, wouldn't return
and didn't listen to <ctrl c>.
> Loop back highmem issue is different, I'll take a look at that later.
> I'll be pretty unresponsive over christmas, though.
>
> Jens Axboe
Enjoy the holidays!
--
Randy Hron
On Fri, Dec 21 2001, [email protected] wrote:
> > Ok, please try something for me. In drivers/block/elevator.c, comment
> > out this block:
>
> After commenting the block of code, make clean, etc, I rebooted and ran
> the dbench 32, 128 scripty. It completed dbench 32 again, but dbench
> 128 hung again. I could quit some tools. df, ps, wouldn't return
> and didn't listen to <ctrl c>.
What IDE controller are you using? The two other reports so far have
been with VIA, maybe that's a clue.
Anyways, could you please reproduce with this applied?
--
Jens Axboe
On Mon, Dec 24, 2001 at 03:03:37PM +0100, Jens Axboe wrote:
> On Fri, Dec 21 2001, [email protected] wrote:
> What IDE controller are you using? The two other reports so far have
> been with VIA, maybe that's a clue.
I do have one of the perhaps buggier VIA chipsets.
00:00.0 Host bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133] (rev 03)
00:01.0 PCI bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133 AGP]
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40)
00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40)
00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139 (rev 10)
00:0f.0 Multimedia audio controller: C-Media Electronics Inc CM8738 (rev 10)
01:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G400 AGP (rev 04)
00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06) (prog-if 8a [Master SecP PriP])
Subsystem: VIA Technologies, Inc. Bus Master IDE
Flags: bus master, medium devsel, latency 32
I/O ports at d000 [size=16]
Capabilities: <available only to root>
It's been reliable for a long time, but it wouldn't compile an Athlon
optimized kernel until 2.4.1x. (Kernel would Oops at boot time unless
compiled with CONFIG_M586=y)
It was reliable when not optimized for Athlon.
> Anyways, could you please reproduce with this applied?
>
> --
> Jens Axboe
With the patch, it still hangs on this system. I recompiled with
CONFIG_NOHIGHMEM=y and CONFIG_M586=y, but that ended up with all processes
in "b" state during dbench 32 too.
I tried unpatched 2.5.2-pre1 on a k6-2. dbench 32 hung similarly with
32 in "b", bo and bi = 0, and id = 100. That machine is ill now and can't
find "init" when booting, boot single, or boot init=/bin/bash.
--
Randy Hron
On Mon, Dec 24 2001, [email protected] wrote:
> On Mon, Dec 24, 2001 at 03:03:37PM +0100, Jens Axboe wrote:
> > On Fri, Dec 21 2001, [email protected] wrote:
> > What IDE controller are you using? The two other reports so far have
> > been with VIA, maybe that's a clue.
>
> I do have one of the perhaps buggier VIA chipsets.
>
> 00:00.0 Host bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133] (rev 03)
> 00:01.0 PCI bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133 AGP]
> 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40)
> 00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
> 00:07.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40)
> 00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139 (rev 10)
> 00:0f.0 Multimedia audio controller: C-Media Electronics Inc CM8738 (rev 10)
> 01:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G400 AGP (rev 04)
>
> 00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06) (prog-if 8a [Master SecP PriP])
> Subsystem: VIA Technologies, Inc. Bus Master IDE
> Flags: bus master, medium devsel, latency 32
> I/O ports at d000 [size=16]
> Capabilities: <available only to root>
>
> It's been reliable for a long time, but it wouldn't compile an Athlon
> optimized kernel until 2.4.1x. (Kernel would Oops at boot time unless
> compiled with CONFIG_M586=y)
Ok noted
> > Anyways, could you please reproduce with this applied?
> >
> > --
> > Jens Axboe
>
> With the patch, it still hangs on this system. I recompiled with
> CONFIG_NOHIGHMEM=y and CONFIG_M586=y, but that ended up with all processes
> in "b" state during dbench 32 too.
I would suspect that, do you get any kernel messages?
> I tried unpatched 2.5.2-pre1 on a k6-2. dbench 32 hung similarly with
> 32 in "b", bo and bi = 0, and id = 100. That machine is ill now and can't
> find "init" when booting, boot single, or boot init=/bin/bash.
Please send ps -eo cmd,wchan info for a hung machine.
--
Jens Axboe
On Mon, Dec 24, 2001 at 06:02:44PM +0100, Jens Axboe wrote:
>
> I would suspect that, do you get any kernel messages?
When the machine gets in this state, it won't save any files,
so kern.log doesn't have anything after the initial boot message.
> Please send ps -eo cmd,wchan info for a hung machine.
>
> --
> Jens Axboe
Strangely (to me anyway), when dbench 32 hangs the machine,
ps will not print anything. vmstat will continue it's 8
second cycle though.
--
Randy Hron
On Mon, Dec 24, 2001 at 06:02:44PM +0100, Jens Axboe wrote:
> > I tried unpatched 2.5.2-pre1 on a k6-2. dbench 32 hung similarly with
> > 32 in "b", bo and bi = 0, and id = 100. That machine is ill now and can't
> > find "init" when booting, boot single, or boot init=/bin/bash.
>
> Please send ps -eo cmd,wchan info for a hung machine.
>
> --
> Jens Axboe
>
I rebuilt the reiserfs that dbench writes to.
Here is ps -eo cmd,wchan on the k6-2 running 2.5.2-pre2:
CMD WCHAN
init do_select
[keventd] context_thread
[ksoftirqd_CPU0] ksoftirqd
[kswapd] kswapd
[bdflush] bdflush
[kupdated] get_request_wait
[kreiserfsd] get_request_wait
/usr/sbin/syslog get_request_wait
/usr/sbin/klogd do_syslog
[eth0] rtl8139_thread
/usr/sbin/iplog do_select
/usr/sbin/iplog do_poll
/usr/sbin/iplog get_request_wait
/usr/sbin/iplog do_select
/usr/sbin/iplog wait_for_packet
/usr/sbin/sshd do_select
/sbin/agetty tty read_chan
/bin/login -- down
/usr/sbin/sshd do_select
-bash wait4
-su wait4
/usr/sbin/sshd do_select
-bash wait4
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/dbench 32 get_request_wait
/usr/sbin/sshd do_select
/usr/sbin/sshd get_request_wait
ed /tmp/ls get_request_wait
ps -eo cmd,wchan -
vmstat 3
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
1 37 2 0 25464 3224 333252 0 0 13 371 107 33 0 4 96
0 37 2 0 25460 3224 333252 0 0 0 0 102 6 0 0 100
0 37 2 0 25460 3224 333252 0 0 0 0 101 7 0 0 100
I rebooted and ran dbench 32 on a new ext2 filesystem. dbench runs okay for about
30 seconds. Towards the end of the vmstat output below, I try to ssh in, the "b"
column goes up, but I don't the a bash prompt.
mountain:~$ vmstat 10
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
1 0 0 0 346236 20012 6316 0 0 793 67 174 164 3 8 90
0 32 0 0 182364 21396 162428 0 0 79 3492 136 109 2 26 72
21 11 0 0 163904 21532 180264 0 0 0 11683 209 97 0 11 89
0 32 0 0 32416 23224 306540 0 0 5 6375 226 108 1 27 72
0 32 1 0 22552 23392 315972 0 0 3 9807 206 98 0 8 92
0 32 2 132 4584 7128 349660 0 0 13 2905 192 204 2 29 69
0 32 2 132 4580 7128 349660 0 0 0 0 101 44 0 0 100
0 32 2 132 4580 7128 349660 0 0 0 0 100 45 0 0 100
0 32 2 132 4580 7128 349660 0 0 0 0 100 44 0 0 100
0 32 2 132 4580 7128 349660 0 0 0 0 100 44 0 0 100
0 32 2 132 4580 7128 349660 0 0 0 0 100 44 0 0 100
0 32 2 132 4580 7128 349660 0 0 0 0 100 44 0 0 100
0 32 2 132 4580 7128 349660 0 0 0 0 101 45 0 0 100
0 35 2 132 4156 7128 349672 0 0 1 1 104 52 1 0 99
0 35 2 132 4156 7128 349672 0 0 0 0 100 44 0 0 100
Below is software, hardware, and kernel configs:
Linux (none) 2.5.2-pre2 #1 Thu Dec 27 12:32:39 EST 2001 i586 unknown
Gnu C 2.95.3
Gnu make 3.79.1
binutils 2.11.2
util-linux 2.11n
mount 2.11n
modutils 2.4.11
e2fsprogs 1.25
reiserfsprogs 3.x.0k-pre14
PPP 2.4.1
Linux C Library 2.2.4
Dynamic linker (ldd) 2.2.4
Procps 2.0.7
Net-tools 1.60
Kbd 1.06
Sh-utils 2.0
Modules Loaded
This machine has a VIA chipset. No proprietary drivers.
384 MB RAM.
Root filesystem on /dev/hdc2 # not the usual /dev/hda
00:00.0 Host bridge: VIA Technologies, Inc. VT82C598 [Apollo MVP3] (rev 04)
00:01.0 PCI bridge: VIA Technologies, Inc. VT82C598/694x [Apollo MVP3/Pro133x AGP]
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C586/A/B PCI-to-ISA [Apollo VP] (rev 47)
00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
00:07.3 Host bridge: VIA Technologies, Inc. VT82C586B ACPI (rev 10)
00:13.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139 (rev 10)
01:00.0 VGA compatible controller: nVidia Corporation Vanta [NV6] (rev 15)
2.4.18-pre1 (and other 2.4.17* kernels run dbench 32, 128 okay on this system)
This is the config difference:
diff 2.5.2-pre2 2.4.18-pre1
> CONFIG_NETLINK_DEV=y
< CONFIG_RAMFS=y
# 2.5.2-pre2 config
CONFIG_X86=y
CONFIG_ISA=y
CONFIG_UID16=y
CONFIG_EXPERIMENTAL=y
CONFIG_MODULES=y
CONFIG_KMOD=y
CONFIG_MK6=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_X86_ALIGNMENT_16=y
CONFIG_X86_TSC=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_NOHIGHMEM=y
CONFIG_MTRR=y
CONFIG_NET=y
CONFIG_PCI=y
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_NAMES=y
CONFIG_SYSVIPC=y
CONFIG_SYSCTL=y
CONFIG_KCORE_ELF=y
CONFIG_BINFMT_ELF=y
CONFIG_PM=y
CONFIG_APM=m
CONFIG_APM_DO_ENABLE=y
CONFIG_BLK_DEV_FD=y
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_SIZE=4096
CONFIG_BLK_DEV_INITRD=y
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_NETFILTER=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_NF_CONNTRACK=y
CONFIG_IP_NF_FTP=m
CONFIG_IP_NF_IPTABLES=y
CONFIG_IP_NF_MATCH_LIMIT=y
CONFIG_IP_NF_MATCH_MULTIPORT=m
CONFIG_IP_NF_MATCH_STATE=y
CONFIG_IP_NF_FILTER=y
CONFIG_IP_NF_NAT=y
CONFIG_IP_NF_NAT_NEEDED=y
CONFIG_IP_NF_TARGET_MASQUERADE=y
CONFIG_IP_NF_NAT_FTP=m
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_IDEDISK_MULTI_MODE=y
CONFIG_BLK_DEV_IDECD=m
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_BLK_DEV_ADMA=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_BLK_DEV_VIA82CXXX=y
CONFIG_IDEDMA_AUTO=y
CONFIG_BLK_DEV_IDE_MODES=y
CONFIG_NETDEVICES=y
CONFIG_NET_ETHERNET=y
CONFIG_NET_PCI=y
CONFIG_8139TOO=y
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_SERIAL=y
CONFIG_SERIAL_CONSOLE=y
CONFIG_UNIX98_PTYS=y
CONFIG_UNIX98_PTY_COUNT=64
CONFIG_MOUSE=m
CONFIG_PSMOUSE=y
CONFIG_REISERFS_FS=y
CONFIG_EXT3_FS=y
CONFIG_JBD=y
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_RAMFS=y
CONFIG_ISO9660_FS=m
CONFIG_NTFS_FS=m
CONFIG_PROC_FS=y
CONFIG_DEVPTS_FS=y
CONFIG_EXT2_FS=y
CONFIG_CODA_FS=m
CONFIG_NFS_FS=m
CONFIG_NFS_V3=y
CONFIG_NFSD=y
CONFIG_NFSD_V3=y
CONFIG_SUNRPC=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_MSDOS_PARTITION=y
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=m
CONFIG_VGA_CONSOLE=y
CONFIG_VIDEO_SELECT=y
CONFIG_DEBUG_KERNEL=y
CONFIG_MAGIC_SYSRQ=y
--
Randy Hron
On Thu, Dec 27 2001, [email protected] wrote:
> On Mon, Dec 24, 2001 at 06:02:44PM +0100, Jens Axboe wrote:
> > > I tried unpatched 2.5.2-pre1 on a k6-2. dbench 32 hung similarly with
> > > 32 in "b", bo and bi = 0, and id = 100. That machine is ill now and can't
> > > find "init" when booting, boot single, or boot init=/bin/bash.
> >
> > Please send ps -eo cmd,wchan info for a hung machine.
> >
> > --
> > Jens Axboe
> >
>
> I rebuilt the reiserfs that dbench writes to.
> Here is ps -eo cmd,wchan on the k6-2 running 2.5.2-pre2:
Ah this is interesting, all stuck in get_request_wait. I cannot
reproduce your problem here whatever I do, no reiser though.
--
Jens Axboe
On Fri, Dec 28, 2001 at 12:40:37PM +0100, Jens Axboe wrote:
> > I rebuilt the reiserfs that dbench writes to.
> > Here is ps -eo cmd,wchan on the k6-2 running 2.5.2-pre2:
>
> Ah this is interesting, all stuck in get_request_wait. I cannot
> reproduce your problem here whatever I do, no reiser though.
>
> --
> Jens Axboe
That's good news. It's probably something with my configuration
or hardware. I saw the livelock on both ext2 and reiserfs.
I removed these options from the config and rebuilt 2.5.2-pre2:
CONFIG_PM=y
CONFIG_APM=m
CONFIG_APM_DO_ENABLE=y
CONFIG_NTFS_FS=m
CONFIG_CODA_FS=m
CONFIG_NFS_FS=m
CONFIG_NFS_V3=y
CONFIG_NFSD=y
CONFIG_NFSD_V3=y
CONFIG_SUNRPC=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_VIDEO_SELECT=y
The initial dbench on ext2 completed for 32 processes but 128 didn't:
vmstat 8
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 128 1 132 14796 22136 314916 0 0 0 5467 272 122 1 13 86
0 128 1 636 3968 21844 328132 0 9 1 1338 132 125 1 18 81
0 128 1 636 3964 21844 328132 0 0 0 0 101 44 0 0 100
0 128 1 636 3964 21844 328132 0 0 0 0 101 45 0 0 100
0 128 1 636 3964 21844 328132 0 0 0 0 101 45 0 0 100
ps -eo cmd,wchan | uniq
CMD WCHAN
init pollwait
[keventd] context_thread
[ksoftirqd_CPU0] ksoftirqd
[kswapd] refill_inactive
[bdflush] try_to_free_buffers
[kupdated] init_private_file
[kreiserfsd] reiserfs_get_block
/usr/sbin/syslog pollwait
/usr/sbin/klogd do_syslog
[eth0] timer_do_blank_screen
/usr/sbin/iplog pollwait
/usr/sbin/iplog select
/usr/sbin/iplog rt_sigsuspend
/usr/sbin/iplog pollwait
/usr/sbin/iplog netdev_ethtool_ioctl
/usr/sbin/sshd pollwait
/sbin/agetty tty is_internal
/bin/login -- write_chan
/usr/sbin/sshd pollwait
-bash wait4
/usr/sbin/sshd pollwait
-bash wait4
/bin/bash ./chk wait4
/dbench 128 wait4
/dbench 128 down
/dbench 128 write_chan
/dbench 128 init_private_file
/dbench 128 write_chan
/dbench 128 down
/dbench 128 write_chan
/dbench 128 down
/dbench 128 write_chan
/dbench 128 down
/dbench 128 write_chan
/dbench 128 down
/dbench 128 write_chan
/dbench 128 down
/dbench 128 write_chan
/dbench 128 down
/dbench 128 write_chan
/dbench 128 down
/dbench 128 write_chan
/dbench 128 down
/dbench 128 write_chan
/dbench 128 down
/dbench 128 write_chan
/dbench 128 down
/dbench 128 write_chan
/dbench 128 down
/dbench 128 write_chan
/dbench 128 down
/dbench 128 write_chan
/dbench 128 down
/dbench 128 write_chan
/dbench 128 down
/dbench 128 write_chan
/dbench 128 down
/dbench 128 write_chan
/dbench 128 down
/dbench 128 write_chan
/dbench 128 down
/dbench 128 write_chan
/dbench 128 down
/dbench 128 write_chan
/dbench 128 down
/dbench 128 write_chan
/dbench 128 add_to_page_cache_unique
/dbench 128 down
/dbench 128 write_chan
/dbench 128 down
/dbench 128 write_chan
/dbench 128 down
/dbench 128 write_chan
/dbench 128 down
ps -eo cmd,wchan -
uniq do_execve
I stripped down the config a little more by removing these:
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_SIZE=4096
CONFIG_BLK_DEV_INITRD=y
CONFIG_IP_NF_CONNTRACK=y
CONFIG_IP_NF_FTP=m
CONFIG_IP_NF_IPTABLES=y
CONFIG_IP_NF_MATCH_LIMIT=y
CONFIG_IP_NF_MATCH_MULTIPORT=m
CONFIG_IP_NF_MATCH_STATE=y
CONFIG_IP_NF_FILTER=y
CONFIG_IP_NF_NAT=y
CONFIG_IP_NF_NAT_NEEDED=y
CONFIG_IP_NF_TARGET_MASQUERADE=y
CONFIG_IP_NF_NAT_FTP=m
CONFIG_BLK_DEV_IDECD=m
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_ISO9660_FS=m
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=m
With the stripped config, I built 2.5.2-pre3. It panic'd
with the stripped config. 2.5.2-pre3 panic'd yesterday
on this machine's normal config too.
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
8139too Fast Ethernet driver 0.9.22
PCI: Found IRQ 11 for device 00:13.0
IRQ routing conflict for 00:13.0, have irq 9, want irq 11
eth0: RealTek RTL8139 Fast Ethernet at 0xd8800000, 00:50:bf:25:68:f3, IRQ 9
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP
IP: routing cache hash table of 4096 buckets, 32Kbytes
TCP: Hash tables configured (established 32768 bind 32768)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
Kernel panic: Out of memory and no killable processes...
I haven't noticed any reports of this panic on 2.5.2-pre3.
Back to 2.5.2-pre2, I removed these:
CONFIG_BLK_DEV_LOOP=m
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_NETFILTER=y
CONFIG_BLK_DEV_VIA82CXXX=y
CONFIG_BLK_DEV_IDE_MODES=y
dbench 32 locked up again.
I re-ran dbench 32, 128 with 2.4.17rc2aa2 on this machine and
it worked fine. I'll try 2.5.1 on this machine (2.5.1 was
okay on another machine).
--
Randy Hron
On Fri, Dec 28 2001, [email protected] wrote:
> On Fri, Dec 28, 2001 at 12:40:37PM +0100, Jens Axboe wrote:
> > > I rebuilt the reiserfs that dbench writes to.
> > > Here is ps -eo cmd,wchan on the k6-2 running 2.5.2-pre2:
> >
> > Ah this is interesting, all stuck in get_request_wait. I cannot
> > reproduce your problem here whatever I do, no reiser though.
> >
> > --
> > Jens Axboe
>
> That's good news. It's probably something with my configuration
> or hardware. I saw the livelock on both ext2 and reiserfs.
Thanks for an excellent report. I can't quite see what the problem
should be yet, especially since the problems seem to start with
2.5.2-pre1 which doesn't really have a lot of interesting changes. I'll
keep looking, though. Could you do sysrq-t for a livelocked system?
The livelocks in this mail appear different than the previous ones.
Could you try running without swap?
> With the stripped config, I built 2.5.2-pre3. It panic'd
> with the stripped config. 2.5.2-pre3 panic'd yesterday
> on this machine's normal config too.
>
> Floppy drive(s): fd0 is 1.44M
> FDC 0 is a post-1991 82077
> 8139too Fast Ethernet driver 0.9.22
> PCI: Found IRQ 11 for device 00:13.0
> IRQ routing conflict for 00:13.0, have irq 9, want irq 11
> eth0: RealTek RTL8139 Fast Ethernet at 0xd8800000, 00:50:bf:25:68:f3, IRQ 9
> NET4: Linux TCP/IP 1.0 for NET4.0
> IP Protocols: ICMP, UDP, TCP
> IP: routing cache hash table of 4096 buckets, 32Kbytes
> TCP: Hash tables configured (established 32768 bind 32768)
> NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
> Kernel panic: Out of memory and no killable processes...
>
> I haven't noticed any reports of this panic on 2.5.2-pre3.
Someone else did report a similar case. Very strange, doesn't look bio
related at all. WHat's the entire boot message for a 2.5.2-pre3 boot
attempt like the above?
> I re-ran dbench 32, 128 with 2.4.17rc2aa2 on this machine and
> it worked fine. I'll try 2.5.1 on this machine (2.5.1 was
> okay on another machine).
2.5.1 vs 2.5.2-preX is much more interesting.
(btw, attached patch should fix your highmem oops)
--- /opt/kernel/linux-2.5.2-pre3/include/linux/blkdev.h Fri Dec 28 11:43:04 2001
+++ include/linux/blkdev.h Fri Dec 28 15:25:36 2001
@@ -228,8 +228,8 @@
* BLK_BOUNCE_ANY : don't bounce anything
* BLK_BOUNCE_ISA : bounce pages above ISA DMA boundary
*/
-#define BLK_BOUNCE_HIGH ((blk_max_low_pfn + 1) << PAGE_SHIFT)
-#define BLK_BOUNCE_ANY ((blk_max_pfn + 1) << PAGE_SHIFT)
+#define BLK_BOUNCE_HIGH (blk_max_low_pfn << PAGE_SHIFT)
+#define BLK_BOUNCE_ANY (blk_max_pfn << PAGE_SHIFT)
#define BLK_BOUNCE_ISA (ISA_DMA_THRESHOLD)
extern int init_emergency_isa_pool(void);
--
Jens Axboe
On Fri, Dec 28, 2001 at 03:30:22PM +0100, Jens Axboe wrote:
> Thanks for an excellent report. I can't quite see what the problem
> should be yet, especially since the problems seem to start with
> 2.5.2-pre1 which doesn't really have a lot of interesting changes. I'll
> keep looking, though. Could you do sysrq-t for a livelocked system?
I don't know how to do sysrq-t via serial console. If I put a monitor
and keyboard on the box, syslogd is blocked when the livelock occurs,
and I haven't figured out a workaround yet.
2.5.1 runs dbench 32, 128, by the way.
> The livelocks in this mail appear different than the previous ones.
> Could you try running without swap?
Here is without swap on 2.5.2-pre2.
vmstat 8
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 0 350756 19484 5464 0 0 0 0 100 41 0 0 100
0 0 0 0 350756 19484 5464 0 0 0 0 100 41 0 0 100
3 29 0 0 344668 19588 8464 0 0 29 0 108 70 1 1 98
0 32 1 0 184264 20824 162556 0 0 32 9123 1085 59 3 86 11
21 11 3 0 181748 20864 164916 0 0 1 10500 1503 20 1 83 16
0 32 1 0 148560 21272 196764 0 0 4 4838 893 52 2 47 51
6 26 2 0 106532 21804 237140 0 0 2 5590 836 62 2 35 64
0 32 2 0 4448 5380 353332 0 0 11 44 253 120 2 26 73
0 32 2 0 4448 5380 353332 0 0 0 0 101 41 0 0 100
0 32 2 0 4448 5380 353332 0 0 0 0 101 41 0 0 100
ps -eo cmd,wchan
CMD WCHAN
init do_select
[keventd] context_thread
[ksoftirqd_CPU0] ksoftirqd
[kswapd] kswapd
[bdflush] wait_on_buffer
[kupdated] wait_on_buffer
[kreiserfsd] reiserfs_journal_commit_thread
/usr/sbin/syslog do_select
/usr/sbin/klogd do_syslog
[eth0] rtl8139_thread
/usr/sbin/sshd do_select
/sbin/agetty tty read_chan
/sbin/agetty -h read_chan
/usr/sbin/sshd do_select
-bash wait4
/usr/sbin/sshd -
-bash wait4
/bin/bash ./chk wait4
/dbench 32 wait4
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 wait_on_buffer
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 down
/dbench 32 down
ps -eo cmd,wchan -
> > Kernel panic: Out of memory and no killable processes...
>
> Someone else did report a similar case. Very strange, doesn't look bio
> related at all. WHat's the entire boot message for a 2.5.2-pre3 boot
> attempt like the above?
I rebuilt 2.5.2-pre3 with mrproper using the config that worked for 2.5.1
first and noticed some depmod errors during the build:
if [ -r System.map ]; then /sbin/depmod -ae -F System.map 2.5.2-pre3; fi
depmod: *** Unresolved symbols in /lib/modules/2.5.2-pre3/kernel/fs/nfs/nfs.o
depmod: seq_escape
depmod: seq_printf
make[1]: Entering directory `/usr/src/linux/arch/i386/boot'
sh -x ./install.sh 2.5.2-pre3 bzImage /usr/src/linux/System.map "/boot"
So I removed initrd, loopback, nfs, coda, ntfs, dosfs, vfat, and rebuilt
with mrproper.
Here is the boot message and panic:
LILO 22.1 boot:
Loading lfs.............
Linux version 2.5.2-pre3 (root@mountain) (gcc version 2.95.3 20010315 (release)) #1 Fri Dec 28 12:33:00 EST 2001
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 0000000018000000 (usable)
BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
On node 0 totalpages: 98304
zone(0): 4096 pages.
zone(1): 94208 pages.
zone(2): 0 pages.
Kernel command line: BOOT_IMAGE=lfs ro root=1602 console=ttyS1,38400n8
Initializing CPU#0
Detected 501.155 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 999.42 BogoMIPS
Memory: 385036k/393216k available (962k kernel code, 7796k reserved, 243k data, 200k init, 0k highmem)
Dentry-cache hash table entries: 65536 (order: 7, 524288 bytes)
Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
Mount-cache hash table entries: 8192 (order: 4, 65536 bytes)
Buffer-cache hash table entries: 32768 (order: 5, 131072 bytes)
Page-cache hash table entries: 131072 (order: 7, 524288 bytes)
CPU: L1 I Cache: 32K (32 bytes/line), D cache 32K (32 bytes/line)
CPU: AMD-K6(tm) 3D processor stepping 0c
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.40 (20010327) Richard Gooch ([email protected])
mtrr: detected mtrr type: AMD K6
PCI: PCI BIOS revision 2.10 entry at 0xfb3c0, last bus=1
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Using IRQ router VIA [1106/0586] at 00:07.0
Activating ISA DMA hang workarounds.
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Starting kswapd
BIO: pool of 256 setup, 14Kb (56 bytes/bio)
biovec: init pool 0, 1 entries, 12 bytes
biovec: init pool 1, 4 entries, 48 bytes
biovec: init pool 2, 16 entries, 192 bytes
biovec: init pool 3, 64 entries, 768 bytes
biovec: init pool 4, 128 entries, 1536 bytes
biovec: init pool 5, 256 entries, 3072 bytes
Journalled Block Device driver loaded
Detected PS/2 Mouse Port.
pty: 256 Unix98 ptys configured
keyboard: Timeout - AT keyboard not present?(ed)
keyboard: Timeout - AT keyboard not present?(f4)
Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
block: 256 slots per queue, batch=32
Uniform Multi-Platform E-IDE driver Revision: 6.32
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller on PCI slot 00:07.1
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: VIA vt82c586b (rev 47) IDE UDMA33 controller on pci00:07.1
ide0: BM-DMA at 0xe000-0xe007, BIOS settings: hda:DMA, hdb:DMA
ide1: BM-DMA at 0xe008-0xe00f, BIOS settings: hdc:DMA, hdd:DMA
hda: Maxtor 51536U3, ATA DISK drive
hdb: ATAPI CDROM, ATAPI CD/DVD-ROM drive
hdc: Maxtor 52049U4, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
blk: queue c028dcc4, I/O limit 4095Mb (mask 0xffffffff)
hda: 30015216 sectors (15368 MB) w/2048KiB Cache, CHS=1868/255/63, UDMA(33)
blk: queue c028e054, I/O limit 4095Mb (mask 0xffffffff)
hdc: 40020624 sectors (20491 MB) w/2048KiB Cache, CHS=39703/16/63, UDMA(33)
Partition check:
hda: hda1 hda2 hda3 < hda5 hda6 hda7 >
hdc: hdc1 hdc2 hdc3 < hdc5 >
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
8139too Fast Ethernet driver 0.9.22
PCI: Found IRQ 11 for device 00:13.0
IRQ routing conflict for 00:13.0, have irq 9, want irq 11
eth0: RealTek RTL8139 Fast Ethernet at 0xd8800000, 00:50:bf:25:68:f3, IRQ 9
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP
IP: routing cache hash table of 4096 buckets, 32Kbytes
TCP: Hash tables configured (established 32768 bind 32768)
ip_conntrack (3072 buckets, 24576 max)
ip_tables: (c)2000 Netfilter core team
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
Kernel panic: Out of memory and no killable processes...
> > I re-ran dbench 32, 128 with 2.4.17rc2aa2 on this machine and
>
> 2.5.1 vs 2.5.2-preX is much more interesting.
2.5.1 finishes dbench 32, 128 on this machine. :)
Throughput 21.6466 MB/sec (NB=27.0582 MB/sec 216.466 MBit/sec) 32 procs
Throughput 5.91991 MB/sec (NB=7.39989 MB/sec 59.1991 MBit/sec) 128 procs
> (btw, attached patch should fix your highmem oops)
>
> --
> Jens Axboe
I'm going to hold off testing on my highmem box for a while.
BTW, the original "cannot find init" after 2.5.1-pre1 was because
I had an invalid "root=" entry in lilo.conf for the kernels
other than current and "old".
--
Randy Hron
On Fri, Dec 28, 2001 at 03:30:22PM +0100, Jens Axboe wrote:
> keep looking, though. Could you do sysrq-t for a livelocked system?
> --
> Jens Axboe
Using a tip from Russell King:
This is while running dbench 32 on an ext2 filesystem.
SysRq : Show State
free sibling
task PC stack pid father child younger older
init S C177FF24 4608 1 0 43 3 (NOTLB)
Call Trace: [<c011159a>] [<c01114dc>] [<c01398d4>] [<c0139c82>] [<c01337d6>]
[<c01085b3>]
keventd S 00010000 6596 2 1 7 (L-TLB)
Call Trace: [<c011e245>] [<c0106efc>]
ksoftirqd_CPU S C1770000 6412 3 0 4 1 (L-TLB)
Call Trace: [<c01179b2>] [<c0106efc>]
kswapd S C176E000 6652 4 0 5 3 (L-TLB)
Call Trace: [<c01282c6>] [<c0106efc>]
bdflush S 00000286 6568 5 0 6 4 (L-TLB)
Call Trace: [<c0111b29>] [<c0130b53>] [<c0106efc>]
kupdated D 00000048 5860 6 0 5 (L-TLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012f265>] [<c015bd08>] [<c015bd95>]
[<c013dd35>] [<c01309bd>] [<c0130c45>] [<c0106efc>]
kreiserfsd S D7D1BFB4 6148 7 1 25 2 (L-TLB)
Call Trace: [<c011159a>] [<c01114dc>] [<c0111b7e>] [<c0177257>] [<c0106efc>]
syslogd D 00000048 4788 25 1 27 7 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c01665b8>]
[<c0124c35>] [<c012db02>] [<c012479c>] [<c012dc0f>] [<c01085b3>]
klogd S 7FFFFFFF 2656 27 1 32 25 (NOTLB)
Call Trace: [<c011153f>] [<c01dd4ad>] [<c01ddd37>] [<c01aed94>] [<c01aef9f>]
[<c012d91a>] [<c01085b3>]
eth0 S D7945F98 2656 32 1 41 27 (L-TLB)
Call Trace: [<c011159a>] [<c01114dc>] [<c0111b7e>] [<c01a0d7e>] [<c0106efc>]
sshd S 7FFFFFFF 4788 41 1 52 42 32 (NOTLB)
Call Trace: [<c011153f>] [<c01af15d>] [<c01398d4>] [<c0139c82>] [<c01085b3>]
agetty S 7FFFFFFF 4364 42 1 43 41 (NOTLB)
Call Trace: [<c011153f>] [<c018350d>] [<c017f786>] [<c012d855>] [<c01085b3>]
agetty S 7FFFFFFF 0 43 1 42 (NOTLB)
Call Trace: [<c011153f>] [<c018350d>] [<c017f786>] [<c012d855>] [<c01085b3>]
sshd S 7FFFFFFF 5484 45 41 46 52 (NOTLB)
Call Trace: [<c011153f>] [<c01398d4>] [<c0139c82>] [<c01085b3>]
bash S 00000000 4580 46 45 59 (NOTLB)
Call Trace: [<c01169ee>] [<c01085b3>]
sshd S 7FFFFFFF 1568 52 41 53 45 (NOTLB)
Call Trace: [<c0183b6f>] [<c011153f>] [<c01398d4>] [<c0139c82>] [<c01085b3>]
bash S 00000000 2656 53 52 58 (NOTLB)
Call Trace: [<c01169ee>] [<c01085b3>]
vmstat S D72B5F8C 644 58 53 (NOTLB)
Call Trace: [<c011159a>] [<c01114dc>] [<c011a959>] [<c01085b3>]
chk S 00000000 5284 59 46 60 (NOTLB)
Call Trace: [<c01169ee>] [<c01085b3>]
dbench S 00000000 5208 60 59 93 (NOTLB)
Call Trace: [<c01169ee>] [<c01085b3>]
dbench D 00000048 5692 62 60 63 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
[<c012d91a>] [<c01085b3>]
dbench D D7744244 5532 63 60 64 62 (NOTLB)
Call Trace: [<c01073ed>] [<c0107538>] [<c01e3473>] [<c0124c35>] [<c0124c8d>]
[<c015a7de>] [<c0159e43>] [<c012e304>] [<c012d48b>] [<c012d4d7>] [<c01085b3>]
dbench D 00000048 5684 64 60 65 63 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
[<c012d91a>] [<c01085b3>]
dbench D 00000048 5624 65 60 66 64 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
[<c012d91a>] [<c01085b3>]
dbench D 00000048 5700 66 60 67 65 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
[<c012d91a>] [<c01085b3>]
dbench D D7744244 5660 67 60 68 66 (NOTLB)
Call Trace: [<c01073ed>] [<c0107538>] [<c01e34b9>] [<c0159886>] [<c015c03c>]
[<c013655d>] [<c01366ca>] [<c012d07a>] [<c012d3b7>] [<c01085b3>]
dbench D 00000048 5688 68 60 69 67 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
[<c012d91a>] [<c01085b3>]
dbench D D7744244 5532 69 60 70 68 (NOTLB)
Call Trace: [<c01073ed>] [<c0107538>] [<c01e3473>] [<c0124c35>] [<c0124c8d>]
[<c015a7de>] [<c0159e43>] [<c012e304>] [<c012d48b>] [<c012d4d7>] [<c01085b3>]
dbench D 00000048 5780 70 60 71 69 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
[<c012d91a>] [<c01085b3>]
dbench D 00000048 5756 71 60 72 70 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
[<c012d91a>] [<c01085b3>]
dbench D 00000048 5692 72 60 73 71 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
[<c012d91a>] [<c01085b3>]
dbench D D7744244 5692 73 60 74 72 (NOTLB)
Call Trace: [<c01073ed>] [<c0107538>] [<c01e3473>] [<c0124c35>] [<c0124c8d>]
[<c015a7de>] [<c0159e43>] [<c012e304>] [<c012d48b>] [<c012d4d7>] [<c01085b3>]
dbench D D7744244 5612 74 60 75 73 (NOTLB)
Call Trace: [<c01073ed>] [<c0107538>] [<c01e3473>] [<c0124c35>] [<c0124c8d>]
[<c015a7de>] [<c0159e43>] [<c012e304>] [<c012d48b>] [<c012d4d7>] [<c01085b3>]
dbench D 00000048 5740 75 60 76 74 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
[<c012d91a>] [<c01085b3>]
dbench D 00000048 5600 76 60 77 75 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
[<c012d91a>] [<c01085b3>]
dbench D 00000048 5448 77 60 78 76 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
[<c012d91a>] [<c01085b3>]
dbench D 00000048 5692 78 60 79 77 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
[<c012d91a>] [<c01085b3>]
dbench D 00000048 5640 79 60 80 78 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012f265>] [<c0159085>] [<c015a85c>]
[<c015aa70>] [<c015ae09>] [<c012f8f2>] [<c012fee1>] [<c015ac38>] [<c015aff6>]
[<c015ac38>] [<c0124bed>] [<c012d91a>] [<c01085b3>]
dbench D 00000048 5692 80 60 81 79 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
[<c012d91a>] [<c01085b3>]
dbench D 00000048 5468 81 60 82 80 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
[<c012d91a>] [<c01085b3>]
dbench D 00000048 5412 82 60 83 81 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
[<c012d91a>] [<c01085b3>]
dbench D 00000048 5400 83 60 84 82 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
[<c012d91a>] [<c01085b3>]
dbench D 00000048 5700 84 60 85 83 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
[<c012d91a>] [<c01085b3>]
dbench D 00000048 5692 85 60 86 84 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
[<c012d91a>] [<c01085b3>]
dbench D 00000048 5336 86 60 87 85 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
[<c012d91a>] [<c01085b3>]
dbench D D7744244 5628 87 60 88 86 (NOTLB)
Call Trace: [<c01073ed>] [<c0107538>] [<c01e34b9>] [<c0159886>] [<c015c35d>]
[<c0136da4>] [<c0136e65>] [<c01085b3>]
dbench D 00000048 5484 88 60 89 87 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
[<c012d91a>] [<c01085b3>]
dbench D D7744244 5740 89 60 90 88 (NOTLB)
Call Trace: [<c01073ed>] [<c0107538>] [<c01e3473>] [<c0124c35>] [<c0124c8d>]
[<c015a7de>] [<c0159e43>] [<c012e304>] [<c012d48b>] [<c012d4d7>] [<c01085b3>]
dbench D 00000048 5420 90 60 91 89 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
[<c012d91a>] [<c01085b3>]
dbench D 00000048 5652 91 60 92 90 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
[<c012d91a>] [<c01085b3>]
dbench D 00000048 5660 92 60 93 91 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
[<c012d91a>] [<c01085b3>]
dbench D 00000048 5592 93 60 92 (NOTLB)
Call Trace: [<c019656d>] [<c0196baf>] [<c0196e70>] [<c0196f04>] [<c0196fa7>]
[<c012e575>] [<c012e5fe>] [<c012f1ef>] [<c012faff>] [<c012ff4b>] [<c0124c35>]
[<c012d91a>] [<c01085b3>]
PID CMD WCHAN
1 init do_select
2 [keventd] context_thread
3 [ksoftirqd_CPU0] ksoftirqd
4 [kswapd] kswapd
5 [bdflush] bdflush
6 [kupdated] get_request_wait
7 [kreiserfsd] reiserfs_journal_commit_thread
25 /usr/sbin/syslog get_request_wait
27 /usr/sbin/klogd unix_wait_for_peer
32 [eth0] rtl8139_thread
41 /usr/sbin/sshd do_select
42 /sbin/agetty tty read_chan
43 /sbin/agetty -h read_chan
45 /usr/sbin/sshd do_select
46 -bash wait4
52 /usr/sbin/sshd do_select
53 -bash wait4
59 /bin/bash ./chk wait4
60 ./dbench 32 wait4
62 ./dbench 32 get_request_wait
63 ./dbench 32 down
64 ./dbench 32 get_request_wait
65 ./dbench 32 get_request_wait
66 ./dbench 32 get_request_wait
67 ./dbench 32 down
68 ./dbench 32 get_request_wait
69 ./dbench 32 down
70 ./dbench 32 get_request_wait
71 ./dbench 32 get_request_wait
72 ./dbench 32 get_request_wait
73 ./dbench 32 down
74 ./dbench 32 down
75 ./dbench 32 get_request_wait
76 ./dbench 32 get_request_wait
77 ./dbench 32 get_request_wait
78 ./dbench 32 get_request_wait
79 ./dbench 32 get_request_wait
80 ./dbench 32 get_request_wait
81 ./dbench 32 get_request_wait
82 ./dbench 32 get_request_wait
83 ./dbench 32 get_request_wait
84 ./dbench 32 get_request_wait
85 ./dbench 32 get_request_wait
86 ./dbench 32 get_request_wait
87 ./dbench 32 down
88 ./dbench 32 get_request_wait
89 ./dbench 32 down
90 ./dbench 32 get_request_wait
91 ./dbench 32 get_request_wait
92 ./dbench 32 get_request_wait
93 ./dbench 32 get_request_wait
97 ps -eo pid,cmd,w -
SysRq : Show Regs
Pid: 0, comm: swapper
EIP: 0010:[<c0106c03>] CPU: 0 EFLAGS: 00000246 Not tainted
EAX: 00000000 EBX: c0220000 ECX: d7d1a270 EDX: d7d1a270
ESI: c0106be0 EDI: ffffe000 EBP: 0008e000 DS: 0018 ES: 0018
CR0: 8005003b CR2: 080cc00c CR3: 17d02000 CR4: 00000090
Call Trace: [<c0106c67>] [<c0105000>] [<c0105027>]
SysRq : Show Memory
Mem-info:
Free pages: 83640kB ( 0kB HighMem)
Zone:DMA freepages: 14632kB min: 128kB low: 256kB high: 384kB
Zone:Normal freepages: 69008kB min: 1020kB low: 2040kB high: 3060kB
Zone:HighMem freepages: 0kB min: 0kB low: 0kB high: 0kB
( Active: 1576, inactive: 69884, free: 20910 )
4*4kB 3*8kB 2*16kB 3*32kB 4*64kB 3*128kB 2*256kB 0*512kB 1*1024kB 6*2048kB = 14632kB)
10*4kB 3*8kB 1*16kB 2*32kB 0*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 33*2048kB = 69008kB)
= 0kB)
Swap cache: add 0, delete 0, find 0/0, race 0+0
Free swap: 136512kB
98304 pages of RAM
0 pages of HIGHMEM
1980 reserved pages
75748 pages shared
0 pages swap cached
0 pages in page table cache
Buffer memory: 4252kB
--
Randy Hron
> > Kernel panic: Out of memory and no killable processes...
>
> Someone else did report a similar case. Very strange, doesn't look bio
Al Viro posted a fix:
http://marc.theaimsgroup.com/?l=linux-kernel&m=100959128922157&w=2
I used Al's patch and 2.5.2-pre3 boots with reiserfs root_fs
and no panic.
Below is the trace on 2.5.2-pre3 after dbench 32 livelocked.
free sibling
task PC stack pid father child younger older
init S C177FF24 4592 1 0 45 3 (NOTLB)
Call Trace: [<c01115d9>] [<c0111500>] [<c0139d54>] [<c013a102>] [<c0133c46>]
[<c01085b3>]
keventd S 00010000 6580 2 1 7 (L-TLB)
Call Trace: [<c011e3f5>] [<c0106ef0>]
ksoftirqd_CPU S C1770000 6396 3 0 4 1 (L-TLB)
Call Trace: [<c0117b12>] [<c0106ef0>]
kswapd S C176E000 6636 4 0 5 3 (L-TLB)
Call Trace: [<c0128716>] [<c0106ef0>]
bdflush S 00000286 6552 5 0 6 4 (L-TLB)
Call Trace: [<c0111c69>] [<c0130fb3>] [<c0106ef0>]
kupdated D C176BFAC 5864 6 0 5 (L-TLB)
Call Trace: [<c012e96a>] [<c012eb2b>] [<c0131023>] [<c0106ef0>]
kreiserfsd S D68E9FB4 6156 7 1 25 2 (L-TLB)
Call Trace: [<c01115d9>] [<c0111500>] [<c0111cbe>] [<c0177717>] [<c0106ef0>]
syslogd D 00000048 4772 25 1 27 7 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0166aa8>]
[<c0125085>] [<c012df62>] [<c0124bec>] [<c012e06f>] [<c01085b3>]
klogd S 7FFFFFFF 4772 27 1 32 25 (NOTLB)
Call Trace: [<c011157b>] [<c01e6c4d>] [<c01e74e7>] [<c01b0d77>] [<c01b0f87>]
[<c012dd7a>] [<c01085b3>]
eth0 S D646FF98 0 32 1 37 27 (L-TLB)
Call Trace: [<c01115d9>] [<c0111500>] [<c0111cbe>] [<c01a125e>] [<c0106ef0>]
iplog S 7FFFFFFF 5304 37 1 38 43 32 (NOTLB)
Call Trace: [<c011157b>] [<c0139bd1>] [<c0139d54>] [<c013a102>] [<c01085b3>]
iplog S D616DF28 188 38 37 41 (NOTLB)
Call Trace: [<c01115d9>] [<c0111500>] [<c013a37c>] [<c013a57d>] [<c011191c>]
[<c01085b3>]
iplog S D6169FB0 5684 39 38 40 (NOTLB)
Call Trace: [<c0107767>] [<c01085b3>]
iplog S D6165F24 6280 40 38 41 39 (NOTLB)
Call Trace: [<c01115d9>] [<c0111500>] [<c0139d54>] [<c013a102>] [<c01085b3>]
iplog S 7FFFFFFF 5656 41 38 40 (NOTLB)
Call Trace: [<c011157b>] [<c01bed35>] [<c01b51e2>] [<c01b52fe>] [<c01e960f>]
[<c01b0dd5>] [<c01b1b47>] [<c011b314>] [<c011b550>] [<c011bc78>] [<c01b6c4b>]
[<c01b2267>] [<c01085b3>]
sshd S 7FFFFFFF 4772 43 1 55 44 37 (NOTLB)
Call Trace: [<c011157b>] [<c01b113d>] [<c0139d54>] [<c013a102>] [<c01085b3>]
agetty S 7FFFFFFF 4468 44 1 45 43 (NOTLB)
Call Trace: [<c011157b>] [<c0183a0d>] [<c017fc76>] [<c012dcb5>] [<c01085b3>]
agetty S 7FFFFFFF 0 45 1 44 (NOTLB)
Call Trace: [<c011157b>] [<c0183a0d>] [<c017fc76>] [<c012dcb5>] [<c01085b3>]
sshd S 7FFFFFFF 548 47 43 48 55 (NOTLB)
Call Trace: [<c011157b>] [<c0139d54>] [<c013a102>] [<c01085b3>]
bash S 00000000 4564 48 47 62 (NOTLB)
Call Trace: [<c0116b4e>] [<c01085b3>]
sshd S 7FFFFFFF 0 55 43 56 47 (NOTLB)
Call Trace: [<c011157b>] [<c0139d54>] [<c013a102>] [<c01085b3>]
bash S 7FFFFFFF 2640 56 55 (NOTLB)
Call Trace: [<c011157b>] [<c0183a0d>] [<c017fc76>] [<c012dcb5>] [<c01085b3>]
chk S 00000000 0 62 48 63 (NOTLB)
Call Trace: [<c0116b4e>] [<c01085b3>]
dbench S 00000000 5192 63 62 96 (NOTLB)
Call Trace: [<c0116b4e>] [<c01085b3>]
dbench D 00000048 5372 65 63 66 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000048 5620 66 63 67 65 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c015974b>]
[<c015a1fd>] [<c015c847>] [<c0137224>] [<c01372e5>] [<c01085b3>]
dbench D 00000000 5620 67 63 68 66 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000048 5728 68 63 69 67 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000048 5608 69 63 70 68 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000000 5948 70 63 71 69 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000048 5572 71 63 72 70 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000048 5264 72 63 73 71 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000048 5464 73 63 74 72 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000048 5728 74 63 75 73 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012f6e5>] [<c015b32a>] [<c012f974>]
[<c012fb2d>] [<c012fd51>] [<c0130341>] [<c015b0d8>] [<c015b496>] [<c015b0d8>]
[<c012503d>] [<c012dd7a>] [<c01085b3>]
dbench D 00000048 5528 75 63 76 74 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000048 5676 76 63 77 75 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000048 5332 77 63 78 76 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000048 5584 78 63 79 77 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000000 5644 79 63 80 78 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea8e>] [<c012f974>] [<c012fb73>] [<c012fd6e>] [<c012f745>]
[<c012f629>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>] [<c012dd7a>]
[<c01085b3>]
dbench D 00000048 5620 80 63 81 79 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000048 5600 81 63 82 80 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000000 5620 82 63 83 81 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000048 5604 83 63 84 82 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000048 5620 84 63 85 83 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000048 5632 85 63 86 84 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000048 5676 86 63 87 85 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000048 5676 87 63 88 86 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000048 5620 88 63 89 87 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000048 5620 89 63 90 88 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012f6e5>] [<c015b32a>] [<c012f974>]
[<c012fb2d>] [<c012fd51>] [<c0130341>] [<c015b0d8>] [<c015b496>] [<c015b0d8>]
[<c012503d>] [<c012dd7a>] [<c01085b3>]
dbench D 00000048 5728 90 63 91 89 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000048 5628 91 63 92 90 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000000 5676 92 63 93 91 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
dbench D 00000000 5948 93 63 94 92 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000000 5692 94 63 95 93 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000000 5488 95 63 96 94 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
dbench D 00000048 5372 96 63 95 (NOTLB)
Call Trace: [<c0196a06>] [<c019708f>] [<c0197340>] [<c01973dc>] [<c019747f>]
[<c012e9d5>] [<c012ea5e>] [<c012f66f>] [<c012ff5f>] [<c01303ab>] [<c0125085>]
[<c012dd7a>] [<c01085b3>]
SysRq : Show Regs
Pid: 0, comm: swapper
EIP: 0010:[<c0106c03>] CPU: 0 EFLAGS: 00000246 Not tainted
EAX: 00000000 EBX: c022e000 ECX: d68e8280 EDX: d68e8280
ESI: c0106be0 EDI: ffffe000 EBP: 0008e000 DS: 0018 ES: 0018
CR0: 8005003b CR2: 40014000 CR3: 16177000 CR4: 00000090
Call Trace: [<c0106c59>] [<c0105000>] [<c0105027>]
SysRq : Show Memory
Mem-info:
Free pages: 95596kB ( 0kB HighMem)
Zone:DMA freepages: 14572kB min: 128kB low: 256kB high: 384kB
Zone:Normal freepages: 81024kB min: 1020kB low: 2040kB high: 3060kB
Zone:HighMem freepages: 0kB min: 0kB low: 0kB high: 0kB
( Active: 1427, inactive: 67074, free: 23899 )
3*4kB 2*8kB 1*16kB 2*32kB 2*64kB 4*128kB 2*256kB 0*512kB 1*1024kB 6*2048kB = 14572kB)
10*4kB 3*8kB 4*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 39*2048kB = 81024kB)
= 0kB)
Swap cache: add 0, delete 0, find 0/0, race 0+0
Free swap: 136512kB
98304 pages of RAM
0 pages of HIGHMEM
1995 reserved pages
72913 pages shared
0 pages swap cached
0 pages in page table cache
Buffer memory: 23796kB
mountain:~/dbench$ ps -eo pid,cmd,wchan
PID CMD WCHAN
1 init do_select
2 [keventd] context_thread
3 [ksoftirqd_CPU0] ksoftirqd
4 [kswapd] kswapd
5 [bdflush] bdflush
6 [kupdated] wait_on_buffer
7 [kreiserfsd] reiserfs_journal_commit_thread
25 /usr/sbin/syslog get_request_wait
27 /usr/sbin/klogd unix_wait_for_peer
32 [eth0] rtl8139_thread
37 /usr/sbin/iplog do_select
38 /usr/sbin/iplog do_poll
39 /usr/sbin/iplog rt_sigsuspend
40 /usr/sbin/iplog do_select
41 /usr/sbin/iplog wait_for_packet
43 /usr/sbin/sshd do_select
44 /sbin/agetty tty read_chan
45 /sbin/agetty -h read_chan
47 /usr/sbin/sshd -
48 -bash wait4
55 /usr/sbin/sshd do_select
56 -bash read_chan
65 ./dbench 32 get_request_wait
66 ./dbench 32 get_request_wait
67 ./dbench 32 get_request_wait
68 ./dbench 32 get_request_wait
69 ./dbench 32 get_request_wait
70 ./dbench 32 get_request_wait
71 ./dbench 32 get_request_wait
72 ./dbench 32 get_request_wait
73 ./dbench 32 get_request_wait
74 ./dbench 32 get_request_wait
75 ./dbench 32 get_request_wait
76 ./dbench 32 get_request_wait
77 ./dbench 32 get_request_wait
78 ./dbench 32 get_request_wait
79 ./dbench 32 get_request_wait
80 ./dbench 32 get_request_wait
81 ./dbench 32 get_request_wait
82 ./dbench 32 get_request_wait
83 ./dbench 32 get_request_wait
84 ./dbench 32 get_request_wait
85 ./dbench 32 get_request_wait
86 ./dbench 32 get_request_wait
87 ./dbench 32 get_request_wait
88 ./dbench 32 get_request_wait
89 ./dbench 32 get_request_wait
90 ./dbench 32 get_request_wait
91 ./dbench 32 get_request_wait
92 ./dbench 32 get_request_wait
93 ./dbench 32 get_request_wait
94 ./dbench 32 get_request_wait
95 ./dbench 32 get_request_wait
96 ./dbench 32 get_request_wait
97 ps -eo pid,cmd,w -
dbench was run on the ext2 filesystem.
mountain:~/dbench$ df -kT .
Filesystem Type 1k-blocks Used Available Use% Mounted on
/dev/hda6 ext2 4032092 249208 3578060 7% /home
--
Randy Hron
On Sat, Dec 29 2001, [email protected] wrote:
> > > Kernel panic: Out of memory and no killable processes...
> >
> > Someone else did report a similar case. Very strange, doesn't look bio
>
> Al Viro posted a fix:
> http://marc.theaimsgroup.com/?l=linux-kernel&m=100959128922157&w=2
>
> I used Al's patch and 2.5.2-pre3 boots with reiserfs root_fs
> and no panic.
>
> Below is the trace on 2.5.2-pre3 after dbench 32 livelocked.
Thanks, could you try with this patch? It's not a fix (haven't found the
bug yet), but I think we are looking at list corruption so please check
if this patch at least alters when it hangs etc.
--- /opt/kernel/linux-2.5.2-pre3/drivers/block/elevator.c Sat Dec 29 12:17:53 2001
+++ drivers/block/elevator.c Sat Dec 29 12:30:20 2001
@@ -142,7 +142,7 @@
int elevator_linus_merge(request_queue_t *q, struct request **req,
struct bio *bio)
{
- struct list_head *entry;
+ struct list_head *entry, *head = &q->queue_head;
struct request *__rq;
int ret;
@@ -160,17 +160,22 @@
}
}
+ if ((__rq = __elv_next_request(q)))
+ if (__rq->flags & REQ_STARTED)
+ head = head->next;
+
entry = &q->queue_head;
ret = ELEVATOR_NO_MERGE;
- while ((entry = entry->prev) != &q->queue_head) {
+ while ((entry = entry->prev) != head) {
__rq = list_entry_rq(entry);
+ if (__rq->flags & (REQ_BARRIER | REQ_STARTED))
+ break;
+
/*
* simply "aging" of requests in queue
*/
if (__rq->elevator_sequence-- <= 0)
- break;
- if (__rq->flags & (REQ_BARRIER | REQ_STARTED))
break;
if (!(__rq->flags & REQ_CMD))
continue;
--
Jens Axboe
On Sat, Dec 29 2001, Jens Axboe wrote:
> On Sat, Dec 29 2001, [email protected] wrote:
> > > > Kernel panic: Out of memory and no killable processes...
> > >
> > > Someone else did report a similar case. Very strange, doesn't look bio
> >
> > Al Viro posted a fix:
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=100959128922157&w=2
> >
> > I used Al's patch and 2.5.2-pre3 boots with reiserfs root_fs
> > and no panic.
> >
> > Below is the trace on 2.5.2-pre3 after dbench 32 livelocked.
>
> Thanks, could you try with this patch? It's not a fix (haven't found the
> bug yet), but I think we are looking at list corruption so please check
> if this patch at least alters when it hangs etc.
Ah I think I got it -- appears to be down to no rechecking for empty
queue after a potential queue_lock droppage (busy I/O, no request left
get_request returns NULL, drop lock and run get_request_wait). This
explains the get_request_wait deadlock, compiling right now...
--- /opt/kernel/linux-2.5.2-pre3/drivers/block/ll_rw_blk.c Sat Dec 29 12:17:53 2001
+++ drivers/block/ll_rw_blk.c Sat Dec 29 12:45:04 2001
@@ -881,7 +881,9 @@
BUG_ON(rw != READ && rw != WRITE);
+ spin_lock_irq(q->queue_lock);
rq = get_request(q, rw);
+ spin_unlock_irq(q->queue_lock);
if (!rq && (gfp_mask & __GFP_WAIT))
rq = get_request_wait(q, rw);
@@ -1081,7 +1083,7 @@
{
struct request *req, *freereq = NULL;
int el_ret, latency = 0, rw, nr_sectors, cur_nr_sectors, barrier;
- struct list_head *insert_here = &q->queue_head;
+ struct list_head *insert_here;
elevator_t *elevator = &q->elevator;
sector_t sector;
@@ -1103,15 +1105,14 @@
barrier = test_bit(BIO_RW_BARRIER, &bio->bi_rw);
spin_lock_irq(q->queue_lock);
+again:
+ req = NULL;
+ insert_here = q->queue_head.prev;
if (blk_queue_empty(q) || barrier) {
blk_plug_device(q);
goto get_rq;
}
-
-again:
- req = NULL;
- insert_here = q->queue_head.prev;
el_ret = elevator->elevator_merge_fn(q, &req, bio);
switch (el_ret) {
--
Jens Axboe
On Sat, Dec 29, 2001 at 06:48:37PM +0100, Jens Axboe wrote:
> Ah I think I got it -- appears to be down to no rechecking for empty
> queue after a potential queue_lock droppage (busy I/O, no request left
> get_request returns NULL, drop lock and run get_request_wait). This
> explains the get_request_wait deadlock, compiling right now...
>
> --
> Jens Axboe
Two thumbs up!! With your ll_rw_blk.c and elevator.c patches,
2.5.2-pre3 completes dbench 32, 128.
I'm running a more complete battery of tests and will let you know
if there are any unusual results.
Thanks!
--
Randy Hron