2001-02-13 14:20:09

by Martin Rode

[permalink] [raw]
Subject: BUG in sched.c, Kernel 2.4.1?

After upgrading to kernel 2.4.1 my box locked hard after starting the
regular arkeia backup. I had previously problems with MM on kernel
2.2.18 and > 2.2.19pre2. My report a few days ago is still unanswered.

I would be glad if someone would be comment this time.

Here's the typed crash dump:

st0: Block limits 1 - 16777215 bytes.
Scheduling an interrupt
kernel BUG at sched.c:714!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c0113781>]
EFLAGS: 00010282
eax: 0000001b ebx 00000000 ecx df4f6000 edx 00000001
esi: 001cffe3 edi db5eede0 ebp dc0e9f40 esp dc0e9ef0
ds 0018 es 0018 ss 0018
process o3flow (pid 13180 stackpage dc0e9000)
stack: c01f26f3 c01f2856 000002ca db5eed80 dc0e8000 db5eede0 dc0e9f18
dc0e8000 000033ba 00000000 00000000 000000e7 0000001c 0000001c
fffffff3 dc0e8000 00000800 00000000 dc0e8000 dc0e9f68 c0139c44
d488bf80 00000000

call trace: [<cc0139c44>] [<c0139d1c>] [<c0130af6>] [<c0108e93>]

code: 0f 0b 8d 65 bc 5b 5e 5f 89 ec 5d c3 8d 76 00 55 89 e5 83 ec
Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing

If necesseray I can send you my kernel .config. ACPI is disabled.


Loaded Modules are:

[martin@bart martin]$ /sbin/lsmod
Module Size Used by
parport_pc 23328 1 (autoclean)
lp 5520 1 (autoclean)
parport 24832 1 (autoclean) [parport_pc lp]
vmnet 17984 1
vmmon 17936 0 (unused)
nfsd 71792 8 (autoclean)
nfs 80032 2 (autoclean)
lockd 50192 1 (autoclean) [nfsd nfs]
sunrpc 62368 1 (autoclean) [nfsd nfs lockd]
tulip 37264 1 (autoclean)
st 27168 0 (unused)
vfat 11696 0 (unused)
fat 32480 0 [vfat]
AM53C974 13248 0 (unused)
ncr53c8xx 56928 8
sd_mod 11648 8
scsi_mod 91056 4 [st AM53C974 ncr53c8xx sd_mod]
[martin@bart martin]$


Regards,


Martin




Attachments:
Martin.Rode.vcf (365.00 B)
Card for Martin Rode

2001-02-13 14:31:11

by Brian Gerst

[permalink] [raw]
Subject: Re: BUG in sched.c, Kernel 2.4.1?

Martin Rode wrote:
>
> After upgrading to kernel 2.4.1 my box locked hard after starting the
> regular arkeia backup. I had previously problems with MM on kernel
> 2.2.18 and > 2.2.19pre2. My report a few days ago is still unanswered.
>
> I would be glad if someone would be comment this time.
>
> Here's the typed crash dump:
>
> st0: Block limits 1 - 16777215 bytes.
> Scheduling an interrupt
> kernel BUG at sched.c:714!
> invalid operand: 0000
> CPU: 0
> EIP: 0010:[<c0113781>]
> EFLAGS: 00010282
> eax: 0000001b ebx 00000000 ecx df4f6000 edx 00000001
> esi: 001cffe3 edi db5eede0 ebp dc0e9f40 esp dc0e9ef0
> ds 0018 es 0018 ss 0018
> process o3flow (pid 13180 stackpage dc0e9000)
> stack: c01f26f3 c01f2856 000002ca db5eed80 dc0e8000 db5eede0 dc0e9f18
> dc0e8000 000033ba 00000000 00000000 000000e7 0000001c 0000001c
> fffffff3 dc0e8000 00000800 00000000 dc0e8000 dc0e9f68 c0139c44
> d488bf80 00000000
>
> call trace: [<cc0139c44>] [<c0139d1c>] [<c0130af6>] [<c0108e93>]
>
> code: 0f 0b 8d 65 bc 5b 5e 5f 89 ec 5d c3 8d 76 00 55 89 e5 83 ec
> Kernel panic: Aiee, killing interrupt handler!
> In interrupt handler - not syncing

Run this oops message through ksymoops please. It will make debugging
it alot easier.

--

Brian Gerst

2001-02-13 15:22:22

by Martin Rode

[permalink] [raw]
Subject: Re: BUG in sched.c, Kernel 2.4.1?

>
> Run this oops message through ksymoops please. It will make debugging
> it alot easier.
>
>

Since I did not compile the kernel myself, ksymoops is not too happy with
what is has to analyse the dump. I tried compile the Mandrake kernel myself
but there seems to be something unmatched. See below for what ksymoops
gives me.

Warning (compare_maps): mismatch on symbol vt_cons , ksyms_base says
c02b06e0, vmlinux says c02ac6e0. Ignoring ksyms_base entry

(I get about > 300 msgs of that kind)

Let me know who I can prepare for the next crash with my own kernel. Are
there any options I have to turn on for compiling?




kernel BUG at sched.c:714!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c0113781>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010282
eax: 0000001b ebx 00000000 ecx df4f6000 edx 00000001
esi: 001cffe3 edi db5eede0 ebp dc0e9f40 esp dc0e9ef0
stack: c01f26f3 c01f2856 000002ca db5eed80 dc0e8000 db5eede0 dc0e9f18
dc0e8000 000033ba 00000000 00000000 000000e7 0000001c 0000001c
fffffff3 dc0e8000 00000800 00000000 dc0e8000 dc0e9f68 c0139c44
d488bf80 00000000
call trace: [<cc0139c44>] [<c0139d1c>] [<c0130af6>] [<c0108e93>]
code: 0f 0b 8d 65 bc 5b 5e 5f 89 ec 5d c3 8d 76 00 55 89 e5 83 ec

>>EIP; c0113781 <schedule+421/430> <=====
Trace; cc0139c44 <END_OF_CODE+bdf830401/????>
Trace; c0139d1c <pipe_read+80/238>
Trace; c0130af6 <sys_read+5e/c4>
Trace; c0108e93 <system_call+33/40>
Code; c0113781 <schedule+421/430>
00000000 <_EIP>:
Code; c0113781 <schedule+421/430> <=====
0: 0f 0b ud2a <=====
Code; c0113783 <schedule+423/430>
2: 8d 65 bc lea 0xffffffbc(%ebp),%esp
Code; c0113786 <schedule+426/430>
5: 5b pop %ebx
Code; c0113787 <schedule+427/430>
6: 5e pop %esi
Code; c0113788 <schedule+428/430>
7: 5f pop %edi
Code; c0113789 <schedule+429/430>
8: 89 ec mov %ebp,%esp
Code; c011378b <schedule+42b/430>
a: 5d pop %ebp
Code; c011378c <schedule+42c/430>
b: c3 ret
Code; c011378d <schedule+42d/430>
c: 8d 76 00 lea 0x0(%esi),%esi
Code; c0113790 <__wake_up+0/9c>
f: 55 push %ebp
Code; c0113791 <__wake_up+1/9c>
10: 89 e5 mov %esp,%ebp
Code; c0113793 <__wake_up+3/9c>
12: 83 ec 00 sub $0x0,%esp

Kernel panic: Aiee, killing interrupt handler!

971 warnings and 5 errors issued. Results may not be reliable.

;Martin



Attachments:
Martin.Rode.vcf (365.00 B)
Card for Martin Rode

2001-02-13 15:49:58

by Manfred Spraul

[permalink] [raw]
Subject: Re: BUG in sched.c, Kernel 2.4.1?

Martin Rode wrote:
>
> >
> > Run this oops message through ksymoops please. It will make debugging
> > it alot easier.
> >
> >
>
> Since I did not compile the kernel myself, ksymoops is not too happy with
> what is has to analyse the dump. I tried compile the Mandrake kernel myself
> but there seems to be something unmatched. See below for what ksymoops
> gives me.
>
looks good.

> Warning (compare_maps): mismatch on symbol vt_cons , ksyms_base says
> c02b06e0, vmlinux says c02ac6e0. Ignoring ksyms_base entry
>
> (I get about > 300 msgs of that kind)
>
> Let me know who I can prepare for the next crash with my own kernel. Are
> there any options I have to turn on for compiling?
>
> kernel BUG at sched.c:714!
> invalid operand: 0000
> CPU: 0
> EIP: 0010:[<c0113781>]
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010282
> eax: 0000001b ebx 00000000 ecx df4f6000 edx 00000001
> esi: 001cffe3 edi db5eede0 ebp dc0e9f40 esp dc0e9ef0
> stack: c01f26f3 c01f2856 000002ca db5eed80 dc0e8000 db5eede0 dc0e9f18
> dc0e8000 000033ba 00000000 00000000 000000e7 0000001c 0000001c
> fffffff3 dc0e8000 00000800 00000000 dc0e8000 dc0e9f68 c0139c44
> d488bf80 00000000

esp is quite high, only 0x110 bytes of the stack are used.

> call trace: [<cc0139c44>] [<c0139d1c>] [<c0130af6>] [<c0108e93>]
^^^^^^^^^
> code: 0f 0b 8d 65 bc 5b 5e 5f 89 ec 5d c3 8d 76 00 55 89 e5 83 ec
>
> >>EIP; c0113781 <schedule+421/430> <=====
> Trace; cc0139c44 <END_OF_CODE+bdf830401/????>
^^^^^^^^^

did you manually copy the oops from the screen?
that value should be c0139c44 <pipe_wait...>

> Trace; c0139d1c <pipe_read+80/238>
> Trace; c0130af6 <sys_read+5e/c4>
> Trace; c0108e93 <system_call+33/40>
>
> [snip]
>
> Kernel panic: Aiee, killing interrupt handler!
>
I don't see that interrupt handler!
it seems to be a normal syscall, just a pipe read that blocks because
the pipe is empty.

Is that the first oops, or was there another oops before this one?

--
Manfred

2001-02-13 18:45:14

by Martin Rode

[permalink] [raw]
Subject: Re: BUG in sched.c, Kernel 2.4.1?

Re-ran ksymoops:

; Martin



kernel BUG at sched.c:714!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c0113781>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010282
eax: 0000001b ebx 00000000 ecx df4f6000 edx 00000001
esi: 001cffe3 edi db5eede0 ebp dc0e9f40 esp dc0e9ef0
stack: c01f26f3 c01f2856 000002ca db5eed80 dc0e8000 db5eede0 dc0e9f18
dc0e8000 000033ba 00000000 00000000 000000e7 0000001c 0000001c
fffffff3 dc0e8000 00000800 00000000 dc0e8000 dc0e9f68 c0139c44
d488bf80 00000000
call trace: [<c0139c44>] [<c0139d1c>] [<c0130af6>] [<c0108e93>]
code: 0f 0b 8d 65 bc 5b 5e 5f 89 ec 5d c3 8d 76 00 55 89 e5 83 ec

>>EIP; c0113781 <schedule+421/430> <=====
Trace; c0139c44 <pipe_wait+44/9c>
Trace; c0139d1c <pipe_read+80/238>
Trace; c0130af6 <sys_read+5e/c4>
Trace; c0108e93 <system_call+33/40>
Code; c0113781 <schedule+421/430>
00000000 <_EIP>:
Code; c0113781 <schedule+421/430> <=====
0: 0f 0b ud2a <=====
Code; c0113783 <schedule+423/430>
2: 8d 65 bc lea 0xffffffbc(%ebp),%esp
Code; c0113786 <schedule+426/430>
5: 5b pop %ebx
Code; c0113787 <schedule+427/430>
6: 5e pop %esi
Code; c0113788 <schedule+428/430>
7: 5f pop %edi
Code; c0113789 <schedule+429/430>
8: 89 ec mov %ebp,%esp
Code; c011378b <schedule+42b/430>
a: 5d pop %ebp
Code; c011378c <schedule+42c/430>
b: c3 ret
Code; c011378d <schedule+42d/430>
c: 8d 76 00 lea 0x0(%esi),%esi
Code; c0113790 <__wake_up+0/9c>
f: 55 push %ebp
Code; c0113791 <__wake_up+1/9c>
10: 89 e5 mov %esp,%ebp
Code; c0113793 <__wake_up+3/9c>
12: 83 ec 00 sub $0x0,%esp

Kernel panic: Aiee, killing interrupt handler!

971 warnings and 5 errors issued. Results may not be reliable.


Attachments:
Martin.Rode.vcf (365.00 B)
Card for Martin Rode

2001-02-13 18:57:26

by Martin Rode

[permalink] [raw]
Subject: Re: BUG in sched.c, Kernel 2.4.1?

Manfred Spraul wrote:

> Martin Rode wrote:
> >
> > >
> > > Run this oops message through ksymoops please. It will make debugging
> > > it alot easier.
> > >
> > >
> >
> > Since I did not compile the kernel myself, ksymoops is not too happy with
> > what is has to analyse the dump. I tried compile the Mandrake kernel myself
> > but there seems to be something unmatched. See below for what ksymoops
> > gives me.
> >
> looks good.
>
> > call trace: [<cc0139c44>] [<c0139d1c>] [<c0130af6>] [<c0108e93>]
> ^^^^^^^^^
>

I re-ran ksymoops and posted it again to the list.

> Is that the first oops, or was there another oops before this one?

I don't know. The server is in a different room from my office so all I saw was
the lock up and all I could do was copying the oops from screen. Our backup
regularily starts at 9 P.M., so I'm very confident that it 's gonna happen
again.

It did not happen with 2.2.18 or 2.2.19pre, that's for sure. With these kernel I
only had MM problems like vm_trying_to_free_unused_page etc.

Martin



Attachments:
Martin.Rode.vcf (365.00 B)
Card for Martin Rode