2000-12-13 19:48:04

by Pete Toscano

[permalink] [raw]
Subject: test1[12] + sparc + bind 9.1.0b1 == bad things

hello,

i'm tried using the first beta release of bind 9.1.0 on an ultra 5
running 2.4.0-test11 or test12 (modified redhat 6.2 distro -- mostly
ipv6-related mods). as soon as i start up named, the machine goes nuts
and continuously prints the following oops (from test12):


\|/ ____ \|/
"@'/ .. \`@"
/_| \__/ |_\
\__U_/
(10): Oops
TSTATE: 0000000080f09606 TPC: 0000000000448264 TNPC: 0000000000448268 Y: 01800000
g0: 000000000041a198 g1: ffffffffffffffff g2: 0000000000000000 g3: 3030386666666666
g4: fffff80000000000 g5: 000000000000000f g6: fffff800167dc000 g7: 0000000000000000
o0: 0000000000000001 o1: 000000000000000f o2: fffff800167dc178 o3: 0000000000000000
o4: 0000000000624f3b o5: 0000000000624f3f sp: fffff8001295bdd1 ret_pc: 0000000000443848
l0: 0000000000000006 l1: fffff800167dc000 l2: 0000000000629000 l3: 000000000068dc00
l4: 0000000000629000 l5: 0000000000003fff l6: 000000000000000f l7: 0000000000625318
i0: 0000000000000009 i1: 0000000000000400 i2: fffff800167dc000 i3: 0000000000000001
i4: 0000000000624f1b i5: 0000000000624f26 i6: fffff8001295be91 i7: 000000000041a198
Instruction DUMP: 10680005 90102000 c45aa008 <c470e008> c6708000 913a2000 c0728000 c072a008 91924000
Aiee, killing interrupt handler
??<1>Unable to handle kernel paging request in mna handler<1> at virtual address 303038666666666e
current->{mm,active_mm}->context = 00000000625318ff
current->{mm,active_mm}->pgd = 0000000003402a00

here's the ksymoops output:

ksymoops 2.3.5 on sparc64 2.4.0-test12-1. Options used
-V (default)
-K (specified)
-l /proc/modules (default)
-o /lib/modules/2.4.0-test12-1/ (default)
-m /usr/src/linux/System.map (default)

No modules in ksyms, skipping objects
No ksyms, skipping lsmod
(10): Oops
TSTATE: 0000000080f09606 TPC: 0000000000448264 TNPC: 0000000000448268 Y: 01800000
Using defaults from ksymoops -t elf32-sparc -a sparc
g0: 000000000041a198 g1: ffffffffffffffff g2: 0000000000000000 g3: 3030386666666666
g4: fffff80000000000 g5: 000000000000000f g6: fffff800167dc000 g7: 0000000000000000
o0: 0000000000000001 o1: 000000000000000f o2: fffff800167dc178 o3: 0000000000000000
o4: 0000000000624f3b o5: 0000000000624f3f sp: fffff8001295bdd1 ret_pc: 0000000000443848
l0: 0000000000000006 l1: fffff800167dc000 l2: 0000000000629000 l3: 000000000068dc00
l4: 0000000000629000 l5: 0000000000003fff l6: 000000000000000f l7: 0000000000625318
i0: 0000000000000009 i1: 0000000000000400 i2: fffff800167dc000 i3: 0000000000000001
i4: 0000000000624f1b i5: 0000000000624f26 i6: fffff8001295be91 i7: 000000000041a198
Instruction DUMP: 10680005 90102000 c45aa008 <c470e008> c6708000 913a2000 c0728000 c072a008 91924000

>>PC; 00448264 <del_timer+24/60> <=====
>>O7; 00443848 <do_exit+68/220>
>>I7; 0041a198 <die_if_kernel+f8/120>
Code; 00448258 <del_timer+18/60>
0000000000000000 <_PC>:
Code; 00448258 <del_timer+18/60>
0: 10 68 00 05 unknown
Code; 0044825c <del_timer+1c/60>
4: 90 10 20 00 clr %o0
Code; 00448260 <del_timer+20/60>
8: c4 5a a0 08 unknown
Code; 00448264 <del_timer+24/60> <=====
c: c4 70 e0 08 unknown <=====
Code; 00448268 <del_timer+28/60>
10: c6 70 80 00 unknown
Code; 0044826c <del_timer+2c/60>
14: 91 3a 20 00 sra %o0, 0, %o0
Code; 00448270 <del_timer+30/60>
18: c0 72 80 00 unknown
Code; 00448274 <del_timer+34/60>
1c: c0 72 a0 08 unknown
Code; 00448278 <del_timer+38/60>
20: 91 92 40 00 unknown

Aiee, killing interrupt handler
<1>Unable to handle kernel paging request in mna handler<1> at virtual address 303038666666666e

is there any further info i can provide? would the test11 oops help
too?

is it not bad enough that i spent the whole day frustrated, working with
this system? but then the computer had to keep making faces at me,
mocking me. *sigh* =;]

pete

--
Pete Toscano p:[email protected] w:[email protected]
GPG fingerprint: D8F5 A087 9A4C 56BB 8F78 B29C 1FF0 1BA7 9008 2736


Attachments:
(No filename) (3.96 kB)
(No filename) (232.00 B)
Download all attachments

2000-12-13 19:58:36

by David Miller

[permalink] [raw]
Subject: Re: test1[12] + sparc + bind 9.1.0b1 == bad things

Date: Wed, 13 Dec 2000 14:17:15 -0500
From: Pete Toscano <[email protected]>

i'm tried using the first beta release of bind 9.1.0 on an ultra 5
running 2.4.0-test11 or test12 (modified redhat 6.2 distro -- mostly
ipv6-related mods). as soon as i start up named, the machine goes nuts
and continuously prints the following oops (from test12):

Is this the first OOPS it prints out? I don't think so. I am
very sure it printed out messages from die_if_kernel first and
we need that initial OOPS to diagnose this bug and fix it.

All the rest of the OOPS messages are useless and won't tell
us what the real problem is.

Later,
David S. Miller
[email protected]

2000-12-14 00:05:34

by Pete Zaitcev

[permalink] [raw]
Subject: Re: test1[12] + sparc + bind 9.1.0b1 == bad things

> Is this the first OOPS it prints out? I don't think so. I am
> very sure it printed out messages from die_if_kernel first and
> we need that initial OOPS to diagnose this bug and fix it.
>
> All the rest of the OOPS messages are useless and won't tell
> us what the real problem is.

> Later,
> David S. Miller

Bad news about recursive Oops is that too often the system
cannot continue and oopsen never reach /var/log/messages.

This problem was so common on sparc(32) that I run all my
kernels with the attached patch. I think an application
of a similar change should be mandatory if you are insterested
in any sort of debugging.

The alternative is to use a serial console, captured at all times.

--Pete

diff -u -r1.63 traps.c
--- arch/sparc/kernel/traps.c 2000/06/04 06:23:52 1.63
+++ arch/sparc/kernel/traps.c 2000/06/26 18:19:10
@@ -114,18 +116,23 @@
* bound in case our stack is trashed and we loop.
*/
while(rw &&
- count++ < 30 &&
+ count++ < 10 && /* P3 30 */
(((unsigned long) rw) >= PAGE_OFFSET) &&
!(((unsigned long) rw) & 0x7)) {
printk("Caller[%08lx]\n", rw->ins[7]);
rw = (struct reg_window *)rw->ins[6];
}
}
+#if 0
printk("Instruction DUMP:");
instruction_dump ((unsigned long *) regs->pc);
if(regs->psr & PSR_PS)
do_exit(SIGKILL);
do_exit(SIGSEGV);
+#else
+ printk("Looping...");
+ for (;;) { }
+#endif
}

void do_hw_interrupt(unsigned long type, unsigned long psr, unsigned long pc)
Index: arch/sparc/mm/fault.c
===================================================================
RCS file: /vger-cvs/linux/arch/sparc/mm/fault.c,v
retrieving revision 1.116
diff -u -r1.116 fault.c
--- arch/sparc/mm/fault.c 2000/05/03 06:37:03 1.116
+++ arch/sparc/mm/fault.c 2000/06/26 18:19:11
@@ -146,11 +146,15 @@
printk(KERN_ALERT "Unable to handle kernel paging request "
"at virtual address %08lx\n", address);
}
+ if (tsk->active_mm == NULL) {
+ printk(KERN_ALERT "tsk->active_mm = NULL\n");
+ } else {
printk(KERN_ALERT "tsk->{mm,active_mm}->context = %08lx\n",
(tsk->mm ? tsk->mm->context : tsk->active_mm->context));
printk(KERN_ALERT "tsk->{mm,active_mm}->pgd = %08lx\n",
(tsk->mm ? (unsigned long) tsk->mm->pgd :
(unsigned long) tsk->active_mm->pgd));
+ }
die_if_kernel("Oops", regs);
}

2000-12-14 00:31:26

by Pete Toscano

[permalink] [raw]
Subject: Re: test1[12] + sparc + bind 9.1.0b1 == bad things



On Wed, 13 Dec 2000, Pete Zaitcev wrote:

> > Is this the first OOPS it prints out? I don't think so. I am
> > very sure it printed out messages from die_if_kernel first and
> > we need that initial OOPS to diagnose this bug and fix it.
> >
> > All the rest of the OOPS messages are useless and won't tell
> > us what the real problem is.
>
> > Later,
> > David S. Miller

no, you're right. here's the first oops:

named(465): Oops
TSTATE: 00000000f0f09603 TPC: 000000000043f730 TNPC: 000000000043f734 Y: 0c000000
g0: 70029eb470029ea0 g1: 000000000000003d g2: 0000000000000002 g3: 0000000000000000
g4: fffff80000000000 g5: 0000000000000004 g6: fffff8001318c000 g7: 000000000000003d
o0: 000000000068dd00 o1: 0000000000000001 o2: 0000000000000000 o3: 0000000000000071
o4: 0000000000000000 o5: 0000000000000000 sp: fffff8001318ed91 ret_pc: 000000000042d5c0
l0: 0000000000000000 l1: 0000000070188270 l2: fffff8001398b8f0 l3: 00000000005b4400
l4: 000000000068fc00 l5: 00000000005b45c0 l6: 000000000000000f l7: 0000000000000000
i0: 0000000000000000 i1: fffff80010528908 i2: 0000000000000001 i3: 0000000000000001
i4: 0000000000000000 i5: 0000000000000003 i6: fffff8001318ee51 i7: 00000000004b2878
Caller[00000000004b2878]
Caller[00000000004b2b3c]
Caller[00000000004e205c]
Caller[00000000004ef3d8]
Caller[00000000004e3e5c]
Caller[000000000041b154]
Caller[0000000000408874]
Caller[000000000042d5c0]
Caller[000000000042da28]
Caller[00000000004100b4]
Caller[000000007005ccd4]
Instruction DUMP: a4063ff0 d85ca008 f05e0000 <d05b0000> 900f4008 80a22000 0247fff8 80a60019 02f6ffcc
Aiee, killing interrupt handler
Unable to handle kernel NULL pointer dereference
tsk->{mm,active_mm}->context = 00000000000005c9
tsk->{mm,active_mm}->pgd = fffff80013789000

..and here's its ksymoops output:

ksymoops 2.3.5 on sparc64 2.4.0-test12-1. Options used
-V (default)
-K (specified)
-L (specified)
-O (specified)
-m /usr/src/linux/System.map (default)

named(465): Oops
TSTATE: 00000000f0f09603 TPC: 000000000043f730 TNPC: 000000000043f734 Y:0c000000
Using defaults from ksymoops -t elf32-sparc -a sparc
g0: 70029eb470029ea0 g1: 000000000000003d g2: 0000000000000002 g3: 0000000000000000
g4: fffff80000000000 g5: 0000000000000004 g6: fffff8001318c000 g7: 000000000000003d
o0: 000000000068dd00 o1: 0000000000000001 o2: 0000000000000000 o3: 0000000000000071
o4: 0000000000000000 o5: 0000000000000000 sp: fffff8001318ed91 ret_pc: 000000000042d5c0
l0: 0000000000000000 l1: 0000000070188270 l2: fffff8001398b8f0 l3: 00000000005b4400
l4: 000000000068fc00 l5: 00000000005b45c0 l6: 000000000000000f l7: 0000000000000000
i0: 0000000000000000 i1: fffff80010528908 i2: 0000000000000001 i3: 0000000000000001
i4: 0000000000000000 i5: 0000000000000003 i6: fffff8001318ee51 i7: 00000000004b2878
Caller[00000000004b2878]
Caller[00000000004b2b3c]
Caller[00000000004e205c]
Caller[00000000004ef3d8]
Caller[00000000004e3e5c]
Caller[000000000041b154]
Caller[0000000000408874]
Caller[000000000042d5c0]
Caller[000000000042da28]
Caller[00000000004100b4]
Caller[000000007005ccd4]
Instruction DUMP: a4063ff0 d85ca008 f05e0000 <d05b0000> 900f4008 80a22000 0247fff8 80a60019 02f6ffcc

>>PC; 0043f730 <__wake_up+110/220> <=====
>>O7; 0042d5c0 <cmsg32_recvmsg_fixup+80/120>
>>I7; 004b2878 <end_buffer_io_sync+58/80>
Trace; 004b2878 <end_buffer_io_sync+58/80>
Trace; 004b2b3c <end_that_request_first+5c/e0>
Trace; 004e205c <ide_end_request+1c/80>
Trace; 004ef3d8 <ide_dma_intr+78/c0>
Trace; 004e3e5c <ide_intr+13c/1a0>
Trace; 0041b154 <handler_irq+114/1c0>
Trace; 00408874 <tl0_irq3+14/40>
Trace; 0042d5c0 <cmsg32_recvmsg_fixup+80/120>
Trace; 0042da28 <sys32_recvmsg+1e8/2e0>
Trace; 004100b4 <linux_sparc_syscall32+34/40>
Trace; 7005ccd4 <END_OF_CODE+6f9ab454/????>
Code; 0043f724 <__wake_up+104/220>
0000000000000000 <_PC>:
Code; 0043f724 <__wake_up+104/220>
0: a4 06 3f f0 add %i0, -16, %l2
Code; 0043f728 <__wake_up+108/220>
4: d8 5c a0 08 unknown
Code; 0043f72c <__wake_up+10c/220>
8: f0 5e 00 00 unknown
Code; 0043f730 <__wake_up+110/220> <=====
c: d0 5b 00 00 unknown <=====
Code; 0043f734 <__wake_up+114/220>
10: 90 0f 40 08 and %i5, %o0, %o0
Code; 0043f738 <__wake_up+118/220>
14: 80 a2 20 00 cmp %o0, 0
Code; 0043f73c <__wake_up+11c/220>
18: 02 47 ff f8 unknown
Code; 0043f740 <__wake_up+120/220>
1c: 80 a6 00 19 cmp %i0, %i1
Code; 0043f744 <__wake_up+124/220>
20: 02 f6 ff cc unknown

Aiee, killing interrupt handler
Unable to handle kernel NULL pointer dereference
tsk->{mm,active_mm}->context = 00000000000005c9
tsk->{mm,active_mm}->pgd = fffff80013789000

thanks,
pete

--
Pete Toscano p:[email protected] w:[email protected]
GPG fingerprint: D8F5 A087 9A4C 56BB 8F78 B29C 1FF0 1BA7 9008 2736


Attachments:
(No filename) (4.72 kB)
(No filename) (232.00 B)
Download all attachments