2003-11-26 16:52:59

by Vince

[permalink] [raw]
Subject: [kernel panic @ reboot] 2.6.0-test10-mm1

Module Size Used by
sch_ingress 3012 1
cls_u32 6532 7
sch_sfq 4480 3
sch_htb 22208 1
ip_conntrack_ftp 71092 0
ipt_MASQUERADE 2816 2
iptable_mangle 2112 0
iptable_nat 19500 2 ipt_MASQUERADE
ipt_REJECT 5312 8
ipt_limit 1856 29
ipt_state 1472 4
ip_conntrack 27248 4 ip_conntrack_ftp,ipt_MASQUERADE,iptable_nat,ipt_state
ipt_LOG 4928 15
ipt_ULOG 5672 12
iptable_filter 2176 1
ip_tables 15616 9 ipt_MASQUERADE,iptable_mangle,iptable_nat,ipt_REJECT,ipt_limit,ipt_state,ipt_LOG,ipt_ULOG,iptable_filter
binfmt_misc 8072 1
af_packet 17032 2
snd_seq_oss 32000 0
snd_seq_midi_event 6272 1 snd_seq_oss
snd_seq 51600 4 snd_seq_oss,snd_seq_midi_event
snd_pcm_oss 48164 0
snd_mixer_oss 16768 1 snd_pcm_oss
snd_via82xx 21792 0
snd_pcm 85668 2 snd_pcm_oss,snd_via82xx
snd_timer 21572 2 snd_seq,snd_pcm
snd_ac97_codec 51716 1 snd_via82xx
snd_page_alloc 8964 2 snd_via82xx,snd_pcm
snd_mpu401_uart 6016 1 snd_via82xx
snd_rawmidi 20384 1 snd_mpu401_uart
snd_seq_device 6600 3 snd_seq_oss,snd_seq,snd_rawmidi
snd 43492 12 snd_seq_oss,snd_seq_midi_event,snd_seq,snd_pcm_oss,snd_mixer_oss,snd_via82xx,snd_pcm,snd_timer,snd_ac97_codec,snd_mpu401_uart,snd_rawmidi,snd_seq_device
soundcore 7168 1 snd
speedtch 12848 1
clip 13668 1
uhci_hcd 29584 0
ehci_hcd 21764 0
usbcore 97244 5 speedtch,uhci_hcd,ehci_hcd
isofs 31544 0
zlib_inflate 21184 1 isofs
nls_cp437 5376 1
vfat 12672 1
fat 40512 1 vfat
nls_iso8859_1 3776 2
ntfs 96364 1


Attachments:
messages.txt (16.00 kB)
dmesg (14.97 kB)
config-2.6.0-test10-mm1 (25.45 kB)
lsmod (2.02 kB)
Download all attachments

2003-11-26 17:17:25

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [kernel panic @ reboot] 2.6.0-test10-mm1

On Wed, 26 Nov 2003, Vince wrote:

> I get a kernel panic each time I'm rebooting my system on all
> recent 2.6.0testx kernels (cpu is an Athlon 1800XP, kernel compiled with
> preempt and ACPI ; config and dmesg is attached).
>
> This time, I got tired of seeing this and finally installed kmsgdump
> in order to collect some data, available in messages.txt (*)
>
> For my particular case, X was not loaded: I just logged in in console
> mode and did a reboot. No nvidia or other binary driver loaded. Any hint
> on tracking down this bug is appreciated (I can compile my kernel with
> additional debugging options if required).

I can't see the first oops, it looks like it's been spewing them out for a
while too;

<4>Oops: 0000 [#49]

At the point you're at there really isn't much state left to work from.
Any chance you can get at the logs (if it hit disk) and get the first
oops?

2003-11-26 17:34:41

by Vince

[permalink] [raw]
Subject: Re: [kernel panic @ reboot] 2.6.0-test10-mm1

Zwane Mwaikambo wrote:
> On Wed, 26 Nov 2003, Vince wrote:
>
> <4>Oops: 0000 [#49]
>
> At the point you're at there really isn't much state left to work from.
> Any chance you can get at the logs (if it hit disk) and get the first
> oops?
>

Nothing ever hits the disk (In interrupt handler - not syncing ...),
that's the reason why I had to install kmsgdump in the first place.
(Sidenote: a few days ago, I had the intent to install the lkcd kernel
patches, but gave up because of the time required to
patch/compile/install/setup correctly the kernel and userspace utilities
(not .deb of lkcd-utils available...)).
I suppose I could enlarge the kernel message log size, but the kmsgdup
documentation states:
---------------------------------
If you have changed your messages buffer size (which is 16 kB by
default), you should modify the size in "include/asm/kmsgdump.h",
parameter LOG_BUF_LEN. Some people required 32 kB. But you shouldn't
exceed 60 kB since the dump is done in real mode (16 bits).
For kernel versions 2.5.6x and later, the LOG_BUF_LEN parameter is part
of the kernel .config file (LOG_BUF_SHIFT) so you don't need to modify
it at all.
---------------------------------

...so I you think 60kB would be enough to catch the first oops -- or if
the doc is outdated -- I can try this...

2003-11-26 17:42:05

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [kernel panic @ reboot] 2.6.0-test10-mm1

On Wed, 26 Nov 2003, Vince wrote:

> parameter LOG_BUF_LEN. Some people required 32 kB. But you shouldn't
> exceed 60 kB since the dump is done in real mode (16 bits).
> For kernel versions 2.5.6x and later, the LOG_BUF_LEN parameter is part
> of the kernel .config file (LOG_BUF_SHIFT) so you don't need to modify
> it at all.
> ---------------------------------
>
> ...so I you think 60kB would be enough to catch the first oops -- or if
> the doc is outdated -- I can try this...

*groan* do you have a PDA?

2003-11-26 17:42:13

by Randy.Dunlap

[permalink] [raw]
Subject: Re: [kernel panic @ reboot] 2.6.0-test10-mm1

On Wed, 26 Nov 2003 18:34:34 +0100 Vince <[email protected]> wrote:

| Zwane Mwaikambo wrote:
| > On Wed, 26 Nov 2003, Vince wrote:
| >
| > <4>Oops: 0000 [#49]
| >
| > At the point you're at there really isn't much state left to work from.
| > Any chance you can get at the logs (if it hit disk) and get the first
| > oops?
| >
|
| Nothing ever hits the disk (In interrupt handler - not syncing ...),
| that's the reason why I had to install kmsgdump in the first place.
| (Sidenote: a few days ago, I had the intent to install the lkcd kernel
| patches, but gave up because of the time required to
| patch/compile/install/setup correctly the kernel and userspace utilities
| (not .deb of lkcd-utils available...)).
| I suppose I could enlarge the kernel message log size, but the kmsgdup
| documentation states:
| ---------------------------------
| If you have changed your messages buffer size (which is 16 kB by
| default), you should modify the size in "include/asm/kmsgdump.h",
| parameter LOG_BUF_LEN. Some people required 32 kB. But you shouldn't
| exceed 60 kB since the dump is done in real mode (16 bits).
| For kernel versions 2.5.6x and later, the LOG_BUF_LEN parameter is part
| of the kernel .config file (LOG_BUF_SHIFT) so you don't need to modify
| it at all.
| ---------------------------------
|
| ...so I you think 60kB would be enough to catch the first oops -- or if
| the doc is outdated -- I can try this...

wow... ooops. a kmsgdump user. :)

No, the doc is not outdated, and since the log buf size must be a
power of 2, 32 KB is the largest that is currently supported.
Sorry about that.

--
~Randy
MOTD: Always include version info.

2003-11-26 17:54:20

by Vince

[permalink] [raw]
Subject: Re: [kernel panic @ reboot] 2.6.0-test10-mm1

Zwane Mwaikambo wrote:
> On Wed, 26 Nov 2003, Vince wrote:
>>parameter LOG_BUF_LEN. Some people required 32 kB. But you shouldn't
>>exceed 60 kB since the dump is done in real mode (16 bits).
>>For kernel versions 2.5.6x and later, the LOG_BUF_LEN parameter is part
>>of the kernel .config file (LOG_BUF_SHIFT) so you don't need to modify
>>it at all.
>>---------------------------------
>>
>>...so I you think 60kB would be enough to catch the first oops -- or if
>>the doc is outdated -- I can try this...
>
>
> *groan* do you have a PDA?
>

Nope. I could probably borrow a laptop to a friend but am not excited at
the idea of having to setup some serial console thing (I do not even
have a serial cable). Dump to floppy/swap/disk would be much easier in
my case... if it could me made to work, of course ;-)

2003-11-26 18:20:14

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [kernel panic @ reboot] 2.6.0-test10-mm1

On Wed, 26 Nov 2003, Vince wrote:

> > *groan* do you have a PDA?
> >
>
> Nope. I could probably borrow a laptop to a friend but am not excited at
> the idea of having to setup some serial console thing (I do not even
> have a serial cable). Dump to floppy/swap/disk would be much easier in
> my case... if it could me made to work, of course ;-)

Those oopses looked rather spurious, i'm not sure what help those other
methods would be here. Try applying the following patch and be sure to
have access to the console. You may have to hand transcribe...

Index: linux-2.6.0-test10-mm1-bochs/arch/i386/kernel/traps.c
===================================================================
RCS file: /build/cvsroot/linux-2.6.0-test10-mm1/arch/i386/kernel/traps.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 traps.c
--- linux-2.6.0-test10-mm1-bochs/arch/i386/kernel/traps.c 26 Nov 2003 05:28:50 -0000 1.1.1.1
+++ linux-2.6.0-test10-mm1-bochs/arch/i386/kernel/traps.c 26 Nov 2003 18:17:37 -0000
@@ -329,6 +329,10 @@ void die(const char * str, struct pt_reg
if (in_interrupt())
panic("Fatal exception in interrupt");

+ local_irq_disable();
+ while (1)
+ __asm__ __volatile__("hlt");
+
if (panic_on_oops) {
printk(KERN_EMERG "Fatal exception: panic in 5 seconds\n");
set_current_state(TASK_UNINTERRUPTIBLE);

2003-11-26 23:37:43

by Mike Fedyk

[permalink] [raw]
Subject: Re: [kernel panic @ reboot] 2.6.0-test10-mm1

On Wed, Nov 26, 2003 at 01:18:48PM -0500, Zwane Mwaikambo wrote:
> On Wed, 26 Nov 2003, Vince wrote:
>
> > > *groan* do you have a PDA?
> > >
> >
> > Nope. I could probably borrow a laptop to a friend but am not excited at
> > the idea of having to setup some serial console thing (I do not even
> > have a serial cable). Dump to floppy/swap/disk would be much easier in
> > my case... if it could me made to work, of course ;-)
>
> Those oopses looked rather spurious, i'm not sure what help those other
> methods would be here. Try applying the following patch and be sure to
> have access to the console. You may have to hand transcribe...

Interesting. It would be nice to have a boot option that halts the system
after the first oops, instead of trying to continue.

Vince/Randy:
Did you use the 2.5.65 patch at http://w.ods.org/tools/kmsgdump/ or is there
some other place that has newer patches?

BTW, http://www.xenotime.net/linux/kmsgdump gives a 404 error.

2003-11-26 23:41:55

by Vince

[permalink] [raw]
Subject: Re: [kernel panic @ reboot] 2.6.0-test10-mm1

Mike Fedyk wrote:
> Interesting. It would be nice to have a boot option that halts the system
> after the first oops, instead of trying to continue.
>
> Vince/Randy:
> Did you use the 2.5.65 patch at http://w.ods.org/tools/kmsgdump/ or is there
> some other place that has newer patches?
>
> BTW, http://www.xenotime.net/linux/kmsgdump gives a 404 error.

My version comes from:
http://developer.osdl.org/rddunlap/kmsgdump/

2003-11-27 00:59:48

by Vince

[permalink] [raw]
Subject: Re: [kernel panic @ reboot in usbcore] 2.6.0-test10-mm1 (culprit: modem_run)

It worked, but I had -- as expected -- to write the oops by hand.
(user request to Randy: would it be possible to have an option in
kmsgdump to only write the first oops on floppy ???)

I it have all on paper, but I'm too lazy to write the full stack right
now (available later on request: I have to go to bed now 8):

------------------------------------------------------------------
CPU: 0
EIP: 0060 : [<d0ae9822>]
EFLAGS: 00010246
EIP is at releaseintf+0x62/0x80 [usbcore]
eax:00000000 ebx:ceddc224 ecx:cs6D5DC0 edx:00000000
esi:ceddc200 edi:00000000 ebp:cd647f0c esp:cd647ef8
ds: 007b es:007b ss:0068

Process: modem_run (pid: 1121, threadinfo=cd646000, task=ce644040)
Stack: c016ffe3 ce0bfb24 ce6d5dc0 ...
[...]

Call trace
[<c016ffe3>] iput+0x63/0x80
[<d0ae9c27>] usbdev_release+0xb7/0xc0 [usbcore]
[<c0157a5c>] __fput+0x10c/0x120
[<c0156047>] filp_close+0x57/0x80
[<c0123d17>] put_files_struct+0x67/0xd0
[<c012491e>] do_exit+0x3a/0xb0
[<c0124c4a>] do_group_exit+0x3a/0xb0
[<c02a302e>] sysenter_past_esp+0x43/0x65
-------------------------------------------------------------------

The modem_run process is the one uploading the firmware for my
speedtouch dsl modem. I'm using the kernel-space speedtouch driver, with
modem_run from http://speedtouch.sourceforge.net/
Manually shutting down the network and killing modem_run before
rebooting makes the oops disapear.

However, I believe the fact that modem_run can cause a kernel panic is
still a bug that should be fixed. I'm willing to test any patch to fix
this issue that has ennoyed me since a long time (in the meantime, I'll
work around this in my shutdown scripts). :-)



Zwane Mwaikambo wrote:
> On Wed, 26 Nov 2003, Vince wrote:
>
>
>>>*groan* do you have a PDA?
>>>
>>
>>Nope. I could probably borrow a laptop to a friend but am not excited at
>>the idea of having to setup some serial console thing (I do not even
>>have a serial cable). Dump to floppy/swap/disk would be much easier in
>>my case... if it could me made to work, of course ;-)
>
>
> Those oopses looked rather spurious, i'm not sure what help those other
> methods would be here. Try applying the following patch and be sure to
> have access to the console. You may have to hand transcribe...
>
> Index: linux-2.6.0-test10-mm1-bochs/arch/i386/kernel/traps.c
> ===================================================================
> RCS file: /build/cvsroot/linux-2.6.0-test10-mm1/arch/i386/kernel/traps.c,v
> retrieving revision 1.1.1.1
> diff -u -p -B -r1.1.1.1 traps.c
> --- linux-2.6.0-test10-mm1-bochs/arch/i386/kernel/traps.c 26 Nov 2003 05:28:50 -0000 1.1.1.1
> +++ linux-2.6.0-test10-mm1-bochs/arch/i386/kernel/traps.c 26 Nov 2003 18:17:37 -0000
> @@ -329,6 +329,10 @@ void die(const char * str, struct pt_reg
> if (in_interrupt())
> panic("Fatal exception in interrupt");
>
> + local_irq_disable();
> + while (1)
> + __asm__ __volatile__("hlt");
> +
> if (panic_on_oops) {
> printk(KERN_EMERG "Fatal exception: panic in 5 seconds\n");
> set_current_state(TASK_UNINTERRUPTIBLE);


2003-11-27 03:15:06

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [kernel panic @ reboot in usbcore] 2.6.0-test10-mm1 (culprit: modem_run)

On Thu, 27 Nov 2003, Vince wrote:

> It worked, but I had -- as expected -- to write the oops by hand.
> (user request to Randy: would it be possible to have an option in
> kmsgdump to only write the first oops on floppy ???)
>
> I it have all on paper, but I'm too lazy to write the full stack right
> now (available later on request: I have to go to bed now 8):

Yes please get it all done =) Especially the first line and the bottom
"Code:" line.

> CPU: 0
> EIP: 0060 : [<d0ae9822>]
> EFLAGS: 00010246
> EIP is at releaseintf+0x62/0x80 [usbcore]
> eax:00000000 ebx:ceddc224 ecx:cs6D5DC0 edx:00000000
> esi:ceddc200 edi:00000000 ebp:cd647f0c esp:cd647ef8
> ds: 007b es:007b ss:0068
>
> Process: modem_run (pid: 1121, threadinfo=cd646000, task=ce644040)
> Stack: c016ffe3 ce0bfb24 ce6d5dc0 ...
> [...]
>
> Call trace
> [<c016ffe3>] iput+0x63/0x80
> [<d0ae9c27>] usbdev_release+0xb7/0xc0 [usbcore]
> [<c0157a5c>] __fput+0x10c/0x120
> [<c0156047>] filp_close+0x57/0x80
> [<c0123d17>] put_files_struct+0x67/0xd0
> [<c012491e>] do_exit+0x3a/0xb0
> [<c0124c4a>] do_group_exit+0x3a/0xb0
> [<c02a302e>] sysenter_past_esp+0x43/0x65

2003-11-27 08:14:50

by Vince

[permalink] [raw]
Subject: Re: [kernel panic @ reboot in usbcore] 2.6.0-test10-mm1 (culprit: modem_run)

Zwane Mwaikambo wrote:

>
> Yes please get it all done =) Especially the first line and the bottom
> "Code:" line.
>

Hmm either the "Code:" line was not available, or I simply forgot to
write it :-/
Here is all the info I have written:


CPU: 0
EIP: 0060 : [<d0ae9822>]
EFLAGS: 00010246
EIP is at releaseintf+0x62/0x80 [usbcore]
eax:00000000 ebx:ceddc224 ecx:cs6D5DC0 edx:00000000
esi:ceddc200 edi:00000000 ebp:cd647f0c esp:cd647ef8
ds: 007b es:007b ss:0068

Process: modem_run (pid: 1121, threadinfo=cd646000, task=ce644040)
Stack: c016ffe3 ce0bfb24 ce6d5dc0 00000000 cffe4dc0 cd647f24 d0ae9c27
00000000 ce773800 00000000 cd647f48 c0157a5c ce529240 ce773800
ce511000 ce773800 00000000 cf699c80 cd647f64 c0156047 ce773800

Call trace
[<c016ffe3>] iput+0x63/0x80
[<d0ae9c27>] usbdev_release+0xb7/0xc0 [usbcore]
[<c0157a5c>] __fput+0x10c/0x120
[<c0156047>] filp_close+0x57/0x80
[<c0123d17>] put_files_struct+0x67/0xd0
[<c012491e>] do_exit+0x3a/0xb0
[<c0124c4a>] do_group_exit+0x3a/0xb0
[<c02a302e>] sysenter_past_esp+0x43/0x65


(If another trace is required, I'll do it... just ask!)


2003-11-27 08:13:14

by Duncan Sands

[permalink] [raw]
Subject: Re: [kernel panic @ reboot in usbcore] 2.6.0-test10-mm1 (culprit: modem_run)

This looks like a problem I have been seeing. I have a fix for this
in the works. Unfortunately I'm pretty busy right now, so I can't
say when I'll have it ready. The problem is in drivers/usb/core/devio.c.
Actually there are lots of problems in devio.c :) One problem is that
devio.c is not protected against actconfig changing under it (thanks
to a usb core change, it becomes NULL before becoming something
else, which causes devio.c to oops rather than quietly do the wrong
thing). The use of dev->serialize (usb semaphore) in releaseintf is
also wrong because it can lead to deadlock. Another problem comes
from devio.c (i.e. usbfs) thinking that disconnects are for the whole
device and not just an interface. Furthermore, there are various
oopsen that come from devio.c not handling urb unlink failures.
That's all I remember off the top of my head :) I sent a couple of
emails about this (especially the locking problems) to the usb
mailing list lately. Hang on, I just remembered another one:
releaseintf needs to be called with a write lock taken on
ps->devsem rather than a read lock, otherwise it can boot other
parts of devio.c off an interface when they think they still have
it.

Ciao,

Duncan.


On Thursday 27 November 2003 01:59, Vince wrote:
> It worked, but I had -- as expected -- to write the oops by hand.
> (user request to Randy: would it be possible to have an option in
> kmsgdump to only write the first oops on floppy ???)
>
> I it have all on paper, but I'm too lazy to write the full stack right
> now (available later on request: I have to go to bed now 8):
>
> ------------------------------------------------------------------
> CPU: 0
> EIP: 0060 : [<d0ae9822>]
> EFLAGS: 00010246
> EIP is at releaseintf+0x62/0x80 [usbcore]
> eax:00000000 ebx:ceddc224 ecx:cs6D5DC0 edx:00000000
> esi:ceddc200 edi:00000000 ebp:cd647f0c esp:cd647ef8
> ds: 007b es:007b ss:0068
>
> Process: modem_run (pid: 1121, threadinfo=cd646000, task=ce644040)
> Stack: c016ffe3 ce0bfb24 ce6d5dc0 ...
> [...]
>
> Call trace
> [<c016ffe3>] iput+0x63/0x80
> [<d0ae9c27>] usbdev_release+0xb7/0xc0 [usbcore]
> [<c0157a5c>] __fput+0x10c/0x120
> [<c0156047>] filp_close+0x57/0x80
> [<c0123d17>] put_files_struct+0x67/0xd0
> [<c012491e>] do_exit+0x3a/0xb0
> [<c0124c4a>] do_group_exit+0x3a/0xb0
> [<c02a302e>] sysenter_past_esp+0x43/0x65
> -------------------------------------------------------------------
>
> The modem_run process is the one uploading the firmware for my
> speedtouch dsl modem. I'm using the kernel-space speedtouch driver, with
> modem_run from http://speedtouch.sourceforge.net/
> Manually shutting down the network and killing modem_run before
> rebooting makes the oops disapear.
>
> However, I believe the fact that modem_run can cause a kernel panic is
> still a bug that should be fixed. I'm willing to test any patch to fix
> this issue that has ennoyed me since a long time (in the meantime, I'll
> work around this in my shutdown scripts). :-)
>
> Zwane Mwaikambo wrote:
> > On Wed, 26 Nov 2003, Vince wrote:
> >>>*groan* do you have a PDA?
> >>
> >>Nope. I could probably borrow a laptop to a friend but am not excited at
> >>the idea of having to setup some serial console thing (I do not even
> >>have a serial cable). Dump to floppy/swap/disk would be much easier in
> >>my case... if it could me made to work, of course ;-)
> >
> > Those oopses looked rather spurious, i'm not sure what help those other
> > methods would be here. Try applying the following patch and be sure to
> > have access to the console. You may have to hand transcribe...
> >
> > Index: linux-2.6.0-test10-mm1-bochs/arch/i386/kernel/traps.c
> > ===================================================================
> > RCS file:
> > /build/cvsroot/linux-2.6.0-test10-mm1/arch/i386/kernel/traps.c,v
> > retrieving revision 1.1.1.1
> > diff -u -p -B -r1.1.1.1 traps.c
> > --- linux-2.6.0-test10-mm1-bochs/arch/i386/kernel/traps.c 26 Nov 2003
> > 05:28:50 -0000 1.1.1.1 +++
> > linux-2.6.0-test10-mm1-bochs/arch/i386/kernel/traps.c 26 Nov 2003
> > 18:17:37 -0000 @@ -329,6 +329,10 @@ void die(const char * str, struct
> > pt_reg
> > if (in_interrupt())
> > panic("Fatal exception in interrupt");
> >
> > + local_irq_disable();
> > + while (1)
> > + __asm__ __volatile__("hlt");
> > +
> > if (panic_on_oops) {
> > printk(KERN_EMERG "Fatal exception: panic in 5 seconds\n");
> > set_current_state(TASK_UNINTERRUPTIBLE);
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/