2002-02-15 08:52:07

by Robert Jameson

[permalink] [raw]
Subject: oops with 2.4.18-pre9-mjc2

I have been seeing this oops from 2.4.16 -> 2.4.18-pre9, so here we go!

Reading Oops report from the terminal
Unable to handle kernel paging request at virtual address 0004004c
dc838114
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<dc838114>] Tainted: P
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00013202
eax: 00040038 ebx: d335d094 ecx: cc83c000 edx: 00040038
esi: cc83c560 edi: d335d000 ebp: d335d0f0 esp: c7223ea4
ds: 0018 es: 0018 ss: 0018
Process jpilot (pid: 23909, stackpage=c7223000)
Stack: dc9472df 00040038 d335d094 d335d000 c9299ec0 00000000 dc845484
d335d094 c9299ec0 c2743000 c859e280 c142e320 c01a2990 c2743000
c9299ec0 c9299ec0 c859e280 c142e320 c5bfb240 00000001 00007053
00003829 00000000 c012b45b Call Trace: [<dc9472df>] [<dc845484>]
[<c01a2990>] [<c012b45b>] [<c011fb50>] [<c011ff72>] [<c01a2fb6>]
[<c0131044>] [<c013014c>] [<c01162aa>] [<c0116834>] [<c011697a>]
[<c0106d13>] Code: 8b 42 14 85 c0 74 1b 8b 80 bc 00 00 00 85 c0 74 11 8b
40 18

>>EIP; dc838114 <[usbcore]usb_unlink_urb+8/30> <=====
Trace; dc9472de <[visor]visor_close+13a/168>
Trace; dc845484 <[usbserial]serial_close+a0/b0>
Trace; c01a2990 <release_dev+240/4fc>
Trace; c012b45a <free_page_and_swap_cache+32/38>
Trace; c011fb50 <__free_pte+40/48>
Trace; c011ff72 <do_zap_page_range+18e/238>
Trace; c01a2fb6 <tty_release+a/10>
Trace; c0131044 <fput+4c/d0>
Trace; c013014c <filp_close+5c/64>
Trace; c01162aa <put_files_struct+4e/b4>
Trace; c0116834 <do_exit+a8/1c8>
Trace; c011697a <sys_exit+e/10>
Trace; c0106d12 <system_call+32/38>
Code; dc838114 <[usbcore]usb_unlink_urb+8/30>
00000000 <_EIP>:
Code; dc838114 <[usbcore]usb_unlink_urb+8/30> <=====
0: 8b 42 14 mov 0x14(%edx),%eax <=====
Code; dc838116 <[usbcore]usb_unlink_urb+a/30>
3: 85 c0 test %eax,%eax
Code; dc838118 <[usbcore]usb_unlink_urb+c/30>
5: 74 1b je 22 <_EIP+0x22> dc838136
<[usbcore]usb_unlink_urb+2a/30> Code; dc83811a
<[usbcore]usb_unlink_urb+e/30> 7: 8b 80 bc 00 00 00 mov
0xbc(%eax),%eax Code; dc838120 <[usbcore]usb_unlink_urb+14/30>
d: 85 c0 test %eax,%eax
Code; dc838122 <[usbcore]usb_unlink_urb+16/30>
f: 74 11 je 22 <_EIP+0x22> dc838136
<[usbcore]usb_unlink_urb+2a/30> Code; dc838124
<[usbcore]usb_unlink_urb+18/30> 11: 8b 40 18 mov
0x18(%eax),%eax

This error occurs while hotsyncing my handspring visor pda witth
pilot-link/jpilot

--
Robert Jameson http://rj.open-net.org
C2 Village at Wexford Hwy 278, Tel: +1 (843) 757 9428
Hilton Head Isl, SC Cel: +1 (843) 298 0957
US, 29928. mailto:[email protected]


Attachments:
config-2.4.18-pre9-mjc2 (32.37 kB)
(No filename) (189.00 B)
Download all attachments

2002-02-15 08:55:57

by William Lee Irwin III

[permalink] [raw]
Subject: Re: oops with 2.4.18-pre9-mjc2

On Fri, Feb 15, 2002 at 03:51:35AM -0500, Robert Jameson wrote:
> I have been seeing this oops from 2.4.16 -> 2.4.18-pre9, so here we go!

This sounds like you might need bits from -gregkh as well.

Cheers,
Bill

2002-02-15 13:29:55

by Michael Cohen

[permalink] [raw]
Subject: Re: oops with 2.4.18-pre9-mjc2

On Fri, 2002-02-15 at 03:51, Robert Jameson wrote:
> I have been seeing this oops from 2.4.16 -> 2.4.18-pre9, so here we go!

Ouch. Does stock 2.4.18-pre9 do this?

------
Michael Cohen
OhDarn.net

2002-02-15 13:38:17

by Robert Love

[permalink] [raw]
Subject: Re: oops with 2.4.18-pre9-mjc2

On Fri, 2002-02-15 at 03:51, Robert Jameson wrote:
> I have been seeing this oops from 2.4.16 -> 2.4.18-pre9, so here we go!

Do you see this on device close? It looks like there may be a race
between device closer -> usb release.

Can you reproduce it without the binary module you are loading?

Robert Love

2002-02-15 14:12:27

by Alan

[permalink] [raw]
Subject: Re: oops with 2.4.18-pre9-mjc2

> I have been seeing this oops from 2.4.16 -> 2.4.18-pre9, so here we go!
>
> Reading Oops report from the terminal
> CPU: 0
> EIP: 0010:[<dc838114>] Tainted: P
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

What strange modules do you have loaded ?

If its a binary only one I'd like to know. If its just the base kernel I'd
appreciate an lsmod so I can find which module is missing a license tag

2002-02-15 15:21:03

by Robert Jameson

[permalink] [raw]
Subject: Re: oops with 2.4.18-pre9-mjc2

It's appears right after my PDA finishes syncing, so im guessing, its
during a device close. To answer alans question im using nVidias kernel
driver, therefor i tainted the kernel (tm) (c).

On 15 Feb 2002 08:37:52 -0500
Robert Love <[email protected]> wrote:

> On Fri, 2002-02-15 at 03:51, Robert Jameson wrote:
> > I have been seeing this oops from 2.4.16 -> 2.4.18-pre9, so here we
> > go!
>
> Do you see this on device close? It looks like there may be a race
> between device closer -> usb release.
>
> Can you reproduce it without the binary module you are loading?
>
> Robert Love


--
Robert Jameson http://rj.open-net.org
C2 Village at Wexford Hwy 278, Tel: +1 (843) 757 9428
Hilton Head Isl, SC Cel: +1 (843) 298 0957
US, 29928. mailto:[email protected]


Attachments:
(No filename) (189.00 B)

2002-02-15 15:30:55

by Arjan van de Ven

[permalink] [raw]
Subject: Re: oops with 2.4.18-pre9-mjc2

Robert Jameson wrote:
>
> It's appears right after my PDA finishes syncing, so im guessing, its
> during a device close. To answer alans question im using nVidias kernel
> driver, therefor i tainted the kernel (tm) (c).

you're using a binary only kernel driver AND a preempt kernel ?
brave. very brave.

preempt works on the assumption that it can change the content of inline
functions and such....

2002-02-15 16:11:50

by Greg KH

[permalink] [raw]
Subject: Re: oops with 2.4.18-pre9-mjc2

On Fri, Feb 15, 2002 at 03:51:35AM -0500, Robert Jameson wrote:
> I have been seeing this oops from 2.4.16 -> 2.4.18-pre9, so here we go!

Known problem, sorry. I can't duplicate this myself to try to fix this.
Other people have reported workarounds by using a different host
controller driver, or running a SMP kernel on a UP machine.

Patches gladly accepted :)

thanks,

greg k-h

2002-02-15 17:35:01

by Alan

[permalink] [raw]
Subject: Re: oops with 2.4.18-pre9-mjc2

> It's appears right after my PDA finishes syncing, so im guessing, its
> during a device close. To answer alans question im using nVidias kernel
> driver, therefor i tainted the kernel (tm) (c).

Please take your bug report to Nvidia. You'll find the binary module
needs recompiling for pre-empt. Have fun with them 8)

2002-02-15 19:15:55

by Robert Love

[permalink] [raw]
Subject: Re: oops with 2.4.18-pre9-mjc2

On Fri, 2002-02-15 at 12:48, Alan Cox wrote:

> Please take your bug report to Nvidia. You'll find the binary module
> needs recompiling for pre-empt. Have fun with them 8)

According to his config, preempt-kernel wasn't enabled.

We don't tend to see problems with preempt-kernel + evil-closed-nvidia
driver (for whatever odd reason), anyway.

Robert Love

2002-02-15 20:18:12

by Mike Fedyk

[permalink] [raw]
Subject: Hard lockup with 2.4.18-pre9 + preempt + lock break + O1k[23] + rmap

On Fri, Feb 15, 2002 at 08:37:52AM -0500, Robert Love wrote:
> On Fri, 2002-02-15 at 03:51, Robert Jameson wrote:
> > I have been seeing this oops from 2.4.16 -> 2.4.18-pre9, so here we go!
>
> Do you see this on device close? It looks like there may be a race
> between device closer -> usb release.
>

I don't use USB, and I have had several machines lock up hard while doing
medium to heavy IO. I've had this happen with pre9-mjc2 and another patch
that just contained pre9-preempt-schedo1
(nyu.dyn.dhs.org:8080/patches/2.4.18-pre9-to-rmap12e-schedO1-rml.patch.bz2)

I'm running 2.4.18-pre9-ac3 now to see if I can reproduce without prempt and
O(1).

I have someone else from IRC that has the same issue with prempt+O(1)
against vanilla 2.4.17. He should be sending you a bug report soon.

BTW, all machines ran the same kernel compiled for SMP, but some machines
were UP.

Has anyone else seen this?

Mike

2002-02-15 22:00:31

by Robert Love

[permalink] [raw]
Subject: Re: Hard lockup with 2.4.18-pre9 + preempt + lock break + O1k[23] + rmap

On Fri, 2002-02-15 at 15:18, Mike Fedyk wrote:

> I don't use USB, and I have had several machines lock up hard while doing
> medium to heavy IO. I've had this happen with pre9-mjc2 and another patch
> that just contained pre9-preempt-schedo1
> (nyu.dyn.dhs.org:8080/patches/2.4.18-pre9-to-rmap12e-schedO1-rml.patch.bz2)

The -mjc and similar patches make debugging a bit, uh, hard ;)

> I'm running 2.4.18-pre9-ac3 now to see if I can reproduce without prempt and
> O(1).

If you can't reproduce it, I'd like to see if you can reproduce it
_only_ with preempt. Also, if it happens on stock pre9 (no -ac) would
be of interest, since that doesn't have Andre's IDE patch.

> I have someone else from IRC that has the same issue with prempt+O(1)
> against vanilla 2.4.17. He should be sending you a bug report soon.

Now this would be of interest, thanks.

> BTW, all machines ran the same kernel compiled for SMP, but some machines
> were UP.
>
> Has anyone else seen this?

Robert Love

2002-02-15 23:22:45

by Mike Fedyk

[permalink] [raw]
Subject: Re: Hard lockup with 2.4.18-pre9 + preempt + lock break + O1k[23] + rmap

On Fri, Feb 15, 2002 at 05:00:10PM -0500, Robert Love wrote:
> On Fri, 2002-02-15 at 15:18, Mike Fedyk wrote:
>
> > I don't use USB, and I have had several machines lock up hard while doing
> > medium to heavy IO. I've had this happen with pre9-mjc2 and another patch
> > that just contained pre9-preempt-schedo1
> > (nyu.dyn.dhs.org:8080/patches/2.4.18-pre9-to-rmap12e-schedO1-rml.patch.bz2)
>
> The -mjc and similar patches make debugging a bit, uh, hard ;)
>

Yep, I understand. When I was patching in rmap12f I had to manually
merge the little bit into mm/bootmem.c and the offset was several hundred
lines. Then I realized just how much WLI's bootmem patch changes.

> > I'm running 2.4.18-pre9-ac3 now to see if I can reproduce without prempt and
> > O(1).
>
> If you can't reproduce it, I'd like to see if you can reproduce it
> _only_ with preempt. Also, if it happens on stock pre9 (no -ac) would
> be of interest, since that doesn't have Andre's IDE patch.
>

Actually, I'm going to recompile -mjc2 without lock breaking to see if that
helps. Then try without prempt altogether. If either of those two fix the
problem, I'll see if I can reproduce against the latest kernel from marcello
and your latest patch and merge myself. Heh, I want to keep testing -mjc.
There are so many nice things in there. ;)

> > I have someone else from IRC that has the same issue with prempt+O(1)
> > against vanilla 2.4.17. He should be sending you a bug report soon.
>
> Now this would be of interest, thanks.
>

I asked him to cc me so that I may be able to help too...

> Robert Love

Mike

2002-02-15 23:31:05

by William Lee Irwin III

[permalink] [raw]
Subject: Re: Hard lockup with 2.4.18-pre9 + preempt + lock break + O1k[23] + rmap

On Fri, Feb 15, 2002 at 03:22:21PM -0800, Mike Fedyk wrote:
> Yep, I understand. When I was patching in rmap12f I had to manually
> merge the little bit into mm/bootmem.c and the offset was several hundred
> lines. Then I realized just how much WLI's bootmem patch changes.

It's a rewrite. Of course it changes the whole file. Lucky for you it
interacts with nothing else. I seem to remember this conflict being
somewhat trivial to resolve though.


Cheers,
Bill

2002-02-15 23:42:45

by Mike Fedyk

[permalink] [raw]
Subject: Re: Hard lockup with 2.4.18-pre9 + preempt + lock break + O1k[23] + rmap

On Fri, Feb 15, 2002 at 03:30:40PM -0800, William Lee Irwin III wrote:
> On Fri, Feb 15, 2002 at 03:22:21PM -0800, Mike Fedyk wrote:
> > Yep, I understand. When I was patching in rmap12f I had to manually
> > merge the little bit into mm/bootmem.c and the offset was several hundred
> > lines. Then I realized just how much WLI's bootmem patch changes.
>
> It's a rewrite. Of course it changes the whole file. Lucky for you it
> interacts with nothing else. I seem to remember this conflict being
> somewhat trivial to resolve though.

Yes, it was quite easy to hand merge/fix it up when I added in rmap12f (was
12e... so not much to change.)

Mike

2002-02-16 09:39:14

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: Hard lockup with 2.4.18-pre9 + preempt + lock break + O1k[23] + rmap

On Fri, 15 Feb 2002, Mike Fedyk wrote:

> I don't use USB, and I have had several machines lock up hard while doing
> medium to heavy IO. I've had this happen with pre9-mjc2 and another patch
> that just contained pre9-preempt-schedo1
> (nyu.dyn.dhs.org:8080/patches/2.4.18-pre9-to-rmap12e-schedO1-rml.patch.bz2)
>
> I'm running 2.4.18-pre9-ac3 now to see if I can reproduce without prempt and
> O(1).

I run the same configuration here but UP, with quite a bit of I/O, box
NFS exports a large build directory to 2 other boxes plus usually has 2
builds going locally as well as being used as a storage area for
creating isos/burning cds. VM load is medium for a 512mem/512swap box,
~140 processes and ~150megs into swap.

Cheers,
Zwane Mwaikambo