2002-10-08 20:32:32

by David Brownell

[permalink] [raw]
Subject: Re: [linux-usb-devel] 2.5.40 panic in uhci-hcd

>>>>How does 2.5.41 work for you?
>>>
>>>It seems to be fixed. Thanks.
>>
>>Heh, that's pretty funny. There were not any uhci specific fixes in
>>2.5.41...
>>
>>Not complaining,
>
>
> Actually, there were. This patch is in 2.5.41.

And wouldn't have changed any oopsing behavior, I assure you.

Your panic was being caused by something else. I saw plenty
of strange 2.5.40 behavior indicative of someone walking over
memory they didn't own, and maybe your panic was another case.


> - sizeof(struct uhci_td), 16, 0, GFP_ATOMIC);
> + sizeof(struct uhci_td), 16, 0);


2002-10-16 17:27:05

by Peter Osterlund

[permalink] [raw]
Subject: Re: [linux-usb-devel] 2.5.40 panic in uhci-hcd

David Brownell <[email protected]> writes:

> >>>>How does 2.5.41 work for you?
> >>>
> >>>It seems to be fixed. Thanks.
> >>
> >>Heh, that's pretty funny. There were not any uhci specific fixes in
> >>2.5.41...
> >>
> >>Not complaining,
> > Actually, there were. This patch is in 2.5.41.
>
> And wouldn't have changed any oopsing behavior, I assure you.
>
> Your panic was being caused by something else. I saw plenty
> of strange 2.5.40 behavior indicative of someone walking over
> memory they didn't own, and maybe your panic was another case.

The problem is back in 2.5.43, although it doesn't happen on every
boot. I think I first saw this problem in 2.5.35.

The oops looks the same as usual. The oops happens because urb->hcpriv
is NULL in uhci_result_control() so the list_empty() check oopses.

At the end of uhci_urb_enqueue() this code

if (ret != -EINPROGRESS) {
uhci_destroy_urb_priv (uhci, urb);
return ret;
}

appears to be calling uhci_destroy_urb_priv() without having acquired
the urb_list_lock. Can this be the cause of my problem?


Unable to handle kernel NULL pointer dereference at virtual address 00000014
*pde = 00000000
Oops: 0000
usb-storage uhci-hcd usbcore
CPU: 0
EIP: 0060:[<c482e4d7>] Not tainted
EFLAGS: 00010006
EIP is at uhci_result_control+0x17/0x210 [uhci-hcd]
eax: 00000000 ebx: c3b2a420 ecx: 00010002 edx: ffffffea
esi: 00000014 edi: 00010002 ebp: c3b2a420 esp: c3b81db8
ds: 0068 es: 0068 ss: 0068
Process usb.agent (pid: 203, threadinfo=c3b80000 task=c3e760a0)
Stack: c3c7d15c 00000082 00000000 c3b2a420 00000000 00010002 c1145600 c482f357
c1145600 c3b2a420 00000202 c1145740 c1145740 c1145600 c1145600 c482fd51
c1145600 c3b2a420 c1145600 00000003 0000000a c3b81e68 c4818de7 c1145600
Call Trace:
[<c482f357>] uhci_transfer_result+0x67/0x1a0 [uhci-hcd]
[<c482fd51>] uhci_irq+0xf1/0x130 [uhci-hcd]
[<c4818de7>] usb_hcd_irq+0x17/0x30 [usbcore]
[<c010881d>] handle_IRQ_event+0x2d/0x50
[<c01089fd>] do_IRQ+0xad/0x140
[<c0107478>] common_interrupt+0x18/0x20
[<c0127182>] do_wp_page+0x1c2/0x3d0
[<c0111c49>] __wake_up+0x39/0x40
[<c0127eaf>] handle_mm_fault+0xdf/0x150
[<c0150c4c>] dput+0x1c/0x1a0
[<c01102fd>] do_page_fault+0x14d/0x4cf
[<c011d1fb>] update_wall_time+0xb/0x40
[<c011fb55>] do_sigaction+0xd5/0x110
[<c011ff29>] sys_rt_sigaction+0x99/0xf0
[<c013c793>] filp_close+0xa3/0xb0
[<c011f2b4>] sys_rt_sigprocmask+0x144/0x200
[<c01101b0>] do_page_fault+0x0/0x4cf
[<c01074bd>] error_code+0x2d/0x40

Code: 8b 40 14 39 f0 75 0a b8 ea ff ff ff e9 d4 01 00 00 8b 54 24
<0>Kernel panic: Aiee, killing interrupt handler!

--
Peter Osterlund - [email protected]
http://w1.894.telia.com/~u89404340

2002-10-16 17:28:53

by Johannes Erdfelt

[permalink] [raw]
Subject: Re: [linux-usb-devel] 2.5.40 panic in uhci-hcd

On Wed, Oct 16, 2002, Peter Osterlund <[email protected]> wrote:
> David Brownell <[email protected]> writes:
>
> > >>>>How does 2.5.41 work for you?
> > >>>
> > >>>It seems to be fixed. Thanks.
> > >>
> > >>Heh, that's pretty funny. There were not any uhci specific fixes in
> > >>2.5.41...
> > >>
> > >>Not complaining,
> > > Actually, there were. This patch is in 2.5.41.
> >
> > And wouldn't have changed any oopsing behavior, I assure you.
> >
> > Your panic was being caused by something else. I saw plenty
> > of strange 2.5.40 behavior indicative of someone walking over
> > memory they didn't own, and maybe your panic was another case.
>
> The problem is back in 2.5.43, although it doesn't happen on every
> boot. I think I first saw this problem in 2.5.35.
>
> The oops looks the same as usual. The oops happens because urb->hcpriv
> is NULL in uhci_result_control() so the list_empty() check oopses.
>
> At the end of uhci_urb_enqueue() this code
>
> if (ret != -EINPROGRESS) {
> uhci_destroy_urb_priv (uhci, urb);
> return ret;
> }
>
> appears to be calling uhci_destroy_urb_priv() without having acquired
> the urb_list_lock. Can this be the cause of my problem?

Have you tried this patch? It's in Greg's BK tree, but hasn't been
picked up by Linus yet.

JE

# This is a BitKeeper generated patch for the following project:
# Project Name: greg k-h's linux 2.5 USB kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
# ChangeSet 1.892 -> 1.893
# drivers/usb/host/uhci-hcd.c 1.25 -> 1.26
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/10/13 johannes@devel.(none) 1.893
# uhci-hcd.c:
# If we fail adding the URB to the schedule, we need to make
# sure that we remove it from the urb_list. Thanks to
# Dan Streetman for finding and fixing this bug.
# --------------------------------------------
#
diff -Nru a/drivers/usb/host/uhci-hcd.c b/drivers/usb/host/uhci-hcd.c
--- a/drivers/usb/host/uhci-hcd.c Sun Oct 13 18:11:20 2002
+++ b/drivers/usb/host/uhci-hcd.c Sun Oct 13 18:11:20 2002
@@ -1496,12 +1496,19 @@
break;
}

- spin_unlock_irqrestore(&uhci->urb_list_lock, flags);
-
if (ret != -EINPROGRESS) {
+ /* Submit failed, so delete it from the urb_list */
+ struct urb_priv *urbp = urb->hcpriv;
+
+ list_del_init(&urbp->urb_list);
+ spin_unlock_irqrestore(&uhci->urb_list_lock, flags);
uhci_destroy_urb_priv (uhci, urb);
+
return ret;
}
+
+ spin_unlock_irqrestore(&uhci->urb_list_lock, flags);
+
return 0;
}

2002-10-16 18:22:40

by Peter Osterlund

[permalink] [raw]
Subject: Re: [linux-usb-devel] 2.5.40 panic in uhci-hcd

Johannes Erdfelt <[email protected]> writes:

> On Wed, Oct 16, 2002, Peter Osterlund <[email protected]> wrote:
> >
> > The problem is back in 2.5.43, although it doesn't happen on every
> > boot. I think I first saw this problem in 2.5.35.
> >
> > The oops looks the same as usual. The oops happens because urb->hcpriv
> > is NULL in uhci_result_control() so the list_empty() check oopses.
> >
> > At the end of uhci_urb_enqueue() this code
> >
> > if (ret != -EINPROGRESS) {
> > uhci_destroy_urb_priv (uhci, urb);
> > return ret;
> > }
> >
> > appears to be calling uhci_destroy_urb_priv() without having acquired
> > the urb_list_lock. Can this be the cause of my problem?
>
> Have you tried this patch? It's in Greg's BK tree, but hasn't been
> picked up by Linus yet.

I applied it to 2.5.39 (which always died at boot before this patch)
and now it boots without problems, so this looks like the correct fix
for my problem. Thanks.

--
Peter Osterlund - [email protected]
http://w1.894.telia.com/~u89404340